## Hugginface
The Hugging Face Sentence Transformer is a Python library and model architecture designed for tasks that require meaningful sentence-level embeddings. It is a hybrid of Hugging Face Transformers and the Sentence-Transformers library. 

- Go to https://huggingface.co/ and create your account and complete your email verification
- Now on click on your profile icon(top right) => settings => Access Tokens => Create new token
- Copy this token in your system and save it in .env file like below
- HF_TOKEN="Your-TOKEN"


In [2]:
from dotenv import load_dotenv
import os
load_dotenv()

True

## Huggingface Sentence Transformer
it is a python framework for state of the art sentence, text and image building. one of the embedding models is used in the HuggingFaceEmbeddings class. We have also added an alias for SentenceTransformerEmbeddings for users who are more familiar with directly using that package.

In [10]:
from langchain_huggingface import HuggingFaceEmbeddings

os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN')

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


In [None]:
text = "This is atest documents"
query_result = embeddings.embed_query(text)
query_result

In [12]:
len(query_result)

384

In [13]:
doc_result = embeddings.embed_documents(["This is atest documents","This is not a test document."])
doc_result[0]

[-0.04311219975352287,
 0.13562117516994476,
 0.022339945659041405,
 0.007216725964099169,
 0.03421059250831604,
 0.0240340419113636,
 -0.024848798289895058,
 0.04566730558872223,
 0.018850604072213173,
 0.04899340122938156,
 -0.004306007642298937,
 0.05968935042619705,
 0.002952359616756439,
 -0.05999087914824486,
 -0.1198037713766098,
 -0.005690686404705048,
 -0.020968446508049965,
 0.00972126517444849,
 0.04023443162441254,
 0.050469979643821716,
 -0.002160770585760474,
 0.0988808199763298,
 0.021964702755212784,
 -0.0585198700428009,
 0.02956192009150982,
 0.004117692355066538,
 -0.09333010017871857,
 -0.04305516555905342,
 0.06968400627374649,
 -0.046840813010931015,
 0.04395326226949692,
 0.010073404759168625,
 0.09620821475982666,
 0.02793021872639656,
 0.07333722710609436,
 -0.012976840138435364,
 0.07613669335842133,
 -0.011923229321837425,
 0.011215190403163433,
 -0.008163253776729107,
 -0.010897460393607616,
 -0.07058071345090866,
 -0.02759602479636669,
 -0.00615308247506618

In [14]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("speech.txt")
docs = loader.load()
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Good morning everyone,\n\nToday, we gather here to discuss the future of artificial intelligence and its impact on our society. Over the past decade, AI has grown from a theoretical concept to a transformative force reshaping industries, economies, and the way we live our lives.\n\nArtificial intelligence has the potential to revolutionize healthcare by enabling earlier diagnoses, personalized treatments, and improved patient care. It can optimize supply chains, reduce waste, and improve sustainability in industries. In education, AI-powered tools are helping students learn in more personalized and effective ways.\n\nHowever, with these opportunities come challenges. Issues like bias in algorithms, data privacy concerns, and the displacement of jobs require thoughtful consideration. It is our responsibility to ensure that AI is developed and deployed ethically and inclusively.\n\nLet us work together to harness the power of art

In [15]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=10)

splitted_doc = text_splitter.split_documents(docs)

splitted_doc

[Document(metadata={'source': 'speech.txt'}, page_content='Good morning everyone,\n\nToday, we gather here to discuss the future of artificial intelligence and its impact on our society. Over the past decade, AI has grown from a theoretical concept to a transformative force reshaping industries, economies, and the way we live our lives.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Artificial intelligence has the potential to revolutionize healthcare by enabling earlier diagnoses, personalized treatments, and improved patient care. It can optimize supply chains, reduce waste, and improve sustainability in industries. In education, AI-powered tools are helping students learn in more personalized and effective ways.'),
 Document(metadata={'source': 'speech.txt'}, page_content='However, with these opportunities come challenges. Issues like bias in algorithms, data privacy concerns, and the displacement of jobs require thoughtful consideration. It is our responsibility to e

In [16]:
from langchain_community.vectorstores import Chroma

from langchain_openai import OpenAIEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma.from_documents(splitted_doc,embeddings)

db

<langchain_community.vectorstores.chroma.Chroma at 0x24cf68c39d0>

In [17]:
query = "Artificial intelligence has the potential to revolutionize"
retrived_results = db.similarity_search(query)
print(retrived_results)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


[Document(metadata={'source': 'speech.txt'}, page_content='Good morning everyone,\n\nToday, we gather here to discuss the future of artificial intelligence and its impact on our society. Over the past decade, AI has grown from a theoretical concept to a transformative force reshaping industries, economies, and the way we live our lives.'), Document(metadata={'source': 'speech.txt'}, page_content='Artificial intelligence has the potential to revolutionize healthcare by enabling earlier diagnoses, personalized treatments, and improved patient care. It can optimize supply chains, reduce waste, and improve sustainability in industries. In education, AI-powered tools are helping students learn in more personalized and effective ways.'), Document(metadata={'source': 'speech.txt'}, page_content='However, with these opportunities come challenges. Issues like bias in algorithms, data privacy concerns, and the displacement of jobs require thoughtful consideration. It is our responsibility to ens