#### Embeddings using sentence-transformers/all-MiniLM-L6-v2
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

In [4]:
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "Apple is a fruit",
    "I like eating apples",
    "Microsoft builds software"
]

embeddings = model.encode(sentences)
print("Embeddings:", embeddings)
  

Embeddings: [[ 0.07331281  0.04409567  0.0085246  ... -0.03772323  0.1416195
   0.03301753]
 [ 0.00726297 -0.01411778  0.00293348 ...  0.05920228  0.03950173
  -0.01490284]
 [-0.04567775 -0.01632963  0.01116786 ... -0.03889105  0.12456898
   0.02443378]]


In [4]:
print("Embedding shape:", embeddings[0].shape)


Embedding shape: (384,)


#### Compare similarity - cosine similarity

- âœ” Fruit sentences â†’ high similarity
- âœ” Unrelated sentences â†’ low similarity

ðŸ’¡ This is retrieval.

In [5]:
score1 = util.cos_sim(embeddings[0], embeddings[1])
score2 = util.cos_sim(embeddings[0], embeddings[2])

print("Fruit similarity:", score1)
print("Unrelated similarity:", score2)


Fruit similarity: tensor([[0.5660]])
Unrelated similarity: tensor([[0.0867]])


#### Task 2

In [6]:
sentences = [
    "Python is a programming language",
    "JavaScript is used for web development",
    "I love pizza"
]

embeddings = model.encode(sentences)

score1 = util.cos_sim(embeddings[0], embeddings[1])
score2 = util.cos_sim(embeddings[0], embeddings[2])

print("Python-JavaScript similarity:", score1)
print("Python-Pizza similarity:", score2)

Python-JavaScript similarity: tensor([[0.2778]])
Python-Pizza similarity: tensor([[0.1320]])


#### Task 3

In [7]:
sentences = [
    "Apple is a fruit",
    "I enjoy consuming fruits",
    "Microsoft builds software",
]

embeddings = model.encode(sentences)

score1 = util.cos_sim(embeddings[0], embeddings[1])
score2 = util.cos_sim(embeddings[0], embeddings[2])

print("Fruit similarity:", score1)
print("Unrelated similarity:", score2)

Fruit similarity: tensor([[0.5506]])
Unrelated similarity: tensor([[0.0867]])


#### Example 1 : create embeddings, store it in vector store and retreive similar words
https://dev.to/moni121189/next-gen-qa-retrieval-augmented-ai-with-chroma-vector-store-4kie

In [None]:
# ingest data
from langchain_community.document_loaders import TextLoader

loader = TextLoader("speech.txt")
docs = loader.load()
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Freedom was not gifted; it was earned through courage and sacrifice.\nCountless voices rose together to demand dignity and self-rule.\nEvery step toward independence carried the weight of hope and loss.\nThe struggle taught us unity beyond language, region, or belief.\nIndependence is not just a date, but a responsibility we carry daily.\nIt reminds us to protect justice, equality, and truth.\nThe past whispers lessons of resilience and bravery.\nThe present asks us to build with integrity and compassion.\nThe future depends on how wisely we use our freedom today.\nIndependence lives on when we choose progress over fear.\n')]

In [3]:
# split data into chunks
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(
    docs
)  
texts

[Document(metadata={'source': 'speech.txt'}, page_content='Freedom was not gifted; it was earned through courage and sacrifice.\nCountless voices rose together to demand dignity and self-rule.\nEvery step toward independence carried the weight of hope and loss.\nThe struggle taught us unity beyond language, region, or belief.\nIndependence is not just a date, but a responsibility we carry daily.\nIt reminds us to protect justice, equality, and truth.\nThe past whispers lessons of resilience and bravery.'),
 Document(metadata={'source': 'speech.txt'}, page_content='The present asks us to build with integrity and compassion.\nThe future depends on how wisely we use our freedom today.\nIndependence lives on when we choose progress over fear.')]

In [31]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

embedding_model = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

vector_store = Chroma.from_documents(
    documents=texts,
    embedding=embedding_model,
    persist_directory="./chroma_db"
)

vector_store.persist()


In [29]:
query = "Freedom was not gifted?"

results = vector_store.similarity_search("is freedom gifted", k=1)

for i in (results):
    print(i.page_content)

Freedom was not gifted; it was earned through courage and sacrifice.
Countless voices rose together to demand dignity and self-rule.
Every step toward independence carried the weight of hope and loss.
The struggle taught us unity beyond language, region, or belief.
Independence is not just a date, but a responsibility we carry daily.
It reminds us to protect justice, equality, and truth.
The past whispers lessons of resilience and bravery.
