#### Ollama Embeddings
- https://github.com/ollama/ollama?tab=readme-ov-file
- https://docs.langchain.com/oss/python/integrations/text_embedding/ollama
- https://ollama.com/blog/embedding-models

In [14]:
from langchain_community.embeddings import OllamaEmbeddings

# Initialize embedding model
embeddings = OllamaEmbeddings(model="mxbai-embed-large")

# List of texts to embed
texts = [
    "Freedom was not gifted; it was earned through sacrifice.",
    "Independence carries responsibility.",
    "The future depends on how wisely we use freedom.",
]

# Generate embeddings
vectors = embeddings.embed_documents(texts)

# Inspect results
print(f"Number of vectors: {len(vectors)}")
print(f"Dimension of one vector: {len(vectors[0])}")
print(vectors[0][:10])  # Show the first 10 dimensions of the first vector
print(vectors[1][:10])  # Show the first 10 dimensions of the second vector
print(vectors[2][:10])  # Show the first 10 dimensions of the third vector

Number of vectors: 3
Dimension of one vector: 1024
[0.013753477483987808, -0.00788838416337967, -0.020158085972070694, 0.07587914168834686, -0.0599953792989254, -0.054242875427007675, -0.017382148653268814, -0.024384701624512672, 0.023145612329244614, 0.011930054984986782]
[-0.013456552289426327, -0.02911004237830639, -0.012630164623260498, 0.04389433562755585, -0.026465805247426033, -0.0032459318172186613, 0.012245533056557178, -0.01780848205089569, 0.049900442361831665, 0.035783275961875916]
[0.0037643371615558863, -0.0018393656937405467, -0.012713055126369, 0.0576869398355484, -0.05178232863545418, -0.024839291349053383, 0.017634239047765732, -0.011185872368514538, 0.030057398602366447, 0.041468508541584015]


In [12]:
single_vector = embeddings.embed_query("is freedom important?")
print(str(single_vector))

[0.03683486580848694, 0.008859277702867985, -0.019556205719709396, 0.03386453911662102, -0.062492456287145615, -0.031327612698078156, 0.012439466081559658, 0.0036480191629379988, -0.012239604257047176, 0.014869713224470615, 0.028012854978442192, 0.03138095885515213, -0.01971389353275299, -0.015162617899477482, -0.0024765129201114178, 0.010033983737230301, -0.03296355530619621, 0.03278322145342827, -0.0032912367023527622, 0.01910456083714962, 0.03891363367438316, -0.004483990371227264, -0.01717185787856579, -0.0025192901957780123, -0.005768466740846634, -0.024465279653668404, -0.016441481187939644, 0.03246401622891426, 0.05402287095785141, 0.13328519463539124, 0.019684914499521255, 0.0168668944388628, -0.006279272027313709, -0.022376224398612976, -0.018143685534596443, -0.01539886835962534, 0.012666418217122555, -0.050653863698244095, -0.007796820253133774, -0.03054005280137062, -0.03255236893892288, -0.015352972783148289, 0.018560824915766716, -0.06292083859443665, -0.06251741200685501

#### Storing the embeddings in vector database and performing similarity search

In [25]:
# store the embeddings in croma database and perform similarity search

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

loader = TextLoader("speech.txt")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(
    docs
)  # here incoming format is already a document

embeddings = OllamaEmbeddings(model="mxbai-embed-large")

vectorstore = Chroma.from_documents(documents=texts, embedding=embeddings, collection_name="speech_collection")
query = "What is the main message of the speech?"
results = vectorstore.similarity_search(query=query, k=3)

print(f"Top 3 results for the query: '{query}'")
for i, res in enumerate(results):
    print(f"Result {i+1}: {res.page_content}\n")

Top 3 results for the query: 'What is the main message of the speech?'
Result 1: The present asks us to build with integrity and compassion.
The future depends on how wisely we use our freedom today.
Independence lives on when we choose progress over fear.

Result 2: The present asks us to build with integrity and compassion.
The future depends on how wisely we use our freedom today.
Independence lives on when we choose progress over fear.

Result 3: Freedom was not gifted; it was earned through courage and sacrifice.
Countless voices rose together to demand dignity and self-rule.
Every step toward independence carried the weight of hope and loss.
The struggle taught us unity beyond language, region, or belief.
Independence is not just a date, but a responsibility we carry daily.
It reminds us to protect justice, equality, and truth.
The past whispers lessons of resilience and bravery.



In [26]:
# print database info
print(f"Collection name: {vectorstore._collection}")
print(f"Number of documents in the collection: {vectorstore._collection.count()}")
print(f"documents: {vectorstore._collection.get()}")


Collection name: Collection(name=speech_collection)
Number of documents in the collection: 4
documents: {'ids': ['1befab00-f3af-404a-ad02-fcde0840191b', '4752dee4-3469-4723-93f0-cd1f3e4c61e8', '2411cd83-65c0-471e-bf58-611a4a795519', '74de70bc-ad45-47c8-bc18-d14ba68f2ae3'], 'embeddings': None, 'documents': ['Freedom was not gifted; it was earned through courage and sacrifice.\nCountless voices rose together to demand dignity and self-rule.\nEvery step toward independence carried the weight of hope and loss.\nThe struggle taught us unity beyond language, region, or belief.\nIndependence is not just a date, but a responsibility we carry daily.\nIt reminds us to protect justice, equality, and truth.\nThe past whispers lessons of resilience and bravery.', 'The present asks us to build with integrity and compassion.\nThe future depends on how wisely we use our freedom today.\nIndependence lives on when we choose progress over fear.', 'Freedom was not gifted; it was earned through courage and