
## Concept: Embeddings
Ref: [LangChain Embedding Models](https://python.langchain.com/docs/concepts/embedding_models/)

<img src="https://python.langchain.com/assets/images/embeddings_concept-975a9aaba52de05b457a1aeff9a7393a.png" width="800">


## Concept: Vector Store (DB)
Ref: [LangChain Vector Stores](https://python.langchain.com/docs/concepts/vectorstores/)

<img src="https://python.langchain.com/assets/images/vectorstores-2540b4bc355b966c99b0f02cfdddb273.png" width="800">

In [3]:
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="nomic-embed-text")

### Example: Text to Vector Embeddings

In [40]:
text = "LangChain is the framework for building context-aware reasoning applications."

single_vector = embeddings.embed_query(text)

print(str(single_vector)[:100])  # Show the first 100 characters of the vector

[0.021211749, 0.018929051, -0.122605085, -0.103397556, 0.057456363, -0.0870874, -0.017253302, 0.0055


In [41]:
single_vector

[0.021211749,
 0.018929051,
 -0.122605085,
 -0.103397556,
 0.057456363,
 -0.0870874,
 -0.017253302,
 0.0055345604,
 -0.07470313,
 -0.03152467,
 -0.011904566,
 0.028112862,
 0.08406746,
 0.020119742,
 0.006278562,
 0.011786077,
 -0.040601984,
 -0.007972868,
 -0.023357151,
 0.0050573233,
 -0.044880502,
 0.02711481,
 -0.0078004375,
 -0.036292907,
 0.08188725,
 0.025105992,
 0.01038842,
 0.0007834423,
 -0.057228487,
 -0.02392584,
 0.01173498,
 -0.012663904,
 -0.010603482,
 -0.008595445,
 -0.05732258,
 -0.08126165,
 0.052537803,
 0.00086891267,
 -0.034000676,
 0.04441117,
 -0.008931928,
 0.04997024,
 0.0018921854,
 0.0076844953,
 0.07049415,
 0.009813396,
 -0.018181536,
 0.0037722872,
 0.05916435,
 -0.063544616,
 -0.06966634,
 -0.028168162,
 -0.030798027,
 0.005205126,
 0.031222545,
 0.067621775,
 0.018021695,
 0.03419625,
 -0.04978201,
 -0.035309117,
 0.0503691,
 0.080171816,
 -0.04688425,
 0.084044345,
 0.0017919784,
 0.027926348,
 -0.047443308,
 0.055177357,
 -0.027200183,
 -0.0024917768

In [39]:
len(single_vector)

768

### Example: Beginner-friendly Semantic Similarity Search with LangChain

In [36]:
# Step 1: Import required library
from langchain_core.vectorstores import InMemoryVectorStore

# Step 2: Prepare a list of sample texts (documents)
text_list = [
    "LangChain is a framework for building context-aware reasoning applications.",
    "I am from Pakistan.",
    "LLMs are powerful!",
    "The Great Wall of China is one of the Seven Wonders of the World.",
    "Python is a versatile programming language popular among data scientists.",
    "Mount Everest is the highest mountain in the world.",
    "Photosynthesis is the process by which plants convert sunlight into energy.",
    "The Eiffel Tower is located in Paris, France.",
    "Artificial Intelligence is transforming industries worldwide.",
    "The Amazon Rainforest is home to a vast variety of plant and animal species.",
]

# Step 3: Create a single vector store from all texts
vectorstore = InMemoryVectorStore.from_texts(text_list, embedding=embeddings)

# Step 4: Convert the vector store into a retriever
# (Retriever will find the most semantically similar documents for a query)
retriever = vectorstore.as_retriever(search_kwargs={"k": len(text_list)})

# Step 5: Ask a question to find similar text
keyword = str(input("Enter keyword(s): "))
retrieved_documents = retriever.invoke(keyword)

# Step 6: Show all retrieved documents
print(f"keyword(s): {keyword}\n")
print("Documents sorted by relevance:")
for i, doc in enumerate(retrieved_documents, start=1):
    print(f"Document {i}: {doc.page_content}")

# Step 7: Display the most relevant document's content
print("\nMost relevant document:")
print(retrieved_documents[0].page_content)

keyword(s): automation

Documents sorted by relevance:
Document 1: Artificial Intelligence is transforming industries worldwide.
Document 2: Python is a versatile programming language popular among data scientists.
Document 3: Photosynthesis is the process by which plants convert sunlight into energy.
Document 4: LangChain is a framework for building context-aware reasoning applications.
Document 5: The Amazon Rainforest is home to a vast variety of plant and animal species.
Document 6: The Eiffel Tower is located in Paris, France.
Document 7: LLMs are powerful!
Document 8: I am from Pakistan.
Document 9: Mount Everest is the highest mountain in the world.
Document 10: The Great Wall of China is one of the Seven Wonders of the World.

Most relevant document:
Artificial Intelligence is transforming industries worldwide.
