# LangChain + Chroma (Demo)

**Purpose:** Demonstrate creating LangChain `Document` objects, building embeddings with Google Gemini, storing them in a Chroma collection, and running similarity searches.

**Prerequisites:**
- Set `GEMINI_API_KEY` in your environment or use Colab user data.  
- Run cells in order. The first cell installs required packages if needed.

**Notes:**
- This notebook uses `langchain_google_genai` for embeddings and `langchain_chroma` for the vector store.  
- Keep sample documents small to control token and quota usage.

In [1]:
!pip install langchain chromadb openai tiktoken pypdf langchain_google_genai langchain-community



In [2]:
import os

In [6]:
from google.colab import userdata
gemini_api_key = userdata.get('GEMINI_API_KEY')

## Creating sample `Document` objects

The next cell defines a few `langchain.schema.Document` objects. These represent pieces of text you can embed and store in Chroma. Keep `page_content` concise for demos.

In [9]:
from langchain.schema import Document

# Create LangChain documents for famous footballers

doc1 = Document(
    page_content="Lionel Messi is regarded as one of the greatest footballers of all time. Known for his dribbling, playmaking, and finishing, he has spent most of his career with FC Barcelona and now plays for Inter Miami.",
    metadata={"team": "Inter Miami"}
)

doc2 = Document(
    page_content="Cristiano Ronaldo is a record-breaking forward known for his athleticism, goal-scoring ability, and leadership. He has played for clubs like Manchester United, Real Madrid, and Juventus, and currently represents Al-Nassr.",
    metadata={"team": "Al-Nassr"}
)

doc3 = Document(
    page_content="Kylian Mbappé is one of the fastest and most talented forwards in modern football. Known for his explosive pace and clinical finishing, he has been a key player for Paris Saint-Germain and the French national team.",
    metadata={"team": "Paris Saint-Germain"}
)

doc4 = Document(
    page_content="Kevin De Bruyne is a world-class midfielder known for his passing vision, long-range shooting, and ability to control the tempo of the game. He plays for Manchester City in the English Premier League.",
    metadata={"team": "Manchester City"}
)

doc5 = Document(
    page_content="Virgil van Dijk is a commanding center-back who is known for his strength, composure, and aerial ability. As a key player for Liverpool FC, he has helped solidify their defense and lead them to major titles.",
    metadata={"team": "Liverpool FC"}
)


In [10]:
docs = [doc1, doc2, doc3, doc4, doc5]

## Setup embeddings and Chroma

This cell installs `langchain-chroma` (if needed) and imports the embedding and Chroma client classes used later.

In [14]:
!pip install -U langchain-chroma

Collecting langchain-chroma
  Downloading langchain_chroma-0.2.6-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_chroma-0.2.6-py3-none-any.whl (12 kB)
Installing collected packages: langchain-chroma
Successfully installed langchain-chroma-0.2.6


In [15]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma

In [16]:
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-001",
    google_api_key=gemini_api_key
)

In [17]:
vector_store = Chroma(
    embedding_function= embeddings,
    persist_directory='my_chroma_db',
    collection_name='sample'
)

In [18]:
# add documents
vector_store.add_documents(docs)

['94f2cf54-03c4-4b2e-8f8f-9322f2d5159a',
 '359169df-13eb-44e0-8904-cfe0f053e18f',
 '61db7c49-df43-4acc-8b70-5dfd0255f722',
 '0c83693c-93ed-4d07-896a-54ff7269e657',
 '75218c31-6011-4c9a-971d-77c9c317828b']

## Add documents to the vector store

Run this cell to add the sample `Document` objects to the Chroma collection.

In [19]:
# view documents
vector_store.get(include=['embeddings','documents', 'metadatas'])

{'ids': ['94f2cf54-03c4-4b2e-8f8f-9322f2d5159a',
  '359169df-13eb-44e0-8904-cfe0f053e18f',
  '61db7c49-df43-4acc-8b70-5dfd0255f722',
  '0c83693c-93ed-4d07-896a-54ff7269e657',
  '75218c31-6011-4c9a-971d-77c9c317828b'],
 'embeddings': array([[-0.01602241,  0.01614691,  0.02156998, ...,  0.00904962,
          0.01005693, -0.00607986],
        [-0.00796063, -0.00279262,  0.0076505 , ...,  0.01047483,
          0.00565746, -0.01442141],
        [-0.02555987,  0.00559623,  0.01505783, ...,  0.01594054,
         -0.00846015, -0.01061948],
        [-0.00408142,  0.00315169,  0.01247231, ..., -0.0022489 ,
         -0.01917078,  0.00226506],
        [-0.01209005,  0.00499694,  0.00654598, ..., -0.00517881,
         -0.01468057,  0.00068887]]),
 'documents': ['Lionel Messi is regarded as one of the greatest footballers of all time. Known for his dribbling, playmaking, and finishing, he has spent most of his career with FC Barcelona and now plays for Inter Miami.',
  'Cristiano Ronaldo is a record

## View stored documents and embeddings

Use this to inspect what was saved in Chroma. The `include` argument shows embeddings, document content, and metadata.

In [20]:
# search documents
vector_store.similarity_search(
    query='Who among these are a striker?',
    k=2
)

[Document(id='61db7c49-df43-4acc-8b70-5dfd0255f722', metadata={'team': 'Paris Saint-Germain'}, page_content='Kylian Mbappé is one of the fastest and most talented forwards in modern football. Known for his explosive pace and clinical finishing, he has been a key player for Paris Saint-Germain and the French national team.'),
 Document(id='359169df-13eb-44e0-8904-cfe0f053e18f', metadata={'team': 'Al-Nassr'}, page_content='Cristiano Ronaldo is a record-breaking forward known for his athleticism, goal-scoring ability, and leadership. He has played for clubs like Manchester United, Real Madrid, and Juventus, and currently represents Al-Nassr.')]

## Search examples

Run similarity search examples: basic search, search with scores, and filtered search using metadata.

In [22]:
# search with similarity score
vector_store.similarity_search_with_score(
    query='Who among these are a striker?',
    k=2
)

[(Document(id='61db7c49-df43-4acc-8b70-5dfd0255f722', metadata={'team': 'Paris Saint-Germain'}, page_content='Kylian Mbappé is one of the fastest and most talented forwards in modern football. Known for his explosive pace and clinical finishing, he has been a key player for Paris Saint-Germain and the French national team.'),
  0.48598504066467285),
 (Document(id='359169df-13eb-44e0-8904-cfe0f053e18f', metadata={'team': 'Al-Nassr'}, page_content='Cristiano Ronaldo is a record-breaking forward known for his athleticism, goal-scoring ability, and leadership. He has played for clubs like Manchester United, Real Madrid, and Juventus, and currently represents Al-Nassr.'),
  0.5011409521102905)]

In [28]:
# meta-data filtering
vector_store.similarity_search_with_score(
    query="Who plays for Liverpool",
    filter={"team": "Liverpool FC"}
)

[(Document(id='75218c31-6011-4c9a-971d-77c9c317828b', metadata={'team': 'Liverpool FC'}, page_content='Virgil van Dijk is a commanding center-back who is known for his strength, composure, and aerial ability. As a key player for Liverpool FC, he has helped solidify their defense and lead them to major titles.'),
  0.40427422523498535)]

In [30]:
# update documents
updated_doc1 = Document(
    page_content="Lionel Messi, the legendary forward and former captain of FC Barcelona, is celebrated for his extraordinary vision, dribbling, and finishing. Holding numerous records, including the most goals for a single club and multiple Ballon d’Or awards, Messi has redefined playmaking in modern football. Even after moving to Inter Miami, his influence on and off the pitch continues to inspire millions. Known for his humility, consistency, and creative brilliance, Messi remains one of the most complete players in football history.",
    metadata={"team": "Inter Miami"}
)
vector_store.update_document(document_id='94f2cf54-03c4-4b2e-8f8f-9322f2d5159a', document=updated_doc1)

## Update and delete documents

Examples below show updating a document by id and deleting by id. Keep backups of your document ids if you plan to modify or remove records.

In [31]:
# view documents
vector_store.get(include=['embeddings','documents', 'metadatas'])

{'ids': ['94f2cf54-03c4-4b2e-8f8f-9322f2d5159a',
  '359169df-13eb-44e0-8904-cfe0f053e18f',
  '61db7c49-df43-4acc-8b70-5dfd0255f722',
  '0c83693c-93ed-4d07-896a-54ff7269e657',
  '75218c31-6011-4c9a-971d-77c9c317828b'],
 'embeddings': array([[-0.02135512,  0.0097917 ,  0.0265045 , ...,  0.0143773 ,
          0.005615  , -0.01927766],
        [-0.00796063, -0.00279262,  0.0076505 , ...,  0.01047483,
          0.00565746, -0.01442141],
        [-0.02555987,  0.00559623,  0.01505783, ...,  0.01594054,
         -0.00846015, -0.01061948],
        [-0.00408142,  0.00315169,  0.01247231, ..., -0.0022489 ,
         -0.01917078,  0.00226506],
        [-0.01209005,  0.00499694,  0.00654598, ..., -0.00517881,
         -0.01468057,  0.00068887]]),
 'documents': ['Lionel Messi, the legendary forward and former captain of FC Barcelona, is celebrated for his extraordinary vision, dribbling, and finishing. Holding numerous records, including the most goals for a single club and multiple Ballon d’Or awar

In [32]:
# delete document
vector_store.delete(ids=['94f2cf54-03c4-4b2e-8f8f-9322f2d5159a'])

In [33]:
# view documents
vector_store.get(include=['embeddings','documents', 'metadatas'])

{'ids': ['359169df-13eb-44e0-8904-cfe0f053e18f',
  '61db7c49-df43-4acc-8b70-5dfd0255f722',
  '0c83693c-93ed-4d07-896a-54ff7269e657',
  '75218c31-6011-4c9a-971d-77c9c317828b'],
 'embeddings': array([[-0.00796063, -0.00279262,  0.0076505 , ...,  0.01047483,
          0.00565746, -0.01442141],
        [-0.02555987,  0.00559623,  0.01505783, ...,  0.01594054,
         -0.00846015, -0.01061948],
        [-0.00408142,  0.00315169,  0.01247231, ..., -0.0022489 ,
         -0.01917078,  0.00226506],
        [-0.01209005,  0.00499694,  0.00654598, ..., -0.00517881,
         -0.01468057,  0.00068887]]),
 'documents': ['Cristiano Ronaldo is a record-breaking forward known for his athleticism, goal-scoring ability, and leadership. He has played for clubs like Manchester United, Real Madrid, and Juventus, and currently represents Al-Nassr.',
  'Kylian Mbappé is one of the fastest and most talented forwards in modern football. Known for his explosive pace and clinical finishing, he has been a key pla