<img src="../images/coefficient-aidl.png" width=1200>

# Build Your Own Private ChatGPT Super-Assistant Using Streamlit, LangChain, Chroma & Llama 2
## Chroma Demo
**Questions?** contact@coefficient.ai / [@CoefficientData](https://twitter.com/CoefficientData)

---

## 0. Imports

In [None]:
import chromadb
from dotenv import load_dotenv

from utils import scrape_page

## 1. Chroma Basics

In [None]:
# Get the Chroma client
chroma_client = chromadb.Client()

In [None]:
# Create a collection
collection = chroma_client.create_collection(name="my_collection")

Collections are where you'll store your embeddings, documents, and any additional metadata. 

In [None]:
# Add some text documents to the collection
collection.add(
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"],
)

Chroma will store your text, and handle tokenization, embedding, and indexing automatically.

In [None]:
collection2 = chroma_client.create_collection(name="another_collection")

In [None]:
# Load in pre-generated embeddings
collection2.add(
    embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"],
)

In [None]:
# Query the collection
results = collection.query(query_texts=["This is a query document"], n_results=2)

In [None]:
results

- **Where is data stored?** By default data stored in Chroma is ephemeral making it easy to prototype scripts.
- **Can data be persisted?** It's easy to make Chroma persistent so you can reuse every collection you create and add more documents to it later. It will load your data automatically when you start the client, and save it automatically when you close it.

Check out the [Usage Guide](https://docs.trychroma.com/usage-guide) for more info.

In [None]:
persistent_client = chromadb.PersistentClient(path=".")
persistent_collection = persistent_client.create_collection(name="example_collection")

---

## 2. Create embeddings with LangChain

### Create embeddings with Llama

In [None]:
from langchain.embeddings.llamacpp import LlamaCppEmbeddings

In [None]:
# Make sure the model path is correct!
llama_embedder = LlamaCppEmbeddings(model_path="../models/llama-2-7b-chat.Q4_K_M.gguf")

In [None]:
text = "This is a test document."
query_result = llama_embedder.embed_query(text)

In [None]:
len(query_result)

In [None]:
query_result[:10]

In [None]:
doc_result = llama_embedder.embed_documents([text])

In [None]:
len(doc_result)

In [None]:
doc_result[0][:10]

### Create embeddings using LangChain

In [None]:
# Let's get some more interesting data
url = "https://www.gov.uk/government/publications/frontier-ai-capabilities-and-risks-discussion-paper/frontier-ai-capabilities-and-risks-discussion-paper"
paper = scrape_page(url)

In [None]:
# Take a peek
print(f"{len(paper)=}\n\nExtract:")
print(paper[10000:15000])

In [None]:
# Save it to disk - we only do 5000 characters as Llama is very slow at embedding
with open("frontier-ai-paper.txt", "w") as f:
    f.write(paper)

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load the document
raw_documents = TextLoader("frontier-ai-paper.txt").load()

# Split it into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

In [None]:
len(documents)

In [None]:
documents[:2]

In [None]:
%%time
from langchain.vectorstores import Chroma

# Embed each chunk and load it into the vector store
# db = Chroma.from_documents(documents, llama_embedder)

### Similarity search

In [None]:
query = "What are scaffolds in AI?"
docs = db.similarity_search(query)
print(docs[0].page_content.replace("\n", " "))

## Using SentenceTransformerEmbeddings

In [None]:
# Initialise the new embedder
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

st_embedder = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

In [None]:
%%time
# Compare this with SentenceTransformerEmbeddings
db2 = Chroma.from_documents(documents, st_embedder, collection_name="st_embeddings")

**Note: It takes `SentenceTransformerEmbeddings` <1 second, and Llama 2 several minutes!**

In [None]:
# Save the whole paper this time, Sentence-Transformers can handle it
print(f"{len(paper)=}")
with open("frontier-ai-paper.txt", "w") as f:
    f.write(paper)

In [None]:
raw_documents = TextLoader("frontier-ai-paper.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

In [None]:
len(documents)

In [None]:
%%time
db2 = Chroma.from_documents(documents, st_embedder, collection_name="st_embeddings")

In [None]:
docs = db2.similarity_search("What are scaffolds in AI?")
print(docs[0].page_content.replace("\n", " "))

In [None]:
docs = db2.similarity_search("What are the top risks of frontier models?")
print(docs[0].page_content.replace("\n", " "), "\n\n")
print(docs[1].page_content.replace("\n", " "))

### Maximum marginal relevance search (MMR)
Maximal marginal relevance optimizes for similarity to query and diversity among selected documents. It is also supported in async API.

In [None]:
query = "What are the top risks of frontier models?"
retriever = db2.as_retriever(search_type="mmr")
docs = retriever.get_relevant_documents(query)

print(docs[0].page_content.replace("\n", " "), "\n\n")
print(docs[1].page_content.replace("\n", " "))

### Deep linking

In [None]:
docs = db2.similarity_search("Which model has the best benchmark?")
result = docs[1].page_content
print(result.replace("\n", " "))

In [None]:
import urllib.parse

In [None]:
encoded_result = urllib.parse.quote(result[:50])
encoded_result

In [None]:
deeplink = f"{url}#:~:text={encoded_result}"
deeplink

---

## 3. Exercise: Q&A bot with vector database

> Combine the Chroma vector database with a Llama-based LangChain LLM to create a Q&A bot for the provided (or any other) URL.
> Tips:
> - Encode your queries using the Sentence-Transformer embedding & return the top documents
> - Include the question alongside the top N documents into your LangChain LLM's context window
> - Use Llama 2 to synthesise a coherent answer
>
> This approach enables LLMs to answer questions to things they haven't been pre-trained on by using the vector database as an "encyclopedia" that it can reference as needed. This is known as "retrieval-augmented generation" or "RAG".

---

## Where next?

LangChain is far more powerful than we've seen so far! Here's an idea of what else you can do:
- [Learn to use agents and tools with LangChain](https://python.langchain.com/docs/modules/agents/tools/) such as searching the web, querying APIs, reading papers on ArXiv, checking the weather, digesting articles on Wikipedia, making (and transcribing) calls with Twilio, accessing financial data and much more. Check out the [list of integrations here](https://python.langchain.com/docs/integrations/tools).
- [Query a SQL database](https://python.langchain.com/docs/expression_language/cookbook/sql_db) with LangChain Runnables
- [Write Python code](https://python.langchain.com/docs/expression_language/cookbook/code_writing) with LangChain
- [Learn more about RAG](https://python.langchain.com/docs/expression_language/cookbook/retrieval) or use [this example to combine agents with the Chroma vector store](https://python.langchain.com/docs/modules/agents/how_to/agent_vectorstore)