## TODO:
- I need to make a function that wraps the creation of a langchain chroma retriever

In [1]:
# specify your working directory
working_dir = "/Users/pietromascheroni/open-modular-rag"

In [2]:
from dotenv import load_dotenv
import os
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from torch import cuda
from typing import Callable
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import chromadb

import pandas as pd
import re
import string

In [3]:
embed_model_id = 'sentence-transformers/all-mpnet-base-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# Initialize embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32},
    cache_folder=working_dir + '/emb_model'
)



In [4]:
# ChromaDB setup to initilize collection including indeces of all documents
# (in case of errors, perform pip uninstall chromadb and pip install chromadb)
chroma_client = chromadb.PersistentClient(path=working_dir + "/vectordb")

In [5]:
# provide a name to setup and reference the vector index
collection_name = "more_agents_paper_self_rag"
# initialize the vector index with the respective similarity search metric
vectorstore = chroma_client.get_or_create_collection(collection_name, metadata={"hnsw:space": "cosine"})

In [6]:
print(f"We have {vectorstore.count()} chunks in the vector store")

We have 139 chunks in the vector store


In [7]:
# Load the vectordb as a langchain object
langchain_chroma = Chroma(
    client=chroma_client,
    collection_name=collection_name,
    embedding_function=embedding_model,
)

print("There are", langchain_chroma._collection.count(), "in the langchain-formatted collection")

There are 139 in the langchain-formatted collection


In [12]:
# Query the database
query = "improve performance of LLMs"
docs = langchain_chroma.similarity_search(query, k=5)

# print results
for i, doc in enumerate(docs):
    print("chunk", i, ":", doc.page_content)

chunk 0 : across a wide range of tasks. Surpris- ingly, a brute-force ensemble of smaller LLMs can achieve comparable or superior performance to larger LLMs, with a nutshell shown in Figure 1, which will be further expanded in later sections. Moreover, by combining our method with other existing methods, we find the performance can be further improved. By comparing with the performance of complicated
chunk 1 : task designed to isolate each one. Consider the task detailed below: To start the analysis, we select two datasets with increasing difficulty, i.e., GSM8K and MATH, to calculate the rela- tive performance gain. The relative performance gain η is given by: η = Pm−Ps where Pm and Ps are the perfor- mances (accuracy) with our method and a single LLM query, respectively. The results are shown in
chunk 2 : compared with the result of GPT-4 with K = 32, the hierarchical method improves performance from 35% to 47%, suggesting the deployment of different LLMs at the corresponding level o

In [22]:
retriever = langchain_chroma.as_retriever(
    search_kwargs={'k': 1, 'filter': {'Page': '1'}}
    )

retriever.invoke(query)

[Document(page_content='Wu et al., 2023). In these works, multiple LLM agents are used to improve the performance of LLMs. For instance, LLM-Debate (Du et al., 2023) employs multiple LLM agents in a debate form. The reasoning performance is improved by creating a frame- work that allows more than one agent to “debate” the final answer of arithmetic tasks. They show performance im- provements compared to using one single', metadata={'Last Modified': '2024-05-02T21:13:10', 'Page': '1', 'Source': '/Users/pietromascheroni/open-modular-rag/docs/2402.05120v1.pdf'})]

In [21]:
langchain_chroma._collection.metadata

{'hnsw:space': 'cosine'}