This notebook sets up a question answering system using LangChain, Google Generative AI embeddings, and a Chroma vector database. Here's a breakdown:

1. **Data Preparation:**
    - Loads Python code files from a directory.
    - Splits these code files into smaller, more manageable chunks.
    - Embeds these chunks using Google's embedding model.

2. **Vector Database:**
    - Checks if a local Chroma database exists. 
    - If not, creates one and stores the embedded code chunks.
    - If it exists, loads the database from disk.

3. **Retrieval System Setup:**
    - Configures a retriever to search for relevant code chunks in the database using the MMR (Maximal Marginal Relevance) method.

4. **Language Model Configuration:**
    - Initializes Google's `gemini-1.5-flash` language model for generating responses.
    - Defines safety settings to control potentially harmful content in the model's output.

5. **Question Answering Chain:**
    - Constructs a multi-step chain:
        - Takes user questions and the conversation history as input.
        - Generates relevant search queries using the language model.
        - Retrieves relevant code chunks from the database.
        - Uses the retrieved context and language model to formulate an answer.

6. **Testing:**
    - Tests the question-answering system with sample questions about Python code concepts.
    - Prints both the questions and the generated answers.

In essence, this notebook sets up a system that can answer questions about a codebase by intelligently searching for and understanding relevant code snippets, providing a powerful tool for code exploration and assistance. 


In [1]:
# Import necessary libraries
import os  # Used for interacting with the operating system
from dotenv import load_dotenv  # Used to load environment variables from a .env file

load_dotenv()  # Load environment variables

True

In [2]:
# Import LangChain libraries
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_text_splitters import Language
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_chroma import Chroma

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from google.generativeai.types.safety_types import (
    HarmBlockThreshold,
    HarmCategory
)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Define the directory to persist the database
PERSIST_DIRECTORY = "/home/rafael/Downloads/qa-with-rag/src/data/"  

In [4]:
# !wget https://github.com/langchain-ai/langchain/archive/refs/heads/master.zip
# !unzip /home/rafael/Downloads/qa-with-rag/notebooks/master.zip
# !rm -rf /home/rafael/Downloads/qa-with-rag/notebooks/langchain-master/.github

In [5]:
# --- Document Loading and Processing ---
# Load documents from the filesystem
loader = GenericLoader.from_filesystem(
    "/home/rafael/Downloads/qa-with-rag/notebooks/langchain-master/libs/core/langchain_core",  # Path to the files
    glob="**/*",  # Load all files within the directory
    suffixes=[".py"],  # Load only Python files
    exclude=["**/non-utf8-encoding.py"],  # Exclude this specific file
    parser=LanguageParser(language=Language.PYTHON, parser_threshold=500),  # Use a parser for Python files with a threshold of 500 tokens
)
documents = loader.load()  # Load the documents
len(documents)  # Print the number of documents loaded

413

In [6]:
# --- Split Documents into Smaller Chunks ---
# Initialize the text splitter for Python code
python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, chunk_size=2000, chunk_overlap=200
)
texts = python_splitter.split_documents(documents)  # Split the documents into smaller texts
len(texts)  # Print the number of texts after splitting

1288

In [7]:
# --- Create or Load the Vector Database ---
# Initialize the Google embeddings model
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Check if the database already exists
if not os.path.exists(PERSIST_DIRECTORY):
    print(f"Creating database in {PERSIST_DIRECTORY}...")
    # Create the Chroma database and persist the documents
    db = Chroma.from_documents(
        documents=texts,
        embedding=embeddings,
        persist_directory=PERSIST_DIRECTORY
    )
    db.persist()  # Save the database to disk
else:
    print(f"Loading existing database from {PERSIST_DIRECTORY}...")
    # Load the existing database from disk
    db = Chroma(persist_directory=PERSIST_DIRECTORY, embedding_function=embeddings)

Loading existing database from /home/rafael/Downloads/qa-with-rag/src/data/...


I0000 00:00:1721576089.253500  469752 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
I0000 00:00:1721576089.263872  469752 check_gcp_environment.cc:61] BIOS data file does not exist or cannot be opened.


In [8]:
# --- Retrieval Chain Configuration ---
# Define the retriever to search for relevant documents in the database
retriever = db.as_retriever(
    search_type="mmr",  # Use the MMR (Maximal Marginal Relevance) search method
    search_kwargs={"k": 8},  # Return the 8 most relevant documents
)

In [9]:
# --- Language Model (LLM) Configuration ---
# Initialize the ChatGoogleGenerativeAI language model
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",  # Specify the 'gemini-1.5-flash' language model
    temperature=0,  # Set the temperature to 0 (more deterministic responses)
    top_k=10,  # Consider the top 10 most likely tokens during text generation
    safety_settings={  # Define the safety settings for the model
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    }
)

In [10]:
# --- Question Answering Chain Creation ---
# Define a prompt to generate search queries based on the conversation history
prompt = ChatPromptTemplate.from_messages(
    [
        ("placeholder", "{chat_history}"),  # Use the conversation history as context
        ("user", "{input}"),  # User input
        (
            "user",
            "Given the above conversation, generate a search query to look up to get information relevant to the conversation",
        ),  # Instruction to generate the search query
    ]
)
# Create a history-aware retriever chain
retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

# Define a prompt to provide context to the language model
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user's questions based on the below context:\n\n{context}",
        ),  # Instruction for the model to use the provided context
        ("placeholder", "{chat_history}"),  # Use the conversation history as additional context
        ("user", "{input}"),  # User input
    ]
)
# Create a document chain to format retrieved documents
document_chain = create_stuff_documents_chain(llm, prompt)

# Create the question answering chain by combining the retriever chain and document chain
qa = create_retrieval_chain(retriever_chain, document_chain)

In [11]:
# --- Question Answering Chain Testing ---
# Define a test question
question = "What is a RunnableBinding?"
# Run the question answering chain with the question
result = qa.invoke({"input": question})

In [12]:
# Define a list of questions to test
questions = [
    "What classes are derived from the Runnable class?",
    "What one improvement do you propose in code in relation to the class hierarchy for the Runnable class?",
]
# Iterate over the list of questions, run the question answering chain, and print the results
for question in questions:
    result = qa.invoke({"input": question})
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")
print(result["answer"])

-> **Question**: What classes are derived from the Runnable class? 

**Answer**: The provided code snippet doesn't explicitly list all classes derived from `Runnable`, but it does show several classes that inherit from it either directly or indirectly:

**Directly Inheriting from Runnable:**

* **RunnableSequence:** This class represents a sequence of Runnables.
* **RunnableParallel:** This class represents a parallel execution of Runnables.
* **RunnableGenerator:** This class represents a Runnable that generates a sequence of outputs.
* **RunnableLambda:** This class represents a Runnable that wraps a lambda function.

**Indirectly Inheriting from Runnable:**

* **RunnableEach:** This class inherits from `RunnableEachBase`, which in turn inherits from `RunnableSerializable`, which inherits from `Runnable`.
* **RunnableBinding:** This class inherits from `RunnableBindingBase`, which in turn inherits from `RunnableSerializable`, which inherits from `Runnable`.

**Other Relevant Classes: