**LangChain: Basic concepts recap**


1. Preprocessing the data

LangChain provides tools that help structure documents for convenient use with LLMs. Document loaders simplify the process of loading data into documents and text splitters break down length pieces of text into smaller chunks for better processing. Finally, indexing involvides creating a structured database of information that the langiage model can query to enhance its understanding and responses.

1.1 Document Loaders
* loading documents into structured data, into `Document` objects
* over 100 loaders and integrations with AirByte and Unstructured, etc.
* CSVLoader - each row into a separate `Document`
* TextLoader - text files
* DirectoryReader - files in a directory
* UnstructuredMarkdownLoader - markdown files
* PyPDFLoader - loading PDF files
* WikipediaLoader - content of specified Wikipedia page
* UnstructuredURLLoader - reading from public web pages
* Proprietary Loaders - additional authentication/setup required, e.g. GoogleDriveLoader, MongodbLoader

In [None]:
from langchain.document_loaders import CSVLoader, WikipediaLoader

# CSV file
loader = CSVLoader('./data/data.csv')
documents = loader.load()

# access content and metadata
for document in documents:
    content = document.page_content
    metadata = document.metadata

# Wikipedia page
loader = WikipediaLoader('Machine_learning')
document = loader.load()

1.2 Document transformers (chunking methods)
* splitting documents into smaller segments
* necessary owing to context window limitations, e.g. GPT-4 (8000), ada-002 (8000)
* Transformation strategies :

1.2.1 Fixed-size chunks - fixed size sufficient for semantically meaningful paragraphs and some overlap, ensuring continuity and context preservation, improving coherence and accuracy, e.g. `CharacterTextSplitter` splits  every N characters/tokens if configured

1.2.2. Variable-sized chunks - partition based on content characteristics, such as EoS punctuation marks, EoL markers or features in NLP libraries, ensures preservation of coherent and contextually intact content, e.g. `RecursiveCharacterTextSplitter`

1.2.3 Customized chunking - you may want to append document title to middle chunks to prevent context loss, e.g. `MarkdownHeaderTextSplitter`

* drawback is the risk of losing vital context related to the overall document, each chunk may only partially capture the nuances and interconnected elements present in the full text, leading to fragmented understanding

1.3 Indexing
* storing and organizing data into a vector store, essential for efficient retrieval

2. Models

2.1 LLMs
* `LLM` class for interfacing with various model providers such as OpenAI, Cohere and HuggingFace.
* Chatmodels are a variation of language models that use a message-based i/p and o/p system, has three types of messages -

2.1.1 `SystemMessage` - sets the behaviour and objectives of the chat model

2.1.2 - `HumanMessage` - input prompts

2.1.3 `AIMessage` - responses from the model 

In [None]:
import os
os.environ['OPENAI_API_KEY'] = 'key'

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

chat = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

messages = [
    SystemMessage(
        content='You are a helpful assistant.'
    ),
    HumanMessage(
        content='What is the capital of France?'
    )
]
chat(messages)

2.2 Embedding models
* standardized interface for various embedding model providers like OpenAI, Cohere and HuggingFace
* transform text into vector representations, enabling semantic search through text similarity in vector space
* consistent o/p dimensionality, regardless of input length

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings()
embeddings = embedding_model.embed_documents(
    [
        'Hi there!',
        'Oh, hello!'
    ]
)
print('Number of documents embedded:', len(embeddings))
print('Dimension of each embedding:', len(embeddings[0]))

3. Vector Stores
* embeddings are high-dimensional vectors that capture the sementics of textual data
* traditional databases are not optimized for high-dimensional data, vector stores, on the other hand can effectively store and search these embeddings
* advantages of using vector stores in LangChain - speed, scalability and precision

3. Retrievers
* interfaces that return documents in response to a query - most basic being similarity metrics like cosine similarity
* LangChain offers more advanced retrieval strategies -

3.1 Parent document retriever - creates multiple embeddings, allowing to look up smaller chunks but return larger context, context from parent document is used for generating final answer

3.2 Self-query retriever - generate several filters based on the user prompt and apply them to the document metadata

4. Chains
* powerful, reusable components that can be linked together to perform complex tasks
* integrating prompt templates with LLMS using chains allows a powerful synergy, taking output of one LLM and using it as input for another makes it feasible to connect multiple prompts sequentially
* allows integrating LLMs with other components such as long-term memory and output guarding
* chains can enhance overall depth and quality of interactions
* two classes - `LLMChain` and `SequentialChain`

4.1 `LLMChain`
* simplest form of chain that transforms user inputs using a `PromptTemplate`
* the prompt object defines the parameters of our request to the model and determines the expected format of the output
* then we can use `LLMChain` to tie the prompt and model to make predictions

In [None]:
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.schema import StrOutputParser

template = '''List all the colours in a rainbow.'''

prompt = PromptTemplate(
    template=template,
    input_variables=[],
    output_parser=StrOutputParser()
)

chat = ChatOpenAI(
    model_name='gpt-3.5-turbo',
    temperature=0
)

llm_chain = LLMChain(
    prompt=prompt,
    llm=chat
)

llm_chain.predict()

In [None]:
# using LCEL
prompt = PromptTemplate.from_template(
    'List all the colours in a {item}.'
)

runnable = prompt | ChatOpenAI(temperature=0) | StrOutputParser()
runnable.invoke({'item': 'rainbow'})

4.2 `Sequential`
* making subsequent calls to an LLM
* especially beneficial for using the outputs of one call as the input for the next, streamlining the process and enabling complex interactions across various applications

In [None]:
from langchain.prompts import PromptTemplate

post_prompt = PromptTemplate.from_template(
    '''You are a business owner. Given the theme of a post, write a social media article.'

    Theme: {theme}
    Content: This is social mesia post based on the theme above.'''
)

review_prompt = PromptTemplate.from_template(
    '''You are an expert social media manager. Given the post, your job is to review it.
     
Social media post: {post}
Review from a Social Media Expert:'''
)

llm = ChatOpenAI(temperature=0)

chain = (
    {'post': post_prompt | llm | StrOutputParser()}
    | review_prompt
    | llm
    | StrOutputParser()
)

chain.invoke(
    {'theme': 'Having a black friday sale with 50% off on everything.'}
)

In [None]:
# printing both the post and the review
from langchain.schema.runnable import RunnablePassthrough

llm = ChatOpenAI(temperature=0)

post_chain = post_prompt | llm | StrOutputParser()
review_chain = review_prompt | llm | StrOutputParser()
chain = {'post': post_chain} | RunnablePassthrough.assign(review=review_chain)
chain.invoke(
    {
        'theme': 'Having a black friday sale with 50% off on everything.'
    }
)

5. Memory
* backbone for maintaining context in ongoing dialogues
* LangChain's Memory module addresses the issues that traditional conversational models face with maintaining context, by storing both input and output messages in a structured manner

**LlamaIndex: Precision and Simplicity in Information Retrieval**

Vector Stores

* storage of large, high-dimensional data and semantic search tools
* transcends traditional keyword matching be seeking information that aligns conceptually with the user query

Data Connectors
* `Readers` in LlamaIndex parse and convert data into a simplified `Document` representation consisting of text and basic metadata

Nodes
* once data is ingested as documents, it passes through a processing structure that transforms these documents into `Node` objects
* Nodes are smaller, more granular data units created from the original documents
* besides content, nodes also contain metadata and contextual information
* `NodeParser` class designed to convert the content of the documents into structured nodes automatically, `SimpleNodeParser` converts a list of document objects into nodes

In [None]:
from llama_index.node_parser import SimpleNodeParser

parser = SimpleNodeParser.from_defaults(chunk_size=512, chunk_overlap=20)
nodes = parser.get_nodes_from_documents(documents)

Indices

* initial step for storing information in a database, essentially transforms the unstructured data into embeddings that capture semantic meaning and optimize the data format so that it can be easily accessed and queried
* variety of index types -

1. Summary Index - extracts a summary from each document and stores it with all the nodes in that document
2. Vector Store Index - generates embeddings during index construction to identify the top-k most similar nodes in response to a query, suitable for small-scale applications and easily scalable to accomodate larger datasets using high-performance vector databases

Query Engines

* wrapper around a retriever and a response synthesizer
* uses the query string to fetch nodes and sends them to LLM to generate a response
* indexes can also function solely as retrievers for fetching documents relevant to the query
* enables creating of Custom Query Engine

Routers
* determine the most appropriate retriever for extracting context from the knowledge base
* selects the optimal query engine for each task, improving performance and accuracy

Saving and Loading indexes locally
* saving the index data, which includes nodes and their associated embeddings to disk
* done using the `persist()` method from the `storage_context` object related to the index

In [None]:
index.storage_context.persist()
# This saves the data in the 'storage' by default

# Let's see if our index already exists in storage.
if not os.path.exists("./storage"):
    # If not, we'll load the Wikipedia data and create a new index
	WikipediaReader = download_loader("WikipediaReader")
	loader = WikipediaReader()
    documents = loader.load_data(pages=['Natural Language Processing', 'Artificial Intelligence'])
    index = VectorStoreIndex.from_documents(documents)
    # Index storing
    index.storage_context.persist()
else:
    # If the index already exists, we'll just load it:
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context)

**Langchain vs. LlamaIndex**

LlamaIndex: processing, structuring and accessing private or domain-specific data with a focus on specific LLM interactions, tasks with high precision and quality, main strength is linking LLMs to any data source

Langchain: dynamic, suited for context-rich interactions, effective for chatbots and virtual assistants

**Example**

Requirements
1. Python - Version 3.7 or newer is required.
2. An active account on OpenAI, along with an OpenAI API key.
3. An Activeloop Deep Lake account, complete with a Deep Lake API key.
4. A 'classic' personal token from GitHub.

In [None]:
# Create a Python virtual environment
!python3 -m venv repo-ai
!source repo-ai/bin/activate
%pip install llama-index deeplake openai python-dotenv

**How does LLamaIndex work?**

In the context of leveraging LlamaIndex for data-driven applications, the underlying logic and workflow are pretty simple. Here's a breakdown:

Load Documents: The first step involves loading your raw data into the system. You can do this manually, directly inputting the data, or through a data loader that automates the process. LlamaIndex offers specialized data loaders that can ingest data from various sources, transforming them into Document objects, and you can find many plugins on Llama Hub. This is a crucial step as it sets the stage for the subsequent data manipulation and querying functionalities.
Parse the Documents into Nodes: Once the documents are loaded, they are parsed into Nodes, essentially structured data units. These Nodes contain chunks of the original documents and carry valuable metadata and relationship information. This parsing process is vital as it organizes the raw data into a structured format, making it easier and more efficient for the system to handle.

Construct an Index from Nodes or Documents: After the Nodes are prepared, an index is constructed to make the data searchable and queryable. Depending on your needs, this index can be built directly from the original documents or the parsed Nodes. The index is often stored in structures like VectorStoreIndex, optimized for quick data retrieval. This step is the system's heart, turning your structured data into a robust, queryable database.

Query the Index: With the index in place, the final step is to query it. A query engine is initialized, allowing you to make natural language queries against the indexed data. This is where the magic happens: you can conversationally ask the system questions, and it will sift through the indexed data to provide accurate and relevant answers.

In [None]:
GITHUB_TOKEN="YOUR_GH_CLASSIC_TOKEN"
OPENAI_API_KEY="YOUR_OPENAI_KEY"
ACTIVELOOP_TOKEN="YOUR_ACTIVELOOP_TOKEN"
DATASET_PATH="hub://YOUR_ORG/repository_vector_store"Copy

# main.py
import os
import textwrap
from dotenv import load_dotenv
from llama_index import download_loader
from llama_hub.github_repo import GithubRepositoryReader, GithubClient
from llama_index import VectorStoreIndex
from llama_index.vector_stores import DeepLakeVectorStore
from llama_index.storage.storage_context import StorageContext
import re

# Load environment variables
load_dotenv()

# Fetch and set API keys
openai_api_key = os.getenv("OPENAI_API_KEY")
active_loop_token = os.getenv("ACTIVELOOP_TOKEN")
dataset_path = os.getenv("DATASET_PATH")


def parse_github_url(url):
    pattern = r"https://github\.com/([^/]+)/([^/]+)"
    match = re.match(pattern, url)
    return match.groups() if match else (None, None)


def validate_owner_repo(owner, repo):
    return bool(owner) and bool(repo)


def initialize_github_client():
    github_token = os.getenv("GITHUB_TOKEN")
    return GithubClient(github_token)


def main():
    # Check for OpenAI API key
    openai_api_key = os.getenv("OPENAI_API_KEY")
    if not openai_api_key:
        raise EnvironmentError("OpenAI API key not found in environment variables")

    # Check for GitHub Token
    github_token = os.getenv("GITHUB_TOKEN")
    if not github_token:
        raise EnvironmentError("GitHub token not found in environment variables")

    # Check for Activeloop Token
    active_loop_token = os.getenv("ACTIVELOOP_TOKEN")
    if not active_loop_token:
        raise EnvironmentError("Activeloop token not found in environment variables")

    github_client = initialize_github_client()
    download_loader("GithubRepositoryReader")

    github_url = input("Please enter the GitHub repository URL: ")
    owner, repo = parse_github_url(github_url)

    while True:
        owner, repo = parse_github_url(github_url)
        if validate_owner_repo(owner, repo):
            loader = GithubRepositoryReader(
                github_client,
                owner=owner,
                repo=repo,
                filter_file_extensions=(
                    [".py", ".js", ".ts", ".md"],
                    GithubRepositoryReader.FilterType.INCLUDE,
                ),
                verbose=False,
                concurrent_requests=5,
            )
            print(f"Loading {repo} repository by {owner}")
            docs = loader.load_data(branch="main")
            print("Documents uploaded:")
            for doc in docs:
                print(doc.metadata)
            break  # Exit the loop once the valid URL is processed
        else:
            print("Invalid GitHub URL. Please try again.")
            github_url = input("Please enter the GitHub repository URL: ")

    print("Uploading to vector store...")

    # ====== Create vector store and upload data ======

    vector_store = DeepLakeVectorStore(
        dataset_path=dataset_path,
        overwrite=True,
        runtime={"tensor_db": True},
    )

    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)
    query_engine = index.as_query_engine()

    # Include a simple question to test.
    intro_question = "What is the repository about?"
    print(f"Test question: {intro_question}")
    print("=" * 50)
    answer = query_engine.query(intro_question)

    print(f"Answer: {textwrap.fill(str(answer), 100)} \n")
    while True:
        user_question = input("Please enter your question (or type 'exit' to quit): ")
        if user_question.lower() == "exit":
            print("Exiting, thanks for chatting!")
            break

        print(f"Your question: {user_question}")
        print("=" * 50)

        answer = query_engine.query(user_question)
        print(f"Answer: {textwrap.fill(str(answer), 100)} \n")


if __name__ == "__main__":
    main()

**Initialization and Environment Setup**

Import Required Libraries: The script starts by importing all the necessary modules and packages.

Load Environment Variables: Using dotenv, it loads environment variables stored in the .env file. This is where API keys and tokens are stored securely.

Helper Functions
1. parse_github_url(url): This function takes a GitHub URL and extracts the repository owner and name using regular expressions.
2. validate_owner_repo(owner, repo): Validates that both the repository owner and name are present.
3. initialize_github_client(): Initializes the GitHub client using the token fetched from the environment variables.

API Key Checks: Before proceeding, the script checks for the presence of the OpenAI API key, GitHub token, and Activeloop token, raising an error if any are missing.

Initialize GitHub Client: Calls initialize_github_client() to get a GitHub client instance.

User Input for GitHub URL: Asks the user to input a GitHub repository URL.

URL Parsing and Validation: Parses the URL to get the repository owner and name and validates them.

Data Loading: If the URL is valid, it uses GithubRepositoryReader from llama_index to load the repository data, specifically Python and Markdown files.

Indexing: The loaded data is then indexed using VectorStoreIndex and stored in a DeepLake vector store. This makes the data queryable.

Query Engine Initialization: Initializes a query engine based on the indexed data.

Test Query: Performs a test query to demonstrate the system's operation.

User Queries: Enters a loop where the user can input natural language queries to interact with the indexed GitHub repository. The loop continues until the user types 'exit'.

Execution Entry Point : The script uses the standard if `__name__ == "__main__":` Python idiom to ensure that `main()` is called when the script is executed directly.

For this project, we used the github_repo data loader from LLamaIndex, and you can find its documentation on the Llama Hub. When it comes to loading data from a GitHub repository, the star of the show is the GithubRepositoryReader class. This class is a powerhouse designed to fetch, filter, and format repository data for indexing. Let's break down its key components:

1. GitHub Client: You'll first notice that the initialized GitHub client is passed into GithubRepositoryReader. This client provides authenticated access to GitHub repositories.
2. Repository Details: Next, the repository owner and name are specified. These are extracted from the URL you input, ensuring the data is fetched from the correct source. This is to give a nice message in the console.
3. File Type Filters: One of the most flexible features here is the ability to specify which file types to load. In this example, we're focusing on Python, JavaScript, TypeScript, and Markdown files. The inclusion of Markdown files is so the reader will also pull in README files, offering valuable context for the language model's understanding.
4. Verbose Logging: If you're the kind of person who likes to see every detail, you can enable verbose logging. This will print out detailed logs of the data loading process.
5. Concurrent Requests: This is where you can speed things up. The number of concurrent requests specifies how many data-fetching operations will happen simultaneously. A word of caution, though: cranking this number up could make you hit GitHub's rate limits, so tread carefully.
Once all these parameters are set, the load_data() method swings into action. It fetches the repository data and neatly packages it into a list of Document objects, ready for the next stage— indexing.

Indexing: Transforming Data into Queryable Intelligence

Vector Store: The first thing that happens is the creation of a DeepLakeVectorStore. Think of this as a specialized database designed to hold vectors. It's a storage unit and an enabler for high-speed queries. The dataset_path parameter specifies where this vector store will reside, and the overwrite flag allows you to control whether existing data should be replaced. The parameter runtime={"tensor_db": True} specifies that the vector store will use the Managed Tensor Database for storage and query execution on the Deep Lake infrastructure.
The default here is to create a cloud vector DB. Instead, You can create a local vector DB by changing dataset_path to the name of the directory you want to use as DB. For example `dataset_path = "repository_db"`
The vector store will create a directory with the name specified in dataset_path.

Storage Context: Next up is the StorageContext, which essentially acts as a manager for the vector store. It ensures the vector store is accessible and manageable throughout the indexing process.
From Documents to Index: The VectorStoreIndex.from_documents() method is the workhorse here. It takes the list of Document objects you got from the GitHub repository and transforms them into a searchable index. This index is stored in the previously initialized DeepLakeVectorStore.
The Interactive Finale: Querying and User Engagement
After the meticulous data loading and indexing process, the stage is set for the user to interact with the system. This is where the query engine comes into play.

Query Engine: Once the index is built, a query engine is initialized using index.as_query_engine(). You'll interact with this engine when you want to search through the GitHub repository. It's optimized for speed and accuracy, ensuring your natural language queries return the most relevant results.
Introductory Question: The code starts with an introductory question: "What is the repository about?" This serves multiple purposes. It acts as a litmus test to ensure the system is operational and gives the user an immediate sense of what questions can be asked.
Formatting and Display: The answer is then formatted to fit within a 100-character width for better readability, thanks to Python's textwrap library.

User Input: The code enters an infinite loop, inviting the user to ask questions. The user can type any query related to the GitHub repository, and the system will attempt to provide a relevant answer.

Exit Strategy: The loop continues indefinitely until the user types 'exit', providing a simple yet effective way to end the interaction.
Query Execution: Each time the user asks a question, the query_engine.query() method is called. This method consults the index built earlier and retrieves the most relevant information.

Answer Presentation: Like the introductory question, the answer to the user's query is formatted and displayed. This ensures that regardless of the complexity or length of the solution, it's presented in a readable manner.

Now, you're fully equipped to index and query GitHub repositories. To tailor the system to your specific needs, pay attention to the filter_file_extensions parameter. This is where you specify which types of files you'd like to include in your index. The current setting—[".py", ".js", ".ts", ".md"]—focuses on Python, JavaScript, TypeScript, and Markdown files.

Consider storing these extensions in your .env file for a more dynamic setup. You can easily switch between different configurations without modifying the codebase. It's a best practice that adds a layer of flexibility to your system.

Run the command to start:

`python3 main.py`

The console will ask for a repository URL. To test, I used my repository explaining how to build a ChatGPT plugin using Express.js, so this only has .js extensions.

This is how the console will look during the interaction:

Copy
Please enter the GitHub repository URL: https://github.com/soos3d/chatgpt-plugin-development-quickstart-express
Loading chatgpt-plugin-development-quickstart-express repository by soos3d
Documents uploaded:
{'file_path': 'README.md', 'file_name': 'README.md'}
{'file_path': 'index.js', 'file_name': 'index.js'}
{'file_path': 'src/app.js', 'file_name': 'app.js'}
Uploading to vector store...
Your Deep Lake dataset has been successfully created!
Dataset(path='hub://YOUR_ORG/repository_db', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  -------
 embedding  embedding  (5, 1536)  float32   None
    id        text      (5, 1)      str     None
 metadata     json      (5, 1)      str     None
   text       text      (5, 1)      str     None
Test question: What is the repository about?
==================================================
Answer: The repository is a ChatGPT Plugin Quickstart with Express.js. It provides a foundation for
developing custom ChatGPT plugins using JavaScript and Express.js. The sample plugin in the
repository showcases how ChatGPT can integrate with external APIs, specifically API-Ninja's API, to
enhance its capabilities. The plugin fetches airport data based on a city name provided by the user.

Please enter your question (or type 'exit' to quit): how does the server work?
Your question: how does the server work?
==================================================
Answer: The server in this context works by setting up an Express.js server with various endpoints to serve
a ChatGPT plugin. It initializes the server, configures it to parse JSON in the body of incoming
requests, and sets up routes for serving the plugin manifest, OpenAPI schema, logo image, and
handling API requests. It also defines a catch-all route to handle any other requests. Finally, the
server starts and listens for requests on the specified port.

Please enter your question (or type 'exit' to quit): exit
Exiting, thanks for chatting!Copy
Diving Deeper: LlamaIndex's Low-Level API for customizations
While the high-level API of LlamaIndex offers a seamless experience for most use cases, there might be situations where you need more granular control over the query logic. This is where the low-level API shines, offering customizations to fine-tune your interactions with the indexed data.

Building the Index
To start, you'll need to build an index from your documents, the same as we have done so far.

Copy
    # Create an index of the documents
    try:
        vector_store = DeepLakeVectorStore(
            dataset_path=dataset_path,
            overwrite=True,
            runtime={"tensor_db": True},
    )
    except Exception as e:
        print(f"An unexpected error occurred while creating or fetching the vector store: {str(e)}")

    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)Copy
Configuring the Retriever
The retriever is responsible for fetching relevant nodes from the index. LlamaIndex supports various retrieval modes, allowing you to choose the one that best fits your needs:

Copy
from llama_index.retrievers import VectorIndexRetriever
retriever = VectorIndexRetriever(index=index, similarity_top_k=4)Copy
Let's break this down:

In modern retrieval systems, documents and queries are represented as vectors, often generated by machine learning models. When a query is made, its vector is compared to document vectors in the index using metrics like cosine similarity. The documents are then ranked based on their similarity scores to the query.

Top-k Retrieval: Instead of returning all the documents sorted by their similarity, often only the top k most similar documents are of interest. This is where similarity_top_k comes into play. If similarity_top_k=4, the system will retrieve the top 4 most similar documents to the given query.

In the context of the code:

Copy
retriever = VectorIndexRetriever(index=index, similarity_top_k=4)Copy
The VectorIndexRetriever is configured to retrieve the top 4 documents most similar to any given query.

Benefits of using similarity_top_k:

Efficiency: Retrieving only the top-k results can be faster and more memory-efficient than retrieving all results, especially when dealing with large datasets.
Relevance: Users are primarily interested in the most relevant results in many applications. By focusing on the top-k results, the system can provide the most pertinent information without overwhelming the user with less relevant results.
However, choosing an appropriate value for k is essential. Some relevant results might be missed if k is too small. If k is too large, the system might return more results than necessary, which could be less efficient and potentially less helpful to the user.

While creating this guide, I noticed that using a value above '4' for the parameter led the LLM to produce off-context responses.

Customize the query engine
In LLamaIndex, passing extra parameters and customizations to the query engine is possible.

Copy
from llama_index import get_response_synthesizer

response_synthesizer = get_response_synthesizer()
query_engine = RetrieverQueryEngine.from_args(
        retriever=retriever,
        response_mode='default',
        response_synthesizer=response_synthesizer,
        node_postprocessors=[
            SimilarityPostprocessor(similarity_cutoff=0.7)]
    )Copy
Let's break down this customization:

Getting the Response Synthesizer:

Copy
response_synthesizer = get_response_synthesizer()Copy
Here, the get_response_synthesizer function is called to get an instance of the response synthesizer. The query engine will use this synthesizer to combine and refine the information retrieved by the retriever.

Configuring the Query Engine:

Copy
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_mode='default',
    response_synthesizer=response_synthesizer,
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)]
)Copy
This section configures and initializes the query engine with the following components and settings:

retriever: This component fetches relevant nodes (or documents) based on the query. It's passed as an argument, and we set it up in the previous step.
response_mode='default': This sets the mode in which the response will be synthesized. The 'default' mode means the system will "create and refine" an answer by sequentially going through each retrieved node, making a separate LLM call for each node. This mode is suitable for generating more detailed explanations.
response_synthesizer=response_synthesizer: The previously obtained response synthesizer is passed to the query engine. This component will generate the final response using the specified response_mode.
node_postprocessors: This is a list of postprocessors that can be applied to the nodes retrieved by the retriever. A SimilarityPostprocessor with a similarity_cutoff of 0.7 is used in this case. This postprocessor filters out nodes based on their similarity scores, ensuring only nodes with a similarity score above 0.7 are considered.
All of those are optional except for the retriever; I recommend testing the various modes and values to find what is more suitable for your use case.

Exploring Different Response Modes
The RetrieverQueryEngine offers various response modes to tailor the synthesis of responses based on the retrieved nodes.

Here's a breakdown of some of the available modes:

Default Mode:
Copy
query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='default')Copy
In the default mode, the system processes each retrieved node sequentially, making a separate LLM call for each one. This mode is suitable for generating detailed answers.

Compact Mode:
Copy
query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='compact')Copy
The compact mode fits as many node text chunks as possible within the maximum prompt size during each LLM call. If there are too many chunks, it refines the answer by processing multiple prompts.

Tree Summarize Mode:
Copy
query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='tree_summarize')Copy
This mode constructs a tree from a set of node objects, and the query then returns the root node as the response. It's beneficial for summarization tasks.

No Text Mode:
Copy
query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='no_text')Copy
In the no-text mode, the retriever fetches the nodes that would have been sent to the LLM but doesn't send them. This mode allows for inspecting the retrieved nodes without generating a synthesized response.

How does LLamaIndex compare to LangChain?
Now that you understand how LLamaIndex works comparing it with the big-name LangChain is an excellent time.

LlamaIndex and Langchain are both frameworks/libraries designed to enhance the capabilities of large language models by allowing them to interact with external data. While they serve similar overarching goals, their approaches and features differ.

Key Features:

Llama Index:

Data Connectors: Offers a variety of connectors to ingest data, making it versatile for different data sources. LLama Hub is an excellent source of community-made tools.
Data Structuring: Allows users to structure data using various index types, such as list index, tree index, and even the ability to compose indices.
Query Abstraction: Provides layers of abstraction, enabling both simple and complex data querying.
Langchain:

Modular Design: Comes with modules for tools, agents, and chains, each serving a distinct purpose.
Chains: A series of steps or actions the language model takes, ideal for tasks requiring multiple interactions.
Agents: Autonomous entities that can decide the next steps in a process, adding a layer of decision-making to the model's interactions.
Tools: Utility agents are used to perform specific tasks, such as searching or querying an index.
Use Cases:

The Llama Index is best suited for applications that require complex data structures and querying. Its strength lies in handling and querying structured data.
LangChain, on the other hand, excels in scenarios that require multiple interactions with a language model, especially when those interactions involve decision-making or a series of steps.
Of course, you can combine the two; the potential of combining LlamaIndex and Langchain is promising. By integrating LlamaIndex's structured data querying capabilities with the multi-step interaction and decision-making features of Langchain, developers can create robust and versatile applications. As mentioned in the conclusion, This combination can offer the best of both worlds.

So which one to use? Use the tool that makes it easier to take care of your use cases; this is usually the leading factor in deciding which one I want to use.

Conclusion
In this comprehensive guide, we've journeyed through the intricacies of integrating LlamaIndex with Activeloop Deep Lake to create a conversational interface for GitHub repositories. We've seen how these powerful tools can transform a static codebase into an interactive, queryable entity. The synergy between LlamaIndex's data structuring and Deep Lake's optimized storage offers a robust solution for managing and understanding your code repositories.

The code we've explored indexes GitHub repositories and makes them accessible through natural language queries. This opens up a plethora of possibilities. Imagine a future where you don't just browse through code; you have conversations with it.

This gave you the basics of using LLamaIndex and Deep Lake; try to improve and customize this app to practice.