<h3 style="text-align:center">Garrett Zimmerman</h3>
<h3 style="text-align:center">January 6, 2024</h3>
<h1 style="font-weight:bold;text-align:center">Concepts of Rag  Retrieval</h1>
<h2 style="font-weight:bold">Goal:</h2>
<p>
<dl>
    <dt style="font-weight:bold">RAG Retrieval:</dt> 
    <dd>Review the concept of RAG Retrieval<dd> 
    <dd>Question and Answering vs. Fine Tuning a Model<dd>
    <dd>Advanced  Retrieval vs. Basic Retrieval</dd>
    <dt style="font-weight:bold">Build Several Examples:</dt> 
    <dd>Explain different advance retrieval methods
        <ul>
            <li>LlamaIndex</li>
            <li>LangChain</li>
            <li>Pinecone</li>
            <li>ChromaDB</li>
        </ul>
    </dd> 
</dl>
</p> 

<h2 style="font-weight:bold">RAG (Retrieval-Augmented Generation) Retrieval:</h2> 
<p>A method used in natural language processing that combines the retrieval of relevant documents with a generative language model to enhance the model's ability to provide informative and accurate responses. This method is particularly useful for question-answering tasks and other applications where having access to a broad range of external information is beneficial.</p> 
<h3>Here's a breakdown of how RAG retrieval works</h3>
<dl>
    <dt>1. Retrieval Component</dt> 
    <dd>The first step in RAG retrieval involves retrieving relevant documents or information. When a query or question is posed, the system searches a large database of texts to find the most relevant documents. This database can be anything from a simple collection of articles to a comprehensive knowledge base.</dd>
    <dt>2. Generative Component</dt> 
    <dd>Once the relevant documents are retrieved, a generative language model, like GPT (Generative Pretrained Transformer), is used. This model takes the input query and the retrieved documents as context to generate a response.</dd>
    <dt>3. Combination of Retrieval and Generation</dt> 
    <dd>The key aspect of RAG retrieval is how it combines these two components. The retrieved documents provide the model with specific, detailed information relevant to the query, which the model might not have in its pre-trained knowledge. The generative model then synthesizes this information to create a coherent and contextually appropriate response.<dd>
    <dt>3. Benefits<dt> 
    <dd>This approach allows the language model to answer questions or provide information that is more up-to-date, detailed, and specific than what it could generate based solely on its pre-trained knowledge. It's particularly useful for queries where the answer might not be common knowledge or is very specific.</dd>
    <dt>4. Applications</dt> 
    <dd>RAG retrieval is commonly used in advanced chatbots, question-answering systems, and research tools. It's especially valuable in situations where keeping up with the latest information or covering a vast range of topics is essential.</dd>
    <dt>Example:</dt> 
    <dd>Suppose someone asks a question about a recent scientific discovery. A RAG retrieval system would first find relevant scientific articles or papers about that discovery, then use a generative model to construct an answer that accurately reflects the current understanding as presented in those documents</dd>
</dl>

<h2 style="font-weight:bold">Question and Answering vs. Fine-Tuning a Model</h2>
<h3>Question and Answering (Q&A):</h3>
<dl>
    <dt>Definition:</dt> 
    <dd>Question and Answering in the context of AI and language models refers to the process where a model responds to queries posed in natural language. The goal is to provide accurate, relevant, and concise answers based on the model's training and knowledge.</dd>
    <dt>Usage:<dt> 
    <dd>Q&A systems are widely used in chatbots, virtual assistants, and information retrieval systems. They are designed to understand a wide range of questions and provide answers that are drawn from their training data or real-time data sources.</dd>
    <dt>Mechanism:</dt> 
    <dd>These systems typically use pre-trained models like GPT-3, which have been trained on vast amounts of text data, enabling them to generate responses based on patterns and information they have learned.</dd>
<dl>
<h3>Fine-Tuning a Model:</h3>
<dl>
    <dt>Definition:</dt> 
    <dd>Fine-tuning refers to the process of taking a pre-trained language model and further training it on a specific dataset to specialize its responses for a particular domain or task.</dd>
    <dt>Usage:</dt> 
    <dd>Fine-tuning is common when you need a model to perform well on a specific type of data or task that was not the primary focus of the original, broader training. For instance, fine-tuning a model for medical Q&A, legal advice, or technical support.</dd>
    <dt>Mechanism:</dt>
    <dd>During fine-tuning, the model's weights are slightly adjusted so that it becomes more adept at understanding and generating responses relevant to the specialized domain. This process requires a smaller dataset and less computational power compared to training a model from scratch.</dd>
</dl>
<h3>Summary:</h3>
<p>Q&A systems provide direct answers to user queries using pre-trained or fine-tuned models, while fine-tuning tailors a model to specific domains or tasks</p>

<h2 style="font-weight: bold">Advanced Retrieval vs. Basic Retrieval<h2>
<h3>Basic Retrieval:</h3>
<dl>
    <dt>Definition:</dt> 
    <dd>Basic retrieval involves straightforward methods of finding information in response to a query. This often involves keyword matching, where the system looks for documents or data entries that contain the same words or phrases as the query.</dd>
    <dt>Limitations:</dt> 
    <dd>Basic retrieval can be limited in handling complex queries, understanding context, or providing nuanced responses. It's generally less effective when dealing with ambiguous or multi-faceted questions.</dd>
</dl>
<h3>Advanced Retrieval:</h3>
<dl>
    <dt>Definition:</dt> 
    <dd>Advanced retrieval encompasses more sophisticated techniques that go beyond simple keyword matching. These methods might include semantic search, contextual understanding, and the integration of AI models.</dd>
    <dt>Features:
        <ul>
            <dl>
                <li><dt>Semantic Understanding:</dt></li> 
                <dd>Advanced systems understand the meaning behind words in a query, allowing them to retrieve information that is conceptually related, even if it doesn’t contain the exact keywords.</dd>
                <li><dt>Context Awareness:</dt></li> 
                <dd>They can consider the context of a query, providing more relevant and precise results. For example, understanding the user's previous queries or the broader topic at hand.</dd>
                <li><dt>Integration with AI:</dt></li> 
                <dd>Advanced retrieval often involves the use of AI models like neural networks, which can process and understand natural language at a more sophisticated level.</dd>
            </dl>
       </ul> 
    </dt>
    <dt>Applications:</dt> 
    <dd>Advanced retrieval is essential in complex domains where queries require deep understanding and nuanced responses, such as in legal research, academic literature search, and specialized information databases.</dd>
</dl>
<h3>Summary:</h3>
<p>Basic retrieval relies on simple keyword matching, whereas advanced retrieval uses more sophisticated techniques for a deeper understanding and contextual awareness in information retrieval</p>

<h2 style="text_weight:bold">Examples</h2>
<h3 style="text_weight:bold">LlamaIndex</h3>
<p>LlamaIndex uses RAG Retrival. By using RAG Retrival LlamaIndex overcomes some weaknesses of the fine tuning approach:</p>
<ul>
    <li>There’s no training involved, so it’s cheap</li>
    <li>Data is fetched only when you ask for them, so it’s always up to date</li>
    <li>LlamaIndex can show you the retrieved documents, so it’s more trustworthy</li>
</ul>
<h4 style="text_weight:bold">Tools Provided by LlamaIndex</h4>
<dl>
    <dt>Data Connectors</dt> 
    <dd>Ingest your existing data from their native source and format. These could be APIs, PDFs, SQL, and (much) more</dd>
    <dt>Data Indexes</dt> 
    <dd>Structure your data in intermediate representations that are easy and performant for LLMs to consume</dd>
    <dt>Engines provide natural language access to your data</dt>
    <dd>For example: Query engines are powerful retrieval interfaces for knowledge-augmented output</dd>
    <dd>For example: Chat engines are conversational interfaces for multi-message, “back and forth” interactions with your data</dd>
    <dt>Data Agents</dt>
    <dd>LLM-powered knowledge workers augmented by tools, from simple helper functions to API integrations and more</dd>
    <dt>Application Integrations</dt>
    <dd>Tie LlamaIndex back into the rest of your ecosystem. This could be LangChain, Flask, Docker, ChatGPT, or… anything else<dd>
</dl>
<h4>Documentation</h4>
<p>To find out more about LlamaIndex and to see documentation please visit: <a href=https://docs.llamaindex.ai/en/stable>LlamaIndexDoc</a></p>
<h4>Getting Started in LlamaIndex</h4>
<p>Run this Line in the terminal to install LlamaIndex: pip install llama-index</p>
<p>In the following example we are using LlamaIndex to find out information about the author in a currently written piece. Make sure the folder data is in the same directory of the python script or the Jupyter Notebook for this script to work. If the data file is not in the same directory make sure to update to file location</p>


In [7]:
#Getting Imports and Uploading API Key. The API key should be stored in a the same directory and named .env for this script to work. If API key is stored elsewhere update code to find file location  
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)
import os
from dotenv import load_dotenv
import logging
import sys

# Load the environment variables from .env file
load_dotenv()

api_key=os.environ.get("OPENAI_API_KEY"),

if api_key is None:
    raise ValueError("API_KEY is not set in the environment variables")



In [9]:
# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

The author mentioned that before college, they worked on writing and programming. They wrote short stories and tried writing programs on the IBM 1401 computer. They also mentioned getting a microcomputer, a TRS-80, and started programming on it. They wrote simple games, a program to predict rocket heights, and even a word processor.


<h3 style="font-weight=bold">LangChain</h3>
<p>LangChain is a framework for developing applications powered by language models. It enables applications that:
<ul>
<li>Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)</li>
<li>Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)</li>
</ul>
This framework consists of LangChain Libraries, LangChain Templates, LangServe, and LangSmith.</p>
<p>Together, these products simplify the entire application lifecycle:
<dl>
    <dt>Develop:</dt> 
    <dd>Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.</dd>
    <dt>Productionize:</dt> 
    <dd>Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy with confidence.</dd>
    <dt>Deploy:</dt> 
    <dd>Turn any chain into an API with LangServe.</dd>
</dl>
For more information and documentaion of LangChain visit: <a href=https://python.langchain.com/docs/get_started/introduction>LangChain Doc</a>