###### Source: 'Building LLMs for Production' by Louis-Francois Bouchard, Louie Peters

#### 0. Install related packages 

In [None]:
!pip install langchain==0.0.208 deeplake openai==0.27.8 tiktoken 

#### 1. Create a LangChain Object

In [1]:
from langchain.document_loaders import TextLoader 

text =""" Google opens up its AI language model PaLM to challenge OpenAI and GPT-3 Google offers developers access to one of its most advanced AI language models: PaLM. The search giant is launching an API for PaLM alongside a number of AI enterprise tools it says will help businesses "generate text, images, code, videos, audio, and more from simple natural language prompts."

PaLM is a large language model, or LLM, similar to the GPT series created by OpenAI or Meta's LLaMA family of models. Google first announced PaLM in April 2022. Like other LLMs, PaLM is a flexible system that can potentially carry out all sorts of text generation and editing tasks. You could train PaLM to be a conversational chatbot like ChatGPT, for example, or you could use it for tasks like summarizing text or even writing code. (It's similar to features Google also announced today for its Workspace apps like Google Docs and Gmail.)
"""

# Write text to local file 
with open('my_file.txt','w') as file: 
    file.write(text)

In [2]:
# Use TextLoader to load text from local file 

loader = TextLoader('my_file.txt')      # Create an instance of TextLoader class 
docs_from_file = loader.load()          # Call the load method to load the text from the file

print(len(docs_from_file))              # Print the number of documents loaded from the file

1


#### 2. Split the documents into Chunks with `CharacterTextSplitter`

`Chunk_overlap` is the number of characters that overlap between two chunks. 

>It preserves context and improves coherence by ensuring that important information is not cut off at the boundaries of chunks.

In [3]:
from langchain.text_splitter import CharacterTextSplitter 

# Create a text splitter instance 
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=20)
# Split the text into chunks
docs = text_splitter.split_documents(docs_from_file)
print(len(docs))

Created a chunk of size 369, which is longer than the specified 200


2


#### 3. Setup a vector store & Create an embedding for each chunk

> A vector store is a system to store embeddings, allowing us to query them.

> Chunking is done because LLMs typically have a limited context window, and storing smaller chunks improves retrieval accuracy and ensures relevant sections are retrieved.

We'll utilize the Deep Lake vector store, offered by Activeloop. They provide a cloud-based vector store solution, but other options like Chroma DB would also be suitable.

In [4]:
from langchain.embeddings import OpenAIEmbeddings

# Before executing the following code, make sure to have
# your OpenAI key saved in the "OPENAI_API_KEY" environment variable.
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

In [5]:
from langchain.vectorstores import DeepLake 

# Signup and Get your DeepLake API key  (https://app.activeloop.ai/), then save API Key / Secret before executing the following code 

# create a DeepLake dataset 

my_activeloop_org_id = "bichpham102" # TODO: use your organization id here. (by default, org id is your username)
my_activeloop_dataset_name = 'langchain_course_indexers_retrievers'

dataset_path = f'hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}'
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)
db.add_documents(docs)
# the documents (docs) and their corresponding embeddings will be generated and stored in the new DeepLake dataset

Deep Lake Dataset in hub://bichpham102/langchain_course_indexers_retrievers already exists, loading from the storage


Creating 2 embeddings in 1 batches of size 2:: 100%|██████████| 1/1 [00:26<00:00, 26.99s/it]

Dataset(path='hub://bichpham102/langchain_course_indexers_retrievers', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (6, 1536)  float32   None   
    id        text      (6, 1)      str     None   
 metadata     json      (6, 1)      str     None   
   text       text      (6, 1)      str     None   





['252d5ad8-7839-11ef-93ec-000d3ac95fb9',
 '252d5c22-7839-11ef-93ec-000d3ac95fb9']

#### 4. Create & work with a LangChain retriever

In [6]:
retriever = db.as_retriever() 
# calling the method as_retriever() on the vector store instance 

> Once created the retriever, we can use the `RetrievalQA` class to define a question answering chain using external data source and start with question-answering. 

### 1. **`chain_type='stuff'`**
   - **How it works**: This method takes **all retrieved documents** and "stuffs" them into a single prompt to the language model (LLM). It concatenates the documents and the query into one large input and then sends that to the LLM to generate a response.
   - **Use case**: Suitable when the **total size of documents is small** enough to fit within the LLM’s context window (the maximum input size the model can process).
   - **Advantages**: It's **simple and fast** because it only makes one call to the LLM.
   - **Limitations**: This approach **fails** when the combined document size exceeds the context length of the model, leading to truncation or incomplete processing of the documents.
   - **Best for**: Short documents or queries that need to be processed all at once【42†source】【45†source】.

### 2. **`chain_type='map-reduce'`**
   - **How it works**: Once the relevant chunks are retrieved from the vector store, the map-reduce process treats each retrieved chunk **independently**. It runs the LLM on each chunk to generate partial answers (the map step), and then **aggregates** (reduces) these partial results to form a final answer (the reduce step).
   - **Use case**: Ideal for scenarios where the documents are **too large** to fit into the context window at once.
   - **Advantages**: It allows for **scaling** to larger datasets because each chunk can be processed separately. It reduces the risk of important information being missed due to context limitations.
   - **Limitations**: This approach may lead to **inconsistencies** across chunks or loss of context between parts of the document.
   - **Best for**: Handling larger documents that exceed the model’s context window【43†source】【44†source】.

### 3. **`chain_type='refine'`**
   - **How it works**: In this method, the LLM processes the first document, generates an answer, and then **iteratively refines** that answer by incorporating information from each subsequent document. Each document is used to improve or adjust the previous answer.
   - **Use case**: Best for situations where you want the model to **build upon** or **refine** an answer by looking at documents one-by-one.
   - **Advantages**: This method is beneficial when **context continuity** is essential because the model refines its response with each new document. It ensures that **new insights** from each document are incorporated into the final answer.
   - **Limitations**: It can be **slow** as the model processes each document individually and performs multiple iterations.
   - **Best for**: Complex queries where iterative improvements in the answer based on additional information are crucial【42†source】【45†source】.

### Summary of Use Cases:
- **`stuff`**: Use when documents are small enough to fit into the context window. Simple and fast but limited by input size.
- **`map-reduce`**: Ideal for large documents that need to be broken into chunks. Efficient for handling large datasets but may lead to slight inconsistencies.
- **`refine`**: Best for complex answers that require refinement over time by integrating information from multiple documents. Provides thoroughness but is slower due to multiple iterations.

These methods allow you to tailor the retrieval and answering process based on the size and complexity of the documents you're working with.

In [7]:
from langchain.chains import RetrievalQA 
from langchain.chat_models import ChatOpenAI


llm = ChatOpenAI(model_name='gpt-3.5-turbo') 

# create a retrieval chain 
qa_chain = RetrievalQA.from_chain_type(
    llm=llm
    ,chain_type='stuff'
    ,retriever=retriever 
)

In [8]:
query = 'How Google plans to challenge OpenAI?'
response = qa_chain.run(query)
print(response)

Google plans to challenge OpenAI by offering developers access to its advanced AI language model, PaLM. By launching an API for PaLM and providing AI enterprise tools, Google aims to help businesses generate text, images, code, videos, audio, and more from simple natural language prompts. PaLM is designed to be a flexible system that can carry out various text generation and editing tasks, similar to the GPT series created by OpenAI.


In [15]:
from langchain.retrievers import ContextualCompressionRetriever 
from langchain.retrievers.document_compressors import LLMChainExtractor 
from langchain.chat_models import ChatOpenAI

# create a GPT-3 wrapper instance 
llm = ChatOpenAI(model_name='gpt-3.5-turbo') 

# create compressor for the retriever 
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor
    ,base_retriever=retriever
)

In [16]:
# retrieving compressed documents 
query = 'How Google plans to challenge OpenAI?'
retrieved_docs = compression_retriever.get_relevant_documents(query)
print(retrieved_docs[0].page_content)

Google opens up its AI language model PaLM to challenge OpenAI
