In [1]:
%load_ext autoreload
%autoreload 2

## Retrieval Augmented Generation (RAG)

Date: 03 May 2023

### What is RAG

[As defined by ChatGPT] 

RAG, or Retrieval Augmented Generation, is a method used in natural language processing that combines 
retrieval-based and generative models to produce more informed and contextually relevant responses. RAG operates in two main 
steps: retrieval and generation.

1. Retrieval: In this step, the model searches a large collection of documents or knowledge sources to find the most relevant 
and informative pieces of text that can help answer a given query. This is typically done using an efficient retrieval mechanism, 
such as dense vector representation or sparse term-based matching.

2. Generation: With the retrieved documents or passages in hand, the model generates a response to the input query, conditioning 
on both the original query and the information gathered from the relevant documents. This allows the model to produce more accurate, 
detailed, and context-aware responses.

In essence, RAG is a hybrid approach that leverages the strengths of both retrieval-based and generative models to provide better 
responses by incorporating external knowledge from large document collections or knowledge bases. 

[DC: Checks out!]

[Lewis et al.](https://arxiv.org/abs/2005.11401) introduced RAG models in 2020 as a model with parametric memory being the
pre-trained seq2seq model and the non-parametric memory a dense Wikipedia based vector index, accessed with a pre-trained neural 
retriever.


In this demo, we will be using LangChain along with HuggingFace's transformers library to build a RAG model.

### What is LangChain

LangChain is a framework for developing applications powered by language models. 

_LangChain Principles:_
- Be data-aware: connect a language model to other sources of data
- Be agentic: allow a language model to interact with its environment

Source: [LangChain documentation](https://python.langchain.com/en/latest/index.html)  
Version: 0.0.154 as of 02 May 2021

## Information Retrieval & Retrieval Augmented Generation

<img src="https://substackcdn.com/image/fetch/w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4063347e-8920-40c6-86b3-c520084b303c_1272x998.jpeg" alt= “” width="600" height="500">


<!-- ![](https://substackcdn.com/image/fetch/w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4063347e-8920-40c6-86b3-c520084b303c_1272x998.jpeg) -->

Source: [Finetuning Large Language Models by Sebastian Raschka](https://magazine.sebastianraschka.com/p/finetuning-large-language-models)

## Requirements (for this demo)
- Offline or Local model
- Initial model is somewhat basic
- HuggingFace Transformers
- Read documents and extract similar documents based on query
- Read documents and generate answer based on query

## LangChain: Models

Generic inferface for models e.g. LLMs, Chat Models, Text Embedding Models. Read more [here](https://python.langchain.com/en/latest/modules/models.html)

```python
from langchain.llms import OpenAI
llm = OpenAI(model_name="text-ada-001")

from langchain import HuggingFaceHub
llm = HuggHuggingFaceHub(repo_id="google/flan-t5-xl")

from langchain.llms import Cohere
llm = Cohere()
```

## Local Models: HuggingFace Pipeline

Note: This is experimental code, always use functional or OO abstraction for your implementation.

```python3
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM

model_id = 'flan-t5-large' # any local model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True) # 8bit in A10, A100 etc.

pipe = pipeline(
    task="text2text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)
```

In [2]:
from langchain.llms import HuggingFacePipeline
from langchain.retrievers.document_compressors import LLMChainExtractor
from transformers import pipeline

model_name = "google/flan-t5-xl"
# model_name = "hkunlp/instructor-large"


# flan-t5-large or flan-t5-xl are also good initial models
pipe = pipeline("text2text-generation", 
                model=model_name, 
                max_length=300)
local_llm = HuggingFacePipeline(pipeline=pipe)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = "hkunlp/instructor-large"

model_kwargs = {'device': 'cuda'}
instructor_embeds = HuggingFaceInstructEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs
)

load INSTRUCTOR_Transformer
max_seq_length  512


#TODO: Add slide on Flan T5

<img src="https://s3.amazonaws.com/moonup/production/uploads/1666363435475-62441d1d9fdefb55a0b7d12c.png" alt= “” width="700" height="500">

- Paper <https://arxiv.org/abs/2210.11416>
- HuggingFace Hub <https://huggingface.co/google/flan-t5-xl>

## LangChain: Indexes

Utility functions to combine own private data. Read more [here](https://python.langchain.com/en/latest/modules/indexes.html).

Now we are talking.

### [Document Loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html)

Email, Images, PDF, s3 Directory and Files, Word Documents, Powerpoints etc.

### [Text Splitters](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html)

Character Text Splitter, Recursive Character Text Splitter here.

### [VectorStores](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html)

ElasticSearch, FAISS, Qdrant, Redis, Weaviate

### [Retrievers](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)

Contextual Compression Retriever (!!!), SVM Retriever, TF-IDF Retriever, Time Weighted VectorStore Retriever etc.



In [4]:
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings, HuggingFaceEmbeddings
from langchain.document_loaders import PDFMinerLoader, TextLoader
from langchain.vectorstores import FAISS

def pretty_print_docs(docs):
    """ Helper function for printing docs."""
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

documents = PDFMinerLoader(file_path="./arxiv_papers/2203.02155.pdf").load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents=documents,)

Created a chunk of size 1385, which is longer than the specified 500
Created a chunk of size 671, which is longer than the specified 500
Created a chunk of size 1438, which is longer than the specified 500
Created a chunk of size 506, which is longer than the specified 500
Created a chunk of size 759, which is longer than the specified 500
Created a chunk of size 615, which is longer than the specified 500
Created a chunk of size 715, which is longer than the specified 500
Created a chunk of size 754, which is longer than the specified 500
Created a chunk of size 752, which is longer than the specified 500
Created a chunk of size 1078, which is longer than the specified 500
Created a chunk of size 802, which is longer than the specified 500
Created a chunk of size 1510, which is longer than the specified 500
Created a chunk of size 1607, which is longer than the specified 500
Created a chunk of size 503, which is longer than the specified 500
Created a chunk of size 973, which is longe

In [5]:
import os
from langchain.document_loaders import UnstructuredPDFLoader

# Load all GPT based papers
pdf_folder_path = f'./arxiv_papers/gpt4-papers/'
loaders = [PDFMinerLoader(os.path.join(pdf_folder_path, fn)) for fn in os.listdir(pdf_folder_path)]

## LangChain: Indexes

```python
query = <YOUR_QUERY_HERE>

from langchain.vectorstores import FAISS
db_faiss = FAISS.from_documents(documents=docs, 
                                embedding=embeddings)
docs = db_faiss.similarity_search(query)

from langchain.vectorstores import ElasticVectorSearch
db_elastic = ElasticVectorSearch.from_documents(documents=docs, 
                                                embedding=embeddings, 
                                                elasticsearch_url="http://localhost:9200")
docs = db_elastic.similarity_search(query)
```

In [6]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import FAISS 
# More details about FAISS: https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/

pdf_loader = PDFMinerLoader(file_path="./arxiv_papers/2203.02155.pdf")

index = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=instructor_embeds,
    text_splitter=CharacterTextSplitter(
        chunk_size=300,
        # chunk_overlap=0
    )
).from_loaders(loaders=[pdf_loader])

Created a chunk of size 1385, which is longer than the specified 300
Created a chunk of size 491, which is longer than the specified 300
Created a chunk of size 486, which is longer than the specified 300
Created a chunk of size 480, which is longer than the specified 300
Created a chunk of size 671, which is longer than the specified 300
Created a chunk of size 1438, which is longer than the specified 300
Created a chunk of size 506, which is longer than the specified 300
Created a chunk of size 759, which is longer than the specified 300
Created a chunk of size 615, which is longer than the specified 300
Created a chunk of size 449, which is longer than the specified 300
Created a chunk of size 471, which is longer than the specified 300
Created a chunk of size 495, which is longer than the specified 300
Created a chunk of size 715, which is longer than the specified 300
Created a chunk of size 754, which is longer than the specified 300
Created a chunk of size 752, which is longer t

In [7]:
query = "What is the best language model?"
results = index.query(question=query,
                      llm = local_llm, )
results

'a lightweight classifier'

In [8]:
index = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=instructor_embeds,
    text_splitter=CharacterTextSplitter(
        chunk_size=300,
        chunk_overlap=0
    )
).from_loaders(loaders=loaders)

Created a chunk of size 1385, which is longer than the specified 300
Created a chunk of size 491, which is longer than the specified 300
Created a chunk of size 486, which is longer than the specified 300
Created a chunk of size 480, which is longer than the specified 300
Created a chunk of size 671, which is longer than the specified 300
Created a chunk of size 1438, which is longer than the specified 300
Created a chunk of size 506, which is longer than the specified 300
Created a chunk of size 759, which is longer than the specified 300
Created a chunk of size 615, which is longer than the specified 300
Created a chunk of size 449, which is longer than the specified 300
Created a chunk of size 471, which is longer than the specified 300
Created a chunk of size 495, which is longer than the specified 300
Created a chunk of size 715, which is longer than the specified 300
Created a chunk of size 754, which is longer than the specified 300
Created a chunk of size 752, which is longer t

In [9]:
query = "What can you tell me about GPT from OpenAI?"
results = index.query(question=query,
                      llm = local_llm)
results

'GPT-4 substantially improves over previous models in the ability to follow user intent'

In [None]:
#TODO: Try querying with ElasticSearch

from langchain.vectorstores import ElasticVectorSearch
index = VectorstoreIndexCreator(
    vectorstore_cls=ElasticVectorSearch,
    embedding=instructor_embeds,
    text_splitter=CharacterTextSplitter(
        chunk_size=300,
        chunk_overlap=0
    )
).from_loaders(loaders=loaders)

## LangChain: Prompts

Prompt Management, Optimization and Serialization. Read more [here](https://docs.langchain.com/docs/components/prompts)

```python
from langchain import PromptTemplate

template = """Question: {question}

Let's think step by step.

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])
user_input = input("Enter your question: ")
prompt.format(question=user_input)
```

## LangChain: Chain

Sequence of calls (with multiple models). Read more [here](https://python.langchain.com/en/latest/modules/chains.html)  
[] `LLMChain`  
[] `SequentialChain`  
[] Custom `Chain`  

```python
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
    input_variables=["product"],
    template="Which country is the largest producer of {product}?",
)

from langchain.chains import LLMChain
# Chain a LLM with a prompt template
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("lithium"))
```


```python
from langchain.chains import SimpleSequentialChain

q = "What type of mammal lays the biggest eggs?"

llm = OpenAI(temperature=0.7)
template = """{question}\n\n"""
prompt_template = PromptTemplate(input_variables=["question"], template=template)
question_chain = LLMChain(llm=llm, prompt=prompt_template)

template = """Here is a statement:
{statement}
Make a bullet point list of the assumptions you made when producing the above statement.\n\n"""
prompt_template = PromptTemplate(input_variables=["statement"], template=template)
assumptions_chain = LLMChain(llm=llm, prompt=prompt_template)

template = """Here is a bullet point list of assertions:
{assertions}
For each assertion, determine whether it is true or false. If it is false, explain why.\n\n"""
prompt_template = PromptTemplate(input_variables=["assertions"], template=template)
fact_checker_chain = LLMChain(llm=llm, prompt=prompt_template)

template = """In light of the above facts, how would you answer the question '{}'""".format(q)
template = """{facts}\n""" + template
prompt_template = PromptTemplate(input_variables=["facts"], template=template)
answer_chain = LLMChain(llm=llm, prompt=prompt_template)

overall_chain = SimpleSequentialChain(chains=[question_chain, assumptions_chain, fact_checker_chain, answer_chain], verbose=True)
```

Credit to jagilley/fact-checker for the example. Check the repo [here](https://github.com/jagilley/fact-checker).


![Importance of SeqChains](https://weaviate.io/assets/images/sequential-chains-fec82f27b64a0d8f5b6123b39569ecf2.gif)

In [11]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SimpleSequentialChain

question = query

template = """{question}\n\n"""
prompt_template = PromptTemplate(input_variables=["question"], template=template)
question_chain = LLMChain(llm=local_llm, prompt=prompt_template)

template = """Here is a statement:
{statement}
Make a bullet point list of the assumptions you made when producing the above statement.\n\n"""
prompt_template = PromptTemplate(input_variables=["statement"], template=template)
assumptions_chain = LLMChain(llm=local_llm, prompt=prompt_template)

template = """Here is a bullet point list of assertions:
{assertions}
For each assertion, determine whether it is true or false. If it is false, explain why.\n\n"""
prompt_template = PromptTemplate(input_variables=["assertions"], template=template)
fact_checker_chain = LLMChain(llm=local_llm, prompt=prompt_template)

template = """In light of the above facts, how would you answer the question '{}'""".format(question)
template = """{facts}\n""" + template
prompt_template = PromptTemplate(input_variables=["facts"], template=template)
answer_chain = LLMChain(llm=local_llm, prompt=prompt_template)

overall_chain = SimpleSequentialChain(chains=[question_chain, assumptions_chain, fact_checker_chain, answer_chain], verbose=True)

In [12]:
print(question)
overall_chain.run(question=question)

What can you tell me about GPT from OpenAI?


In [13]:

llm = local_llm
template = """You are a programmer. Given the title of function, it is your job to write a Python program.

Title: {title}
Programmer: This is the program I wrote:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template)

synopsis_chain.run(title="BubbleSort")


'a=[] b=[] c=[] d=[] e=[] f=[] g=[] h=[] i=0 while ilen(a): if a[i]==b[i]: a[i]=b[i] i+=1 elif a[i]==c[i]: c[i]=b[i] i+=1 if a[i]==d[i]: d[i]=c[i] i+=1 if a[i]==e[i]: e[i]=c[i] i+=1 if a[i]==f[i]: f[i]=c[i] i+=1 if a[i]==g[i]: g[i]=c[i] i+=1 if a[i]==h[i]: g[i]=c[i] i+=1 if a[i]==i: g[i]=c[i] i+=1 if a[i]==j: g['

In [14]:
template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.

Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template)

template = """You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.

Play Synopsis:
{synopsis}
Review from a New York Times play critic of the above play:"""
prompt_template = PromptTemplate(input_variables=["synopsis"], template=template)
review_chain = LLMChain(llm=llm, prompt=prompt_template)

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[synopsis_chain, review_chain], verbose=True)

review = overall_chain.run("Tragedy at sunset on the beach")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mTragedy at sunset on the beach is a tragedy about a young man who is a sailor who is stranded on a beach. He is rescued by a lifeguard who is a sailor himself. The sailor is a sailor who is stranded on a beach. He is rescued by a lifeguard who is a sailor himself.[0m
[33;1m[1;3mIt is a play about a young man who is a sailor who is stranded on a beach. He is rescued by a lifeguard who is a sailor himself.[0m

[1m> Finished chain.[0m


## LangChain: Agents

Agents that can use tools like Google Search or Wikipedia. Read more [here](https://python.langchain.com/en/latest/modules/agents.html)

Note: This is _not_ great for air gapped systems.

```python
from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.agents import 

from langchain.llms import OpenAI
llm = OpenAI(temperature=0)

tools = load_tools(["google-search", "wikimedia", "llm-math"], 
                    llm=llm)
agent = initialize_agent(tools=tools,
                        llm=llm,
                        agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                        verbose=True)

agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?") #relevant example
```

In [15]:
from langchain.agents import initialize_agent, load_tools

tools = load_tools([
    "wikipedia",])
agent = initialize_agent(
    tools=tools,
    llm=local_llm,
    agent="zero-shot-react-description",
    verbose=True
)

In [18]:
agent.run("Who is the president of the United States?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: you should always think about what to do, should be one of [terminal] Action Input: the input to the action Observation: the result of the action ... Final Answer: the final answer to the original input question[0m

[1m> Finished chain.[0m


'the final answer to the original input question'

In [19]:
from langchain import Wikipedia
from langchain.agents.react.base import DocstoreExplorer
from langchain.agents import Tool

docstore=DocstoreExplorer(Wikipedia())
tools = [
    Tool(
        name="Search",
        func=docstore.search,
        description='search wikipedia'
    ),
    Tool(
        name="Lookup",
        func=docstore.lookup,
        description='lookup a term in wikipedia'
    )
]


In [25]:
docstore_agent = initialize_agent(
    tools, 
    llm, 
    agent="react-docstore", 
    verbose=True,
    max_iterations=3
)


In [26]:
docstore_agent("What were Archimedes' last words?")



[1m> Entering new AgentExecutor chain...[0m


In [27]:
llm

HuggingFacePipeline(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x7fb4feed5270>, pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7fb40db28190>, model_id='gpt2', model_kwargs=None)

## LangChain: Memory

Momory components for
1. utilities for managing and manipulating previous chat messages
2. incorporate these utilities into chains

Read more [here](https://python.langchain.com/en/latest/modules/memory.html).

<img src="https://weaviate.io/assets/images/map-reduce-9391a173a110e4f176ffbc41230408dd.gif" width="700" height="500">

<img src="https://weaviate.io/assets/images/refine-e08b700c51cb69bbae27dcfa9478b108.gif" width="700" height="500">


<img src="https://weaviate.io/assets/images/map-rerank-0764fcc75ed70d7b6ab45333589c685e.gif" width="700" height="500">


## References

[1] LangChain Documentation