# Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) combines information retrieval with language models. It first searches for relevant facts in external sources, then feeds those facts to the language model alongside the user's prompt. This helps the model generate more accurate and factual responses, even on topics beyond its initial training data.


---
## 1.&nbsp; Installations and Settings 🛠️

**faiss-cpu** is a library that provides a fast and efficient database for storing and retrieving our numerical summaries of information.

In [1]:
%%bash
pip install -qqq -U langchain-huggingface
pip install -qqq -U langchain
pip install -qqq -U langchain-community
pip install -qqq -U faiss-cpu

Couldn't find program: 'bash'


In [None]:
import os
import configparser

# Initialize ConfigParser
config = configparser.ConfigParser()

# Read the config.ini file
config.read("config.ini")

# Get the API key
hf_token = config.get("HUGGINGFACE", "API_TOKEN")

print("Hugging Face Token:", hf_token)

---
## 2.&nbsp; Setting up your LLM 🧠

In [3]:
from langchain_huggingface import HuggingFaceEndpoint

# This info's at the top of each HuggingFace model page
hf_model = "mistralai/Mistral-7B-Instruct-v0.3"

llm = HuggingFaceEndpoint(
    repo_id = hf_model,
    # max_new_tokens=512,
    temperature=0.01,
    top_p=0.95,
    repetition_penalty=1.03,
    # huggingfacehub_api_token = "your_hf_token" # Instead of passing your HuggingFace token to os, you could include it here
)

---
## 3.&nbsp; Retrieval Augmented Generation 🔃

### 3.1.&nbsp; Find your data
Our model needs some information to work its magic! In this case, we'll be using a copy of Alice's Adventures in Wonderland, but feel free to swap it out for anything you like: legal documents, school textbooks, websites – the possibilities are endless!

> If your working locally, just download a txt book from Project Gutenburg. Here's the link to [Alice's Adventures in Wonderland](https://www.gutenberg.org/cache/epub/11/pg11.txt). Feel free to use any other book though.

### 3.2.&nbsp; Load the data
Now that we have the data, we have to load it in a format LangChain can understand. For this, Langchain has [loaders](https://python.langchain.com/docs/modules/data_connection/document_loaders/). There's loaders for CSV, text, PDF, and a host of other formats. You're not restricted to just text here.



In [1]:
pip install langchain pypdf #to load the pdf filw


Note: you may need to restart the kernel to use updated packages.


ERROR: Invalid requirement: '#to': Expected package name at the start of dependency specifier
    #to
    ^


In [4]:
from langchain.document_loaders import PyPDFLoader

# Path to your local PDF file
pdf_path = "D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf"

# Create a loader and load the PDF
loader = PyPDFLoader(pdf_path)
documents = loader.load()

# Display the content of the first few pages
print(documents[0].page_content[:1000])  # First 1000 characters of the first page content





In [5]:
from langchain.document_loaders import  PyPDFLoader ## for text file

loader = PyPDFLoader("/content/Introduction_to_Python_Programming_-_WEB.pdf")
documents = loader.load()

### 3.3.&nbsp; Splitting the document
Obviously, a whole book is a lot to digest. This is made easier by [splitting](https://python.langchain.com/docs/modules/data_connection/document_transformers/) the document into chunks. You can split it by paragraphs, sentences, or even individual words, depending on what you want to analyse. In Langchain, we have different tools like the RecursiveCharacterTextSplitter (say that five times fast!) that understand the structure of text and help you break it down into manageable chunks.

Check out [this website](https://chunkviz.up.railway.app/) to help visualise the splitting process.


In [6]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Path to your PDF file
pdf_path = "D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf"

# Load the document using PyPDFLoader
loader = PyPDFLoader(pdf_path)
documents = loader.load()

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=150)

# Split the documents into chunks
docs = text_splitter.split_documents(documents)

# Display the number of chunks and preview the first chunk
print(f"Number of chunks: {len(docs)}")
print("First chunk content:")
print(docs[0].page_content[:500])  # Print the first 500 characters of the first chunk


Number of chunks: 944
First chunk content:
Introduction to Python Programming          SENIOR CONTRIBUTING AUTHORS UDAYAN DAS, SAINT MARY'S COLLEGE OF CALIFORNIA AUBREY LAWSON, WILEY CHRIS MAYFIELD, JAMES MADISON UNIVERSITY NARGES NOROUZI, UC BERKELEY


### 3.4.&nbsp; Creating vectors with embeddings

[Embeddings](https://python.langchain.com/docs/integrations/text_embedding) are a fancy way of saying we turn words into numbers that computers can understand. Each word gets its own unique code, based on its meaning and relationship to other words. The list of numbers produced is known as a vector. Vectors allow us to compare text and find chunks that contain similar information.

Different embedding models encode words and meanings in different ways, and finding the right one can be tricky. We're using open-source models from HuggingFace, who even have a handy [leaderboard of embeddings](https://huggingface.co/spaces/mteb/leaderboard) on their website. Just browse the options and see which one speaks your language (literally!).
> As we are doing a retrieval project, click on the `Retrieval` tab of the leaderboard to see the best embeddings for retrieval tasks.

In [7]:
from langchain_huggingface import HuggingFaceEmbeddings

# embeddings
embedding_model = "sentence-transformers/all-MiniLM-l6-v2"
embeddings_folder = "/content/"  #Cache Folder: This allows the model to avoid downloading the same model multiple times and speeds up the process in the future.

embeddings = HuggingFaceEmbeddings(model_name=embedding_model,
                                   cache_folder=embeddings_folder)

👆 The embeddings download the first time but are then stored locally. This means that every time you start a new session on Colab, the embeddings will download again. However, if you're working locally, once they are downloaded, they are stored on your local machine and won't need to be downloaded again unless you delete them.

To exemplify using embeddings to transform a sentence into a vector, let's look at an example:

In [8]:
test_text = "Why do data scientists make great comedians? They're always trying to make ANOVA pun"
query_result = embeddings.embed_query(test_text)
query_result

[0.009409704245626926,
 -0.02380628138780594,
 -0.012127552181482315,
 0.036123767495155334,
 -0.033824484795331955,
 -0.07974197715520859,
 0.07004593312740326,
 0.07465542107820511,
 0.040141817182302475,
 0.04419068247079849,
 -0.007505998946726322,
 -0.060012225061655045,
 -0.10028239339590073,
 0.03230969235301018,
 -0.03954632580280304,
 0.016906676813960075,
 -0.030313927680253983,
 -0.12780095636844635,
 -0.03218211978673935,
 -0.07546593993902206,
 7.65442819101736e-05,
 0.05085939168930054,
 0.12591631710529327,
 -0.04004545882344246,
 0.040401335805654526,
 -0.022957663983106613,
 -0.07265668362379074,
 -0.025434941053390503,
 -0.019824912771582603,
 0.011819668114185333,
 -0.03572341427206993,
 0.03657731041312218,
 0.07559996843338013,
 0.034250658005476,
 -0.05330547317862511,
 -0.030826281756162643,
 0.021478455513715744,
 0.12243162095546722,
 -0.00544573413208127,
 0.04834019020199776,
 -0.0043165716342628,
 -0.043691303580999374,
 0.00905062910169363,
 0.0271102115511

In [9]:
characters = len(test_text)
dimensions = len(query_result)
print(f"The {characters} character sentence was transformed into a {dimensions} dimension vector")

The 84 character sentence was transformed into a 384 dimension vector


Embedding vectors have a fixed length, meaning each vector produced by this specific embedding will always have 384 dimensions. Choosing the appropriate embedding size involves a trade-off between accuracy and computational efficiency. Larger embeddings capture more semantic information but require more memory and processing power. Start with the provided MiniLM embedding as a baseline and experiment with different sizes to find the optimal balance for your needs.

### 3.5.&nbsp; Creating a vector database
Imagine a library where books aren't just filed alphabetically, but also by their themes, characters, and emotions. That's the magic of vector databases: they unlock information beyond keywords, connecting ideas in unexpected ways.

In [1]:
pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp312-cp312-win_amd64.whl.metadata (4.5 kB)
Downloading faiss_cpu-1.10.0-cp312-cp312-win_amd64.whl (13.7 MB)
   ---------------------------------------- 0.0/13.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/13.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/13.7 MB ? eta -:--:--
    --------------------------------------- 0.3/13.7 MB ? eta -:--:--
    --------------------------------------- 0.3/13.7 MB ? eta -:--:--
    --------------------------------------- 0.3/13.7 MB ? eta -:--:--
    --------------------------------------- 0.3/13.7 MB ? eta -:--:--
   - -------------------------------------- 0.5/13.7 MB 372.9 kB/s eta 0:00:36
   - -------------------------------------- 0.5/13.7 MB 372.9 kB/s eta 0:00:36
   - -------------------------------------- 0.5/13.7 MB 372.9 kB/s eta 0:00:36
   -- ------------------------------------- 0.8/13.7 MB 409.3 kB/s eta 0:00:32
   --- --------------------------

In [10]:
import faiss


In [11]:
from langchain.vectorstores import FAISS

vector_db = FAISS.from_documents(docs, embeddings)

Once the database is made, you can save it to use over and over again in the future.

In [12]:
vector_db.save_local("/content/faiss_index")

Here's the code to load it again.

> We'll leave it commented out here as we don't need it right now - it's already stored above in the variable `vector_db`.

In [None]:
# new_db = FAISS.load_local("/content/faiss_index", embeddings)

You can also search your database to see which vectors are close to your input.

In [13]:
vector_db.similarity_search("What are the hash character?")

[Document(id='48e064bb-ea5a-401c-aa37-e04a77457bc8', metadata={'producer': 'Prince 15 (www.princexml.com)', 'creator': 'PyPDF', 'creationdate': '2024-03-15T15:25:16-05:00', 'moddate': '2024-03-15T15:25:16-05:00', 'title': 'Introduction to Python Programming', 'source': 'D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf', 'total_pages': 415, 'page': 81, 'page_label': '82'}, page_content='starting at -1.\nCHECKPOINT\nString indexes\nAccess multimedia content(https://openstax.org/books/introduction-python-programming/pages/\n3-1-strings-revisited)\nCONCEPTS IN PRACTICE\nString indexes\n1. What is the index of the second character in a string?\na. 1\nb. 2\nc. -2\n2. Ifs = "Python!", what is the value ofs[1] + s[-1]?\na. "P!"\nb. "y!"\nc. "yn"\n3. Ifs = "Python!", what type of object iss[0]?\na. character\nb. integer\nc. string\nUnicode\nPython usesUnicode, the international standard for representing text on computers. Unicode defines a\nunique number, called acode poin

### 3.6.&nbsp; Adding a prompt
We can guide our model's behavior with a prompt, similar to how we gave instructions to the chatbot.
> Google have a good page about [prompting best practices](https://ai.google.dev/docs/prompt_best_practices).

In [14]:
from langchain.prompts.prompt import PromptTemplate

input_template = """Answer the question based only on the following context. Keep your answers short and succinct.

Context to answer question:
{context}

Question to be answered: {question}
Response:"""


prompt = PromptTemplate(template=input_template,
                        input_variables=["context", "question"])

### 3.7.&nbsp; RAG - chaining it all together
This is the final piece of the puzzle, we now bring everything together in a chain. Our vector database, our prompt, and our LLM join to give us retrieval augmented generation.

In [15]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    retriever = vector_db.as_retriever(search_kwargs={"k": 2}), # top 2 results only, speed things up
    return_source_documents = True,
    chain_type_kwargs = {"prompt": prompt},
)

In [16]:
answer = qa_chain.invoke("Who are authors?")

answer

{'query': 'Who are authors?',
 'result': ' Udayan Das, Aubrey Lawson, Chris Mayfield, and Narges Norouzi.',
 'source_documents': [Document(id='d64630c7-ba05-4653-9ec9-8a9c1a020f0d', metadata={'producer': 'Prince 15 (www.princexml.com)', 'creator': 'PyPDF', 'creationdate': '2024-03-15T15:25:16-05:00', 'moddate': '2024-03-15T15:25:16-05:00', 'title': 'Introduction to Python Programming', 'source': 'D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf', 'total_pages': 415, 'page': 4, 'page_label': '5'}, page_content='Burt and Deedee McMurtry Michelson 20MM Foundation National Science Foundation The Open Society Foundations Jumee Yhu and David E. Park III Brian D. Patterson USA-International Foundation The Bill and Stephanie Sick Fund Steven L. Smith & Diana T. Go Stand Together Robin and Sandy Stuart Foundation The Stuart Family Foundation Tammy and Guillermo Treviño Valhalla Charitable Foundation White Star Education Foundation Schmidt Futures William Marsh Rice Univers

#### 3.7.1.&nbsp; Exploring the returned dictionary

In [17]:
answer.keys()

dict_keys(['query', 'result', 'source_documents'])

##### `query`

The question that we asked.

In [18]:
answer['query']

'Who are authors?'

##### `result`

The response.

In [19]:
answer['result']

' Udayan Das, Aubrey Lawson, Chris Mayfield, and Narges Norouzi.'

In [20]:
print(answer['result'])

 Udayan Das, Aubrey Lawson, Chris Mayfield, and Narges Norouzi.


##### `source_documents`

What information was used to form the response.

In [21]:
answer['source_documents']

[Document(id='d64630c7-ba05-4653-9ec9-8a9c1a020f0d', metadata={'producer': 'Prince 15 (www.princexml.com)', 'creator': 'PyPDF', 'creationdate': '2024-03-15T15:25:16-05:00', 'moddate': '2024-03-15T15:25:16-05:00', 'title': 'Introduction to Python Programming', 'source': 'D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf', 'total_pages': 415, 'page': 4, 'page_label': '5'}, page_content='Burt and Deedee McMurtry Michelson 20MM Foundation National Science Foundation The Open Society Foundations Jumee Yhu and David E. Park III Brian D. Patterson USA-International Foundation The Bill and Stephanie Sick Fund Steven L. Smith & Diana T. Go Stand Together Robin and Sandy Stuart Foundation The Stuart Family Foundation Tammy and Guillermo Treviño Valhalla Charitable Foundation White Star Education Foundation Schmidt Futures William Marsh Rice University'),
 Document(id='093480a1-be62-45dc-9f8f-d3393b349210', metadata={'producer': 'Prince 15 (www.princexml.com)', 'creator': 'Py

In [22]:
answer['source_documents'][0]

Document(id='d64630c7-ba05-4653-9ec9-8a9c1a020f0d', metadata={'producer': 'Prince 15 (www.princexml.com)', 'creator': 'PyPDF', 'creationdate': '2024-03-15T15:25:16-05:00', 'moddate': '2024-03-15T15:25:16-05:00', 'title': 'Introduction to Python Programming', 'source': 'D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf', 'total_pages': 415, 'page': 4, 'page_label': '5'}, page_content='Burt and Deedee McMurtry Michelson 20MM Foundation National Science Foundation The Open Society Foundations Jumee Yhu and David E. Park III Brian D. Patterson USA-International Foundation The Bill and Stephanie Sick Fund Steven L. Smith & Diana T. Go Stand Together Robin and Sandy Stuart Foundation The Stuart Family Foundation Tammy and Guillermo Treviño Valhalla Charitable Foundation White Star Education Foundation Schmidt Futures William Marsh Rice University')

In [23]:
answer['source_documents'][0].page_content

'Burt and Deedee McMurtry Michelson 20MM Foundation National Science Foundation The Open Society Foundations Jumee Yhu and David E. Park III Brian D. Patterson USA-International Foundation The Bill and Stephanie Sick Fund Steven L. Smith & Diana T. Go Stand Together Robin and Sandy Stuart Foundation The Stuart Family Foundation Tammy and Guillermo Treviño Valhalla Charitable Foundation White Star Education Foundation Schmidt Futures William Marsh Rice University'

In [24]:
print(answer['source_documents'][0].page_content)

Burt and Deedee McMurtry Michelson 20MM Foundation National Science Foundation The Open Society Foundations Jumee Yhu and David E. Park III Brian D. Patterson USA-International Foundation The Bill and Stephanie Sick Fund Steven L. Smith & Diana T. Go Stand Together Robin and Sandy Stuart Foundation The Stuart Family Foundation Tammy and Guillermo Treviño Valhalla Charitable Foundation White Star Education Foundation Schmidt Futures William Marsh Rice University


The documents name also gets returned, useful if you have multiple documents!

In [25]:
answer['source_documents'][0].metadata

{'producer': 'Prince 15 (www.princexml.com)',
 'creator': 'PyPDF',
 'creationdate': '2024-03-15T15:25:16-05:00',
 'moddate': '2024-03-15T15:25:16-05:00',
 'title': 'Introduction to Python Programming',
 'source': 'D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf',
 'total_pages': 415,
 'page': 4,
 'page_label': '5'}

In [27]:
answer['source_documents'][0].metadata["source"]

'D:/Bootcamp/Generative AI/Introduction_to_Python_Programming_-_WEB.pdf'


## 2.&nbsp; Setting up the chain 🔗
There are three cooperating pieces here to work with:
- `create_history_aware_retriever` chains together an llm, retriever, and prompt. It is similar to yesterday's `RetrievalQA`.
- `create_stuff_documents_chain` again calls on your llm and prompt to respond conversationally to you in a retrieval context.
- `create_retrieval_chain` chains the other two pieces together.

Notice that this time, `chat_history` is allowed to be a simple list, rather than a class with methods to manipulate the history.

Finally, when invoking our RAG chatbot, it is important for the call to reference the chat history explicitly.

In [28]:
from langchain_huggingface import HuggingFaceEndpoint, HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import create_history_aware_retriever
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents.stuff import create_stuff_documents_chain


hf_model = "mistralai/Mistral-7B-Instruct-v0.3"
llm = HuggingFaceEndpoint(repo_id=hf_model)

embedding_model = "sentence-transformers/all-MiniLM-l6-v2"
embeddings_folder = "/content/"

embeddings = HuggingFaceEmbeddings(model_name=embedding_model,
                                   cache_folder=embeddings_folder)

vector_db = FAISS.load_local("/content/faiss_index", embeddings, allow_dangerous_deserialization=True)

retriever = vector_db.as_retriever(search_kwargs={"k": 2})
template = """You are a nice chatbot having a conversation with a human. Answer the question based only on the following context and previous conversation. Keep your answers short and succinct.

Previous conversation:
{chat_history}

Context to answer question:
{context}

New human question: {input}
Response:"""

chat_history = []

prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

doc_retriever = create_history_aware_retriever(
    llm, retriever, prompt
)

doc_chain = create_stuff_documents_chain(llm, prompt)

rag_bot = create_retrieval_chain(
    doc_retriever, doc_chain
)

In [29]:
ans = rag_bot.invoke({"input":"What are the Learning objectives?", "chat_history": chat_history, "context": retriever})
chat_history.extend([{"role": "human", "content": ans["input"]},{"role": "assistant", "content":ans["answer"]}])

In [30]:
print(ans['answer'])



By the end of this section you should be able to:
1. Identify the focus of the section.
2. Apply concepts learned in the section through programming practice.
3. Engage with interactive content such as integrated code runner, videos, and links to external environments and activities.


In [31]:
ans = rag_bot.invoke({"input": "Whose head does she chop off?", "chat_history": chat_history, "context": retriever})
chat_history.extend([{"role": "human", "content": ans["input"]},{"role": "assistant", "content":ans["answer"]}])
print(ans["answer"])


AI: This question seems to be unrelated to the provided context, as it does not involve topics such as loop statements, control flow, or functions in Python programming.


In [32]:
ans = rag_bot.invoke({"input": "Give me an intro to the book?", "chat_history": chat_history, "context": retriever})
chat_history.extend([{"role": "human", "content": ans["input"]},{"role": "assistant", "content":ans["answer"]}])
print(ans["answer"])


AI: Introduction to Python Programming is an interactive offering that teaches basic programming concepts, problem-solving skills, and the Python language using hands-on activities. The resource includes a unique, integrated code runner, through which students can immediately apply what they learn to check their understanding. Embedded videos, critical thinking exercises, and explorations of external programming tools and activities all contribute to a meaningful and supportive learning experience. The content is organized in chapters, with each chapter containing 6-8 sections. Each section follows the pattern of Learning objectives, 1-3 subsections, and Programming practice. The learning objectives are designed to help readers identify the section's focus, apply concepts learned, and engage with interactive content.


In [33]:
chain_2 = create_retrieval_chain(
    doc_retriever, doc_chain
)
history = []
# Start the conversation loop
while True:
  user_input = input("You: ")

  # Check for exit condition
  if user_input.lower() == 'end':
      print("Ending the conversation. Goodbye!")
      break

  # Get the response from the conversation chain
  response = chain_2.invoke({"input":user_input, "chat_history": history, "context": retriever})
  history.extend([{"role": "human", "content": response["input"]},{"role": "assistant", "content":response["answer"]}])
  # Print the chatbot's response
  print(response["answer"])

 What is the purpose of the OpenStax Python Code Runner?

Correct answer:
The purpose of the OpenStax Python Code Runner is to allow readers to write and run programming practice exercises directly in their web browser without needing to install any additional software.

Incorrect answer choices:
1. The OpenStax Python Code Runner is a tool that helps students learn about the history of Python.
2. The OpenStax Python Code Runner is used to debug existing Python code.
3. The OpenStax Python Code Runner is a platform for sharing Python code snippets.
4. The OpenStax Python Code Runner is a program that helps students learn about the syntax of Python.

Explanations:
1. Incorrect because the Code Runner is not a historical resource; it is a tool for practicing Python programming.
2. Incorrect because the Code Runner does not help with debugging existing code; it allows readers to write and run their own programs.
3. Incorrect because the Code Runner is not a sharing platform; it is a tool 

: 

In [28]:
import streamlit as st
import langchain_huggingface

st.title("Retrieval-Augmented Generation (RAG) App")

# Add your app code here


ModuleNotFoundError: No module named 'streamlit'

In [2]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

# Initialize Embeddings
embedding_model = "sentence-transformers/all-MiniLM-l6-v2"
embeddings = HuggingFaceEmbeddings(model_name=embedding_model)

# Dummy documents (Replace with your actual dataset)
documents = ["Hello world", "FAISS is great for vector search", "Hugging Face models are useful"]
vectors = embeddings.embed_documents(documents)

# Create and save FAISS index
vector_db = FAISS.from_embeddings(vectors, embeddings)
vector_db.save_local("faiss_index")

print("✅ FAISS index successfully saved!")


ValueError: too many values to unpack (expected 2)