<a href="https://colab.research.google.com/github/Dntfreitas/introduction-agents-ai/blob/main/rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-Augmented Generation (RAG) with OpenAI API

In this notebook, we'll explore the concept of Retrieval-Augmented Generation (RAG) and how to use it with OpenAI's API. This approach combines the strengths of information retrieval and text generation to create more informed and accurate responses.

### What is RAG?
RAG enhances the response generation process by incorporating relevant external documents or data. It involves two main steps:
1. **Retrieve**: Fetch relevant documents from a knowledge base.
2. **Generate**: Use a language model (e.g., OpenAI GPT) to generate an answer based on both the query and retrieved documents.

We'll walk through a basic example using PDF documents stored in a local directory.

In [None]:
# Let's make sure we have the required libraries installed for this tutorial.
!pip install openai faiss-cpu tiktoken PyPDF2 langchain gradio

In [None]:
# Now, let's import the necessary libraries and set up our environment.
import os
from typing import List

import PyPDF2
import faiss
import gradio as gr
import numpy as np
from IPython.display import Markdown, display
from langchain.text_splitter import RecursiveCharacterTextSplitter
from openai import OpenAI

In [None]:
# As we are going to use Google Coolab, we don't need to load the environment variables.
# Otherwise, you can use the following code to load the environment variables from a `.env` file.
# from dotenv import load_dotenv
# load_dotenv(override=True)

from google.colab import userdata

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [None]:
# Now, let's initialize the OpenAI API client.
openai = OpenAI(api_key = OPENAI_API_KEY)

In [None]:
# Load and Extract Text from PDFs (code)
def load_pdfs_from_directory(directory_path: str) -> List[str]:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    documents = []
    for filename in os.listdir(directory_path):
        if filename.endswith(".pdf"):
            filepath = os.path.join(directory_path, filename)
            with open(filepath, 'rb') as file:
                reader = PyPDF2.PdfReader(file)
                text = " ".join(page.extract_text() for page in reader.pages if page.extract_text())
                # Chunk the text
                chunks = text_splitter.split_text(text)
                documents.extend(chunks)
    return documents


pdf_directory = "./pdfs"
documents = load_pdfs_from_directory(pdf_directory)

## Embedding

Embeddings measure the relatedness of text strings. Embeddings are commonly used for:
- Search (where results are ranked by relevance to a query string)
- Clustering (where text strings are grouped by similarity)
- Recommendations (where items with related text strings are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where text strings are classified by their most similar label)

**An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.**

Source: [OpenAI](https://platform.openai.com/docs/guides/embeddings).


In [None]:
def embed_texts(texts: List[str]) -> List[List[float]]:
    response = openai.embeddings.create(
        input=texts,
        model="text-embedding-3-small"
    )
    return [e.embedding for e in response.data]

In [None]:
embeddings = embed_texts(documents)

# Build FAISS Index

FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors.
For the RAG pipeline, we will use FAISS to index the embeddings of the documents. This allows us to quickly retrieve the most relevant documents based on a query.

In [None]:
# Build FAISS Index
dim = len(embeddings[0])  # dimension of the embeddings
index = faiss.IndexFlatL2(dim)  # L2 distance (Euclidean distance)
index.add(np.array(embeddings).astype('float32'))  # add embeddings to the index

# Retrieve Relevant Documents

The retrieve function takes a query string and retrieves the top k most relevant documents from the FAISS index. It does this by embedding the query and searching for the nearest neighbors in the index.

In [None]:
# Retrieve Relevant Documents
def retrieve(query: str, k: int = 50) -> List[str]:
    query_embedding = embed_texts([query])[0]
    D, I = index.search(np.array([query_embedding]).astype('float32'), k)
    # D contains distances, I contains indices of the nearest neighbors
    return [documents[i] for i in I[0]]

In [None]:
retrieved = retrieve("Talk about Lisbon")

In [None]:
def generate_answer(query: str, retrieved_docs: List[str]) -> str:
    context = "\n".join(retrieved_docs)
    prompt = f"""
    Answer the question based only the context below.
    If the context does not contain the answer, say "I don't know".

    Context:
    {context}

    Question: {query}
    Answer:
    """
    response = openai.chat.completions.create(
        model="gpt-4.1-nano",
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    return response.choices[0].message.content


In [None]:
question = "What are the main attractions in Lisbon?"
most_relevant_documents = retrieve(question)
completion = generate_answer(question, documents)

In [None]:
display(Markdown(completion))

In [None]:
question = "What are the main attractions in Jupyter?"
most_relevant_documents = retrieve(question)
completion = generate_answer(question, documents)

In [None]:
display(Markdown(completion))

# Build a RAG Pipeline

The RAG pipeline combines the retrieval and generation steps into a single function. It takes a query string, retrieves relevant documents, and generates an answer based on those documents.

In [None]:
def rag_pipeline(query: str) -> str:
    retrieved_docs = retrieve(query)
    answer = generate_answer(query, retrieved_docs)
    return answer

In [None]:
display(Markdown(rag_pipeline("Talk about Lisbon")))

# Build a Q&A Interface using Gradio

Gradio is a Python library that allows you to quickly create user interfaces for machine learning models. We'll use Gradio to build a simple chat interface for our RAG pipeline.

In [None]:
gr.Interface(
    fn=rag_pipeline,
    inputs=gr.Textbox(lines=2, placeholder="Enter your question here..."),
    outputs="markdown",
    title="RAG with OpenAI and PDF Knowledge Base",
    description="Ask questions based on content extracted from your local PDF files."
).launch()