# Part 3: Pinecone VectorDB and RAG

### 3.1 Task Description


In this part, we will explore the capabilities of Pinecone VectorDB and implement a basic Retrieval-Augmented Generation (RAG) pipeline. The goal is to enhance the performance of a standard Question Answering (QA) model by leveraging relevant documents from a vector database.



## 3.1.1 Dataset Selection


First, we need to find a dataset for which a standard QA model fails to accurately answer the questions, often resulting in hallucinations. The correct answers should be available within a set of documents that will be stored in the Pinecone VectorDB.



In [1]:
# Installing all the folders we need

!pip install sentence_transformers
!pip install datasets
!pip install pinecone-client
!pip install cohere

Collecting sentence_transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl (227 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/227.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/227.1 kB[0m [31m2.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (1

In [2]:
# Import necessary libraries
from sentence_transformers import SentenceTransformer
from datasets import load_dataset
import pinecone
import os
from tqdm import tqdm
import cohere
import numpy as np
import warnings
from IPython.display import display
import time

# Ignore warnings
warnings.filterwarnings("ignore")

# Setting the API keys after connecting to the relevant sites( deleted for privacy)
PINECONE_API_KEY = ""
COHERE_API_KEY = ""

# Initialize Pinecone connection
pc = pinecone.Pinecone(
    api_key=PINECONE_API_KEY
)

# Define index name and create it if it doesn't exist
index_name = "jokes-index"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=384,
        metric='cosine',
        spec=pinecone.ServerlessSpec(
            cloud='aws',
            region='us-east-1'  # Use a supported region
        )
    )

# Connect to the index
index = pc.Index(index_name)

# Initialize the Cohere client
co = cohere.Client(COHERE_API_KEY)

# Load the dataset
dataset = load_dataset('SocialGrep/one-million-reddit-jokes', split='train')

  from tqdm.autonotebook import tqdm, trange


Downloading readme:   0%|          | 0.00/3.41k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/300M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000000 [00:00<?, ? examples/s]

In [30]:
def test_QA_model(query):
    """
    Sends a query to the Cohere API and returns the response.

    Args:
    - query (str): The query string to send to Cohere.

    Returns:
    - response (str): The response from the Cohere API.
    """
    try:
        # Send the query to Cohere
        response = co.generate(
            prompt=query,
            max_tokens=50  # Adjust the max tokens as needed
        )
        # Extract and return the response text
        return response.generations[0].text.strip()
    except Exception as e:
        return f"An error occurred: {e}"

## 3.1.2 Building a RAG Pipeline


We will now build a pipeline that retrieves relevant documents from the Pinecone VectorDB and integrates them with a generative model to form the RAG pipeline. This involves several steps, which we will outline below.



### 3.1.2.1 Embedding the Dataset

To begin, we will use the SentenceTransformer model to embed the jokes dataset. These embeddings will help us to efficiently search for relevant jokes later on.




In [10]:
# Function to load and embed dataset
def load_and_embedd_dataset(dataset, model, text_field, rec_num=None):
    """
    Load a dataset and embed the text field using a sentence-transformer model
    Args:
        dataset: The loaded dataset
        model: The model to use for embedding
        text_field: The field in the dataset that contains the text
        rec_num: The number of records to load and embed (None to embed all)
    Returns:
        tuple: A tuple containing the dataset and the embeddings
    """
    print("Loading and embedding the dataset")

    # If rec_num is not None, limit the dataset
    if rec_num is not None:
        subset = dataset[text_field][:rec_num]
    else:
        subset = dataset[text_field]

    # Embed the dataset
    embeddings = model.encode(subset, convert_to_tensor=True)

    print("Done!")
    return dataset, embeddings

# Use the full dataset
text_field = 'title'
rec_num = 1000  # Adjust this number for testing with a smaller subset

start_time = time.time()

model = SentenceTransformer('paraphrase-MiniLM-L6-v2')
print("Model loaded successfully.")

# Load and embed dataset
dataset, embeddings = load_and_embedd_dataset(dataset, model, text_field, rec_num)

end_time = time.time()
print(f"Time for embedding the dataset: {end_time - start_time} seconds")


Model loaded successfully.
Loading and embedding the dataset
Done!
Time for embedding the dataset: 4.46078085899353 seconds


### 3.1.2.2 Inserting Embeddings into Pinecone

Next, we will insert these embeddings into Pinecone. We'll do this in batches for efficiency. Each joke will get an ID and some metadata to keep track of it.


In [None]:
# Define a function to create Pinecone index
def create_pinecone_index(index_name: str, dimension: int, metric: str = 'cosine'):
    print("Creating a Pinecone index...")
    pc = pinecone.Pinecone(api_key=PINECONE_API_KEY)
    existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]
    if index_name not in existing_indexes:
        pc.create_index(
            name=index_name,
            dimension=dimension,
            metric=metric,
            spec=ServerlessSpec(
                cloud="aws",
                region="us-east-1"
            )
        )
    print("Done!")
    return pc

# Create the vector database
pc = create_pinecone_index(index_name, embeddings.shape[1])

# Define a function to upsert vectors to Pinecone
def upsert_vectors(index, embeddings, dataset, text_field='title', batch_size=100):
    print("Upserting the embeddings to the Pinecone index...")
    shape = embeddings.shape

    ids = [str(i) for i in range(shape[0])]
    meta = [{text_field: text} for text in dataset[text_field][:shape[0]]]

    to_upsert = list(zip(ids, embeddings.tolist(), meta))

    for i in tqdm(range(0, shape[0], batch_size)):
        i_end = min(i + batch_size, shape[0])
        index.upsert(vectors=to_upsert[i:i_end])
    print("Upsert completed.")

# Upsert the vectors
index = pc.Index(index_name)
upsert_vectors(index, embeddings, dataset)


### 3.1.2.3 Retrieving Relevant Documents


We need a function to retrieve the most relevant jokes when given a query. This function will take a query, embed it, and use Pinecone to find the top matching jokes.


In [None]:
# Define a function to retrieve relevant documents from Pinecone
def retrieve_relevant_docs(query, model, index, top_k=5):
    query_emb = model.encode([query])
    result = index.query(vector=query_emb.tolist(), top_k=top_k, include_metadata=True)
    docs = [match['metadata']['title'] for match in result['matches']]
    return docs


### 3.1.2.4 Building the RAG Pipeline

In this section, we will compare the performance of a standard QA model with our RAG pipeline using specific queries. This comparison will help demonstrate the improvements in accuracy and relevance that the RAG pipeline provides over the standard QA model.



In [8]:
# Build the RAG pipeline
def RAG_pipeline(query):
    relevant_docs = retrieve_relevant_docs(query, model, index)
    context = ' '.join(relevant_docs)
    response = co.generate(
        model='command-xlarge-nightly',  # Use a different model like 'command-xlarge-nightly'
        prompt=f'Answer the question based on the context provided.\n\nContext: {context}\n\nQuestion: {query}\nAnswer:',
        max_tokens=100
    )
    return response.generations[0].text.strip()

***Question 1***

In [34]:
# Question 1
query1 = "Tell me a joke about programming?"
qa_answer1 = test_QA_model(query1)
rag_answer1 = RAG_pipeline(query1)
print(f"Query: \"{query1}\"\n\n")
print("QA Model Answer:\n")
print(f"Answer: \"{qa_answer1}\"\n\n")
print("RAG Pipeline Answer:\n")
print(f"Answer: \"{rag_answer1}\"\n\n")


Query: "Tell me a joke about programming?"


QA Model Answer:

Answer: "Here's a programming joke for you:

Why did the Java developer wear a jacket and tie?

Because he had a Java-script tonight!

I hope you found that joke to be witty and clever, it is a popular"


RAG Pipeline Answer:

Answer: "Why did the programmer quit his job? Because he didn't get arrays."




***Question 2***

In [33]:
# Question 2
query2 = "Tell me a funny joke about artificial intelligence."
qa_answer2 = test_QA_model(query2)
rag_answer2 = RAG_pipeline(query2)
print(f"Query: \"{query2}\"\n\n")
print("QA Model Answer:\n")
print(f"Answer: \"{qa_answer2}\"\n\n")
print("RAG Pipeline Answer:\n")
print(f"Answer: \"{rag_answer2}\"\n\n")


Query: "Tell me a funny joke about artificial intelligence."


QA Model Answer:

Answer: "I would love to! 

I love A.I., but apparently it hates me and wants to destroy me. It's kind of a toxic relationship I never seem to learn from. Maybe one day I'll wake up and it'll be over"


RAG Pipeline Answer:

Answer: "Why did the robot quit his job? He couldn't stand the commute!"




***Question 3***

In [22]:
# Question 3
query3 = "Can u tell me a dark joke?"
qa_answer3 = test_QA_model(query3)
rag_answer3 = RAG_pipeline(query3)
print(f"Query: \"{query3}\"\n\n")
print("QA Model Answer:\n")
print(f"Answer: \"{qa_answer3}\"\n\n")
print("RAG Pipeline Answer:\n")
print(f"Answer: \"{rag_answer3}\"\n\n")


Query: "Can u tell me a dark joke?"


QA Model Answer:

Answer: "Sure, here's a dark joke for you:

Why did the skeleton cross the road?
To get to the other side, of course!

This joke is dark because it presents a typical response to the question "why did the"


RAG Pipeline Answer:

Answer: "Sure, here's a dark joke: What do you call a pile of kittens at the bottom of a cliff? A meowntain."





## Conclusion


#### Effectiveness of RAG:

Compared to the standard QA model, it was found that the RAG pipeline returned more contextually relevant and coherent jokes. Subtle understanding and generation of humor by the contexts retrieved from the joke dataset could be propositional in this case.

#### Insights and Observations:  

The RAG pipeline reduces many hallucinations and improves the accuracy of answers by simply grounding this in retrieved data. From time to time, as often happens, regular QA models will probably come up with plausible but wrong answers, thus reverting to creative situations like jokes. On the other hand, this RAG pipeline will ensure that generated answers are correct and contextually fitting. This gives an example of success during integration since combining retrieval and generation provided a pretty robust mechanism for answering quality enhancement in many contexts. We compared the output of a standard QA model with the RAG pipeline. We observed that the latter produced more relevant, Basel funny responses, which proves that it works better on handling a wide array of contextually rich queries.


