<a href="https://colab.research.google.com/github/Tiwari666/RAG/blob/main/RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Step-by-Step Instructions for Implementing RAG**

Step 1: Environment Setup:

Install Necessary Libraries: !pip install transformers datasets torch

Step 2: Load Data

Step 3: Initialize Retriever and Generator

Set Up the Retriever: Choose a retriever model. For text data, models like BM25 or a neural retriever can be used.

Set Up the Generator: Choose a generator model. This could be a pretrained model from Hugging Face’s Transformer library, like GPT or BART.

Step 4: Data Processing

Prepare Data: Depending on the task, one may need to preprocess his/her data, such as tokenization or vectorization.

Step 5: Implement Retrieval:

Execute Retrieval: Use the retriever model to fetch relevant documents or data snippets based on queries.

Step 6: Generate Responses

Generate Answer: Use the generator model to produce an answer based on the retrieved documents.

Step 7: Evaluation and Refinement:

Evaluate Performance: Test the system's performance on known queries and check the relevance and quality of the responses.

Refine Models: Based on performance, one might need to fine-tune models or adjust  retrieval methods.

Step 8: Deployment:

Deploy the Model: Once satisfied, deploy the RAG system into a production environment or integrate it into the application.

Additional Tips:

Monitor and Update: Continuously monitor the performance and update the underlying data or models as needed to maintain accuracy and relevance.

Experiment with Configurations: Depending on the specifics of  application, different configurations of retrievers and generators may yield better results.

To proceed effectively, I recommend reviewing specific documentation or tutorials related to each component one is planning to use, especially those provided by Hugging Face if you're using their models.



# **Tokenization vs. Vectorization**

Tokenization and vectorization are two fundamental concepts in natural language processing (NLP) that often work together but serve different purposes.

# Tokenization:


Tokenization is the process of breaking down text into smaller pieces, called tokens. Tokens can be words, phrases, or even single characters. The goal is to simplify the text and prepare it for further processing like vectorization.

# Vectorization:

Vectorization is the process of converting tokens into numerical representations that a machine learning model can understand. This often involves transforming sparse categorical data (tokens) into a dense numerical vector.

# **UPLOAD DATA:**

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('/content/top_rated_wines.csv')
df.head()

Unnamed: 0,name,region,variety,rating,notes
0,3 Rings Reserve Shiraz 2004,"Barossa Valley, Barossa, South Australia, Aust...",Red Wine,96.0,Vintage Comments : Classic Barossa vintage con...
1,Abreu Vineyards Cappella 2007,"Napa Valley, California",Red Wine,96.0,Cappella is a proprietary blend of two clones ...
2,Abreu Vineyards Cappella 2010,"Napa Valley, California",Red Wine,98.0,Cappella is one of the oldest vineyard sites i...
3,Abreu Vineyards Howell Mountain 2008,"Howell Mountain, Napa Valley, California",Red Wine,96.0,When David purchased this Howell Mountain prop...
4,Abreu Vineyards Howell Mountain 2009,"Howell Mountain, Napa Valley, California",Red Wine,98.0,"As a set of wines, it is hard to surpass the f..."


In [3]:
df.describe()

Unnamed: 0,rating
count,1365.0
mean,96.859341
std,0.995957
min,96.0
25%,96.0
50%,97.0
75%,98.0
max,99.0


# **Convert DataFrame to List of Dictionaries**

This conversion makes it easier to manipulate each record individually or pass the data to functions that expect a dictionary format.

This method converts the DataFrame into a list, where each element is a dictionary representing a row in the DataFrame. Each key in the dictionary corresponds to a column name of the  DataFrame.

In the 'records' format, each row of the DataFrame becomes one dictionary in the resulting list, where each key-value pair in the dictionary corresponds to a column name and the data value for that row, respectively.

In [37]:
data = df.to_dict('records')

# **Create Embeddings**

Objective: Use Sentence Transformers to generate embeddings from the imported data, which will be stored in the Qdrant vector database for efficient retrieval.

Tools: Sentence Transformers, Qdrant

Tasks:

Install Sentence Transformers:
pip install sentence-transformers

Convert textual data into embeddings: model.encode(data['column'])

Install and set up Qdrant: Follow Qdrant documentation for setup.

Store embeddings in Qdrant and validate by performing test queries.


# **Install Sentence Transformers:**

In [5]:
!pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.met

In [7]:
!pip install qdrant-client

Collecting qdrant-client
  Downloading qdrant_client-1.10.1-py3-none-any.whl.metadata (10 kB)
Collecting grpcio-tools>=1.41.0 (from qdrant-client)
  Downloading grpcio_tools-1.65.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting httpx>=0.20.0 (from httpx[http2]>=0.20.0->qdrant-client)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting portalocker<3.0.0,>=2.7.0 (from qdrant-client)
  Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)
Collecting protobuf<6.0dev,>=5.26.1 (from grpcio-tools>=1.41.0->qdrant-client)
  Downloading protobuf-5.27.3-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting grpcio>=1.41.0 (from qdrant-client)
  Downloading grpcio-1.65.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.3 kB)
Collecting httpcore==1.* (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0

# **Generate Embeddings:**

Embeddings in machine learning are a way to represent data, typically text, in a format (usually a vector of continuous numbers) that preserves the contextual or semantic relationships between elements. This approach allows complex items like words, sentences, or even entire documents to be converted into a form that a computer can understand and process mathematically.


Embeddings transform high-dimensional categorical data into a lower-dimensional, continuous, dense vector space. Each dimension of the vector captures some underlying attribute of the data, and similar data points are placed closer together in this spac

In [8]:
from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

  from tqdm.autonotebook import tqdm, trange


In [10]:
# Initialize the Sentence Transformer model: Model/encode to create embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')

In [33]:
# create the vector database client
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

In [35]:
# Create collection to store books
qdrant.recreate_collection(
    collection_name="top_wines",
    vectors_config=models.VectorParams(
        size=model.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance=models.Distance.COSINE
    )
)

  qdrant.recreate_collection(


True

In [38]:
# vectorize!
qdrant.upload_points(
    collection_name="top_wines",
    points=[
        models.PointStruct(
            id=idx,
            vector=model.encode(doc["notes"]).tolist(),
            payload=doc
        ) for idx, doc in enumerate(data) # data is the variable holding all the wines
    ]
)

In [40]:
# Let's find some awesome wines from Mendoza, Argentina!

# First, we encode our search query into a vector using the encoder.
# This vector represents the semantic meaning of the query.
query = "A wine from Mendoza, Argentina"
query_vector = model.encode(query).tolist()

# Now, we'll ask our Qdrant database to find the top 3 wines that best match our query.
# We search in the 'top_wines' collection.
hits = qdrant.search(
    collection_name="top_wines",
    query_vector=query_vector,
    limit=3
)

# Let's print out what we found!
# For each wine that matches our search, we'll see its details and how well it matched.
for hit in hits:
    print("Wine Details:", hit.payload, "| Match Score:", hit.score)


Wine Details: {'name': 'Catena Zapata Nicasia Vineyard Malbec 2004', 'region': 'Argentina', 'variety': 'Red Wine', 'rating': 96.0, 'notes': '"The single-vineyard 2004 Malbec Nicasia Vineyard is located in the Altamira district of Mendoza. It was aged for 18 months in new French oak. Opaque purple-colored, it exhibits a complex perfume of pain grille, scorched earth, mineral, licorice, blueberry, and black cherry. Thick on the palate, bordering on opulent, it has layers of fruit, silky tannins, and a long, fruit-filled finish. It will age effortlessly for another 6-8 years and provide pleasure through 2025. When all is said and done, Catena Zapata is the Argentina winery of reference – the standard of excellence for comparing all others. The brilliant, forward-thinking Nicolas Catena remains in charge, with his daughter, Laura, playing an increasingly large role. The Catena Zapata winery is an essential destination for fans of both architecture and wine in Mendoza. It is hard to believe

In [41]:
user_prompt = "Suggest me an amazing Malbec wine from Argentina"

In [57]:
# Search time for awesome wines!

top = qdrant.search(
    collection_name="top_wines",
    query_vector=model.encode(user_prompt).tolist(),
    limit=3
)
for hit in top:
  print(hit.payload, "score:", hit.score)

{'name': 'Catena Zapata Argentino Vineyard Malbec 2004', 'region': 'Argentina', 'variety': 'Red Wine', 'rating': 98.0, 'notes': '"The single-vineyard 2004 Malbec Argentino Vineyard spent 17 months in new French oak. Remarkably fragrant and complex aromatically, it offers up aromas of wood smoke, creosote, pepper, clove, black cherry, and blackberry. Made in a similar, elegant style, it is the most structured of the three single vineyard wines, needing a minimum of a decade of additional cellaring. It should easily prove to be a 25-40 year wine. It is an exceptional achievement in Malbec. When all is said and done, Catena Zapata is the Argentina winery of reference – the standard of excellence for comparing all others. The brilliant, forward-thinking Nicolas Catena remains in charge, with his daughter, Laura, playing an increasingly large role. The Catena Zapata winery is an essential destination for fans of both architecture and wine in Mendoza. It is hard to believe, given the surge i

The for loop iterates over the search results (top), and for each result (hit), it prints:

hit.payload: The payload typically contains detailed data about each item, in this case, wine details such as name, region, variety, rating, and tasting notes.

hit.score: A numerical score that represents the relevance of each result to the query. A higher score indicates a closer match to the query.

In [43]:
# define a variable to hold the search results
search_results = [hit.payload for hit in hits]

In [45]:
!pip install openai

Collecting openai
  Downloading openai-1.40.3-py3-none-any.whl.metadata (22 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Downloading openai-1.40.3-py3-none-any.whl (360 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m360.7/360.7 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (318 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.9/318.9 kB[0m [31m21.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jiter, openai
Successfully installed jiter-0.5.0 openai-1.40.3
