# External lab: Create a RAG with LLM using your own data

## Concepts Covered

* Retrieval Augmented Generation

* Large Language Models using Llamafile

* Using Vector databases like Qdrant

* Creating embeddings with Sentence Transformers

* Using OpenAI's Python API to connect to the LLM and produce responses

In [1]:
import pandas as pd
from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Load dataset
wine_datafile_path = "top_rated_wines.csv"
df = pd.read_csv(wine_datafile_path)

df

Unnamed: 0,name,region,variety,rating,notes
0,3 Rings Reserve Shiraz 2004,"Barossa Valley, Barossa, South Australia, Aust...",Red Wine,96.0,Vintage Comments : Classic Barossa vintage con...
1,Abreu Vineyards Cappella 2007,"Napa Valley, California",Red Wine,96.0,Cappella is a proprietary blend of two clones ...
2,Abreu Vineyards Cappella 2010,"Napa Valley, California",Red Wine,98.0,Cappella is one of the oldest vineyard sites i...
3,Abreu Vineyards Howell Mountain 2008,"Howell Mountain, Napa Valley, California",Red Wine,96.0,When David purchased this Howell Mountain prop...
4,Abreu Vineyards Howell Mountain 2009,"Howell Mountain, Napa Valley, California",Red Wine,98.0,"As a set of wines, it is hard to surpass the f..."
...,...,...,...,...,...
1360,Lewis Cellars Alec's Blend Red 2002,"Napa Valley, California",Red Wine,96.0,Number 12 on
1361,Lewis Cellars Cabernet Sauvignon 2002,"Napa Valley, California",Red Wine,96.0,Showcasing the unique personalities of small h...
1362,Lewis Cellars Cuvee L Cabernet Sauvignon 2015,"Napa Valley, California",Red Wine,96.0,"Straight from James Fenimore Cooper’s novel, L..."
1363,Lewis Cellars Reserve Cabernet Sauvignon 2010,"Napa Valley, California",Red Wine,96.0,


In [3]:
#Convert data to dictionary
data = df.to_dict(orient='records')

In [4]:
# Initialize encoder
encoder = SentenceTransformer("all-MiniLM-L6-v2")  # Model to create embeddings

In [5]:
# create the vector database client
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

In [6]:
# Collection name
collection_name = "wines_quality"

# Check if the collection exists, then create or delete as needed
if qdrant.collection_exists(collection_name):
    qdrant.delete_collection(collection_name)

# Create a new collection
qdrant.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(),  # Vector size from model
        distance=models.Distance.COSINE
    )
)

# Vectorize and upload points
qdrant.upload_points(
    collection_name=collection_name,
    points=[
        models.PointStruct(
            id=idx,
            vector=encoder.encode(str(doc["name"])),  # Convert int to string
            payload=doc
        ) for idx, doc in enumerate(data)
    ]
)

print(f"Collection '{collection_name}' created successfully and data uploaded.")

Collection 'wines_quality' created successfully and data uploaded.


In [7]:
user_prompt = "Suggest me a n amazing Malbec wine from Argentina"

In [8]:
# Search for the best matching wines
hits = qdrant.query_points(
    collection_name=collection_name,
    query= encoder.encode(user_prompt),
    limit=5
)

# Print results
print("Hit Score ", " | ID", " | Payload")
print("--------", "--", "-------")
for hit in hits.points:
    print(hit.score, hit.id, hit.payload)


Hit Score   | ID  | Payload
-------- -- -------
0.6155315838862419 296 {'name': 'Catena Zapata Argentino Vineyard Malbec 2004', 'region': 'Argentina', 'variety': 'Red Wine', 'rating': 98.0, 'notes': '"The single-vineyard 2004 Malbec Argentino Vineyard spent 17 months in new French oak. Remarkably fragrant and complex aromatically, it offers up aromas of wood smoke, creosote, pepper, clove, black cherry, and blackberry. Made in a similar, elegant style, it is the most structured of the three single vineyard wines, needing a minimum of a decade of additional cellaring. It should easily prove to be a 25-40 year wine. It is an exceptional achievement in Malbec. When all is said and done, Catena Zapata is the Argentina winery of reference – the standard of excellence for comparing all others. The brilliant, forward-thinking Nicolas Catena remains in charge, with his daughter, Laura, playing an increasingly large role. The Catena Zapata winery is an essential destination for fans of both arc

In [9]:
# Define a variable to hold the search results including ID and payload
search_results = [{"id": hit.id, "payload": hit.payload} for hit in hits.points]

print(type(search_results))  # Debugging check

# Ensure search_results is properly formatted as a string
if isinstance(search_results, list):
    # Convert all three selected results into a formatted string
    formatted_search_results = "\n".join([f"ID: {item['id']}, Data: {item['payload']}" for item in search_results])
else:
    formatted_search_results = str(search_results)  # Convert to string if needed

print(type(formatted_search_results))  # Debugging check
print(formatted_search_results)  # Print the final formatted output


<class 'list'>
<class 'str'>
ID: 296, Data: {'name': 'Catena Zapata Argentino Vineyard Malbec 2004', 'region': 'Argentina', 'variety': 'Red Wine', 'rating': 98.0, 'notes': '"The single-vineyard 2004 Malbec Argentino Vineyard spent 17 months in new French oak. Remarkably fragrant and complex aromatically, it offers up aromas of wood smoke, creosote, pepper, clove, black cherry, and blackberry. Made in a similar, elegant style, it is the most structured of the three single vineyard wines, needing a minimum of a decade of additional cellaring. It should easily prove to be a 25-40 year wine. It is an exceptional achievement in Malbec. When all is said and done, Catena Zapata is the Argentina winery of reference – the standard of excellence for comparing all others. The brilliant, forward-thinking Nicolas Catena remains in charge, with his daughter, Laura, playing an increasingly large role. The Catena Zapata winery is an essential destination for fans of both architecture and wine in Mendo

In [10]:
len(formatted_search_results)
#The Llama-3.2-1B-Instruct.Q6_K model has a maximum context window size of 128,000 tokens

3616

# Use LLAMA File to run locally
https://github.com/Mozilla-Ocho/llamafile

Command: Llama-3.2-1B-Instruct.Q6_K.exe --server --v2 -ngl 9999

Note: A llama.log file will be created automatically; use it to get the details about the model

One the above is up and running, move forward.

In [11]:
# Testing if the model is up and running
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-no-key-required"
)
completion = client.chat.completions.create(
    model="LLaMA_CPP",
    messages=[
        {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ]
)
print(completion.choices[0].message)

ChatCompletionMessage(content='Here is a limerick about Python exceptions:\n\nThere once was a Python case so fine,\nRaised by a ValueError, sublime.\nIt caught a bad sign,\nAnd caused the code to decline,\nNow debuggers must re-align.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)


In [12]:
# Now time to connect to the local large language model
from openai import OpenAI
client = OpenAI(
    base_url="http://127.0.0.1:8080/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-no-key-required"
)

# Ensure model name is correct (check from the API)
model_name = "LLaMA_CPP" 

# Query the model
print("Runnning the model...")
completion = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "system", "content": "You are a wine specialist. Your priority is to help users select wines based on their needs. Only select one most suitable wine from the available options given in the content."},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": formatted_search_results}  # Ensure variable exists
    ]
)

print(completion.choices[0].message)
print("----------------")
print("User Prompt: ", user_prompt)
print(completion.choices[0].message.content)  # Print the final output from the model

Runnning the model...
ChatCompletionMessage(content="Based on the options provided, I would highly recommend the:\n\n**Araujo Eisele Vineyard Cabernet Sauvignon 2003**\n\nThis wine is a standout in its category, with a complex and nuanced profile that showcases the best of Argentine Malbec. The wine is known for its rich, velvety texture, with flavors of blackberries, cassis, floral notes, and spicy undertones. The tannins are smooth and well-integrated, making this a great value for the price.\n\nThe wine's aging potential is impressive, with a minimum of 25-40 years expected to be at its best. Additionally, its popularity and recognition among wine enthusiasts make it a great choice for those looking to try a high-quality Argentine Malbec.\n\nOverall, the Araujo Eisele Vineyard Cabernet Sauvignon 2003 is an exceptional wine that is sure to impress even the most discerning palates.", refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)
----------------
User

In [13]:
print(completion.choices[0].message)

ChatCompletionMessage(content="Based on the options provided, I would highly recommend the:\n\n**Araujo Eisele Vineyard Cabernet Sauvignon 2003**\n\nThis wine is a standout in its category, with a complex and nuanced profile that showcases the best of Argentine Malbec. The wine is known for its rich, velvety texture, with flavors of blackberries, cassis, floral notes, and spicy undertones. The tannins are smooth and well-integrated, making this a great value for the price.\n\nThe wine's aging potential is impressive, with a minimum of 25-40 years expected to be at its best. Additionally, its popularity and recognition among wine enthusiasts make it a great choice for those looking to try a high-quality Argentine Malbec.\n\nOverall, the Araujo Eisele Vineyard Cabernet Sauvignon 2003 is an exceptional wine that is sure to impress even the most discerning palates.", refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)
