<a href="https://colab.research.google.com/github/DataSavvyYT/AI-engineering-course/blob/main/03_rag/selecting_wines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [52]:
import kagglehub
path = kagglehub.dataset_download("zynicide/wine-reviews")

Using Colab cache for faster access to the 'wine-reviews' dataset.


In [56]:
print(path)

/kaggle/input/wine-reviews


In [57]:
!ls /kaggle/input/wine-reviews/

winemag-data-130k-v2.csv  winemag-data-130k-v2.json  winemag-data_first150k.csv


In [58]:
import pandas as pd

# Load the red wine data
df = pd.read_csv('/kaggle/input/wine-reviews/winemag-data-130k-v2.csv')
print(df.head())

   Unnamed: 0   country                                        description  \
0           0     Italy  Aromas include tropical fruit, broom, brimston...   
1           1  Portugal  This is ripe and fruity, a wine that is smooth...   
2           2        US  Tart and snappy, the flavors of lime flesh and...   
3           3        US  Pineapple rind, lemon pith and orange blossom ...   
4           4        US  Much like the regular bottling from 2012, this...   

                          designation  points  price           province  \
0                        Vulkà Bianco      87    NaN  Sicily & Sardinia   
1                            Avidagos      87   15.0              Douro   
2                                 NaN      87   14.0             Oregon   
3                Reserve Late Harvest      87   13.0           Michigan   
4  Vintner's Reserve Wild Child Block      87   65.0             Oregon   

              region_1           region_2         taster_name  \
0              

In [59]:
df = df[df['variety'].notna()] # remove any NaN values as it blows up serialization
data = df.sample(700).to_dict('records') # Get only 700 records. More records will make it slower to index
len(data)

700

In [61]:
!pip install -q qdrant-client sentence-transformers

In [62]:
from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

In [63]:
encoder = SentenceTransformer('all-MiniLM-L6-v2') # Model to create embeddings

In [64]:
# create the vector database client
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

In [65]:
# Create collection to store wines
qdrant.recreate_collection(
    collection_name="top_wines",
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance=models.Distance.COSINE
    )
)

  qdrant.recreate_collection(


True

In [67]:
# vectorize!
qdrant.upload_points(
    collection_name="top_wines",
    points=[
        models.PointStruct(
            id=idx,
            vector=encoder.encode(doc["description"]).tolist(),
            payload=doc,
        ) for idx, doc in enumerate(data) # data is the variable holding all the wines
    ]
)

In [68]:
user_prompt = "Suggest me an amazing Malbec wine from Argentina"

In [69]:
# Search time for awesome wines!

hits = qdrant.query_points(
    collection_name="top_wines",
    query=encoder.encode(user_prompt).tolist(),
    limit=3
)

In [70]:
for hit in hits.points:
  print(hit.payload, "score:", hit.score)

{'Unnamed: 0': 87609, 'country': 'France', 'description': 'An expressive wine, with solid, dense and powerful tannins. With a spicy, leathery, dry textured feel, and perfumed fruits supporting it, this is a wine for the seriously long-term. It is black and solid, an impressive evocation of the structure of great Malbec in Cahors. Keep for at least eight years.', 'designation': 'K-2', 'points': 95, 'price': 60.0, 'province': 'Southwest France', 'region_1': 'Cahors', 'region_2': nan, 'taster_name': 'Roger Voss', 'taster_twitter_handle': '@vossroger', 'title': 'Clos Troteligotte 2011 K-2 Malbec (Cahors)', 'variety': 'Malbec', 'winery': 'Clos Troteligotte'} score: 0.7267671486647858
{'Unnamed: 0': 84393, 'country': 'Argentina', 'description': "A little bit baked, but overall it's a good, semirich Malbec with roasted, dark fruit flavors and a backdrop of chocolate. The palate has slightly more acidity than you might expect, so it doesn't come across heavy or out of whack. Regular wine done 

In [71]:
# define a variable to hold the search results
search_results = [hit.payload for hit in hits.points]

In [73]:
from google.colab import userdata
import os
OPENAI_API_KEY=userdata.get('OPENAI_API_KEY')
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

In [74]:
# Now time to connect to the local large language model
from openai import OpenAI
client = OpenAI(api_key = os.environ.get('OPENAI_API_KEY'))

In [77]:
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
        {"role": "user", "content": "Suggest me an amazing Malbec wine from Argentina"},
        {"role": "assistant", "content": str(search_results)}
    ]
)
out=completion.choices[0].message.content
print(out)

For an amazing Malbec from Argentina, I recommend trying the **Four Aces 2008 Malbec** from Mendoza. 

### Four Aces 2008 Malbec
- **Region:** Mendoza Province, Mendoza
- **Points:** 86
- **Price:** $13.00
- **Description:** This Malbec features roasted dark fruit flavors with a backdrop of chocolate. It has a slightly higher acidity which gives it a refreshing quality, balancing the richness of the fruit. Overall, it is a well-rounded and approachable wine.

If you're looking for a more premium option, consider the **Catena Zapata Malbec Argentino**, which is known for its depth and complexity, showcasing dark fruit notes along with hints of floral and spice. 

Would you like more recommendations or details about a specific wine?


In [76]:
from IPython.display import Markdown, display

In [79]:
display(Markdown(out))

For an amazing Malbec from Argentina, I recommend trying the **Four Aces 2008 Malbec** from Mendoza. 

### Four Aces 2008 Malbec
- **Region:** Mendoza Province, Mendoza
- **Points:** 86
- **Price:** $13.00
- **Description:** This Malbec features roasted dark fruit flavors with a backdrop of chocolate. It has a slightly higher acidity which gives it a refreshing quality, balancing the richness of the fruit. Overall, it is a well-rounded and approachable wine.

If you're looking for a more premium option, consider the **Catena Zapata Malbec Argentino**, which is known for its depth and complexity, showcasing dark fruit notes along with hints of floral and spice. 

Would you like more recommendations or details about a specific wine?