Welcome to the interactive guide on Retrieval Augmented Generation (RAG). In this manual, we will walk you through the entire process of setting up and using RAG to enhance the capabilities of large language models (LLMs) with real-time data retrieval. By integrating cutting-edge technologies like vector databases and advanced NLP models, RAG allows you to tailor AI responses to specific data queries, making it a powerful tool for developers and data scientists. Follow the steps outlined below to build a RAG system from scratch, using practical tools and examples to bring your data to life.

In [45]:
import pandas as pd
df = pd.read_csv('Demo_fashion_products.csv')
df = df[df['description'].notna()] # remove any NaN values as it blows up serialization
data = df.to_dict('records')
df

Unnamed: 0,name,sku,mpn,price,in_stock,currency,brand,description,images,gender
0,DKNY Unisex Black & Grey Printed Medium Trolle...,10017413.0,10017413.0,11745.0,True,INR,DKNY,"Black and grey printed medium trolley bag, sec...",http://assets.myntassets.com/assets/images/100...,Unisex
1,EthnoVogue Women Beige & Grey Made to Measure ...,10016283.0,10016283.0,5810.0,True,INR,EthnoVogue,Beige & Grey made to measure kurta with churid...,http://assets.myntassets.com/assets/images/100...,Women
2,SPYKAR Women Pink Alexa Super Skinny Fit High-...,10009781.0,10009781.0,899.0,True,INR,SPYKAR,Pink coloured wash 5-pocket high-rise cropped ...,http://assets.myntassets.com/assets/images/100...,Women
3,Raymond Men Blue Self-Design Single-Breasted B...,10015921.0,10015921.0,5599.0,True,INR,Raymond,Blue self-design bandhgala suitBlue self-desig...,http://assets.myntassets.com/assets/images/pro...,Men
4,Parx Men Brown & Off-White Slim Fit Printed Ca...,10017833.0,10017833.0,759.0,True,INR,Parx,"Brown and off-white printed casual shirt, has ...",http://assets.myntassets.com/assets/images/pro...,Men
...,...,...,...,...,...,...,...,...,...,...
144,Parx Men Blue Slim Fit Checked Casual Shirt,10017599.0,10017599.0,699.0,True,INR,Parx,"Blue checked casual shirt, has a spread collar...",http://assets.myntassets.com/assets/images/pro...,Men
145,JBN Creation Boys Maroon & Brown Sherwani Set,10002785.0,10002785.0,1154.0,True,INR,JBN Creation,Maroon and brown sherwaniMaroon woven design s...,http://assets.myntassets.com/assets/images/pro...,Boys
146,U.S. Polo Assn. Kids Boys Navy Hooded Sweatshirt,1000905.0,1000905.0,1234.0,True,INR,U.S. Polo Assn. Kids,"Navy sweatshirt with applique', has an attache...",http://assets.myntassets.com/assets/images/100...,Boys
147,Bvlgari Women Omnia Paraiba Eau De Toilette Pe...,10001445.0,10001445.0,7000.0,True,INR,Bvlgari,Bvlgari Omnia Paraiba Eau De Toilette captures...,http://assets.myntassets.com/assets/images/pro...,Women


In [46]:
from qdrant_client import models, QdrantClient - https://github.com/qdrant/qdrant
from sentence_transformers import SentenceTransformer # https://huggingface.co/sentence-transformers

In [47]:
encoder = SentenceTransformer('all-MiniLM-L6-v2') # Model to create embeddings - https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

In [48]:
# create the vector database client
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

In [49]:
# Create collection to store top fashions
qdrant.recreate_collection(
    collection_name="top_fashion_products",
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance=models.Distance.COSINE
    )
)

True

In [50]:
# vectorize!
qdrant.upload_points(
    collection_name="top_fashion_products",
    points=[
        models.PointStruct(
            id=idx,
            vector=encoder.encode(doc["description"]).tolist(),
            payload=doc
        ) for idx, doc in enumerate(data) # data is the variable holding all the fashions
    ]
)

In [53]:
user_prompt = "Suggest amazing formal pants"

In [54]:
# Search time!

hits = qdrant.search(
    collection_name="top_fashion_products",
    query_vector=encoder.encode(user_prompt).tolist(),
    limit=3
)
for hit in hits:
  print(hit.payload, "score:", hit.score)

{'name': 'SPYKAR Women Burgundy Alexa Super Skinny Fit High-Rise Clean Look Stretchable Ankle Jeans', 'sku': 10009695.0, 'mpn': 10009695.0, 'price': 899.0, 'in_stock': True, 'currency': 'INR', 'brand': 'SPYKAR', 'description': 'Burgundy coloured wash 5-pocket high-rise jeans, clean look, no fade, has a button and zip closure, and waistband with belt loops', 'images': 'http://assets.myntassets.com/assets/images/10009695/2019/7/3/b4eac087-58cb-4b1f-91b7-bd0e56c6e1701562148015751-SPYKAR-Women-Maroon-Super-Skinny-Fit-High-Rise-Clean-Look-St-1.jpg ~ http://assets.myntassets.com/assets/images/10009695/2019/7/3/daf89a15-4099-43e4-b8e7-81de776083a91562148015734-SPYKAR-Women-Maroon-Super-Skinny-Fit-High-Rise-Clean-Look-St-2.jpg ~ http://assets.myntassets.com/assets/images/10009695/2019/7/3/8d90cf23-f3d6-4dfe-a279-f9cf9e3893081562148015716-SPYKAR-Women-Maroon-Super-Skinny-Fit-High-Rise-Clean-Look-St-3.jpg ~ http://assets.myntassets.com/assets/images/10009695/2019/7/3/a286f3ec-63a5-421d-88af-7682

In [61]:
# For good practice lets store results in a var
search_results = [hit.payload for hit in hits]

Now we are implementing RAG

In [62]:
import openai

client = openai.AzureOpenAI(
        azure_endpoint="https://demoaistudio5526374213.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2023-03-15-preview",
        api_key="KEY",
        api_version="2024-02-15-preview"
    )
completion = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are chatbot, a clothing specialist. Your top priority is to help guide users into selecting amazing fashions and guide them with their requests."},
        {"role": "user", "content": "What is the price of formal pants that are Burgundy "},
        {"role": "assistant", "content": str(search_results)}
    ]
)
print(completion.choices[0].message)

ChatCompletionMessage(content="I'm here to help you find the perfect pair of formal pants. The SPYKAR burgundy-colored jeans you’ve mentioned are priced at ₹899 INR, but these are categorized as jeans and not formal pants. However, the Parx tapered fit brown trousers may serve as semi-formal or business-casual attire, and they're priced at ₹664 INR.\n\nPrices and availability can vary widely based on the retailer, current sales, promotions, and whether the item is new or on clearance. If you're looking for true formal pants specifically in burgundy, I can help you locate options that best fit your request. Do keep in mind that prices can vary, and it would be best to visit the store of your preference or its online platform for the most accurate and up-to-date information.", role='assistant', function_call=None, tool_calls=None)
