# MONTHLY MISSION PROJECT FROM SUPERDATASCIENCE

#### This project is made up of 3 Levels: The initiate level, The Specialist(level 2) and The Operative(level 3) 

####  I'll attend the three levels  and make my work easy to understand

#### Business statement: This project involves using the provided files: company.txt, product.txt, and shipping.txt—from SuperDataScience to develop a retrieval augmented generation (RAG) system. The goal is to create a solution that can answer specific questions by leveraging the context within these files. The objective is to help the organization improve time management and operational efficiency 

## Level 1: The Initiate

 -- Requirements:

 * Use HuggingFace for LLM interaction
* Support 2 languages (English + 1 other)
* Basic text input/output interface

#### At the end of this level the system should be able to answer the following questions seamlessy
 * What is the name of the company?
   * What types of products do they sell?

In [2]:
import os
import pandas as pd
import torch
import transformers
import google.generativeai as genai
import numpy as np

## Reading the document and storing into a dataframe

In [3]:

# Initialize an empty DataFrame with columns 'Title' and 'Text'

df = pd.DataFrame(columns=['Title', 'Text'])

title_mapping = {
    'company.txt': 'company_data',
    'products.txt':'product_data',
    'shipping.txt': 'shipping_data'
}

# Loop through each file in the current directory
for file_name in os.listdir('.'):
    # Check if the file is in the mapping
    if file_name in title_mapping:
        try:
            with open(file_name, 'r', encoding='utf-8') as file:                
                # Read the content of the .txt file and replace newline with spaces
                text = file.read().replace('\n', ' ') 
                custom_title = title_mapping[file_name]
                new_row = pd.DataFrame({'Title': [custom_title], 'Text': [text]})
                df = pd.concat([df, new_row], ignore_index=True)
        except Exception as e:
            print(f"Error processing file {file_name}: {e}")


In [4]:
df

Unnamed: 0,Title,Text
0,product_data,# TechStyle Global - Product Catalog ## Consu...
1,company_data,# TechStyle Global - Company Overview Founded...
2,shipping_data,# TechStyle Global - Shipping Policy ## Deliv...


In [5]:
df['Text'][0]

'# TechStyle Global - Product Catalog  ## Consumer Electronics  ### Smartphones & Accessories - TechStyle Smartphone Stand Pro   - Price: $29.99   - Adjustable aluminum stand   - Compatible with all smartphones   - Colors: Silver, Black, Rose Gold  - Premium Screen Protector Set   - Price: $19.99   - 9H hardness tempered glass   - Oleophobic coating   - Pack of 3  - Wireless Charging Pad   - Price: $39.99   - 15W fast charging   - LED charging indicator   - Compatible with Qi devices  ### Laptops & Computing - PowerBook Pro 15"   - Price: $1299.99   - Intel i7 processor   - 16GB RAM   - 512GB SSD   - NVIDIA Graphics  - UltraBook Air 13"   - Price: $899.99   - AMD Ryzen 5   - 8GB RAM   - 256GB SSD   - 18-hour battery life  - Laptop Accessories Bundle   - Price: $79.99   - Wireless mouse   - Laptop sleeve   - USB-C hub   - Cleaning kit  ## Smart Home Devices  ### Security - SmartCam Pro   - Price: $149.99   - 2K resolution   - Night vision   - Two-way audio   - Cloud storage included  - 

In [6]:
df['Text'][1]

"# TechStyle Global - Company Overview  Founded in 2015, TechStyle Global is a leading e-commerce marketplace specializing in consumer electronics, fashion, and lifestyle products. Our platform connects millions of customers worldwide with quality products at competitive prices.  ## Our Story TechStyle began as a small online electronics store in San Francisco, founded by tech entrepreneurs Sarah Chen and Marcus Rodriguez. Their vision was to create a seamless shopping experience that bridges the gap between technology and lifestyle products. Within five years, we expanded globally, now serving customers in over 50 countries.  ## Mission Statement To provide accessible, innovative, and sustainable shopping solutions that enhance people's daily lives through technology and style.  ## Core Values - Customer First: Every decision we make starts with our customers - Innovation: Constantly improving our platform and services - Sustainability: Committed to reducing our environmental impact -

In [7]:
df['Text'][2]

'# TechStyle Global - Shipping Policy  ## Delivery Options  ### Standard Shipping - Delivery Time: 3-5 business days - Cost: Free for orders over $50 - Orders under $50: $4.99 - Available in: United States and Canada - Tracking provided via email  ### Express Shipping - Delivery Time: 1-2 business days - Cost: $9.99 - Free for TechStyle Plus members - Available in: United States and Canada - Real-time tracking via app and email  ### International Shipping - Delivery Time: 7-14 business days - Cost: Calculated based on:   - Destination country   - Package weight   - Package dimensions - Available in: 50+ countries - Tracking provided where available  ## Order Processing  ### Processing Times - In-stock items: 1-2 business days - Custom orders: 3-5 business days - Pre-orders: As specified on product page - Business days are Monday-Friday, excluding holidays  ### Cut-off Times - Orders placed before 2 PM EST: Same-day processing - Orders placed after 2 PM EST: Next business day processing

## Step 2: Load Embedding Model

In [10]:
api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=api_key)

In [11]:
for model in genai.list_models():
    print(model.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-2.0-flash-exp
models/gemini-exp-1206
models/gemini-exp-1121
models/gemini-exp-1114
models/gemini-2.0-flash-thinking-exp
models/gemini-2.0-flash-thinking-exp-1219
models/learnlm-1.5-pro-experimental
models/embedding-001
models/text-embedding-004
models

## Step 3: Create Vector Embeddings

In [12]:
## using embedding-001

def embed_text(text):
    return genai.embed_content(model='models/embedding-001',content=text,
                             task_type='retrieval_document')['embedding']

In [13]:
embed_text(df)

[[0.036324266,
  -0.016104134,
  -0.06378887,
  0.0003293303,
  0.06889862,
  0.025572963,
  0.015267983,
  -0.030628,
  -0.001611677,
  0.04255618,
  -0.018528556,
  0.02613645,
  -0.020302687,
  0.035702106,
  0.00093939365,
  -0.018568495,
  0.024387764,
  0.021278517,
  0.026603155,
  -0.0232328,
  0.020398743,
  0.012248343,
  -0.0011772101,
  -0.014038618,
  0.01914998,
  -0.00991502,
  0.0027459278,
  -0.053401366,
  -0.046979614,
  0.0029287226,
  -0.0465679,
  0.004048616,
  -0.058430996,
  0.009856659,
  0.0022491156,
  -0.035116293,
  -0.029151557,
  0.014875902,
  -0.005494093,
  0.019807024,
  0.01414047,
  -0.017134167,
  -0.04283156,
  0.0017684806,
  0.008481331,
  -0.0111722825,
  -0.039881635,
  0.03966658,
  0.0041230447,
  -0.04116027,
  0.037181858,
  0.010318138,
  0.07368109,
  -0.043018118,
  0.0036252341,
  -0.05481147,
  0.044152737,
  -0.007069333,
  -0.008919668,
  0.01248939,
  -0.021858148,
  0.016412556,
  0.03117762,
  0.014073012,
  -0.05606764,
  -0.07

# storing the embeddings in my dataframe

## Step 4: Storing  Embeddings

In [14]:
df['Embeddings'] = df['Text'].apply(embed_text)

In [15]:
df

Unnamed: 0,Title,Text,Embeddings
0,product_data,# TechStyle Global - Product Catalog ## Consu...,"[0.04988395, -0.029523415, -0.012447186, -0.03..."
1,company_data,# TechStyle Global - Company Overview Founded...,"[0.068495, -0.018913778, -0.030388031, -0.0293..."
2,shipping_data,# TechStyle Global - Shipping Policy ## Deliv...,"[0.043810476, -0.02836824, -0.019548077, -0.03..."


## NOTE:  note that this can also be done using chromadb or pinecoin. In the specialist session i'll use chromadb to store embeddings 

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Title       3 non-null      object
 1   Text        3 non-null      object
 2   Embeddings  3 non-null      object
dtypes: object(3)
memory usage: 200.0+ bytes


## Step 5: Similarity Searching approach

In [17]:


def query_similarity_score(query,vector):
    '''
    INPUTS:
        query: str: The user prompt
        vector: array: The existing vector embedding from a document
    OUTPUT:
        score: float - Cosine similarity score
    '''
    query_embedding = embed_text(query)
    return np.dot(query_embedding,vector)

In [18]:
query = "What is the name of the company?"

In [19]:
df['Similarity'] = df['Embeddings'].apply(lambda vector: query_similarity_score(query,vector))

In [20]:
df

Unnamed: 0,Title,Text,Embeddings,Similarity
0,product_data,# TechStyle Global - Product Catalog ## Consu...,"[0.04988395, -0.029523415, -0.012447186, -0.03...",0.643179
1,company_data,# TechStyle Global - Company Overview Founded...,"[0.068495, -0.018913778, -0.030388031, -0.0293...",0.628646
2,shipping_data,# TechStyle Global - Shipping Policy ## Deliv...,"[0.043810476, -0.02836824, -0.019548077, -0.03...",0.656525


In [21]:
df.sort_values('Similarity',ascending=False)[['Title','Text']]

Unnamed: 0,Title,Text
2,shipping_data,# TechStyle Global - Shipping Policy ## Deliv...
0,product_data,# TechStyle Global - Product Catalog ## Consu...
1,company_data,# TechStyle Global - Company Overview Founded...


## Creating a simple function that returns the most similar document based on the user prompt 

In [22]:

def most_similar_document(query):
    df['Similarity'] = df['Embeddings'].apply(lambda vector: query_similarity_score(query,vector))
    title = df.sort_values('Similarity',ascending=False)[['Title','Text']].iloc[0]['Title']
    text = df.sort_values('Similarity',ascending=False)[['Title','Text']].iloc[0]['Text']
    return title,text

## Answer the case study Questions and  return the most similiar document that can answer them

In [23]:
query1 = "What is the name of the company?"

In [24]:
most_similar_document(query1)

('shipping_data',
 '# TechStyle Global - Shipping Policy  ## Delivery Options  ### Standard Shipping - Delivery Time: 3-5 business days - Cost: Free for orders over $50 - Orders under $50: $4.99 - Available in: United States and Canada - Tracking provided via email  ### Express Shipping - Delivery Time: 1-2 business days - Cost: $9.99 - Free for TechStyle Plus members - Available in: United States and Canada - Real-time tracking via app and email  ### International Shipping - Delivery Time: 7-14 business days - Cost: Calculated based on:   - Destination country   - Package weight   - Package dimensions - Available in: 50+ countries - Tracking provided where available  ## Order Processing  ### Processing Times - In-stock items: 1-2 business days - Custom orders: 3-5 business days - Pre-orders: As specified on product page - Business days are Monday-Friday, excluding holidays  ### Cut-off Times - Orders placed before 2 PM EST: Same-day processing - Orders placed after 2 PM EST: Next busi

In [25]:
query2 = "What types of products do they sell?"

In [26]:
most_similar_document(query2)

('product_data',
 '# TechStyle Global - Product Catalog  ## Consumer Electronics  ### Smartphones & Accessories - TechStyle Smartphone Stand Pro   - Price: $29.99   - Adjustable aluminum stand   - Compatible with all smartphones   - Colors: Silver, Black, Rose Gold  - Premium Screen Protector Set   - Price: $19.99   - 9H hardness tempered glass   - Oleophobic coating   - Pack of 3  - Wireless Charging Pad   - Price: $39.99   - 15W fast charging   - LED charging indicator   - Compatible with Qi devices  ### Laptops & Computing - PowerBook Pro 15"   - Price: $1299.99   - Intel i7 processor   - 16GB RAM   - 512GB SSD   - NVIDIA Graphics  - UltraBook Air 13"   - Price: $899.99   - AMD Ryzen 5   - 8GB RAM   - 256GB SSD   - 18-hour battery life  - Laptop Accessories Bundle   - Price: $79.99   - Wireless mouse   - Laptop sleeve   - USB-C hub   - Cleaning kit  ## Smart Home Devices  ### Security - SmartCam Pro   - Price: $149.99   - 2K resolution   - Night vision   - Two-way audio   - Cloud st

## Step 6: Inject Text as Context using RAG

## Compiling the RAG system and prompt engineering 

In [27]:
def RAG(query):
    title,text = most_similar_document(query)
    model = genai.GenerativeModel('gemini-pro')
    prompt = f'Answer this query:\n{query}.\nOnly use this context to answer:\n{text}'
    response = model.generate_content(prompt)
    return f'{response.text}\n\nSource Document:{title}'

### Using the RAG the system to answer the questions correctly 

In [39]:
print(RAG("What is the name of the company?"))

TechStyle Global

Source Document:shipping_data


In [40]:
print(RAG("What types of products do they sell?"))

- Smartphones & Accessories
- Laptops & Computing
- Smart Home Devices
- Wearable Technology
- TechStyle Basics (Private Label)
- Special Offers

Source Document:product_data


## The system has answered our question correctly

# Level 2: The Specialist

 -- Requirements:

 * Use HuggingFace for LLM interaction
* langchain
* Basic text input/output interface

### At the end of this level the system should be able to Handle more complex customer queries using the txt files
* Product specific queries
* Shipping queries

In [66]:
# import neccessary libraries

import os
from langchain_google_genai import  ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.prompts  import PromptTemplate
from IPython.display import Markdown
from langchain.document_loaders import TextLoader
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain.vectorstores import Chroma
import torch

In [30]:
### Authenticating my GOOGLE ADC

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "horizontal-veld-446019-m0-cf19f12da986.json"

In [31]:
model = ChatGoogleGenerativeAI(model="gemini-pro")

In [32]:
embeddings = GoogleGenerativeAIEmbeddings(model = "models/embedding-001")

In [34]:
## Using the TextLoader to load and store them in a folder caller document 

file_paths = ["products.txt", "company.txt", "shipping.txt"]

# Load all text files
documents = []
for file_path in file_paths:
    loader = TextLoader(file_path)
    documents.extend(loader.load())

In [35]:
documents

[Document(metadata={'source': 'products.txt'}, page_content='# TechStyle Global - Product Catalog\n\n## Consumer Electronics\n\n### Smartphones & Accessories\n- TechStyle Smartphone Stand Pro\n  - Price: $29.99\n  - Adjustable aluminum stand\n  - Compatible with all smartphones\n  - Colors: Silver, Black, Rose Gold\n\n- Premium Screen Protector Set\n  - Price: $19.99\n  - 9H hardness tempered glass\n  - Oleophobic coating\n  - Pack of 3\n\n- Wireless Charging Pad\n  - Price: $39.99\n  - 15W fast charging\n  - LED charging indicator\n  - Compatible with Qi devices\n\n### Laptops & Computing\n- PowerBook Pro 15"\n  - Price: $1299.99\n  - Intel i7 processor\n  - 16GB RAM\n  - 512GB SSD\n  - NVIDIA Graphics\n\n- UltraBook Air 13"\n  - Price: $899.99\n  - AMD Ryzen 5\n  - 8GB RAM\n  - 256GB SSD\n  - 18-hour battery life\n\n- Laptop Accessories Bundle\n  - Price: $79.99\n  - Wireless mouse\n  - Laptop sleeve\n  - USB-C hub\n  - Cleaning kit\n\n## Smart Home Devices\n\n### Security\n- SmartCa

In [31]:
# Loop through the documents and print their metadata
for doc in documents:
    print(doc.metadata)


{'source': 'products.txt'}
{'source': 'company.txt'}
{'source': 'shipping.txt'}


In [36]:
## Creating my vector stores using  ChromaDB

vectordb = Chroma.from_documents(documents,embeddings)

In [9]:
vectordb

<langchain_community.vectorstores.chroma.Chroma at 0x7fc72c4bb400>

In [37]:
## Creating the retriver variable and returning top 2 similiar context

retriever = vectordb.as_retriever(search_kwargs={"k":2})

In [39]:
template ="""
You are a helpful AI assistant
Answer based on the context provided.
content:{context}
input:{input}
answer: 
"""


In [40]:
prompt = PromptTemplate.from_template(template)

####  Combine  model and prompt and chain together with the retriver chain.

In [41]:
combine_docs_chain = create_stuff_documents_chain(model, prompt)

In [42]:
retrieval_chain = create_retrieval_chain(retriever,combine_docs_chain)

In [45]:
Question1 = "What other package offered and how much?"

In [46]:
response1 = retrieval_chain.invoke({"input":Question1})

In [75]:
print(response1["answer"])

- Work From Home Bundle: $1499.99
- Smart Home Starter Bundle: $299.99


In [57]:
Question2 = "What products do they sell?"

In [58]:
response2 = retrieval_chain.invoke({"input":Question2})

In [65]:
display(Markdown(response2["answer"]))

TechStyle Global sells a wide range of products, including consumer electronics, fashion tech, lifestyle products, TechStyle Basics (private label tech accessories), and TechStyle Plus (premium membership program with exclusive benefits).

**Consumer Electronics:**
- Smartphones & Accessories
- Laptops & Computing
- Smart Home Devices

**Fashion Tech:**
- Wearable Technology
- Fitness Trackers
- Smartwatches

**Lifestyle Products:**
- Home Automation
- Fitness Tech

**TechStyle Basics (Private Label):**
- Cables & Chargers
- Audio

**TechStyle Plus (Premium Membership Program):**
- Exclusive benefits such as discounts, free shipping, early access to new products, extended warranty, and priority customer support

In [77]:
Question3 = "What's the name of the company "
response3 = retrieval_chain.invoke({"input":Question3})
display(Markdown(response3["answer"]))

TechStyle Global

In [78]:
Question4 = "list all the products and their prices "
response4 = retrieval_chain.invoke({"input":Question4})
display(Markdown(response4["answer"]))

**Smartphones & Accessories**
- TechStyle Smartphone Stand Pro: $29.99
- Premium Screen Protector Set: $19.99
- Wireless Charging Pad: $39.99

**Laptops & Computing**
- PowerBook Pro 15": $1299.99
- UltraBook Air 13": $899.99
- Laptop Accessories Bundle: $79.99

**Smart Home Devices**
- SmartCam Pro: $149.99
- Smart Door Lock: $199.99
- Smart Light Starter Kit: $89.99
- Smart Thermostat: $129.99

**Wearable Technology**
- FitStyle Pro: $149.99
- SportBand Lite: $79.99
- TechWatch Premium: $299.99
- TechWatch Lite: $199.99

**TechStyle Basics (Private Label)**
- Premium USB-C Cable: $14.99
- Power Bank 10000mAh: $34.99
- TechStyle Wireless Earbuds: $69.99
- TechStyle Over-Ear Headphones: $89.99

**Bundles**
- Work From Home Bundle: $1499.99
- Smart Home Starter Bundle: $299.99

In [79]:
Question5= "Where is the located "
response5 = retrieval_chain.invoke({"input":Question5})
display(Markdown(response5["answer"]))

The headquarters of TechStyle Global is located in San Francisco, California. They also have regional offices in London, Singapore, Sydney, and Toronto.

In [80]:
Question6= "Give me full address "
response6 = retrieval_chain.invoke({"input":Question6})
display(Markdown(response6["answer"]))

123 Innovation Drive
San Francisco, CA 94105
United States