<a href="https://colab.research.google.com/github/Menna0Ameen/TRIAL/blob/main/AI_project_Hugging_Face.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Install Python libraries for NLP and machine learning**

In [None]:
!pip install transformers datasets torch pandas

Collecting datasets
  Downloading datasets-3.3.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.

**Load Microsoft's DialoGPT-small model for conversational AI and using tokenizer**

In [None]:
!pip install langchain-community # Install langchain-community package
from langchain.llms import HuggingFacePipeline # Import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline # Import the pipeline function

# Load the small conversational model
model_name = "microsoft/DialoGPT-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Create an LLM pipeline for LangChain
chatbot_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer, max_length=200)
llm = HuggingFacePipeline(pipeline=chatbot_pipeline)


Collecting langchain-community
  Downloading langchain_community-0.3.17-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.7.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/641 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/351M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cpu
  llm = HuggingFacePipeline(pipeline=chatbot_pipeline)


**Generate a Scalable Product Catalog and save to json file**

In [None]:
import random
import json

# Sample categories, brands, and attributes
categories = ["Electronics", "Furniture", "Stationery", "Clothing", "Accessories", "Home Appliances", "Laptops"]
brands = ["Apple", "Samsung", "Sony", "LG", "HP", "Dell", "IKEA", "Nike", "Adidas"]
delivery_times = ["Same-day", "Next-day", "2-3 days", "1 week"]

# Generate an enhanced product catalog
def generate_product_catalog(num_items):
    catalog = []
    for i in range(1, num_items + 1):
        product = {
            "id": i,
            "name": f"Product {i}",
            "brand": random.choice(brands),
            "price": round(random.uniform(5, 2000), 2),
            "category": random.choice(categories),
            "stock": random.randint(0, 200),
            "rating": round(random.uniform(1, 5), 1),  # Ratings between 1.0 and 5.0
            "discount": random.randint(0, 50),         # Discount percentage
            "delivery_time": random.choice(delivery_times),
        }
        catalog.append(product)
    return catalog




# Convert to DataFrame
product_catalog = generate_product_catalog(1000)
print(f"Catalog size: {len(product_catalog)} items")

# Save the catalog to a JSON file for reuse
with open("product_catalog.json", "w") as f:
    json.dump(product_catalog, f, indent=4)

Catalog size: 1000 items


**Complex Query Handling**

In [11]:
import pandas as pd
!pip install langchain
from langchain import PromptTemplate, LLMChain # Import PromptTemplate and LLMChain


# Load JSON file into a DataFrame
df_catalog = pd.read_json("product_catalog.json")

# Function to filter products based on user queries
def find_products(query):
    """Filter products based on price, category, delivery_time, brand, rating, and discount."""
    query_lower = query.lower()

    # Extract price condition (e.g., "under $500")   ###
    price_limit = None
    if "under $" in query_lower:
        try:
            price_limit = int(query_lower.split("under $")[1].split()[0])
        except ValueError:
            pass

    # Extract category
    category = None
    for cat in df_catalog["category"].unique():
        if cat.lower() in query_lower:
            category = cat

    # Extract delivery_time
    delivery_time = None
    for d in df_catalog["delivery_time"].unique():
        if d.lower() in query_lower:
            delivery_time = d

    # Extract brand
    brand = None
    for b in df_catalog["brand"].unique():
        if b.lower() in query_lower:
            brand = b

    # Extract rating condition (e.g., "above 4 stars")
    min_rating = None
    if "rated above" in query_lower:
        try:
            min_rating = float(query_lower.split("rated above ")[1].split()[0])
        except ValueError:
            pass

    # Extract discount condition (e.g., "at least 20% off")
    min_discount = None
    if "at least" in query_lower and "%" in query_lower:
        try:
            min_discount = int(query_lower.split("at least ")[1].split("%")[0])
        except ValueError:
            pass

    # Apply filters
    filtered = df_catalog[
        (df_catalog["price"] <= price_limit if price_limit else True) &  #if None so no filter is based on this section
        (df_catalog["category"] == category if category else True) &
        (df_catalog["delivery_time"] == delivery_time if delivery_time else True) &
        (df_catalog["brand"] == brand if brand else True) &
        (df_catalog["rating"] >= min_rating if min_rating else True) &
        (df_catalog["discount"] >= min_discount if min_discount else True) &
        (df_catalog["stock"] >= 0)  # Ensure in-stock items   #Ensures only in-stock items are included in the results. This condition is always applied.
    ]

    return filtered.head(5)  # Return top 5 results

# Test the new function with multiple conditions:
print(find_products("Find me Apple laptops under $1000"))
#print(find_products("Find me Sony TVs  delivered on Same-day and under $500"))
#print(find_products("Find me Sony TVs that will be delivered on Same-day and under $500"))
#print(find_products("Find me Furniture with at least 20% off"))
#print(find_products("Find me Electronics rated above 4 stars"))
#print(find_products("Find me Sony TVs under $500 rated above 4.5 stars with at least 30% off"))
#print(find_products("Find me Apple laptops under $1000 rated above 4 stars with at least 20% off"))


      id         name  brand   price category  stock  rating  discount  \
42    43   Product 43  Apple  141.32  Laptops    162     4.3        11   
204  205  Product 205  Apple  904.89  Laptops     68     1.4        44   
408  409  Product 409  Apple  900.30  Laptops    125     1.4        16   
411  412  Product 412  Apple  864.49  Laptops     36     5.0        44   
437  438  Product 438  Apple  147.24  Laptops     65     4.5        45   

    delivery_time  
42       Same-day  
204      2-3 days  
408      Next-day  
411      Same-day  
437      Next-day  


**Version 1 for prompt chatbot**

In [13]:
# Define a prompt template
prompt = PromptTemplate(
    input_variables=["query"],
    template="I am a helpful shopping assistant.\n{query}"
)

# Create LangChain LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

def classify_query(user_query):
    """Classifies the query type based on keywords."""
    user_query = user_query.lower()

    if any(word in user_query for word in ["find", "show", "recommend", "under", "cheapest", "I need"]):
        return "product_search"
    elif any(word in user_query for word in ["stock", "available", "in stock","in-stock"]):
        return "availability_check"
    elif any(word in user_query for word in ["deliver", "shipping", "arrive"]):
        return "delivery_check"
    else:
        return "general"

# Global variable to store the last searched products
filtered_products = pd.DataFrame()  # Empty DataFrame at the start

def chat_with_bot(user_query):
    """Handles user queries with chatbot and product recommendations."""
    global filtered_products  # Access the global variable
    query_type = classify_query(user_query)

    if query_type == "product_search":
        # Search for products
        filtered_products = find_products(user_query)

        if not filtered_products.empty:
            product_list = "\n".join([f"{row['name']} - ${row['price'] } - {row['category'] } - {row['brand'] } -  {row['rating'] } - {row['discount'] }" for _, row in filtered_products.iterrows()])
            response_text = f"Here are some options:\n{product_list}"
        else:
            response_text = "Sorry, no matching products found."

    elif query_type == "availability_check":
        if not filtered_products.empty:
            in_stock = [f"Checking stock for: {row['name']} - Stock: {row['stock']}" for _, row in filtered_products.iterrows() if row["stock"] > 0]
            response_text = "\n".join(in_stock) if in_stock else "None of these items are currently in stock."

        else:
            response_text = "Please search for a product first."

    elif query_type == "delivery_check":
        #if 'filtered_products' in locals() and not filtered_products.empty:
        if not filtered_products.empty:
            fast_delivery = [row["name"] for _, row in filtered_products.iterrows() if row["delivery_time"] == "Next Day" or row["delivery_time"] == "Same-day"]
            response_text = f"These items can be delivered today or tomorrow maximum: {', '.join(fast_delivery)}" if fast_delivery else "None of these items can be delivered today or tomorrow."
        else:
            response_text = "Please search for a product first."

    else:
        response_text = "I'm here to help! You can ask me to find products, check availability, or delivery options."

    # Generate chatbot response
    chatbot_reply = chain.run(response_text)

    return chatbot_reply

############################################## Test chatbot################################################################
exit_words = {"thank you", "bye", "exit", "quit","thanks", "thanks!", "thanks alot", "ok"}

while True:
    inquiry = input("Enter your inquiry:  ").strip().lower()
    if inquiry in exit_words:
        print("Any time. Goodbye!")
        break
    print(chat_with_bot(inquiry))

#Samples of questions:

#Group 1 questions:
#print(chat_with_bot("Hi"))
#print(chat_with_bot("Show me Apple Laptops under $500"))
#print(chat_with_bot("Can these be delivered tomorrow?"))

#Group 2 questions:
#print(chat_with_bot("I want to ask about something"))
#print(find_products("I need Sony Accessories under $500"))
#print(chat_with_bot("How many items are available?"))

#Group 3 questions:
#print(find_products("Find me Furniture with at least 20% off"))
#print(chat_with_bot("Are these in stock?"))
#print(chat_with_bot("ok, thank you"))
#print(chat_with_bot("thank you"))

#Group 4 questions:
#print(chat_with_bot("Good evening!"))
#print(find_products("Find me Sony Laptops under $500 rated above 4.5 stars with at least 30% off"))
#print(chat_with_bot("Are these in stock?"))
#print(chat_with_bot("Can these be delivered today?"))
#print(chat_with_bot("I have another inqury"))
#print(find_products("Find me Electronics rated above 4 stars"))
#print(chat_with_bot("How many items are in stock?"))
#print(chat_with_bot("thanks"))

#Group 5 questions:
#print(chat_with_bot("Find me Sony laptops that will be delivered on Same-day and under $500"))

#Group 6 questions:
#print(chat_with_bot("Are these in stock?"))
#print(chat_with_bot("ok"))


Enter your inquiry:  bye
Any time. Goodbye!


**Version 2 for prompt chatbot**

In [None]:
import os  # Import os to check file existence

# File to save filtered products
filtered_products_file = "filtered_products.csv"

# Load last saved filtered products if file exists
if os.path.exists(filtered_products_file):
    filtered_products = pd.read_csv(filtered_products_file)
else:
    filtered_products = pd.DataFrame()  # Start with an empty DataFrame



# Define a prompt template
prompt = PromptTemplate(
    input_variables=["query"],
    template="I am a helpful shopping assistant.\n{query}"
)


# Create LangChain LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

def classify_query(user_query):
    """Classifies the query type based on keywords."""
    user_query = user_query.lower()

    if any(word in user_query for word in ["find", "show", "recommend", "under", "cheapest", "I need"]):
        return "product_search"
    elif any(word in user_query for word in ["stock", "available", "in stock","in-stock"]):
        return "availability_check"
    elif any(word in user_query for word in ["deliver", "shipping", "arrive"]):
        return "delivery_check"
    else:
        return "general"

def chat_with_bot(user_query):
    """Handles user queries with chatbot and product recommendations."""
    global filtered_products  # Access the global variable
    query_type = classify_query(user_query)

    if query_type == "product_search":
        # Search for products
        filtered_products = find_products(user_query)

        if not filtered_products.empty:
            product_list = "\n".join([f"{row['name']} - ${row['price']} - {row['category']} - {row['brand']} - {row['rating']} - {row['discount']}" for _, row in filtered_products.iterrows()])
            response_text = f"Here are some options:\n{product_list}"
            # Save filtered products to CSV
            filtered_products.to_csv(filtered_products_file, index=False)
        else:
            response_text = "Sorry, no matching products found."

    elif query_type == "availability_check":
        if not filtered_products.empty:
            in_stock = [f"Checking stock for: {row['name']} - Stock: {row['stock']}" for _, row in filtered_products.iterrows() if row["stock"] > 0]
            response_text = "\n".join(in_stock) if in_stock else "None of these items are currently in stock."
        else:
            response_text = "Please search for a product first."

    elif query_type == "delivery_check":
        if not filtered_products.empty:
            fast_delivery = [row["name"] for _, row in filtered_products.iterrows() if row["delivery_time"] == "Next Day" or row["delivery_time"] == "Same-day"]
            response_text = f"These items can be delivered today or tomorrow maximum: {', '.join(fast_delivery)}" if fast_delivery else "None of these items can be delivered today or tomorrow."
        else:
            response_text = "Please search for a product first."

    else:
        response_text = "I'm here to help! You can ask me to find products, check availability, or delivery options."

    # Generate chatbot response
    chatbot_reply = chain.run(response_text)

    return chatbot_reply

############################################## Test chatbot################################################################
exit_words = {"thank you", "bye", "exit", "quit","thanks", "thanks!", "thanks alot", "ok"}

while True:
    inquiry = input("Enter your inquiry:  ").strip().lower()
    if inquiry in exit_words:
        print("Any time. Goodbye!")
        break
    print(chat_with_bot(inquiry))

#Samples of questions:

#Group 1 questions:
#print(chat_with_bot("Hi"))
#print(chat_with_bot("Show me Apple Laptops under $500")) ////////////////////
#print(chat_with_bot("Can these be delivered tomorrow?"))

#Group 2 questions:
#print(chat_with_bot("I want to ask about something")) ///////////////////////
#print(find_products("I need Sony Accessories under $500"))//////////////////////
#print(chat_with_bot("How many items are available?"))

#Group 3 questions:
#print(find_products("Find me Furniture with at least 20% off"))////////////////////////
#print(chat_with_bot("Are these in stock?"))
#print(chat_with_bot("ok, thank you"))
#print(chat_with_bot("thank you"))////////////////////////////////////

#Group 4 questions:
#print(chat_with_bot("Good evening!"))//////////////////////////
#print(find_products("Find me Sony Laptops under $500 rated above 4.5 stars with at least 30% off"))
#print(chat_with_bot("Are these in stock?"))
#print(chat_with_bot("Can these be delivered today?"))
#print(chat_with_bot("I have another inqury"))
#print(find_products("Find me Electronics rated above 4 stars"))
#print(chat_with_bot("How many items are in stock?"))
#print(chat_with_bot("thanks"))

#Group 5 questions:
#print(chat_with_bot("Find me Sony laptops that will be delivered on Same-day and under $500"))

#Group 6 questions:
#print(chat_with_bot("Are these in stock?"))
#print(chat_with_bot("ok"))


Enter your inquiry:  Are these in stock?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I am a helpful shopping assistant.
Checking stock for: Product 43 - Stock: 137
Checking stock for: Product 53 - Stock: 71
Checking stock for: Product 91 - Stock: 4
Checking stock for: Product 103 - Stock: 160
Checking stock for: Product 105 - Stock: 36
Enter your inquiry:  ok
Any time. Goodbye!
