# Task 5: Chatbot for Analytical Insights (Superstore Dataset)

**Objective:**  
- Upload Superstore dataset and interact with a chatbot to get analytical insights.
- Queries examples:
  - "Top 5 products by Sales"
  - "Most profitable Category"
  - "Total Quantity sold by Region"
- Technology: Python, Pandas, optional open-source LLM


## Step 2: Import Libraries


In [3]:
import pandas as pd
from transformers import pipeline





## Step 3: Load Superstore Dataset


In [5]:
df = pd.read_excel(r"C:\Users\shrut\Descriptive and Predictive Analysis with Interactive Dashboard\SuperStore Sales DataSet (1).xlsx")
df.head()

Unnamed: 0,Row ID+O6G3A1:R6,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Category,Sub-Category,Product Name,Sales,Quantity,Profit,Returns,Payment Mode,ind1,ind2
0,4918,CA-2019-160304,2019-01-01,2019-01-07,Standard Class,BM-11575,Brendan Murry,Corporate,United States,Gaithersburg,...,Furniture,Bookcases,"Bush Westfield Collection Bookcases, Medium Ch...",73.94,1,28.2668,,Online,,
1,4919,CA-2019-160304,2019-01-02,2019-01-07,Standard Class,BM-11575,Brendan Murry,Corporate,United States,Gaithersburg,...,Furniture,Bookcases,"Bush Westfield Collection Bookcases, Medium Ch...",173.94,3,38.2668,,Online,,
2,4920,CA-2019-160304,2019-01-02,2019-01-07,Standard Class,BM-11575,Brendan Murry,Corporate,United States,Gaithersburg,...,Technology,Phones,GE 30522EE2,231.98,2,67.2742,,Cards,,
3,3074,CA-2019-125206,2019-01-03,2019-01-05,First Class,LR-16915,Lena Radford,Consumer,United States,Los Angeles,...,Office Supplies,Storage,Recycled Steel Personal File for Hanging File ...,114.46,2,28.615,,Online,,
4,8604,US-2019-116365,2019-01-03,2019-01-08,Standard Class,CA-12310,Christine Abelman,Corporate,United States,San Antonio,...,Technology,Accessories,Imation Clip USB flash drive - 8 GB,30.08,2,-5.264,,Online,,


In [6]:
df.columns


Index(['Row ID+O6G3A1:R6', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
       'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State',
       'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name',
       'Sales', 'Quantity', 'Profit', 'Returns', 'Payment Mode', 'ind1',
       'ind2'],
      dtype='object')

## Step 4: Optional LLM Initialization


In [8]:
# Optional: small text-generation model
chatbot_llm = pipeline("text-generation", model="gpt2", tokenizer="gpt2")


Device set to use cpu


## Step 5: Chatbot Function (Pandas-based)


In [10]:
def superstore_chatbot(query):
    query_lower = query.lower()
    
    # Top 5 products by Sales
    if "top" in query_lower and "sales" in query_lower:
        try:
            result = df.sort_values('Sales', ascending=False).head(5)
            return result[['Product Name','Sales','Category','Region']]
        except:
            return "Error: 'Sales' column missing."
    
    # Most profitable Categories
    elif "most profitable" in query_lower:
        try:
            result = df.groupby('Category')['Profit'].sum().sort_values(ascending=False).head(5)
            return result
        except:
            return "Error: 'Profit' or 'Category' missing."
    
    # Total Quantity by Region
    elif "total quantity" in query_lower and "region" in query_lower:
        try:
            result = df.groupby('Region')['Quantity'].sum()
            return result
        except:
            return "Error: 'Quantity' or 'Region' missing."
    
    # Optional LLM response for other queries
    else:
        response = chatbot_llm(query, max_new_tokens=100, truncation=True, do_sample=True)
        return response[0]['generated_text']


## Step 6: Interact with Chatbot


In [12]:
while True:
    query = input("Ask your question (type 'exit' to stop): ")
    if query.lower() == 'exit':
        break
    response = superstore_chatbot(query)
    print(response)


Ask your question (type 'exit' to stop):  Top 5 Country by Sales


                                          Product Name    Sales  \
430   3D Systems Cube Printer, 2nd Generation, Magenta  9099.93   
5236             Canon imageCLASS 2200 Advanced Copier  5517.97   
2654         GBC DocuBind P400 Electric Binding System  5455.96   
1694              Hewlett Packard LaserJet 3310 Copier  5399.91   
4735  3D Systems Cube Printer, 2nd Generation, Magenta  5234.96   

             Category   Region  
430        Technology     East  
5236       Technology     East  
2654  Office Supplies  Central  
1694       Technology     East  
4735       Technology     East  


Ask your question (type 'exit' to stop):  Top 5 Country by Profit


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Top 5 Country by Profit

Country Rank Country Rank 1 USA $8,000 $32,950 2 Canada $11,000 $32,950 3 Netherlands $11,000 $32,950 4 Norway $11,000 $32,950 5 Germany $11,000 $32,950 6 Sweden $12,000 $32,950 7 France $12,000 $32,950 8 Denmark $12,000 $32,950 9 Belgium $12,000 $32,950 10 Italy $12


Ask your question (type 'exit' to stop):  Total Sales by Region


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Total Sales by Region

Inventory in the United States is estimated to be $3.3 trillion (up 35% from $4.2 trillion in 2015). The overall sales of inventory in the United States is estimated to be $3.8 trillion (up 47% from $3.8 trillion in 2015). The overall sales of inventory in the United States is estimated to be $3.1 trillion (up 35% from $3.1 trillion in 2015).

Inventory per Unit of Goods


Ask your question (type 'exit' to stop):  Average Profit by Category


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Average Profit by Category

Here are some more table-specific statistics for this category:

Category Average Profit by Category Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit by Category Average Profit


Ask your question (type 'exit' to stop):  Top 5 States with highest Sales


                                          Product Name    Sales  \
430   3D Systems Cube Printer, 2nd Generation, Magenta  9099.93   
5236             Canon imageCLASS 2200 Advanced Copier  5517.97   
2654         GBC DocuBind P400 Electric Binding System  5455.96   
1694              Hewlett Packard LaserJet 3310 Copier  5399.91   
4735  3D Systems Cube Printer, 2nd Generation, Magenta  5234.96   

             Category   Region  
430        Technology     East  
5236       Technology     East  
2654  Office Supplies  Central  
1694       Technology     East  
4735       Technology     East  


Ask your question (type 'exit' to stop):  Which City has the highest Returns


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Which City has the highest Returns per Taxpayer of any municipality in the world.

The City's returns are based on a taxonomy of the largest four cities in the world. The largest city in the world is Detroit, with a return of $39.05. The smallest city in the world is Zurich, Switzerland ($31.46).

In 2011, the average return per taxpayer of the City of Detroit was $15, a return of $1.53.

As of 2011, the City


Ask your question (type 'exit' to stop):  Top 10 Product Name by Sales


                                          Product Name    Sales  \
430   3D Systems Cube Printer, 2nd Generation, Magenta  9099.93   
5236             Canon imageCLASS 2200 Advanced Copier  5517.97   
2654         GBC DocuBind P400 Electric Binding System  5455.96   
1694              Hewlett Packard LaserJet 3310 Copier  5399.91   
4735  3D Systems Cube Printer, 2nd Generation, Magenta  5234.96   

             Category   Region  
430        Technology     East  
5236       Technology     East  
2654  Office Supplies  Central  
1694       Technology     East  
4735       Technology     East  


Ask your question (type 'exit' to stop):  exit
