## 🍕 Pizza Sales– Data Analysis With GENERATIVE AI

### 📌 Project Overview
This project aims to analyze key performance indicators (KPIs) for a pizza business using sales transaction data. The primary goal is to gain insights into sales performance and customer behavior using Python based analysis.


### 🤖How Generative AI will help

With the power of my **personal LLM-based Gen AI Library** to derive deeper, faster, and more contextual business insights from pizza sales data.

Our primary aim is not only to calculate KPIs but also to demonstrate how **Gen AI** can dramatically improve business intelligence workflows.


### 🧬 Why This Matters

By integrating traditional data science with Gen AI capabilities, we gain:

- **Speed** – Faster turnaround on data questions
- **Depth** – Discover insights that might be missed manually
- **Adaptability** – With time the llm model learns about the insights.

❓ Problems We Aim to Solve

We'll answer the following critical business questions:

> ✅ **Performance KPIs**
- What is the **total revenue** from pizza sales?
- What is the **average order value (AOV)**?
- How many **pizzas were sold** overall?
- How many **orders were placed**?
- What is the **average number of pizzas per order**?

> 🔍 **Business Insights**
- Which pizzas are the **best-selling**?
- On which days do we have **peak sales**?
- Are certain pizza sizes or types more **profitable**?
- Can we identify **seasonal trends** or **order patterns**?

Tech Stack
- Custom GEN AI Library
- Python 
- Pandas
- Matplotlib & Seaborn 📊
- Jupyter Notebook 📓
- Plotly, Matplotlib, Seaborn (for interactive visuals) 📈

LIBRARIES

In [1]:
import sys
import os
sys.path.append(os.path.abspath(".."))

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime

#CUSTOM GEN AI LIBRARY
from genai_dataanalyst.assistant import AnalystAssistant
assistant = AnalystAssistant()


DATA SET

>ABOUT THE DATA SET

The dataset provides a comprehensive snapshot of pizza sales transactions over time, including details such as order IDs, pizza types, quantities sold, prices, order timestamps, and categorical attributes like pizza size and category. It consists of nearly 49,000 rows, covering a broad range of customer orders, making it well-suited for KPI analysis and business intelligence tasks.  and AI augmented insight generation.


In [3]:
raw_data = pd.read_excel("/workspaces/GEN-AI-DATA-ANALYST/data/sales.xlsx")

In [4]:
def data_overview(df, sample_size=10000, top_n=5):
    print("📊 BASIC INFO")
    print(f"- Shape: {df.shape}")
    print(f"- Columns: {list(df.columns[:10])}...")  # show only first 10
    print(f"- Memory Usage: ~{df.memory_usage(deep=False).sum() / 1024**2:.2f} MB\n")

    print("MISSING VALUES (%):")
    missing = df.isnull().mean() * 100
    print(missing[missing > 0].sort_values(ascending=False).head(10), "\n")

    print("UNIQUE VALUES (Top 10 Columns):")
    unique_counts = df.nunique().sort_values(ascending=False).head(10)
    print(unique_counts, "\n")

    print("NUMERICAL STATS (Sampled if > sample_size):")
    df_sample = df.sample(sample_size) if len(df) > sample_size else df
    print(df_sample.describe(include=[np.number]).T, "\n")

    print("CATEGORICAL PREVIEW:")
    cat_cols = df.select_dtypes(include=['object', 'category']).columns
    for col in cat_cols[:20]:  # process only first 10 for speed
        print(f"\n🔹 Column: {col}")
        print(f" - Unique: {df[col].nunique()}")
        print(f" - Top {top_n}:\n{df[col].astype(str).value_counts(dropna=False).head(top_n)}")

    print("\nCORRELATION MATRIX (Top Pairs Only):")
    num_cols = df.select_dtypes(include=np.number)
    if num_cols.shape[1] >= 2:
        corr = num_cols.corr().abs().unstack().sort_values(ascending=False)
        corr = corr[corr < 1.0].drop_duplicates().head(10)
        print(corr)
    else:
        print(" - Not enough numerical columns for correlation.")

In [5]:
data_overview(raw_data)

📊 BASIC INFO
- Shape: (48620, 12)
- Columns: ['pizza_id', 'order_id', 'pizza_name_id', 'quantity', 'order_date', 'order_time', 'unit_price', 'total_price', 'pizza_size', 'pizza_category']...
- Memory Usage: ~4.45 MB

MISSING VALUES (%):
Series([], dtype: float64) 

UNIQUE VALUES (Top 10 Columns):
pizza_id             48620
order_id             21350
order_time           16382
order_date             358
pizza_name_id           91
total_price             56
pizza_ingredients       32
pizza_name              32
unit_price              25
pizza_size               5
dtype: int64 

NUMERICAL STATS (Sampled if > sample_size):
               count         mean           std   min       25%      50%  \
pizza_id     10000.0  24029.28560  14046.711521  2.00  11779.00  23922.5   
order_id     10000.0  10577.81510   6185.443700  2.00   5178.25  10518.0   
quantity     10000.0      1.02110      0.149857  1.00      1.00      1.0   
unit_price   10000.0     16.45588      3.611700  9.75     12.50     1

SECTION 1: Data Preparation

In [6]:
raw_data.head(5)

Unnamed: 0,pizza_id,order_id,pizza_name_id,quantity,order_date,order_time,unit_price,total_price,pizza_size,pizza_category,pizza_ingredients,pizza_name
0,1,1,hawaiian_m,1,2015-01-01,11:38:36,13.25,13.25,M,Classic,"Sliced Ham, Pineapple, Mozzarella Cheese",The Hawaiian Pizza
1,2,2,classic_dlx_m,1,2015-01-01,11:57:40,16.0,16.0,M,Classic,"Pepperoni, Mushrooms, Red Onions, Red Peppers,...",The Classic Deluxe Pizza
2,3,2,five_cheese_l,1,2015-01-01,11:57:40,18.5,18.5,L,Veggie,"Mozzarella Cheese, Provolone Cheese, Smoked Go...",The Five Cheese Pizza
3,4,2,ital_supr_l,1,2015-01-01,11:57:40,20.75,20.75,L,Supreme,"Calabrese Salami, Capocollo, Tomatoes, Red Oni...",The Italian Supreme Pizza
4,5,2,mexicana_m,1,2015-01-01,11:57:40,16.0,16.0,M,Veggie,"Tomatoes, Red Peppers, Jalapeno Peppers, Red O...",The Mexicana Pizza


DATA CLEANING (PYTHON + GEN AI)

In [7]:
raw_data.dtypes

pizza_id                      int64
order_id                      int64
pizza_name_id                object
quantity                      int64
order_date           datetime64[ns]
order_time                   object
unit_price                  float64
total_price                 float64
pizza_size                   object
pizza_category               object
pizza_ingredients            object
pizza_name                   object
dtype: object

In [8]:
data_clean = assistant.transform(raw_data, prompt="update column 'order_time' which is in format hr:min:sec  to datatype date time")

[INFO] [TRANSFORM] Prompt: update column 'order_time' which is in format hr:min:sec  to datatype date time
[DEBUG] [TRANSFORM] Generated Code:
 import pandas as pd
from datetime import datetime

df['order_time'] = df['order_time'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
[ERROR] Code execution failed: name 'datetime' is not defined
[DEBUG] Failed code was:
 import pandas as pd
from datetime import datetime

df['order_time'] = df['order_time'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
