# AI Applications Project: Automated Report Generation

This notebook demonstrates an end-to-end automated report generation system that combines:
- **Supervised Machine Learning Models** for predictions and analysis
- **Generative AI** for natural language report creation  
- **Database Integration** for real-time data processing
- **PDF Generation** for professional report output

## Features
- 🔍 **Location Prediction**: Optimal storage location recommendations
- 📊 **Sample Categorization**: Automated product classification
- ⚠️ **Disposal Risk Analysis**: Identify items at risk of disposal
- 📈 **Demand Forecasting**: Predict future product demand
- 🚨 **Anomaly Detection**: Identify storage optimization opportunities
- 📋 **Automated Report Generation**: Professional PDF reports with insights

## Models Used
- Gradient Boosting (Jason's Model)
- Storage Prediction (Samuel's Model) 
- Disposal Risk Prediction (Kendrick's Model)
- Sales Forecasting (ShernFai's Model)

In [1]:
#Importing the generative ai
%pip install -U -q "google-genai>=1.4.0"

Note: you may need to restart the kernel to use updated packages.


## Installation Requirements

Before running this notebook, ensure you have the following packages installed:

```bash
pip install google-genai pandas numpy scikit-learn joblib sqlalchemy python-dateutil markdown-pdf
```

**Note:** This notebook requires:
- Access to the supervised model files in the `Supervised_Models` directory
- Database connection configured in `Database/db.py`  
- Google API key for generative AI features

**File Structure Expected:**
```
BackEnd/
├── Generative_Models/
│   └── ReportGeneration/
│       └── ReportGeneration.ipynb (this file)
├── Supervised_Models/
│   ├── Jason/model/
│   ├── Kendrick/
│   ├── Samuel/
│   └── ShernFai/model/
└── Database/
    └── db.py
```

### Setting up API Key

In [None]:
from google import genai
import os

# Security Note: For production use, store API key as environment variable
# Example: GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
# For development/demo purposes:
GOOGLE_API_KEY = input("Enter your Google API key:")
client = genai.Client(api_key=GOOGLE_API_KEY)

In [3]:
MODEL_ID = "gemini-2.5-flash" # @param ["gemini-2.5-flash-lite","gemini-2.5-flash","gemini-2.5-pro","gemini-2.0-flash"] {"allow-input":true, isTemplate: true}

# Data

## Database Integration

In [None]:
import numpy as np
import pandas as pd
import os
import sys

# Configuration
DEBUG_MODE = False  # Set to True for debugging output

# Add parent directories to path for module imports
sys.path.append('../../')

try:
    from Database.db import SessionLocal
    from Database_Table import Inventory, Order
    
    def getDbContent():
        """
        Retrieves inventory and order data from the database.
        Returns empty lists if database connection fails.
        """
        try:
            session = SessionLocal()
            inventory_records = session.query(Inventory).all()
            order_records = session.query(Order).all()
            session.close()
            
            if DEBUG_MODE:
                print(f"Successfully loaded {len(inventory_records)} inventory records")
                print(f"Successfully loaded {len(order_records)} order records")
                
            return inventory_records, order_records
            
        except Exception as e:
            print(f"Database connection error: {e}")
            print("Using empty data - some features may not work")
            return [], []

    inventory, order = getDbContent()
    
except ImportError as e:
    print(f"Database import error: {e}")
    print("Database modules not available - using empty data")
    inventory, order = [], []

2025-08-19 03:15:36,124 INFO sqlalchemy.engine.Engine SELECT DATABASE()
2025-08-19 03:15:36,124 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-08-19 03:15:36,127 INFO sqlalchemy.engine.Engine SELECT @@sql_mode
2025-08-19 03:15:36,129 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-08-19 03:15:36,130 INFO sqlalchemy.engine.Engine SELECT @@lower_case_table_names
2025-08-19 03:15:36,130 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-08-19 03:15:36,133 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-08-19 03:15:36,124 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-08-19 03:15:36,127 INFO sqlalchemy.engine.Engine SELECT @@sql_mode
2025-08-19 03:15:36,129 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-08-19 03:15:36,130 INFO sqlalchemy.engine.Engine SELECT @@lower_case_table_names
2025-08-19 03:15:36,130 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-08-19 03:15:36,133 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-08-19 03:15:36,135 INFO sqlalchemy.engine.Engine SELECT `Invent

In [5]:
def dbtoList(records):
    output_list = []
    for r in records:
        if isinstance(r, Inventory):
            data = {
                "ItemId": r.ItemId, 
                "ItemName": r.ItemName,
                "Category": r.ItemCategory,
                "Quantity": r.ItemQuantity, 
                "UnitsSold": r.UnitsSold,
                "Weight": r.Weight, 
                "Size": r.Size,
                "Priority": r.Priority, 
                "Location": r.Location,
                "Date": r.Date, 
                "Dispose": r.Dispose                
            }
        elif isinstance(r, Order):
            data = {
                "OrderId": r.OrderId,
                "ItemId": r.ItemId, 
                "OrderQuantity": r.OrderQuantity, 
                "Sales": r.Sales, 
                "Price": r.Price, 
                "Discount": r.Discount,
                "Profit": r.Profit, 
                "DateOrdered": r.DateOrdered,
                "DateReceived": r.DateReceived,
                "CustomerSegment": r.CustomerSegment
            }
        else:
            continue
        output_list.append(data)
    
    return output_list

inventoryData = dbtoList(inventory)
orderData = dbtoList(order)

## Supervised Models

In [26]:
import pickle
import joblib
import numpy as np
import pandas as pd
import os
from datetime import datetime
from IPython.display import Markdown
from sklearn.metrics import classification_report, confusion_matrix

# Configuration
DEBUG_MODE = False  # Set to True for debugging output


class Supervised_Models:
    # Category mapping for sample categorization
    CATEGORY_MAPPING = {
        "Apparel": "Clothing",
        "Footwear": "Clothing",
        "Discs Shop": "Technology", 
        "Technology": "Technology",
        "Fitness": "Sports and Fitness",
        "Outdoors": "Sports and Fitness", 
        "Golf": "Sports and Fitness",
        "Health and Beauty": "Other",
        "Pet Shop": "Other",
        "Book Shop": "Other",
        "Fan Shop": "Other"
    }
    
    # Reverse mapping for prediction decoding (assuming numeric labels 0-10)
    LABEL_TO_CATEGORY = {
        0: "Apparel",
        1: "Book Shop", 
        2: "Discs Shop",
        3: "Fan Shop",
        4: "Fitness",
        5: "Footwear",
        6: "Golf",
        7: "Health and Beauty",
        8: "Outdoors", 
        9: "Pet Shop",
        10: "Technology"
    }

    # Location Prediction Model
    @staticmethod
    def predict_location(input_data):
        try:
            # Use relative path from ReportGeneration directory
            model_path = "../../Supervised_Models/Samuel/storage_prediction_model.pkl"
            with open(model_path, "rb") as file:
                storage_prediction_model = pickle.load(file)

            categorical_features = {
                "Priority": ["High", "Low", "Medium"],
                "Product_Type": ["Clothing", "Technology", "Other", "Sports and Fitness"],
                "Size": ["Large", "Medium", "Small"],
            }
            numerical_features = ["Order_Quantity", "Weight"]
            one_hot_columns = []

            for feature, values in categorical_features.items():
                for value in values:
                    one_hot_columns.append(f"{feature}_{value}")

            all_feature_names = one_hot_columns + numerical_features

            features_dict = {col: 0 for col in all_feature_names}

            for feature, values in categorical_features.items():
                if feature in input_data:
                    selected_value = input_data[feature]
                    one_hot_col = f"{feature}_{selected_value}"
                    if one_hot_col in features_dict:
                        features_dict[one_hot_col] = 1

            for feature in numerical_features:
                if feature in input_data:
                    features_dict[feature] = float(input_data[feature])

            features_array = np.array(
                [features_dict[col] for col in all_feature_names]
            ).reshape(1, -1)

            prediction = storage_prediction_model.predict(features_array)
            return prediction
            
        except FileNotFoundError as e:
            print(f"Storage prediction model file not found: {e}")
            return ["Model not available"]
        except Exception as e:
            print(f"Error in location prediction: {e}")
            return ["Error occurred"]

    # Sample Categorization Model (Supervised Model 1)
    @staticmethod
    def predict_sample_category(input_data):
        try:
            # Use relative paths from ReportGeneration directory
            model_path = "../../Supervised_Models/Jason/model/gradient_boosting_model.pkl"
            encoder_path = "../../Supervised_Models/Jason/model/label_encoder.pkl"
            
            sample_categorization_model = joblib.load(model_path)
            
            input_df = pd.DataFrame([input_data])
            prediction = sample_categorization_model.predict(input_df)
            
            if DEBUG_MODE:
                print(f"Raw prediction: {prediction}")
            
            # Convert numeric prediction to category name
            predicted_label = int(prediction[0]) if len(prediction) > 0 else 0
            
            # Get subcategory name
            subcategory = Supervised_Models.LABEL_TO_CATEGORY.get(predicted_label, "Fitness")
            
            # Map to main category
            main_category = Supervised_Models.CATEGORY_MAPPING.get(subcategory, "Other")
            
            if DEBUG_MODE:
                print(f"Predicted label: {predicted_label}")
                print(f"Subcategory: {subcategory}")
                print(f"Main category: {main_category}")
            
            return {
                "subcategory": subcategory,
                "main_category": main_category,
                "prediction_confidence": "Model prediction"
            }
                
        except FileNotFoundError as e:
            print(f"Sample categorization model file not found: {e}")
            return {
                "subcategory": "Model not available",
                "main_category": "Other",
                "prediction_confidence": "Error"
            }
        except Exception as e:
            print(f"Error in sample categorization: {e}")
            return {
                "subcategory": "Error occurred", 
                "main_category": "Other",
                "prediction_confidence": "Error"
            }

    # Disposal Risk Prediction Model (Supervised Model 2)
    @staticmethod
    def predict_disposal_risk(input_data):
        try:
            # Use relative path from ReportGeneration directory
            model_path = "../../Supervised_Models/Kendrick/best_model.joblib"
            
            # Load the model
            loaded_object = joblib.load(model_path)
            
            # Check if it's a dictionary containing the model or the model itself
            if isinstance(loaded_object, dict):
                if 'model' in loaded_object:
                    disposal_risk_model = loaded_object['model']
                elif 'best_model' in loaded_object:
                    disposal_risk_model = loaded_object['best_model']
                else:
                    # Try to find any sklearn model in the dictionary
                    for key, value in loaded_object.items():
                        if hasattr(value, 'predict'):
                            disposal_risk_model = value
                            break
                    else:
                        raise ValueError("No valid model found in the loaded dictionary")
            else:
                # Assume it's the model directly
                disposal_risk_model = loaded_object
            
            # Ensure the model has a predict method
            if not hasattr(disposal_risk_model, 'predict'):
                raise ValueError("Loaded object does not have a predict method")

            input_df = pd.DataFrame([input_data])
            prediction = disposal_risk_model.predict(input_df)
            
            # Convert prediction to meaningful output
            risk_level = "High Risk" if prediction[0] > 0.5 else "Low Risk"
            
            return {
                "risk_prediction": risk_level,
                "risk_score": float(prediction[0]) if hasattr(prediction[0], '__float__') else prediction[0],
                "recommendation": "Consider disposal" if prediction[0] > 0.5 else "Keep in inventory"
            }
            
        except FileNotFoundError as e:
            print(f"Disposal risk model file not found: {e}")
            return {
                "risk_prediction": "Model not available",
                "risk_score": 0.0,
                "recommendation": "Manual assessment required"
            }
        except Exception as e:
            print(f"Error in disposal risk prediction: {e}")
            return {
                "risk_prediction": "Error occurred",
                "risk_score": 0.0, 
                "recommendation": "Manual assessment required"
            }

    # Demand Forecast Preprocessing
    @staticmethod
    def demand_forecast_preprocessor(order_data, inventory_data):
        try:
            # Create DataFrames
            order = pd.DataFrame(order_data)
            inventory = pd.DataFrame(inventory_data)
            
            # Debug: Print column names to see what we have (optional for production)
            if DEBUG_MODE:
                print("Order columns:", order.columns.tolist())
                print("Inventory columns:", inventory.columns.tolist())

            # Ensure dates are in datetime format
            if "DateOrdered" in order.columns:
                order["DateOrdered"] = pd.to_datetime(order["DateOrdered"])
                # Extract Month (period or string, depending on preference)
                order["OrderMonth"] = order["DateOrdered"].dt.to_period("M")
            else:
                if DEBUG_MODE:
                    print("Warning: DateOrdered column not found, creating dummy OrderMonth")
                order["OrderMonth"] = "2025-01"  # Default value

            # Handle Category column - inventory has ItemCategory, orders might not have category
            if "ItemCategory" in inventory.columns:
                inventory["Category"] = inventory["ItemCategory"]
            elif "Category" not in inventory.columns:
                if DEBUG_MODE:
                    print("Warning: No category column found in inventory, setting to 'Unknown'")
                inventory["Category"] = "Unknown"

            # Add Category to orders if missing (will be filled from inventory after merge)
            if "Category" not in order.columns:
                order["Category"] = None

            # Handle CustomerSegment - make sure it exists
            if "CustomerSegment" not in order.columns:
                if DEBUG_MODE:
                    print("Warning: CustomerSegment not found, setting to 'Consumer'")
                order["CustomerSegment"] = "Consumer"

            # Handle Price and Discount - make sure they exist
            if "Price" not in order.columns:
                if DEBUG_MODE:
                    print("Warning: Price not found in order data, setting to 0")
                order["Price"] = 0.0
                
            if "Discount" not in order.columns:
                if DEBUG_MODE:
                    print("Warning: Discount not found in order data, setting to 0")
                order["Discount"] = 0.0

            # Merge with inventory to get category information
            merged_df = order.merge(inventory[["ItemId", "Category"]], on="ItemId", how="left", suffixes=('', '_inv'))
            
            # Use inventory category if order category is missing
            if "Category_inv" in merged_df.columns:
                merged_df["Category"] = merged_df["Category_inv"].fillna(merged_df["Category"])
                merged_df = merged_df.drop("Category_inv", axis=1)
            
            # Fill any remaining missing values
            merged_df["Category"] = merged_df["Category"].fillna("Unknown")
            merged_df["CustomerSegment"] = merged_df["CustomerSegment"].fillna("Consumer")
            merged_df["Price"] = merged_df["Price"].fillna(0.0)
            merged_df["Discount"] = merged_df["Discount"].fillna(0.0)

            # Group and aggregate
            result_df = merged_df.groupby(
                ["OrderMonth", "Category", "CustomerSegment"], as_index=False
            ).agg(AveragePrice=("Price", "mean"), AverageDiscount=("Discount", "mean"))

            return result_df
            
        except Exception as e:
            print(f"Error in demand forecast preprocessing: {e}")
            return pd.DataFrame()  # Return empty DataFrame on error

    # Demand forecast model
    @staticmethod
    def predict_demand_forecast(input_data):
        try:
            # Use relative paths from ReportGeneration directory
            model_path = "../../Supervised_Models/ShernFai/model/salesforecast(categories).pkl"
            preprocessor_path = "../../Supervised_Models/ShernFai/model/salesforecast_preprocessor.pkl"
            
            demand_forecast_model = joblib.load(model_path)
            with open(preprocessor_path, "rb") as f:
                preprocessor_data = pickle.load(f)

            categories = {
                "Clothing": ["Cleats", "Men's Footwear", "Women's Apparel"],
                "Technology": [
                    "Electronics",
                    "Video Games",
                    "Cameras",
                    "Computers",
                ],
                "Sports and Fitness": [
                    "Cardio Equipment",
                    "Indoor/Outdoor Games",
                    "Water Sports",
                    "Shop By Sport",
                    "Camping & Hiking",
                    "Fishing",
                ],
                "Other": ["Garden", "Pet Supplies"],
            }

            cat_keys = list(categories.keys())

            # Extract preprocessor components
            le_category = preprocessor_data["label_encoder_category"]
            reference_date = preprocessor_data["reference_date"]
            unique_categories = preprocessor_data["unique_categories"]
            feature_columns = preprocessor_data["feature_columns"]

            # Get data
            category_name = input_data["category"]
            future_month = input_data["month"]
            avg_price = float(input_data["avg_price"])
            customer_segment = input_data["customer_segment"]
            discount_rate = float(input_data["discount_rate"])

            # Parse the future date
            future_date = pd.to_datetime(future_month)

            # Calculate time features for the future date
            months_since_start = (future_date - reference_date).days / 30.44

            # Create test data with numerical time features
            test_data = {
                "Category Name": category_name,
                "Average Product Price": avg_price,
                "Customer Segment": customer_segment,
                "Order Item Discount Rate": discount_rate,
                # Time features (numerical - can handle ANY future date!)
                "Year": future_date.year,
                "Month": future_date.month,
                "Quarter": future_date.quarter,
                "Months_Since_Start": int(months_since_start),
                "Month_Sin": np.sin(2 * np.pi * future_date.month / 12),
                "Month_Cos": np.cos(2 * np.pi * future_date.month / 12),
                "Year_Trend": future_date.year - reference_date.year,
            }

            # Create DataFrame
            test_df = pd.DataFrame([test_data])

            # Handle unknown category
            if category_name not in cat_keys:
                if DEBUG_MODE:
                    print(f"Unknown category '{category_name}' - using default: {cat_keys[0]}")
                test_df["Category Name"] = cat_keys[0]
                category_name = cat_keys[0]

            # One-hot encode customer segment
            test_df = pd.get_dummies(test_df, columns=["Customer Segment"], drop_first=True)

            # Ensure same columns as training (crucial!)
            test_df = test_df.reindex(columns=feature_columns, fill_value=0)

            # Make prediction
            total = 0
            num = len(categories[category_name])
            for subclass in categories[category_name]:
                test_df["Category Name"] = subclass
                test_df["Category Name"] = le_category.transform(test_df["Category Name"])
                total += demand_forecast_model.predict(test_df)

            avg_demand = total / num
            return avg_demand
            
        except FileNotFoundError as e:
            print(f"Demand forecast model file not found: {e}")
            return [0.0]  # Return default prediction
        except Exception as e:
            print(f"Error in demand forecast prediction: {e}")
            return [0.0]  # Return default prediction

    # Anomaly detection
    @staticmethod
    def detect_anomalies(inventory_list):
        try:
            anomalies_detected = []
            for item in inventory_list:
                current_location = item["Location"]
                predicted_location = Supervised_Models.predict_location(
                    {
                        "Priority": item["Priority"],
                        "Product_Type": item["Category"],
                        "Size": item["Size"],
                        "Order_Quantity": item["Quantity"],
                        "Weight": item["Weight"],
                    }
                )[0]

                if current_location != predicted_location:
                    anomalies_detected.append(
                        {
                            "ItemId": item["ItemId"],
                            "ItemName": item["ItemName"],
                            "CurrentLocation": current_location,
                            "PredictedLocation": predicted_location,
                        }
                    )

            return anomalies_detected
            
        except Exception as e:
            print(f"Error in anomaly detection: {e}")
            return []  # Return empty list on error


# Initialize the Supervised Models instance
s = Supervised_Models()

# Testing

In [27]:
# Test all Supervised Models
print("=== Testing Supervised Models ===\n")

# Test Location Prediction Model
print("1. Testing Location Prediction Model:")
try:
    location_result = s.predict_location({
        "Priority": "Medium",
        "Product_Type": "Sports and Fitness",
        "Size": "Medium",
        "Order_Quantity": 12,
        "Weight": 10.78,
    })
    print(f"   ✓ Location prediction: {location_result}")
except Exception as e:
    print(f"   ✗ Location prediction failed: {e}")

# Test Sample Categorization Model
print("\n2. Testing Sample Categorization Model:")
try:
    category_result = s.predict_sample_category({
        "Price": 30.0,
        "Sales": 1000,
        "Order_Profit": 500,
        "ProductWeight": 2.5,
        "Quantity": 50,
    })
    print(f"   ✓ Sample categorization:")
    print(f"      • Subcategory: {category_result['subcategory']}")
    print(f"      • Main Category: {category_result['main_category']}")
    print(f"      • Status: {category_result['prediction_confidence']}")
except Exception as e:
    print(f"   ✗ Sample categorization failed: {e}")

# Test Disposal Risk Prediction Model
print("\n3. Testing Disposal Risk Prediction Model:")
try:
    disposal_result = s.predict_disposal_risk({
        "Inventory_Level": 150,
        "Inventory_Turnover": 1.5,
        "Units_Sold": 200,
        "Demand_Forecast": 180,
        "Inventory_Lag_1": 120,
        "Turnover_Lag_1": 1.2,
    })
    print(f"   ✓ Disposal risk prediction:")
    print(f"      • Risk Level: {disposal_result['risk_prediction']}")
    print(f"      • Risk Score: {disposal_result['risk_score']}")
    print(f"      • Recommendation: {disposal_result['recommendation']}")
except Exception as e:
    print(f"   ✗ Disposal risk prediction failed: {e}")

# Test Demand Forecast Preprocessor
print("\n4. Testing Demand Forecast Preprocessor:")
try:
    # Sample test data
    test_order_data = [
        {
            "ItemId": 101,
            "DateOrdered": "2025-01-10",
            "Category": "Clothing",
            "CustomerSegment": "Consumer",
            "Price": 30.0,
            "Discount": 0.05,
        },
        {
            "ItemId": 102,
            "DateOrdered": "2025-02-15",
            "Category": "Technology",
            "CustomerSegment": "Corporate",
            "Price": 200.0,
            "Discount": 0.10,
        },
    ]
    test_inventory_data = [
        {"ItemId": 101, "Category": "Clothing", "Price": 30.0, "Stock": 50},
        {"ItemId": 102, "Category": "Technology", "Price": 200.0, "Stock": 30},
    ]
    
    preprocessed_result = s.demand_forecast_preprocessor(test_order_data, test_inventory_data)
    print(f"   ✓ Preprocessor output shape: {preprocessed_result.shape}")
    if DEBUG_MODE:
        print(f"   Preprocessed data:\n{preprocessed_result}")
except Exception as e:
    print(f"   ✗ Demand forecast preprocessing failed: {e}")

# Test Demand Forecast Model
print("\n5. Testing Demand Forecast Model:")
try:
    demand_result = s.predict_demand_forecast({
        "category": "Clothing",
        "month": "2025-05",
        "avg_price": 10.0,
        "customer_segment": "Consumer",
        "discount_rate": 0.12,
    })
    print(f"   ✓ Demand forecast: {demand_result}")
except Exception as e:
    print(f"   ✗ Demand forecast failed: {e}")

# Test Anomaly Detection
print("\n6. Testing Anomaly Detection:")
try:
    test_inventory = [
        {
            "ItemId": 1,
            "ItemName": "Test Item A",
            "Priority": "High",
            "Category": "Clothing",
            "Size": "Large",
            "Quantity": 5,
            "Weight": 8.5,
            "Location": "A1",
        },
        {
            "ItemId": 2,
            "ItemName": "Test Item B",
            "Priority": "Low",
            "Category": "Technology",
            "Size": "Medium",
            "Quantity": 10,
            "Weight": 3.4,
            "Location": "B2",
        },
    ]
    
    anomalies_result = s.detect_anomalies(test_inventory)
    print(f"   ✓ Anomalies detected: {len(anomalies_result)} items")
    if DEBUG_MODE and anomalies_result:
        print(f"   Anomaly details: {anomalies_result}")
except Exception as e:
    print(f"   ✗ Anomaly detection failed: {e}")

print("\n=== All Tests Completed ===")

# Test with different categorization examples
print("\n=== Testing Category Mapping Examples ===")
test_examples = [
    {"Price": 30.0, "Sales": 1000, "Order_Profit": 500, "ProductWeight": 2.5, "Quantity": 50},
    {"Price": 150.0, "Sales": 500, "Order_Profit": 200, "ProductWeight": 1.2, "Quantity": 20},
    {"Price": 80.0, "Sales": 800, "Order_Profit": 300, "ProductWeight": 5.0, "Quantity": 30}
]

for i, example in enumerate(test_examples, 1):
    try:
        result = s.predict_sample_category(example)
        print(f"Example {i}: {result['subcategory']} → {result['main_category']}")
    except Exception as e:
        print(f"Example {i}: Error - {e}")

# Use real data if available, otherwise use test data
if len(inventoryData) > 0 and len(orderData) > 0:
    print(f"\n✓ Using real data: {len(inventoryData)} inventory items, {len(orderData)} orders")
else:
    print(f"\n⚠ Using test data for demonstration")
    # Fallback to test data for the rest of the notebook
    inventoryData = test_inventory
    orderData = test_order_data

=== Testing Supervised Models ===

1. Testing Location Prediction Model:
   ✓ Location prediction: ['B-3']

2. Testing Sample Categorization Model:
   ✓ Sample categorization:
      • Subcategory: Fitness
      • Main Category: Sports and Fitness
      • Status: Model prediction

3. Testing Disposal Risk Prediction Model:
   ✓ Disposal risk prediction:
      • Risk Level: High Risk
      • Risk Score: 1.0
      • Recommendation: Consider disposal

4. Testing Demand Forecast Preprocessor:
   ✓ Preprocessor output shape: (2, 5)

5. Testing Demand Forecast Model:
   ✓ Demand forecast: [186.10974]

6. Testing Anomaly Detection:
   ✓ Anomalies detected: 2 items

=== All Tests Completed ===

=== Testing Category Mapping Examples ===
Example 1: Fitness → Sports and Fitness
Example 2: Fitness → Sports and Fitness
Example 3: Fitness → Sports and Fitness

✓ Using real data: 2 inventory items, 2 orders
Example 2: Fitness → Sports and Fitness
Example 3: Fitness → Sports and Fitness

✓ Using real d

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [28]:
# Check real database data
print("=== Real Database Data Analysis ===\n")

print(f"📊 Database Status:")
print(f"   • Inventory records: {len(inventoryData)}")
print(f"   • Order records: {len(orderData)}")

if len(inventoryData) > 0:
    print(f"\n📦 Inventory Data Sample:")
    print(f"   • Sample item: {inventoryData[0]}")
    
    # Check categories in inventory
    categories = [item['Category'] for item in inventoryData if 'Category' in item]
    unique_categories = list(set(categories))
    print(f"   • Categories found: {unique_categories}")
    
    # Check locations
    locations = [item['Location'] for item in inventoryData if 'Location' in item]
    unique_locations = list(set(locations))
    print(f"   • Storage locations: {unique_locations}")

if len(orderData) > 0:
    print(f"\n🛒 Order Data Sample:")
    print(f"   • Sample order: {orderData[0]}")
    
    # Check customer segments
    segments = [order['CustomerSegment'] for order in orderData if 'CustomerSegment' in order]
    unique_segments = list(set(segments))
    print(f"   • Customer segments: {unique_segments}")

print(f"\n✅ Ready to generate reports with real data!")

=== Real Database Data Analysis ===

📊 Database Status:
   • Inventory records: 2
   • Order records: 2

📦 Inventory Data Sample:
   • Sample item: {'ItemId': 101, 'Category': 'Clothing', 'Price': 30.0, 'Stock': 50}
   • Categories found: ['Clothing', 'Technology']
   • Storage locations: []

🛒 Order Data Sample:
   • Sample order: {'ItemId': 101, 'DateOrdered': '2025-01-10', 'Category': 'Clothing', 'CustomerSegment': 'Consumer', 'Price': 30.0, 'Discount': 0.05}
   • Customer segments: ['Corporate', 'Consumer']

✅ Ready to generate reports with real data!


In [29]:
# Test Full Report Generation with Real Database Content
print("=== Generating Full Automated Report with Real Data ===\n")

try:
    # 1. Test Location Predictions with Real Data
    print("🗺️ Testing Location Predictions:")
    location_predictions = []
    for item in inventoryData[:2]:  # Test with first 2 items
        try:
            prediction = s.predict_location({
                'Priority': item.get('Priority', 'Medium'),
                'Product_Type': item.get('Category', 'Other'),
                'Size': item.get('Size', 'Medium'),
                'Order_Quantity': item.get('Quantity', 1),
                'Weight': item.get('Weight', 1.0)
            })
            location_predictions.append({
                'ItemId': item['ItemId'],
                'ItemName': item.get('ItemName', 'Unknown'),
                'Current': item.get('Location', 'Unknown'),
                'Predicted': prediction[0] if prediction else 'Unknown'
            })
            print(f"   • Item {item['ItemId']}: {item.get('Location', 'Unknown')} → {prediction[0] if prediction else 'Unknown'}")
        except Exception as e:
            print(f"   ✗ Error predicting location for item {item['ItemId']}: {e}")
    
    # 2. Test Sample Categorization with Real Data
    print("\n📊 Testing Sample Categorization:")
    for item in inventoryData[:2]:  # Test with first 2 items
        try:
            # Prepare input data for categorization model
            categorization_input = {
                'Price': item.get('Price', 50.0),
                'Sales': item.get('UnitsSold', 100),
                'Order_Profit': item.get('UnitsSold', 100) * 0.3,  # Estimated profit
                'ProductWeight': item.get('Weight', 2.0),
                'Quantity': item.get('Quantity', 10)
            }
            
            category_result = s.predict_sample_category(categorization_input)
            print(f"   • Item {item['ItemId']}: {category_result['subcategory']} → {category_result['main_category']}")
        except Exception as e:
            print(f"   ✗ Error categorizing item {item['ItemId']}: {e}")
    
    # 3. Test Disposal Risk with Real Data
    print("\n⚠️ Testing Disposal Risk Analysis:")
    for item in inventoryData[:2]:  # Test with first 2 items
        try:
            disposal_input = {
                'Inventory_Level': item.get('Quantity', 50),
                'Inventory_Turnover': 1.5,
                'Units_Sold': item.get('UnitsSold', 100),
                'Demand_Forecast': item.get('UnitsSold', 100) * 1.1,
                'Inventory_Lag_1': item.get('Quantity', 50) * 0.8,
                'Turnover_Lag_1': 1.2
            }
            
            disposal_result = s.predict_disposal_risk(disposal_input)
            print(f"   • Item {item['ItemId']}: {disposal_result['risk_prediction']} (Score: {disposal_result['risk_score']:.2f})")
        except Exception as e:
            print(f"   ✗ Error analyzing disposal risk for item {item['ItemId']}: {e}")
    
    # 4. Test Demand Forecasting with Real Data
    print("\n📈 Testing Demand Forecasting:")
    try:
        # Get unique categories from real data
        real_categories = list(set([item.get('Category', 'Other') for item in inventoryData]))
        print(f"   • Found categories in data: {real_categories}")
        
        # Map to our model categories
        category_mapping = {
            'Clothing': 'Clothing',
            'Technology': 'Technology', 
            'Sports': 'Sports and Fitness',
            'Other': 'Other'
        }
        
        for category in real_categories[:2]:  # Test first 2 categories
            try:
                model_category = category_mapping.get(category, 'Other')
                demand_result = s.predict_demand_forecast({
                    'category': model_category,
                    'month': '2025-09',
                    'avg_price': 50.0,
                    'customer_segment': 'Consumer',
                    'discount_rate': 0.1
                })
                print(f"   • {category} → {model_category}: Predicted demand = {demand_result[0]:.2f}")
            except Exception as e:
                print(f"   ✗ Error forecasting demand for {category}: {e}")
    except Exception as e:
        print(f"   ✗ Error in demand forecasting: {e}")
    
    # 5. Test Anomaly Detection with Real Data
    print("\n🔍 Testing Anomaly Detection:")
    try:
        anomalies = s.detect_anomalies(inventoryData)
        print(f"   • Found {len(anomalies)} storage anomalies:")
        for anomaly in anomalies[:3]:  # Show first 3 anomalies
            print(f"     - Item {anomaly['ItemId']} ({anomaly['ItemName']}): {anomaly['CurrentLocation']} → {anomaly['PredictedLocation']}")
    except Exception as e:
        print(f"   ✗ Error in anomaly detection: {e}")
    
    # 6. Test AI Report Generation with Real Data Summary
    print("\n🤖 Testing AI Report Generation:")
    try:
        # Create a sample report section with real data
        data_summary = {
            'total_inventory_items': len(inventoryData),
            'total_orders': len(orderData),
            'categories': list(set([item.get('Category', 'Unknown') for item in inventoryData])),
            'locations': list(set([item.get('Location', 'Unknown') for item in inventoryData])),
            'customer_segments': list(set([order.get('CustomerSegment', 'Unknown') for order in orderData]))
        }
        
        sample_report = client.models.generate_content(
            model=MODEL_ID,
            contents=f'''Generate a brief executive summary based on this real business data:

            Database Summary:
            - Total inventory items: {data_summary['total_inventory_items']}
            - Total orders: {data_summary['total_orders']}
            - Product categories: {data_summary['categories']}
            - Storage locations: {data_summary['locations']}
            - Customer segments: {data_summary['customer_segments']}
            
            Provide a 2-3 sentence business insight.'''
        )
        
        print("   ✓ AI-Generated Executive Summary:")
        print(f"     {sample_report.text}")
        
    except Exception as e:
        print(f"   ✗ Error in AI report generation: {e}")
    
    print(f"\n✅ Full Report Generation Test Completed!")
    print(f"📊 System Status: Ready for production report generation with {len(inventoryData)} inventory items and {len(orderData)} orders")

except Exception as e:
    print(f"❌ Error in full report generation test: {e}")

=== Generating Full Automated Report with Real Data ===

🗺️ Testing Location Predictions:
   • Item 101: Unknown → B-6
   • Item 102: Unknown → C-1

📊 Testing Sample Categorization:
   • Item 101: Fan Shop → Other
   • Item 102: Fan Shop → Other

⚠️ Testing Disposal Risk Analysis:
   • Item 101: High Risk (Score: 1.00)
   • Item 102: High Risk (Score: 1.00)

📈 Testing Demand Forecasting:
   • Found categories in data: ['Clothing', 'Technology']


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


   • Clothing → Clothing: Predicted demand = 503.67
   • Technology → Technology: Predicted demand = 429.55

🔍 Testing Anomaly Detection:
Error in anomaly detection: 'Location'
   • Found 0 storage anomalies:

🤖 Testing AI Report Generation:
   ✓ AI-Generated Executive Summary:
     This nascent business operates at a very limited scale, processing two orders from two inventory items, yet remarkably serves diverse product categories (Clothing, Technology) and customer segments (Corporate, Consumer). A critical operational gap exists with unknown storage locations for inventory. Prioritizing the establishment of clear storage and inventory management is crucial for the business's foundational stability and future growth.

✅ Full Report Generation Test Completed!
📊 System Status: Ready for production report generation with 2 inventory items and 2 orders


In [31]:
# Full Automated Report Generation with MySQL Database
print("=== PRODUCTION REPORT GENERATION WITH MYSQL DATABASE ===\n")

# Verify database connection and fetch fresh data
print("🔌 Database Connection Status:")
try:
    # Get fresh data from MySQL database
    session = SessionLocal()
    fresh_inventory = session.query(Inventory).all()
    fresh_orders = session.query(Order).all()
    session.close()
    
    print(f"   ✓ MySQL Connection: ACTIVE")
    print(f"   ✓ Inventory Records: {len(fresh_inventory)} items")
    print(f"   ✓ Order Records: {len(fresh_orders)} orders")
    
    # Convert to working format
    live_inventory_data = dbtoList(fresh_inventory)
    live_order_data = dbtoList(fresh_orders)
    
    print(f"\n📊 Live Data Summary:")
    if live_inventory_data:
        sample_inventory = live_inventory_data[0]
        print(f"   • Sample Inventory Item:")
        print(f"     - ID: {sample_inventory.get('ItemId', 'N/A')}")
        print(f"     - Name: {sample_inventory.get('ItemName', 'N/A')}")
        print(f"     - Category: {sample_inventory.get('Category', 'N/A')}")
        print(f"     - Location: {sample_inventory.get('Location', 'N/A')}")
        print(f"     - Quantity: {sample_inventory.get('Quantity', 'N/A')}")
        
    if live_order_data:
        sample_order = live_order_data[0] 
        print(f"   • Sample Order:")
        print(f"     - Order ID: {sample_order.get('OrderId', 'N/A')}")
        print(f"     - Item ID: {sample_order.get('ItemId', 'N/A')}")
        print(f"     - Customer Segment: {sample_order.get('CustomerSegment', 'N/A')}")
        print(f"     - Sales Amount: ${sample_order.get('Sales', 'N/A')}")
    
except Exception as e:
    print(f"   ✗ Database connection failed: {e}")
    print("   Using previously loaded data...")
    live_inventory_data = inventoryData
    live_order_data = orderData

print(f"\n🚀 AUTOMATED REPORT GENERATION PIPELINE:")
print("=" * 60)

# 1. LOCATION OPTIMIZATION ANALYSIS
print("\n1️⃣ LOCATION OPTIMIZATION ANALYSIS")
location_analysis = []
storage_issues = 0

for item in live_inventory_data[:5]:  # Analyze first 5 items
    try:
        current_location = item.get('Location', 'Unknown')
        
        # Predict optimal location
        prediction_input = {
            'Priority': item.get('Priority', 'Medium'),
            'Product_Type': item.get('Category', 'Other'),
            'Size': item.get('Size', 'Medium'),
            'Order_Quantity': item.get('Quantity', 1),
            'Weight': item.get('Weight', 1.0)
        }
        
        predicted_location = s.predict_location(prediction_input)[0]
        
        is_optimized = current_location == predicted_location
        if not is_optimized:
            storage_issues += 1
            
        location_analysis.append({
            'item_id': item.get('ItemId'),
            'item_name': item.get('ItemName', 'Unknown')[:20],
            'current': current_location,
            'predicted': predicted_location,
            'optimized': is_optimized
        })
        
        status = "✓ OPTIMAL" if is_optimized else "⚠ NEEDS RELOCATION"
        print(f"   Item {item.get('ItemId')}: {current_location} → {predicted_location} {status}")
        
    except Exception as e:
        print(f"   ✗ Error analyzing item {item.get('ItemId')}: {e}")

print(f"\n   📊 Location Analysis Summary:")
print(f"   • Items analyzed: {len(location_analysis)}")
print(f"   • Storage issues found: {storage_issues}")
print(f"   • Optimization rate: {((len(location_analysis) - storage_issues) / len(location_analysis) * 100):.1f}%")

# 2. CATEGORY PREDICTION & BUSINESS INSIGHTS
print(f"\n2️⃣ CATEGORY PREDICTION & BUSINESS INSIGHTS")
category_insights = {}

for item in live_inventory_data[:3]:  # Analyze first 3 items
    try:
        # Prepare model input
        category_input = {
            'Price': float(item.get('Price', 50.0)) if item.get('Price') else 50.0,
            'Sales': float(item.get('UnitsSold', 100)) if item.get('UnitsSold') else 100,
            'Order_Profit': float(item.get('UnitsSold', 100)) * 0.3 if item.get('UnitsSold') else 30,
            'ProductWeight': float(item.get('Weight', 2.0)) if item.get('Weight') else 2.0,
            'Quantity': float(item.get('Quantity', 10)) if item.get('Quantity') else 10
        }
        
        category_prediction = s.predict_sample_category(category_input)
        actual_category = item.get('Category', 'Unknown')
        
        predicted_main = category_prediction.get('main_category', 'Other')
        predicted_sub = category_prediction.get('subcategory', 'Unknown')
        
        # Track category insights
        if predicted_main not in category_insights:
            category_insights[predicted_main] = {'count': 0, 'items': []}
        category_insights[predicted_main]['count'] += 1
        category_insights[predicted_main]['items'].append(item.get('ItemId'))
        
        match_status = "✓ MATCH" if actual_category == predicted_main else "⚠ DIFFERENT"
        print(f"   Item {item.get('ItemId')}: {actual_category} vs {predicted_sub}→{predicted_main} {match_status}")
        
    except Exception as e:
        print(f"   ✗ Error categorizing item {item.get('ItemId')}: {e}")

print(f"\n   📊 Category Distribution:")
for category, data in category_insights.items():
    print(f"   • {category}: {data['count']} items (IDs: {data['items']})")

# 3. DISPOSAL RISK ASSESSMENT  
print(f"\n3️⃣ DISPOSAL RISK ASSESSMENT")
high_risk_items = []
total_risk_score = 0

for item in live_inventory_data[:3]:
    try:
        disposal_input = {
            'Inventory_Level': float(item.get('Quantity', 50)),
            'Inventory_Turnover': 1.5,  # Default value
            'Units_Sold': float(item.get('UnitsSold', 100)) if item.get('UnitsSold') else 100,
            'Demand_Forecast': float(item.get('UnitsSold', 100)) * 1.1 if item.get('UnitsSold') else 110,
            'Inventory_Lag_1': float(item.get('Quantity', 50)) * 0.8 if item.get('Quantity') else 40,
            'Turnover_Lag_1': 1.2
        }
        
        disposal_result = s.predict_disposal_risk(disposal_input)
        risk_level = disposal_result.get('risk_prediction', 'Unknown')
        risk_score = disposal_result.get('risk_score', 0.0)
        
        total_risk_score += risk_score
        
        if 'High Risk' in risk_level:
            high_risk_items.append({
                'id': item.get('ItemId'),
                'name': item.get('ItemName', 'Unknown'),
                'score': risk_score
            })
            
        status_emoji = "🔴" if 'High Risk' in risk_level else "🟢"
        print(f"   Item {item.get('ItemId')}: {risk_level} (Score: {risk_score:.2f}) {status_emoji}")
        
    except Exception as e:
        print(f"   ✗ Error assessing disposal risk for item {item.get('ItemId')}: {e}")

avg_risk = total_risk_score / len(live_inventory_data[:3]) if live_inventory_data else 0
print(f"\n   📊 Risk Assessment Summary:")
print(f"   • High-risk items: {len(high_risk_items)}")
print(f"   • Average risk score: {avg_risk:.2f}")
print(f"   • Items needing attention: {[item['id'] for item in high_risk_items]}")

# 4. DEMAND FORECASTING
print(f"\n4️⃣ DEMAND FORECASTING FOR NEXT MONTH")
forecast_results = []

try:
    # Get unique categories from live data
    live_categories = list(set([item.get('Category', 'Other') for item in live_inventory_data]))
    print(f"   Categories in database: {live_categories}")
    
    # Map to model categories
    category_mapping = {
        'Clothing': 'Clothing',
        'Technology': 'Technology',
        'Sports': 'Sports and Fitness',
        'Other': 'Other'
    }
    
    for db_category in live_categories[:2]:  # Forecast for first 2 categories
        try:
            model_category = category_mapping.get(db_category, 'Other')
            
            # Calculate average price for this category
            category_items = [item for item in live_inventory_data if item.get('Category') == db_category]
            avg_price = 50.0  # Default
            if category_items:
                prices = [float(item.get('Price', 50.0)) for item in category_items if item.get('Price')]
                if prices:
                    avg_price = sum(prices) / len(prices)
            
            forecast_input = {
                'category': model_category,
                'month': '2025-09',  # Next month
                'avg_price': avg_price,
                'customer_segment': 'Consumer',
                'discount_rate': 0.1
            }
            
            predicted_demand = s.predict_demand_forecast(forecast_input)[0]
            
            forecast_results.append({
                'category': db_category,
                'model_category': model_category,
                'predicted_demand': predicted_demand,
                'avg_price': avg_price
            })
            
            print(f"   {db_category} → {model_category}: {predicted_demand:.1f} units (Avg Price: ${avg_price:.2f})")
            
        except Exception as e:
            print(f"   ✗ Error forecasting {db_category}: {e}")
            
except Exception as e:
    print(f"   ✗ Error in demand forecasting: {e}")

print(f"\n   📊 Forecast Summary:")
total_predicted = sum([result['predicted_demand'] for result in forecast_results])
print(f"   • Total predicted demand: {total_predicted:.1f} units")
print(f"   • Categories forecasted: {len(forecast_results)}")

    
    

=== PRODUCTION REPORT GENERATION WITH MYSQL DATABASE ===

🔌 Database Connection Status:
2025-08-19 06:11:54,792 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-08-19 06:11:54,794 INFO sqlalchemy.engine.Engine SELECT `Inventory`.`ItemId` AS `Inventory_ItemId`, `Inventory`.`ItemName` AS `Inventory_ItemName`, `Inventory`.`ItemCategory` AS `Inventory_ItemCategory`, `Inventory`.`ItemQuantity` AS `Inventory_ItemQuantity`, `Inventory`.`UnitsSold` AS `Inventory_UnitsSold`, `Inventory`.`Weight` AS `Inventory_Weight`, `Inventory`.`Size` AS `Inventory_Size`, `Inventory`.`Priority` AS `Inventory_Priority`, `Inventory`.`Location` AS `Inventory_Location`, `Inventory`.`Date` AS `Inventory_Date`, `Inventory`.`Dispose` AS `Inventory_Dispose` 
FROM `Inventory`
2025-08-19 06:11:54,794 INFO sqlalchemy.engine.Engine [cached since 1.058e+04s ago] {}
2025-08-19 06:11:54,797 INFO sqlalchemy.engine.Engine SELECT `Order`.`OrderId` AS `Order_OrderId`, `Order`.`ItemId` AS `Order_ItemId`, `Order`.`OrderQuantit

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


print(s.predict_location({
    "Priority": "Medium",
    "Product_Type": "Sports and Fitness",
    "Size": "Medium",
    "Order_Quantity": 12,
    "Weight": 10.78
}))

print(s.detect_anomalies(inventoryData))

result = s.demand_forecast_preprocessor(orderData, inventoryData)
result

print(s.predict_demand_forecast({
    'category': "Clothing",
    'month': "2025-05",
    'avg_price': 10.0,
    'customer_segment': "Consumer",
    'discount_rate': 0.12
}))

# Sections

## Products Overview

In [11]:
# Product Overview: This section summarizes the overall performance of products
product_overview = client.models.generate_content(
    model=MODEL_ID,
    contents='''
Generate an overview of products based on the following:

1. **Top Products**: Identify the top-selling products by sales volume and revenue over the past month.
2. **Product Performance**: Analyze the performance of each product category in terms of sales, demand, and revenue.
3. **Sales by Product**: Present a summary of sales for each product, including revenue, volume sold, and average price.

Data:
- Sales Data: {sales_data}
- Product Categories: {product_categories_data}
- Product Sales Volume: {sales_volume_data}
- Product Revenue: {product_revenue_data}
'''
)

display(Markdown(product_overview.text))


NameError: name 'Markdown' is not defined

## Category Distribution

In [None]:
# Category Distribution: This section analyzes the distribution of sales across categories
category_distribution = client.models.generate_content(
    model=MODEL_ID,
    contents='''
Generate insights on the distribution of sales across categories based on the following:

1. **Category-wise Sales Volume**: Summarize the total sales volume across all product categories.
2. **Category-wise Revenue**: Display the total revenue generated from each product category over the past month.
3. **Category-wise Performance**: Provide insights on which product categories are performing the best in terms of sales, revenue, and customer demand.

Data:
- Sales Volume by Category: {sales_volume_by_category_data}
- Revenue by Category: {revenue_by_category_data}
- Product Category Performance: {category_performance_data}
'''
)

display(Markdown(category_distribution.text))


To provide precise insights on the distribution of sales across categories, the actual data for `sales_volume_by_category_data`, `revenue_by_category_data`, and `category_performance_data` is required.

However, I can outline *how* these insights would be generated and what *types* of conclusions could be drawn once the data is provided. Please replace the placeholders with your actual sales data to get the specific analysis.

---

**Insights on Sales Distribution Across Categories (Framework for Analysis)**

**Data Required:**

*   **`sales_volume_by_category_data`**: e.g., `{"Electronics": 1500, "Apparel": 2200, "Home Goods": 800, "Books": 1000, "Food & Beverage": 3500}`
*   **`revenue_by_category_data`**: e.g., `{"Electronics": 350000, "Apparel": 120000, "Home Goods": 90000, "Books": 25000, "Food & Beverage": 70000}`
*   **`category_performance_data`**: This would ideally be a more detailed structure, possibly including trends, growth rates, and specific demand metrics.
    *   Example for a single category: `{"Electronics": {"SalesGrowthMonthOverMonth": "10%", "RevenueGrowthMonthOverMonth": "15%", "CustomerDemandScore": "High (4.5/5 reviews, frequent searches)"}}`

---

**Analysis & Insights:**

Once the data is populated, we would perform the following analysis:

---

### 1. Category-wise Sales Volume

**Analysis:**
We would sum the total sales volume across all product categories to get an overall picture of transactional activity. Then, we would break down the units sold by each category.

**Expected Insights:**
*   **High Volume Categories:** Identify which categories move the most units. This indicates products with high transactional frequency or broad appeal. (e.g., if `Food & Beverage` has the highest volume, it suggests many small, frequent purchases).
*   **Low Volume Categories:** Pinpoint categories with fewer unit sales. These might be niche products, higher-priced items, or items with longer purchase cycles.
*   **Volume Distribution:** Understand if sales volume is concentrated in a few categories or evenly distributed across many.

---

### 2. Category-wise Revenue

**Analysis:**
We would display the total revenue generated from each product category over the past month, allowing for a clear understanding of the monetary contribution of each segment.

**Expected Insights:**
*   **Revenue Drivers:** Clearly identify the "cash cow" categories that bring in the most money. (e.g., if `Electronics` has the highest revenue, it's a key financial pillar).
*   **Revenue vs. Volume Discrepancy:**
    *   **High Revenue, Moderate/Low Volume:** This suggests high-value or high-margin products (e.g., `Electronics` might have lower volume than `Food & Beverage` but significantly higher revenue). These are often critical for profitability.
    *   **High Volume, Moderate/Low Revenue:** Indicates commodity-like products or items with lower price points/margins (e.g., `Food & Beverage` might have high volume but lower total revenue compared to `Electronics`). These contribute to customer traffic and basket size.
*   **Profitability Potential:** Revenue figures are crucial for assessing the financial health and potential of each category.

---

### 3. Category-wise Performance (Sales, Revenue, Customer Demand)

**Analysis:**
This section would combine the volume and revenue data with specific performance metrics related to customer demand and growth trends.

**Expected Insights:**

*   **Best Performing Categories:**
    *   **Definition:** These categories would show a strong positive correlation across high sales volume, high revenue, and robust customer demand.
    *   **Characteristics:**
        *   **High Sales/Revenue Growth:** Consistently increasing unit sales and monetary value month-over-month.
        *   **Strong Customer Demand:** Indicated by high search interest, good conversion rates, positive customer reviews, low return rates, and potentially high repeat purchase rates.
    *   **Example:** If `Electronics` shows high revenue, steady sales growth, and a high "Customer Demand Score" (reflecting strong interest and satisfaction), it would be a top performer.

*   **Underperforming Categories:**
    *   **Definition:** Categories with stagnant or declining sales volume, low revenue contribution, and/or waning customer interest.
    *   **Characteristics:**
        *   Low or negative sales/revenue growth.
        *   Low customer engagement, poor reviews, high return rates, or declining search trends.
    *   **Example:** If `Books` have low volume, low revenue, and "Customer Demand Score" indicating declining interest, it would be an underperformer.

*   **Emerging or Niche Performers:**
    *   **Definition:** Categories that might not be top in absolute volume or revenue yet but show significant positive trends in growth or demand.
    *   **Characteristics:** Rapid month-over-month growth in sales/revenue, sudden spikes in customer interest or searches, positive early feedback.
    *   **Example:** A new category like "Sustainable Living" might have lower absolute numbers but show a "200% MoM growth" in demand and sales, marking it as an emerging opportunity.

---

**Overall Insights & Key Takeaways (Upon Data Provision):**

Once the actual data is provided, the combined analysis would allow us to:

1.  **Identify Core Business Drivers:** Pinpoint the 2-3 categories that are the absolute backbone of your sales and revenue.
2.  **Uncover Hidden Gems:** Discover categories that, despite lower volume, yield high revenue (high-margin products).
3.  **Spot Areas for Improvement:** Clearly see which categories are lagging and require intervention (e.g., marketing boost, pricing adjustment, product line review, inventory reduction).
4.  **Understand Customer Preferences:** Gain a clearer picture of where genuine customer interest and purchasing power lie.
5.  **Inform Strategic Decisions:** Guide decisions on inventory management, marketing budget allocation, product development, and resource deployment across categories.

---

**To get specific, actionable insights, please provide your sales, revenue, and performance data for each category.**

## Product Usage Forecast

In [None]:
# Product Usage Forecast: This section forecasts the future usage of products based on historical sales
product_usage_forecast = client.models.generate_content(
    model=MODEL_ID,
    contents='''
Generate a product usage forecast for the upcoming period based on historical data and trends:

1. **Sales Forecast by Product**: Predict the sales volume for each product in the next quarter/year.
2. **Demand Forecast**: Provide a demand forecast for products based on historical sales, usage probabilities, and seasonal patterns.
3. **Future Stock Levels**: Estimate the required stock levels for each product to meet forecasted demand.

Data:
- Historical Sales Data: {historical_sales_data}
- Product Usage Data: {usage_probabilities}
- Seasonal Patterns: {seasonal_sales_patterns_data}
- Current Stock Levels: {current_inventory_data}
'''
)

display(Markdown(product_usage_forecast.text))


To generate a comprehensive product usage forecast, I first need to define and simulate the `historical_sales_data`, `usage_probabilities`, `seasonal_sales_patterns_data`, and `current_inventory_data` as these were provided as placeholders.

**Disclaimer:** The following forecast is based on *simulated data* and a simplified forecasting methodology. For real-world applications, actual granular data and more sophisticated time-series analysis (e.g., ARIMA, Prophet, exponential smoothing) would be required.

---

### Executive Summary

This report provides a product usage forecast for the upcoming quarter (Q1 2024), covering sales predictions, overall demand, and recommended stock levels. Based on simulated historical data, Product A is expected to maintain its lead in sales and demand, followed by Product B and Product C, which shows significant growth potential but lower individual usage probability. Strategic ordering is recommended to ensure optimal stock levels and avoid both stockouts and excessive inventory.

---

### 1. Simulated Data Used for Forecast

To demonstrate the methodology, I've created plausible data sets:

**A. Historical Sales Data (Monthly for 2023)**

| Month | Product A (Units) | Product B (Units) | Product C (Units) |
| :---- | :------------------ | :------------------ | :------------------ |
| Jan-23 | 95                  | 48                  | 18                  |
| Feb-23 | 98                  | 47                  | 19                  |
| Mar-23 | 102                 | 50                  | 22                  |
| Apr-23 | 105                 | 52                  | 23                  |
| May-23 | 108                 | 51                  | 24                  |
| Jun-23 | 110                 | 53                  | 26                  |
| Jul-23 | 107                 | 50                  | 25                  |
| Aug-23 | 106                 | 49                  | 24                  |
| Sep-23 | 103                 | 48                  | 22                  |
| Oct-23 | 112                 | 54                  | 27                  |
| Nov-23 | 115                 | 55                  | 28                  |
| Dec-23 | 118                 | 56                  | 30                  |

**B. Product Usage Probabilities**
*(Interpretation: The probability that a sold unit of a product will be actively used or consumed within the forecast period.)*

*   **Product A:** 0.95 (95% of sold units are typically used)
*   **Product B:** 0.85 (85% of sold units are typically used)
*   **Product C:** 0.70 (70% of sold units are typically used)

**C. Seasonal Sales Patterns Data (Quarterly Multipliers)**
*(Based on general observation, Q1 is often slower, Q2/Q3 stable, Q4 peak.)*

*   **Q1 (Jan-Mar):** 0.90 (10% lower than average)
*   **Q2 (Apr-Jun):** 1.05 (5% higher than average)
*   **Q3 (Jul-Sep):** 1.00 (Average)
*   **Q4 (Oct-Dec):** 1.15 (15% higher than average)

**D. Current Stock Levels (As of End of Dec 2023)**

*   **Product A:** 200 units
*   **Product B:** 100 units
*   **Product C:** 50 units

---

### 2. Assumptions for Forecasting

*   **Forecasting Period:** Q1 2024 (January, February, March)
*   **Forecasting Method:**
    *   **Base Sales:** Average monthly sales from the preceding year (2023) are used as a baseline.
    *   **Trend:** A conservative overall growth trend of **+5%** year-over-year (applied to the quarterly average) is assumed for all products, reflecting general market expansion or internal initiatives.
    *   **Seasonality:** The predefined quarterly multipliers are applied to adjust for seasonal fluctuations.
*   **Safety Stock:** A **15% safety stock** buffer is added to the forecasted demand to mitigate against unforeseen demand spikes or supply chain delays.
*   **Lead Time:** Assumed to be short enough that stock can be acquired to meet the Q1 demand by the start of the period.
*   **Usage Probability Interpretation:** Directly applied as a multiplier to the sales forecast to derive effective usage demand.

---

### 3. Forecasting Methodology & Calculations

#### Step 1: Calculate Average Monthly/Quarterly Historical Sales (2023)

*   **Product A:** Sum = 1279 units. Average Monthly = 1279 / 12 = 106.58 units. Average Quarterly = 106.58 * 3 = **319.75 units.**
*   **Product B:** Sum = 603 units. Average Monthly = 603 / 12 = 50.25 units. Average Quarterly = 50.25 * 3 = **150.75 units.**
*   **Product C:** Sum = 288 units. Average Monthly = 288 / 12 = 24.00 units. Average Quarterly = 24.00 * 3 = **72.00 units.**

---

### 1. Sales Forecast by Product (Q1 2024)

**Formula:** `(Average Quarterly Sales 2023) * (1 + Annual Growth Trend) * Seasonal Multiplier (Q1)`

*   **Product A:**
    *   319.75 * (1 + 0.05) * 0.90
    *   319.75 * 1.05 * 0.90 = **302.16 units ≈ 302 units**
*   **Product B:**
    *   150.75 * (1 + 0.05) * 0.90
    *   150.75 * 1.05 * 0.90 = **142.46 units ≈ 142 units**
*   **Product C:**
    *   72.00 * (1 + 0.05) * 0.90
    *   72.00 * 1.05 * 0.90 = **68.04 units ≈ 68 units**

**Sales Forecast Summary (Q1 2024)**

| Product   | Forecasted Sales (Units) |
| :-------- | :----------------------- |
| Product A | 302                      |
| Product B | 142                      |
| Product C | 68                       |

---

### 2. Demand Forecast (Q1 2024)

**Formula:** `Sales Forecast * Product Usage Probability`

*   **Product A:**
    *   302 units (Sales Forecast) * 0.95 (Usage Probability) = **286.9 units ≈ 287 units**
*   **Product B:**
    *   142 units (Sales Forecast) * 0.85 (Usage Probability) = **120.7 units ≈ 121 units**
*   **Product C:**
    *   68 units (Sales Forecast) * 0.70 (Usage Probability) = **47.6 units ≈ 48 units**

**Demand Forecast Summary (Q1 2024)**

| Product   | Forecasted Demand (Units) |
| :-------- | :------------------------ |
| Product A | 287                       |
| Product B | 121                       |
| Product C | 48                        |

---

### 3. Future Stock Levels (Required to meet Q1 2024 Demand)

**Formulas:**
*   `Safety Stock = Forecasted Demand * 0.15`
*   `Required Stock Level = Forecasted Demand + Safety Stock`
*   `Units to Order = Required Stock Level - Current Stock Level` (If positive, order; if zero or negative, no order needed, or potentially surplus to manage).

*Self-correction: The "Future Stock Levels" typically refers to the *target* stock level to have at the *start* of the period to cover the period's demand. The "Units to Order" is the action needed.*

**A. Product A:**
*   Demand: 287 units
*   Safety Stock: 287 * 0.15 = 43.05 units ≈ 43 units
*   **Required Stock Level (Target):** 287 + 43 = **330 units**
*   Current Stock: 200 units
*   **Units to Order:** 330 - 200 = **130 units**

**B. Product B:**
*   Demand: 121 units
*   Safety Stock: 121 * 0.15 = 18.15 units ≈ 18 units
*   **Required Stock Level (Target):** 121 + 18 = **139 units**
*   Current Stock: 100 units
*   **Units to Order:** 139 - 100 = **39 units**

**C. Product C:**
*   Demand: 48 units
*   Safety Stock: 48 * 0.15 = 7.2 units ≈ 7 units
*   **Required Stock Level (Target):** 48 + 7 = **55 units**
*   Current Stock: 50 units
*   **Units to Order:** 55 - 50 = **5 units**

**Future Stock Levels & Ordering Summary (Q1 2024)**

| Product   | Forecasted Demand (Units) | Safety Stock (Units) | Required Stock Level (Target) | Current Stock (Units) | Units to Order (Approx.) |
| :-------- | :------------------------ | :------------------- | :---------------------------- | :-------------------- | :----------------------- |
| Product A | 287                       | 43                   | 330                           | 200                   | 130                      |
| Product B | 121                       | 18                   | 139                           | 100                   | 39                       |
| Product C | 48                        | 7                    | 55                            | 50                    | 5                        |

---

### 4. Recommendations and Next Steps

1.  **Prioritize Product A:** It continues to be the highest demand product. Ensure its supply chain is robust to meet the forecasted 130 units needed for Q1.
2.  **Monitor Product C:** While its forecasted sales and demand are lower, it has a lower usage probability (0.70), suggesting some units sold might not translate to immediate active use. The current stock is nearly sufficient, but 5 units are still needed. Its historical data shows strong growth, so continued monitoring for a higher growth trend in future forecasts might be warranted.
3.  **Review Safety Stock Levels:** The 15% safety stock is a general assumption. For critical products or those with high demand variability or long lead times, a more dynamic safety stock calculation (e.g., based on forecast error and lead time variability) would be beneficial.
4.  **Refine Forecasting Model:** For more accuracy, consider:
    *   **More Granular Data:** Weekly or daily sales data can capture short-term trends.
    *   **Advanced Models:** Time series analysis methods (ARIMA, Exponential Smoothing, Prophet) can better capture complex trends, seasonality, and cycles.
    *   **External Factors:** Incorporate market trends, marketing campaigns, competitor activities, economic indicators, and holiday effects.
    *   **Product Lifecycle:** Account for new product introductions, mature products, and products nearing end-of-life.
5.  **Iterate and Adjust:** Forecasting is an ongoing process. Regularly review actual sales and demand against forecasts and adjust the model parameters accordingly for improved accuracy in subsequent periods.

This structured forecast provides a solid basis for operational planning for Q1 2024.

## Sales Insights

In [None]:
from datetime import datetime
from dateutil.relativedelta import relativedelta

# Sales Data
sales_data = orderData

# Sales Predictions
current_date = datetime.now()
next_month_date = current_date + relativedelta(months=1)
next_month_yearmonth = next_month_date.strftime("%Y-%m")

sales_pred_input = s.demand_forecast_preprocessor(orderData, inventoryData)
sales_predictions = []

for index, row in sales_pred_input.iterrows():
    sales_predictions.append({
        'Category Name': row.Category,
        'Customer Segment': row.CustomerSegment,
        'Predicted Demand for next month': s.predict_demand_forecast({
            'category': row.Category,
            'month': next_month_yearmonth,
            'avg_price': row.AveragePrice,
            'customer_segment': row.CustomerSegment,
            'discount_rate': row.AverageDiscount
        })[0]
    })

# Product Categories
product_categories = ["Clothing","Technology","Sports and Fitness","Other"]

# Current Inventory
current_inventory = inventoryData

# Usage Probabilities
usage_probabilities = "Currently empty. Please ignore this section for now."

configuration generated by an older version of XGBoost, please export the model by calling
`Booster.save_model` from that version first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html

for more details about differences between saving model and serializing.

  setstate(state)
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [None]:
# Sales Insights Section
section_sales_insights = client.models.generate_content(
    model=MODEL_ID,
    contents=
f'''Generate a sales insights report that describes information for the following areas:

1. Sales Trends: Summarize sales based on the provided data. Provide insights on which product categories are seeing the highest demand.
2. Product Performance: Analyze the best-selling product categories by quantity. Highlight the top 3 performing categories.
3. Product Demand Forecast: Based on the historical sales and usage probability, forecast the demand for the next month.
4. Restocking or Discontinuation: Recommend which products should be restocked and which should be discontinued, based on sales trends and inventory levels.

Data:
- Historical sales data: {sales_data}
- Sales volume predictions for next month: {sales_predictions}
- Product categories: {product_categories}
- Current inventory levels: {current_inventory}
- Usage probabilities: {usage_probabilities}'''
)

display(Markdown(section_sales_insights.text))

## Storage Optimizations

In [None]:
location_predictions = []
for item in inventoryData:
    location_predictions.append({
        'Item Id': item['ItemId'],
        'Current Location': item['Location'],
        'Predicted Location': s.predict_location({
            'Priority': item['Priority'],
            'Product_Type': item['Category'],
            'Size': item['Size'],
            'Order_Quantity': item['Quantity'],
            'Weight': item['Weight']
        })[0]
    })

section_storage_optimizations = client.models.generate_content(
    model=MODEL_ID,
    contents=
f'''Provide detailed storage optimization recommendations based on:

1. Current storage utilization metrics
2. Model-predicted optimal locations vs current locations
3. List of items flagged for relocation, including:
   - Current location
   - Recommended location

Data:
{inventoryData}
{location_predictions}
'''
)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


display(Markdown(section_storage_optimizations.text))

## Anomalies Detected

In [None]:
section_anomalies_detected = client.models.generate_content(
    model=MODEL_ID,
    contents=
f'''Generate an anomalies section that lists all detected storage anomalies detected in a table. Include each item's current location, predicted location, item id, and name.
Include the reason for each anomaly.

Data:
{s.detect_anomalies(inventoryData)}
'''
)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


display(Markdown(section_anomalies_detected.text))

## Summary

In [None]:
# Read Markdown content
monthly_report = f'''<h1 style="text-align:center;">Monthly Report</h1><br>

# Products Overview:
section_products_overview.text

# Category Distribution:
section_category_distribution.text

# Product Usage Forecast:
section_product_usage.text

# Sales Insights:
{section_sales_insights.text}

# Storage Optimizations:
{section_storage_optimizations.text}

# Anomalies Detected:
{section_anomalies_detected.text}'''

In [None]:
section_summary = client.models.generate_content(
    model=MODEL_ID,
    contents=
f'''Provide a brief and concise summary of the provided report, covering all the key points highlighted by each section. Highlight the important information for each section in a bullet list, with the final paragraph providing general insight on overall performance.

Report:
{monthly_report}'''
)

display(Markdown(section_summary.text))

# PDF File Generation

In [None]:
current_date = datetime.now()
monthly_report = f"""# <h1 style="text-align:center;">Monthly Report ({current_date.date()})</h1><br>

# Products Overview:
section_products_overview.text

# Category Distribution:
section_category_distribution.text

# Product Usage Forecast:
section_product_usage.text

# Sales Insights:
{section_sales_insights.text}

# Storage Optimizations:
{section_storage_optimizations.text}

# Anomalies Detected:
{section_anomalies_detected.text}

# Summary:
{section_summary.text}"""

In [None]:
from markdown_pdf import MarkdownPdf, Section

pdf = MarkdownPdf(toc_level=1)
pdf.add_section(Section(monthly_report)) # Add Section(md_content, user_css=css_content) for custom CSS
pdf.save(f"MonthlyReport_({current_date.date()}).pdf")