In [None]:
# Waste Reduction System: Dead Stock Prediction & Recommendation Engine

This notebook implements a system to:
1. Predict dead stock (products likely to expire before being sold)
2. Recommend soon-to-expire items to appropriate users to minimize wastage

## Phase 1: Data Foundation


In [None]:
### Step 1.1: Import Required Libraries


In [None]:
# Data manipulation and analysis
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import classification_report, confusion_matrix, mean_squared_error
from sklearn.metrics.pairwise import cosine_similarity

# Warnings
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Libraries imported successfully!")


: 

In [None]:
### Step 1.2: Load the Datasets

We'll load the three datasets generated by the walmart_new.py script:
- **fake_users.csv**: Customer profiles with dietary preferences, allergies, and shopping habits
- **fake_products.csv**: Product inventory with expiry dates, prices, and nutritional info
- **fake_transactions.csv**: Historical purchase data linking users and products


In [None]:
# Load the datasets
users_df = pd.read_csv('../datasets/fake_users.csv')
products_df = pd.read_csv('../datasets/fake_products.csv')
transactions_df = pd.read_csv('../datasets/fake_transactions.csv')

print("Dataset shapes:")
print(f"Users: {users_df.shape}")
print(f"Products: {products_df.shape}")
print(f"Transactions: {transactions_df.shape}")

# Convert date columns to datetime
date_columns = {
    'users_df': ['last_purchase_date'],
    'products_df': ['packaging_date', 'expiry_date'],
    'transactions_df': ['purchase_date']
}

for df_name, cols in date_columns.items():
    df = eval(df_name)
    for col in cols:
        if col in df.columns:
            df[col] = pd.to_datetime(df[col])

print("\nDate columns converted successfully!")


In [None]:
### Step 1.3: Data Exploration and Understanding

Let's explore each dataset to understand the data structure and identify key features for our models.


In [None]:
# Explore Users Dataset
print("USERS DATASET OVERVIEW:")
print("="*50)
print(users_df.head())
print("\nColumn Info:")
print(users_df.info())
print("\nDiet Type Distribution:")
print(users_df['diet_type'].value_counts())
print("\nDiscount Preference Distribution:")
print(users_df['prefers_discount'].value_counts())


In [None]:
# Explore Products Dataset
print("\nPRODUCTS DATASET OVERVIEW:")
print("="*50)
print(products_df.head())
print("\nColumn Info:")
print(products_df.info())
print("\nCategory Distribution:")
print(products_df['category'].value_counts())
print("\nCurrent Discount Distribution:")
print(products_df['current_discount_percent'].value_counts())


In [None]:
# Explore Transactions Dataset
print("\nTRANSACTIONS DATASET OVERVIEW:")
print("="*50)
print(transactions_df.head())
print("\nColumn Info:")
print(transactions_df.info())
print("\nTransaction Statistics:")
print(transactions_df.describe())


In [None]:
## Phase 2: Dead Stock Prediction

### Step 2.1: Feature Engineering for Dead Stock Prediction

To predict dead stock, we need to engineer features that capture:
1. **Product shelf life characteristics** - How long until expiry?
2. **Sales velocity** - How fast is the product selling?
3. **Inventory turnover** - What percentage of stock moves in a given time?
4. **Product characteristics** - Category, price, discount patterns


In [None]:
# Calculate current date for reference
current_date = pd.Timestamp.now()

# Add days until expiry for each product
products_df['days_until_expiry'] = (products_df['expiry_date'] - current_date).dt.days
products_df['total_shelf_life'] = (products_df['expiry_date'] - products_df['packaging_date']).dt.days
products_df['shelf_life_remaining_pct'] = products_df['days_until_expiry'] / products_df['total_shelf_life'] * 100

# Calculate sales metrics for each product
sales_metrics = transactions_df.groupby('product_id').agg({
    'quantity': ['sum', 'mean', 'count'],
    'purchase_date': ['min', 'max'],
    'discount_percent': 'mean',
    'user_engaged_with_deal': 'mean'
}).reset_index()

# Flatten column names
sales_metrics.columns = ['product_id', 'total_quantity_sold', 'avg_quantity_per_sale', 
                        'number_of_sales', 'first_sale_date', 'last_sale_date',
                        'avg_discount_given', 'avg_user_engagement']

# Calculate days since last sale
sales_metrics['days_since_last_sale'] = (current_date - sales_metrics['last_sale_date']).dt.days

# Calculate sales velocity (units sold per day)
sales_metrics['days_on_market'] = (sales_metrics['last_sale_date'] - sales_metrics['first_sale_date']).dt.days + 1
sales_metrics['sales_velocity'] = sales_metrics['total_quantity_sold'] / sales_metrics['days_on_market']

# Merge sales metrics with products
products_enhanced = products_df.merge(sales_metrics, on='product_id', how='left')

# Fill NaN values for products with no sales
products_enhanced['total_quantity_sold'].fillna(0, inplace=True)
products_enhanced['number_of_sales'].fillna(0, inplace=True)
products_enhanced['sales_velocity'].fillna(0, inplace=True)
products_enhanced['days_since_last_sale'].fillna(999, inplace=True)  # Large number for never sold

print("Enhanced Product Features:")
print(products_enhanced[['product_id', 'name', 'days_until_expiry', 'total_quantity_sold', 
                        'sales_velocity', 'days_since_last_sale']].head(10))


In [None]:
### Step 2.2: Define Dead Stock

We'll define a product as "dead stock" if:
1. It has less than 30 days until expiry AND
2. Its sales velocity suggests it won't sell out before expiry

This is a practical definition that helps us identify products at risk of wastage.


In [None]:
# Define dead stock based on our criteria
def calculate_dead_stock_risk(row):
    """
    Calculate if a product is at risk of becoming dead stock.
    Returns: 1 if dead stock risk, 0 otherwise
    """
    # If already expired
    if row['days_until_expiry'] <= 0:
        return 1
    
    # If no sales history and less than 30 days to expiry
    if row['sales_velocity'] == 0 and row['days_until_expiry'] < 30:
        return 1
    
    # If sales velocity suggests won't sell out before expiry
    # Assuming we need to sell at least 80% of typical inventory
    if row['sales_velocity'] > 0:
        projected_sales = row['sales_velocity'] * row['days_until_expiry']
        # Assuming average inventory of 100 units per product
        if projected_sales < 80 and row['days_until_expiry'] < 30:
            return 1
    
    return 0

# Apply the function to create labels
products_enhanced['is_dead_stock_risk'] = products_enhanced.apply(calculate_dead_stock_risk, axis=1)

# Create additional risk score (continuous variable)
products_enhanced['dead_stock_risk_score'] = (
    (30 - products_enhanced['days_until_expiry'].clip(lower=0, upper=30)) / 30 * 0.5 +  # Expiry urgency
    (1 - products_enhanced['sales_velocity'].clip(upper=5) / 5) * 0.3 +  # Low sales velocity
    (products_enhanced['days_since_last_sale'].clip(upper=30) / 30) * 0.2  # Stagnation
)

print(f"Dead Stock Risk Distribution:")
print(products_enhanced['is_dead_stock_risk'].value_counts())
print(f"\nPercentage of products at risk: {products_enhanced['is_dead_stock_risk'].mean() * 100:.1f}%")


In [None]:
### Step 2.3: Build Dead Stock Prediction Model

We'll use a Random Forest Classifier to predict dead stock risk based on product features. This model will help us proactively identify products that need intervention.


In [None]:
# Prepare features for the model
feature_columns = [
    'days_until_expiry', 'total_shelf_life', 'shelf_life_remaining_pct',
    'total_quantity_sold', 'avg_quantity_per_sale', 'number_of_sales',
    'days_since_last_sale', 'sales_velocity', 'price_mrp', 
    'current_discount_percent', 'weight_grams'
]

# One-hot encode categorical variables
categorical_features = ['category', 'diet_type']
products_encoded = pd.get_dummies(products_enhanced, columns=categorical_features, prefix=categorical_features)

# Update feature columns to include encoded features
encoded_columns = [col for col in products_encoded.columns if any(cat in col for cat in categorical_features)]
feature_columns.extend(encoded_columns)

# Prepare X and y
X = products_encoded[feature_columns].fillna(0)
y = products_encoded['is_dead_stock_risk']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set shape: {X_train.shape}")
print(f"Test set shape: {X_test.shape}")
print(f"Class distribution in training set:")
print(y_train.value_counts(normalize=True))


In [None]:
# Train the Random Forest model
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42,
    class_weight='balanced'  # Handle class imbalance
)

rf_model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = rf_model.predict(X_test_scaled)
y_pred_proba = rf_model.predict_proba(X_test_scaled)[:, 1]

# Evaluate the model
print("Model Performance:")
print("="*50)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Feature importance
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False).head(10)

print("\nTop 10 Most Important Features:")
print(feature_importance)


In [None]:
## Phase 3: Recommendation System

### Step 3.1: Build User-Product Compatibility Matrix

The recommendation system will match users to products based on:
1. **Dietary compatibility** - Matching user dietary restrictions with product attributes
2. **Allergy safety** - Ensuring products don't contain user allergens
3. **Price sensitivity** - Matching discount preferences
4. **Category preferences** - Recommending from preferred categories
5. **Urgency** - Prioritizing products closer to expiry


In [None]:
# Create recommendation system functions
def is_diet_compatible(user_diet, product_diet):
    """Check if product diet type is compatible with user's diet"""
    diet_hierarchy = {
        "non-vegetarian": 3,
        "eggs": 2,
        "vegetarian": 1,
        "vegan": 0
    }
    return diet_hierarchy.get(product_diet, 0) <= diet_hierarchy.get(user_diet, 3)

def is_allergen_safe(user_allergies, product_allergens):
    """Check if product is safe for user's allergies"""
    if isinstance(user_allergies, str):
        user_allergies = eval(user_allergies) if user_allergies != '[]' else []
    if isinstance(product_allergens, str):
        product_allergens = eval(product_allergens) if product_allergens != '[]' else []
    
    return not any(allergen in product_allergens for allergen in user_allergies)

def calculate_user_product_score(user, product, urgency_weight=0.3):
    """
    Calculate compatibility score between user and product
    Higher score = better match
    """
    score = 0
    
    # Diet compatibility (mandatory)
    if not is_diet_compatible(user['diet_type'], product['diet_type']):
        return 0
    
    # Allergen safety (mandatory)
    if not is_allergen_safe(user['allergies'], product['allergens']):
        return 0
    
    # Category preference (0-30 points)
    if isinstance(user['preferred_categories'], str):
        preferred_cats = eval(user['preferred_categories'])
    else:
        preferred_cats = user['preferred_categories']
    
    if product['category'] in preferred_cats:
        score += 30
    
    # Discount preference (0-20 points)
    if user['prefers_discount'] and product['current_discount_percent'] > 0:
        score += min(20, product['current_discount_percent'] / 2)
    elif not user['prefers_discount'] and product['current_discount_percent'] == 0:
        score += 10
    
    # Price range compatibility (0-20 points)
    # Assuming users prefer products in moderate price range
    if 50 <= product['price_mrp'] <= 200:
        score += 20
    elif 200 < product['price_mrp'] <= 350:
        score += 10
    
    # Urgency score for expiring products (0-30 points)
    if product['days_until_expiry'] <= 7:
        urgency_score = 30
    elif product['days_until_expiry'] <= 14:
        urgency_score = 20
    elif product['days_until_expiry'] <= 30:
        urgency_score = 10
    else:
        urgency_score = 0
    
    score += urgency_score * urgency_weight
    
    # Dead stock risk bonus (0-20 points)
    if product.get('is_dead_stock_risk', 0) == 1:
        score += 20
    
    return score

print("Recommendation system functions created successfully!")


In [None]:
### Step 3.2: Generate Recommendations

Now let's create a function that generates personalized recommendations for each user, prioritizing products at risk of becoming dead stock.


In [None]:
def get_user_recommendations(user_id, products_df, users_df, top_n=5, focus_on_expiring=True):
    """
    Get top N product recommendations for a specific user
    """
    # Get user data
    user = users_df[users_df['user_id'] == user_id].iloc[0]
    
    # Filter products that are still available (not expired)
    available_products = products_df[products_df['days_until_expiry'] > 0].copy()
    
    # If focusing on expiring products, filter to those expiring soon
    if focus_on_expiring:
        available_products = available_products[
            (available_products['days_until_expiry'] <= 30) | 
            (available_products['is_dead_stock_risk'] == 1)
        ]
    
    # Calculate scores for all available products
    scores = []
    for _, product in available_products.iterrows():
        score = calculate_user_product_score(user, product)
        if score > 0:  # Only include compatible products
            scores.append({
                'product_id': product['product_id'],
                'product_name': product['name'],
                'category': product['category'],
                'days_until_expiry': product['days_until_expiry'],
                'current_discount': product['current_discount_percent'],
                'price': product['price_mrp'],
                'compatibility_score': score,
                'is_dead_stock_risk': product['is_dead_stock_risk']
            })
    
    # Sort by score and return top N
    recommendations = sorted(scores, key=lambda x: x['compatibility_score'], reverse=True)[:top_n]
    
    return pd.DataFrame(recommendations)

# Test the recommendation system with a sample user
sample_user_id = users_df['user_id'].iloc[0]
recommendations = get_user_recommendations(sample_user_id, products_enhanced, users_df)

print(f"Recommendations for User {sample_user_id}:")
print("="*80)
print(f"User Profile:")
user_info = users_df[users_df['user_id'] == sample_user_id].iloc[0]
print(f"- Diet Type: {user_info['diet_type']}")
print(f"- Allergies: {user_info['allergies']}")
print(f"- Prefers Discount: {user_info['prefers_discount']}")
print(f"- Preferred Categories: {user_info['preferred_categories']}")
print("\nRecommended Products:")
print(recommendations)


In [None]:
## Phase 4: Integrated Waste Reduction System

### Step 4.1: Create the Complete System

Now we'll integrate both components into a comprehensive waste reduction system that:
1. Identifies products at risk of becoming dead stock
2. Matches these products with the most suitable users
3. Generates targeted recommendations to minimize waste


In [None]:
class WasteReductionSystem:
    """
    Integrated system for predicting dead stock and recommending products to minimize waste
    """
    
    def __init__(self, users_df, products_df, transactions_df, model=None, scaler=None):
        self.users_df = users_df
        self.products_df = products_df
        self.transactions_df = transactions_df
        self.model = model
        self.scaler = scaler
        self.products_enhanced = None
        
    def enhance_product_features(self):
        """Add calculated features to products dataframe"""
        current_date = pd.Timestamp.now()
        
        # Copy products_df to avoid modifying original
        self.products_enhanced = self.products_df.copy()
        
        # Add time-based features
        self.products_enhanced['days_until_expiry'] = (
            self.products_enhanced['expiry_date'] - current_date
        ).dt.days
        
        # Calculate sales metrics
        sales_metrics = self.transactions_df.groupby('product_id').agg({
            'quantity': ['sum', 'mean', 'count'],
            'purchase_date': ['min', 'max'],
            'discount_percent': 'mean'
        }).reset_index()
        
        sales_metrics.columns = ['product_id', 'total_quantity_sold', 'avg_quantity_per_sale',
                                'number_of_sales', 'first_sale_date', 'last_sale_date',
                                'avg_discount_given']
        
        # Calculate sales velocity
        sales_metrics['days_since_last_sale'] = (
            current_date - sales_metrics['last_sale_date']
        ).dt.days
        sales_metrics['days_on_market'] = (
            sales_metrics['last_sale_date'] - sales_metrics['first_sale_date']
        ).dt.days + 1
        sales_metrics['sales_velocity'] = (
            sales_metrics['total_quantity_sold'] / sales_metrics['days_on_market']
        )
        
        # Merge with products
        self.products_enhanced = self.products_enhanced.merge(
            sales_metrics, on='product_id', how='left'
        )
        
        # Fill NaN values
        self.products_enhanced['sales_velocity'].fillna(0, inplace=True)
        self.products_enhanced['days_since_last_sale'].fillna(999, inplace=True)
        
        # Calculate dead stock risk
        self.products_enhanced['is_dead_stock_risk'] = self.products_enhanced.apply(
            calculate_dead_stock_risk, axis=1
        )
        
    def get_dead_stock_products(self, threshold_days=30):
        """Get products at risk of becoming dead stock"""
        if self.products_enhanced is None:
            self.enhance_product_features()
            
        at_risk = self.products_enhanced[
            (self.products_enhanced['is_dead_stock_risk'] == 1) |
            (self.products_enhanced['days_until_expiry'] <= threshold_days)
        ].sort_values('days_until_expiry')
        
        return at_risk[['product_id', 'name', 'category', 'days_until_expiry',
                       'sales_velocity', 'total_quantity_sold', 'current_discount_percent']]
    
    def generate_waste_reduction_recommendations(self, top_n_per_user=3):
        """
        Generate recommendations for all users focusing on products at risk
        """
        if self.products_enhanced is None:
            self.enhance_product_features()
        
        # Get products at risk
        at_risk_products = self.products_enhanced[
            self.products_enhanced['is_dead_stock_risk'] == 1
        ]
        
        all_recommendations = []
        
        for _, user in self.users_df.iterrows():
            user_recs = get_user_recommendations(
                user['user_id'], 
                at_risk_products, 
                self.users_df,
                top_n=top_n_per_user,
                focus_on_expiring=True
            )
            
            if not user_recs.empty:
                user_recs['user_id'] = user['user_id']
                all_recommendations.append(user_recs)
        
        if all_recommendations:
            return pd.concat(all_recommendations, ignore_index=True)
        else:
            return pd.DataFrame()
    
    def calculate_potential_waste_reduction(self):
        """
        Calculate metrics showing potential waste reduction
        """
        if self.products_enhanced is None:
            self.enhance_product_features()
            
        at_risk_products = self.products_enhanced[
            self.products_enhanced['is_dead_stock_risk'] == 1
        ]
        
        metrics = {
            'total_products': len(self.products_enhanced),
            'products_at_risk': len(at_risk_products),
            'percentage_at_risk': len(at_risk_products) / len(self.products_enhanced) * 100,
            'products_expiring_7_days': len(
                self.products_enhanced[self.products_enhanced['days_until_expiry'] <= 7]
            ),
            'products_expiring_30_days': len(
                self.products_enhanced[self.products_enhanced['days_until_expiry'] <= 30]
            ),
            'avg_days_until_expiry_at_risk': at_risk_products['days_until_expiry'].mean()
        }
        
        return metrics

# Initialize the system
waste_reduction_system = WasteReductionSystem(
    users_df=users_df,
    products_df=products_df,
    transactions_df=transactions_df,
    model=rf_model,
    scaler=scaler
)

# Enhance product features
waste_reduction_system.enhance_product_features()

print("Waste Reduction System initialized successfully!")


In [None]:
### Step 4.2: Demonstrate the System

Let's see the waste reduction system in action by:
1. Identifying products at risk
2. Generating targeted recommendations
3. Calculating potential impact


In [None]:
# 1. Get waste reduction metrics
metrics = waste_reduction_system.calculate_potential_waste_reduction()
print("WASTE REDUCTION METRICS:")
print("="*50)
for key, value in metrics.items():
    if isinstance(value, float):
        print(f"{key}: {value:.2f}")
    else:
        print(f"{key}: {value}")

# 2. Get products at risk of becoming dead stock
print("\n\nPRODUCTS AT RISK OF BECOMING DEAD STOCK:")
print("="*50)
dead_stock_products = waste_reduction_system.get_dead_stock_products()
print(f"Found {len(dead_stock_products)} products at risk")
print("\nTop 10 most urgent products:")
print(dead_stock_products.head(10))


In [None]:
# 3. Generate recommendations to reduce waste
print("\n\nGENERATING WASTE REDUCTION RECOMMENDATIONS:")
print("="*50)
recommendations = waste_reduction_system.generate_waste_reduction_recommendations(top_n_per_user=2)

if not recommendations.empty:
    print(f"Generated {len(recommendations)} total recommendations")
    print(f"Covering {recommendations['user_id'].nunique()} users")
    print(f"Targeting {recommendations['product_id'].nunique()} at-risk products")
    
    # Show sample recommendations
    print("\n\nSAMPLE RECOMMENDATIONS:")
    print("="*50)
    sample_users = recommendations['user_id'].unique()[:3]
    
    for user_id in sample_users:
        user_recs = recommendations[recommendations['user_id'] == user_id]
        user_info = users_df[users_df['user_id'] == user_id].iloc[0]
        
        print(f"\nUser {user_id}:")
        print(f"  Diet: {user_info['diet_type']}, Prefers Discount: {user_info['prefers_discount']}")
        print("  Recommendations:")
        for _, rec in user_recs.iterrows():
            print(f"    - {rec['product_name']} ({rec['category']})")
            print(f"      Days until expiry: {rec['days_until_expiry']}, Score: {rec['compatibility_score']:.1f}")
else:
    print("No recommendations generated (possibly no at-risk products match user preferences)")


In [None]:
### Step 4.3: Visualize System Performance

Let's create visualizations to better understand the system's impact and performance.


In [None]:
# Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Distribution of days until expiry
ax1 = axes[0, 0]
products_enhanced['days_until_expiry'].hist(bins=30, ax=ax1, color='skyblue', edgecolor='black')
ax1.axvline(x=30, color='red', linestyle='--', label='30-day threshold')
ax1.set_xlabel('Days Until Expiry')
ax1.set_ylabel('Number of Products')
ax1.set_title('Distribution of Product Expiry Times')
ax1.legend()

# 2. Dead stock risk by category
ax2 = axes[0, 1]
risk_by_category = products_enhanced.groupby('category')['is_dead_stock_risk'].mean() * 100
risk_by_category.plot(kind='bar', ax=ax2, color='coral')
ax2.set_xlabel('Category')
ax2.set_ylabel('% Products at Risk')
ax2.set_title('Dead Stock Risk by Product Category')
ax2.tick_params(axis='x', rotation=45)

# 3. Sales velocity distribution
ax3 = axes[1, 0]
products_enhanced[products_enhanced['sales_velocity'] > 0]['sales_velocity'].hist(
    bins=30, ax=ax3, color='lightgreen', edgecolor='black'
)
ax3.set_xlabel('Sales Velocity (units/day)')
ax3.set_ylabel('Number of Products')
ax3.set_title('Distribution of Sales Velocity (excluding zero sales)')

# 4. Recommendation coverage
ax4 = axes[1, 1]
if not recommendations.empty:
    coverage_data = pd.DataFrame({
        'Total Products': [len(products_enhanced)],
        'At Risk': [len(products_enhanced[products_enhanced['is_dead_stock_risk'] == 1])],
        'Recommended': [recommendations['product_id'].nunique()]
    })
    coverage_data.plot(kind='bar', ax=ax4, color=['blue', 'orange', 'green'])
    ax4.set_xlabel('Category')
    ax4.set_ylabel('Number of Products')
    ax4.set_title('Recommendation System Coverage')
    ax4.tick_params(axis='x', rotation=0)
else:
    ax4.text(0.5, 0.5, 'No recommendations generated', 
             horizontalalignment='center', verticalalignment='center')
    ax4.set_title('Recommendation System Coverage')

plt.tight_layout()
plt.show()


In [None]:
## Summary and Usage Guide

### System Overview

This Waste Reduction System combines two powerful components:

1. **Dead Stock Prediction Model**
   - Uses Random Forest to predict products at risk of expiring unsold
   - Key features: days until expiry, sales velocity, days since last sale
   - Helps identify products needing immediate attention

2. **Smart Recommendation Engine**
   - Matches at-risk products with compatible users
   - Considers: dietary restrictions, allergies, price preferences, category preferences
   - Prioritizes products closest to expiry

### How to Use the System

```python
# 1. Initialize the system
system = WasteReductionSystem(users_df, products_df, transactions_df)

# 2. Get products at risk
at_risk = system.get_dead_stock_products(threshold_days=30)

# 3. Generate recommendations
recommendations = system.generate_waste_reduction_recommendations(top_n_per_user=5)

# 4. Get metrics
metrics = system.calculate_potential_waste_reduction()
```

### Business Impact

- **Reduced Waste**: Proactively identifies products before they expire
- **Increased Revenue**: Converts potential losses into sales through targeted recommendations
- **Customer Satisfaction**: Users receive personalized deals on products they can actually use
- **Sustainability**: Contributes to reducing food waste and environmental impact


In [None]:
### Step 2.3: Build Dead Stock Prediction Model

We'll use a Random Forest Classifier to predict dead stock risk based on product features. This model will help us proactively identify products that need intervention.
