# Promotion Response Prediction Model

This notebook builds a machine learning model to predict the impact of promotions on sales and customer response.

## Objectives:
- Analyze promotion effectiveness
- Build predictive model for promotion response
- Evaluate model performance
- Provide actionable insights for promotion strategy

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
from sklearn.preprocessing import StandardScaler
import snowflake.connector
from snowflake.connector.pandas_tools import pd_read_sql

plt.style.use('default')
sns.set_palette("husl")

print("Libraries imported successfully!")

In [None]:
# Connect to Snowflake
conn_params = {
    'user': 'workshop_user',
    'password': 'VotreMotDePasse123!',
    'account': 'dnb65599.snowflakecomputing.com',
    'warehouse': 'ANYCOMPANY_WH',
    'database': 'ANYCOMPANY_LAB',
    'schema': 'ANALYTICS'
}

conn = snowflake.connector.connect(**conn_params)
print("Connected to Snowflake!")

In [None]:
# Load ML features
query = """
SELECT * FROM ANALYTICS.ml_features
"""

df = pd_read_sql(query, conn)
print(f"Loaded {len(df)} records")
print(df.head())
print("\nData info:")
print(df.info())

In [None]:
# Data preprocessing
# Features for prediction
features = ['month', 'day_of_week', 'is_weekend', 'has_promotion', 
           'discount_percentage', 'promotion_duration', 'avg_amount_region_month',
           'transaction_count_region_month']

# Target: high value transaction (proxy for promotion success)
target = 'high_value_transaction'

# Prepare data
X = df[features].fillna(0)
y = df[target]

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print(f"Features: {features}")
print(f"Target: {target}")
print(f"Class distribution: {y.value_counts(normalize=True)}")

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.3, random_state=42, stratify=y
)

print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")

In [None]:
# Train Random Forest model
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    class_weight='balanced'
)

rf_model.fit(X_train, y_train)

# Make predictions
y_pred = rf_model.predict(X_test)
y_pred_proba = rf_model.predict_proba(X_test)[:, 1]

print("Model trained successfully!")

In [None]:
# Model evaluation
print("Classification Report:")
print(classification_report(y_test, y_pred))

print(f"ROC AUC Score: {roc_auc_score(y_test, y_pred_proba):.3f}")

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Low Value', 'High Value'],
            yticklabels=['Low Value', 'High Value'])
plt.title('Confusion Matrix')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

In [None]:
# Feature importance
feature_importance = pd.DataFrame({
    'feature': features,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x='importance', y='feature', data=feature_importance)
plt.title('Feature Importance for Promotion Response Prediction')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()

print("Top 5 most important features:")
print(feature_importance.head())

## Business Insights and Recommendations

### Key Findings:
1. **Promotion Impact**: [Analysis based on model results]
2. **Important Features**: [List key drivers]
3. **Model Performance**: [ROC AUC, precision, recall]

### Recommendations:
1. **Promotion Strategy**: Focus on high-impact promotions
2. **Timing**: Optimal timing for promotions
3. **Targeting**: Target customers likely to respond
4. **Discount Levels**: Optimal discount percentages

### Next Steps:
- Deploy model for real-time prediction
- A/B test recommended promotions
- Monitor and update model regularly

In [None]:
# Close connection
conn.close()
print("Analysis completed and connection closed!")