# Revenue Forecast Explorer üìä

**Interactive Notebook for Finance Team**

This notebook helps you explore revenue forecasts from Snowflake using:
- **SQL** - Query data directly from Snowflake tables
- **Python** - Create beautiful charts and visualizations  
- **Markdown** - Plain English explanations

---

## What You'll Learn

1. How to connect to Snowflake
2. How to query forecast data
3. How to create charts and visualizations
4. How to export data for presentations

No coding experience required! Just run each cell in order.

## 1. Setup

Import the Python libraries we need for data analysis and visualizations.

**What's happening here?**
- We load Python tools for working with data (pandas) and charts (plotly)
- The Snowflake `session` object is already available in Snowflake Notebooks
- No manual connection needed!

In [None]:
# Import libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Get Snowflake session (automatically available in Snowflake Notebooks)
from snowflake.snowpark.context import get_active_session
session = get_active_session()

print("‚úÖ Libraries loaded successfully!")
print(f"‚úÖ Connected to: {session.get_current_database()}.{session.get_current_schema()}")
print(f"‚úÖ Warehouse: {session.get_current_warehouse()}")

## 2. Explore Available Models

Let's see what forecast models are available in our database.

**What's this query doing?**
- Lists all revenue forecast models that have been run
- Shows when each model was created
- Helps you pick which model to analyze

In [None]:
# Query: Get list of available forecast models with performance metrics
query = """
WITH model_metrics AS (
    SELECT 
        m.model_run_id,
        MAX(CASE WHEN m.metric_name = 'WAPE' THEN ROUND(m.value * 100, 2) END) AS wape_pct,
        MAX(CASE WHEN m.metric_name = 'MAE' THEN ROUND(m.value, 2) END) AS mae,
        MAX(CASE WHEN m.metric_name = 'RMSE' THEN ROUND(m.value, 2) END) AS rmse,
        MAX(CASE WHEN m.metric_name = 'BIAS' THEN ROUND(m.value * 100, 2) END) AS bias_pct,
        MAX(CASE WHEN m.metric_name = 'MAPE_EPS' THEN ROUND(m.value * 100, 2) END) AS mape_eps_pct,
        MAX(CASE WHEN m.metric_name = 'MASE' THEN ROUND(m.value, 2) END) AS mase
    FROM DB_BI_P_SANDBOX.SANDBOX.FORECAST_MODEL_METRICS m
    WHERE m.metric_scope = 'OVERALL'
        AND m.horizon IS NULL
    GROUP BY m.model_run_id
)
SELECT 
    r.model_run_id,
    r.model_family,
    r.asof_fiscal_yyyymm AS forecast_as_of_month,
    r.updated_at,
    m.wape_pct,
    m.mae,
    m.rmse,
    m.bias_pct,
    m.mape_eps_pct,
    m.mase
FROM DB_BI_P_SANDBOX.SANDBOX.FORECAST_MODEL_RUNS r
LEFT JOIN model_metrics m ON m.model_run_id = r.model_run_id
ORDER BY r.updated_at DESC, m.wape_pct ASC
LIMIT 20
"""

models_df = session.sql(query).to_pandas()
print(f"üìä Found {len(models_df)} recent models")

# Show best performing model if WAPE is available
if 'WAPE_PCT' in models_df.columns and models_df['WAPE_PCT'].notna().any():
    best_idx = models_df['WAPE_PCT'].idxmin()
    print(f"üèÜ Best performing model: {models_df.loc[best_idx, 'MODEL_RUN_ID']}")
    print(f"   WAPE: {models_df.loc[best_idx, 'WAPE_PCT']:.2f}%")
    print(f"   MAE: {models_df.loc[best_idx, 'MAE']:.2f}")
    print(f"   RMSE: {models_df.loc[best_idx, 'RMSE']:.2f}")
else:
    print("‚ö†Ô∏è  Metrics not available for these models")

models_df

## 3. Load Forecast Data

Now let's load the actual forecast data from the most recent model.

**üí° Tip:** Copy a `model_run_id` from the table above and paste it below to analyze a specific model.

In [None]:
# Pick the most recent model (or paste a specific model_run_id here)
selected_model = models_df.iloc[0]['MODEL_RUN_ID']
print(f"üìå Analyzing model: {selected_model}")

# Query: Load forecast data for selected model
forecast_query = f"""
SELECT 
    FORECAST_MONTH,
    FORECAST_FISCAL_YEAR,
    FORECAST_FISCAL_PERIOD,
    ROLL_UP_SHOP AS PC,
    ROLL_UP_SHOP_NAME AS PC_NAME,
    CUSTOMER_GROUP,
    REASON_CODE_GROUP,
    FORECAST_AMOUNT AS FORECAST,
    FORECAST_LO80 AS FORECAST_LOW,
    FORECAST_HI80 AS FORECAST_HIGH,
    Y_TRUE AS ACTUAL
FROM DB_BI_P_SANDBOX.SANDBOX.vw_forecast_report_mart
WHERE MODEL_RUN_ID = '{selected_model}'
    AND HORIZON = 1
LIMIT 50000
"""

forecast_df = session.sql(forecast_query).to_pandas()
print(f"‚úÖ Loaded {len(forecast_df):,} forecast rows")
print(f"üìÖ Date range: {forecast_df['FORECAST_MONTH'].min()} to {forecast_df['FORECAST_MONTH'].max()}")
forecast_df.head()

## 4. Monthly Revenue Forecast - Big Picture üìà

Let's see the total forecasted revenue by month across all product centers and customer groups.

**What you're looking at:**
- **Blue line** = Forecasted revenue for each month
- **Gray shaded area** = 80% confidence interval (the forecast could be within this range)
- **Green dots** = Actual revenue (when available)

In [None]:
# Aggregate forecast by month
monthly_forecast = forecast_df.groupby('FORECAST_MONTH').agg({
    'FORECAST': 'sum',
    'FORECAST_LOW': 'sum',
    'FORECAST_HIGH': 'sum',
    'ACTUAL': 'sum'
}).reset_index()

# Create interactive chart
fig = go.Figure()

# Add confidence interval (shaded area)
fig.add_trace(go.Scatter(
    x=monthly_forecast['FORECAST_MONTH'],
    y=monthly_forecast['FORECAST_HIGH'],
    mode='lines',
    line=dict(width=0),
    showlegend=False,
    hoverinfo='skip'
))
fig.add_trace(go.Scatter(
    x=monthly_forecast['FORECAST_MONTH'],
    y=monthly_forecast['FORECAST_LOW'],
    mode='lines',
    line=dict(width=0),
    fillcolor='rgba(68, 68, 68, 0.2)',
    fill='tonexty',
    name='80% Confidence Interval'
))

# Add forecast line
fig.add_trace(go.Scatter(
    x=monthly_forecast['FORECAST_MONTH'],
    y=monthly_forecast['FORECAST'],
    mode='lines+markers',
    name='Forecast',
    line=dict(color='#1f77b4', width=3)
))

# Add actuals (where available)
actuals_mask = monthly_forecast['ACTUAL'].notna() & (monthly_forecast['ACTUAL'] > 0)
if actuals_mask.any():
    fig.add_trace(go.Scatter(
        x=monthly_forecast.loc[actuals_mask, 'FORECAST_MONTH'],
        y=monthly_forecast.loc[actuals_mask, 'ACTUAL'],
        mode='markers',
        name='Actual',
        marker=dict(size=10, color='green')
    ))

fig.update_layout(
    title="Monthly Revenue Forecast - All Product Centers",
    xaxis_title="Month",
    yaxis_title="Revenue ($)",
    hovermode='x unified',
    height=500
)

fig.show()

# Summary stats
total_forecast = monthly_forecast['FORECAST'].sum()
print(f"\nüí∞ Total Forecasted Revenue: ${total_forecast:,.2f}")

## 5. Top Product Centers by Revenue üèÜ

Which product centers (PCs) generate the most revenue?

**Why this matters:**
- Helps identify where to focus attention
- Shows which locations drive the business
- Useful for resource allocation decisions

In [None]:
# Aggregate forecast by PC
pc_forecast = forecast_df.groupby(['PC', 'PC_NAME']).agg({
    'FORECAST': 'sum'
}).reset_index().sort_values('FORECAST', ascending=False)

# Get top 15 PCs
top_pcs = pc_forecast.head(15)

# Create bar chart
fig = px.bar(
    top_pcs,
    x='FORECAST',
    y='PC_NAME',
    orientation='h',
    title='Top 15 Product Centers by Forecasted Revenue',
    labels={'FORECAST': 'Forecasted Revenue ($)', 'PC_NAME': 'Product Center'},
    color='FORECAST',
    color_continuous_scale='Blues'
)

fig.update_layout(
    height=600,
    showlegend=False,
    yaxis={'categoryorder': 'total ascending'}
)

fig.show()

print(f"\nüìä Top 3 Product Centers:")
for idx, row in top_pcs.head(3).iterrows():
    print(f"   {row['PC']} - {row['PC_NAME']}: ${row['FORECAST']:,.2f}")

## 6. Revenue by Customer Group üè¢

How is revenue distributed across different types of customers?

**Customer Groups Explained:**
- Different customer segments (e.g., TRANS, other categories)
- Helps understand customer mix
- Useful for targeting and marketing strategies

In [None]:
# Aggregate by customer group
customer_forecast = forecast_df.groupby('CUSTOMER_GROUP').agg({
    'FORECAST': 'sum'
}).reset_index().sort_values('FORECAST', ascending=False)

# Create pie chart
fig = px.pie(
    customer_forecast,
    values='FORECAST',
    names='CUSTOMER_GROUP',
    title='Revenue Distribution by Customer Group',
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(height=500)

fig.show()

print(f"\nüìä Revenue by Customer Group:")
for idx, row in customer_forecast.iterrows():
    pct = (row['FORECAST'] / customer_forecast['FORECAST'].sum()) * 100
    print(f"   {row['CUSTOMER_GROUP']}: ${row['FORECAST']:,.2f} ({pct:.1f}%)")

## 7. Revenue Trends by Reason Code üìù

What are the main reasons for revenue (routine vs non-routine work)?

**Reason Code Groups:**
- **Routine** - Regular, predictable work
- **Non-Routine** - Special projects, emergency work
- Helps with workforce and resource planning

In [None]:
# Aggregate by month and reason code group
reason_monthly = forecast_df.groupby(['FORECAST_MONTH', 'REASON_CODE_GROUP']).agg({
    'FORECAST': 'sum'
}).reset_index()

# Create stacked area chart
fig = px.area(
    reason_monthly,
    x='FORECAST_MONTH',
    y='FORECAST',
    color='REASON_CODE_GROUP',
    title='Revenue Forecast by Reason Code Group Over Time',
    labels={'FORECAST': 'Revenue ($)', 'FORECAST_MONTH': 'Month'}
)

fig.update_layout(height=500, hovermode='x unified')
fig.show()

# Summary by reason code
reason_summary = forecast_df.groupby('REASON_CODE_GROUP').agg({
    'FORECAST': 'sum'
}).reset_index().sort_values('FORECAST', ascending=False)

print(f"\nüìä Total Revenue by Reason Code:")
for idx, row in reason_summary.iterrows():
    pct = (row['FORECAST'] / reason_summary['FORECAST'].sum()) * 100
    print(f"   {row['REASON_CODE_GROUP']}: ${row['FORECAST']:,.2f} ({pct:.1f}%)")

## 8. Deep Dive: Specific Product Center

Let's zoom into a specific product center to see monthly trends.

**How to use this:**
1. Look at the Top PCs chart above
2. Copy a PC code (e.g., "555")
3. Paste it in the code below to analyze that specific location

In [None]:
# Pick a specific PC to analyze (change this to any PC code)
selected_pc = top_pcs.iloc[0]['PC']  # Uses top PC by default
print(f"üìç Analyzing PC: {selected_pc}")

# Filter data for this PC
pc_data = forecast_df[forecast_df['PC'] == selected_pc].copy()
pc_monthly = pc_data.groupby('FORECAST_MONTH').agg({
    'FORECAST': 'sum',
    'FORECAST_LOW': 'sum',
    'FORECAST_HIGH': 'sum'
}).reset_index()

# Create chart
fig = go.Figure()

# Confidence interval
fig.add_trace(go.Scatter(
    x=pc_monthly['FORECAST_MONTH'],
    y=pc_monthly['FORECAST_HIGH'],
    mode='lines',
    line=dict(width=0),
    showlegend=False,
    hoverinfo='skip'
))
fig.add_trace(go.Scatter(
    x=pc_monthly['FORECAST_MONTH'],
    y=pc_monthly['FORECAST_LOW'],
    mode='lines',
    line=dict(width=0),
    fillcolor='rgba(255, 165, 0, 0.2)',
    fill='tonexty',
    name='80% Confidence'
))

# Forecast line
fig.add_trace(go.Scatter(
    x=pc_monthly['FORECAST_MONTH'],
    y=pc_monthly['FORECAST'],
    mode='lines+markers',
    name='Forecast',
    line=dict(color='orange', width=3),
    marker=dict(size=8)
))

pc_name = pc_data.iloc[0]['PC_NAME']
fig.update_layout(
    title=f"Monthly Forecast for PC {selected_pc} - {pc_name}",
    xaxis_title="Month",
    yaxis_title="Revenue ($)",
    hovermode='x unified',
    height=500
)

fig.show()

total_pc_forecast = pc_monthly['FORECAST'].sum()
print(f"\nüí∞ Total Forecast for PC {selected_pc}: ${total_pc_forecast:,.2f}")

## 9. Export Data for Presentations üíæ

Save your analysis results to files that you can share with others.

**What gets saved:**
- Monthly forecast summary (CSV file)
- Top PCs (CSV file)
- Customer group breakdown (CSV file)

Files are saved to the current directory and ready to attach to emails or presentations!

In [None]:
from datetime import datetime
import os

# Create timestamp for filenames
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

# Export monthly forecast summary
monthly_file = f"monthly_forecast_{timestamp}.csv"
monthly_forecast.to_csv(monthly_file, index=False)
print(f"‚úÖ Saved: {monthly_file}")

# Export top PCs
top_pcs_file = f"top_product_centers_{timestamp}.csv"
top_pcs.to_csv(top_pcs_file, index=False)
print(f"‚úÖ Saved: {top_pcs_file}")

# Export customer group breakdown
customer_file = f"customer_groups_{timestamp}.csv"
customer_forecast.to_csv(customer_file, index=False)
print(f"‚úÖ Saved: {customer_file}")

print(f"\nüìÅ All files saved to: {os.getcwd()}")
print(f"\nüí° Tip: You can now attach these CSV files to emails or import them into Excel!")