## Deployment to Production

What could go wrong?
1. Data integrity issues: upstream system errors, transformation errors, schema changes e.g., Weather API changes units from metric to imperial/ Solar panel readings stop arriving after maintenance
2. Operational failures: code bugs, missing dependencies, incorrect logic  e.g Forecast calculation failed with division by zero error/ Pipeline crashed due to an incompatibe panndass version update
3. Infrastructure issues: network failures, insufficient compute resources, system limitations e.g Pipeline timeout after cloud provider network outage lasted 3 hours/ GPU memory allocation failed during model training due to insufficient resources
4. Model drift: performance decline, new patterns in data, missing features e.g Model accuracy dropped after 2 months in production due to seasonal changes/ New weather feature (e.g sudden temperature drop) not captured in training data leading to poor forecasts



### observability- 
collecting evidence/data eg Application writes detailed execution logs with timestamps to a centralized lo file/ APi tracks and records response times for every forecast request

### monitoring- 
triggering alerts, analysis and alerts eg Prometheus alerts when forecast model accuracy drops bbelow 85%/ Slack notification fires when the daily data ingestion job fails 

## Pipeline Monitoring Strategy
1. Mapping components 
2. Analyzing risks a. Well defined risks (-ve electricity demand) b. ambiquous risks (sudden demand spike due to unexpected heatwave)
3. Developping a mitigation plan a. logs to capture errors b. KPIs c. success failure

* Data integreity monitoring- transaction amount returns null values, Date formats vary between different API endpoints
* Availability monitoring- API requests timeout after 60 seconds during peak hours, The API experiences 15 minute service outages weekly
* Restatements monitoring- Last weeks transaction totals were revised 3 days later

### Model drrift
* Sudden drift- Unexpected, rapid shift, caused by external events introducing structural breaks
* Gradual drift- Data distribution slowly changes, new behaviours replacing old patterns
* Recurring drift- Seasonal patterns, cyclical trends

Other causes 1. Data integrity(input data issues) 2. Feature Engineering 




In [None]:
# identify drift
threshold = 3
p = go.Figure()

p.add_trace(go.Scatter(x=fc_lo['forecast_start'],
                       y =100*fc_log['mape'],
                       mode='lines',
                       name='MAPE (%)',
                       line = dict(color='royalblue', width=2)))

p.update_layout(title='Forecast Error Over Time',
                xaxis_title='Model Error Rate Since Deployment',
                yaxis_title='MAPE (%)')

p.add_shape(type='line',
            x0=fc_log['forecast_start'].min(),
            x1=fc_log['forecast_start'].max(),
            y0=threshold,
            y1=threshold,
            line=dict(color='Red', width=2, dash='dash'))

p.add_trace(go.Scatter(x=fc_log['forecast_start'],
                       y=100*fc_log['mape_ma_7'],
                       mode='lines', name='7 Days MA',
                       line = dict(color='green', width=2)))

p.add_trace(go.Scatter(x=fc_log['forecast_start'],
                       y=100*fc_log['mape_ma_14'],
                       mode='lines', name='14 Days MA',
                       line = dict(color='green', width=2)))



In [None]:
# Set threshold: mean + 3 standard deviations
rmse_threshold = fc_log_test["rmse"].mean() + 3 * fc_log_test["rmse"].std()

# Create rolling window averages for RMSE
fc_log["rmse_ma_7"] = fc_log["rmse"].rolling(window=7).mean()
fc_log["rmse_ma_14"] = fc_log["rmse"].rolling(window=14).mean()

print(f"RMSE threshold: {round(rmse_threshold, 2)}")
print()
print("Forecast log with rolling averages:")
print(fc_log[["forecast_start", "rmse", "rmse_ma_7", "rmse_ma_14"]].head(20))

p = go.Figure()

# Add RMSE line
p.add_trace(go.Scatter(x=fc_log["forecast_start"], y=fc_log["rmse"],
                        mode='lines',
                        name='RMSE',
                        line=dict(color='royalblue', width=2)))

# Add the RMSE rolling windows for 7 and 14 days
p.add_trace(go.Scatter(x=fc_log["forecast_start"], y=fc_log["rmse_ma_7"],
                        mode='lines',
                        name='7 Days MA',
                        line=dict(color='green', width=2)))

p.add_trace(go.Scatter(x=fc_log["forecast_start"], y=fc_log["rmse_ma_14"],
                        mode='lines',
                        name='14 Days MA',
                        line=dict(color='orange', width=2)))

p.add_trace(go.Scatter(x=[fc_log["forecast_start"].min(), fc_log["forecast_start"].max()], 
y=[rmse_threshold, rmse_threshold], 
name="Threshold",
line=dict(color="red", width=2, dash="dash")))

# Add plot titles and show the plot
p.update_layout(title="Forecast Error Rate Over Time",
                xaxis_title="Forecast Date",
                yaxis_title="RMSE", 
                height=400,
                title_x=0.5,
                margin=dict(t=50, b=50, l=50, r=50))
p.show()


## Best Practices for Production Deployment
1. Reproducibility- version control, environment management, data versioning, containers 
2. Deploymment as code- automate with scripts
3. Staging vs production- use a staging environment to test changes before production
4. Protoype- prototype the pipeline o