# **AI TECH INSTITUTE** ¬∑ *Intermediate AI & Data Science*
### Time Series Forecasting - Solar Energy Prediction
**Instructor:** Amir Charkhi | **Dataset:** Australian Distributed Solar PV Generation

---

## üìö What You'll Learn

- Time series components (trend, seasonality, residuals)
- Stationarity and why it matters
- Multiple forecasting approaches
- Evaluation metrics for forecasts
- Real-world energy forecasting

---

## üéØ The Problem: Predicting Solar Generation

**Business Context:**
- Energy grid operators need to predict solar generation
- Plan backup power sources
- Balance supply and demand
- Optimize energy storage

**Your Task:** Forecast solar generation for the next day using historical data

**Dataset:** Australian distributed solar PV generation (30-minute intervals)

---

## 1. Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Plotly for interactive visualizations
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

print("‚úÖ Libraries loaded!")

---
## 2. Load and Explore Time Series Data

### üìä What Makes Time Series Special?

**Regular Data:**
```
Customer | Age | Purchase
   A     | 25  |   Yes
   B     | 30  |   No
```
‚Üí **Order doesn't matter**

**Time Series:**
```
Time    | Solar Generation
8:00 AM |    800 MW
8:30 AM |    890 MW
9:00 AM |   1346 MW
```
‚Üí **Order MATTERS! Time dependency is key**

In [None]:
# Load data
df = pd.read_csv('distributed-pv-2025.csv')

# Parse datetime
df['datetime'] = pd.to_datetime(df['Trading Interval'])
df['generation_mw'] = df['Estimated DPV Generation (MW)']

# Keep only essential columns
df = df[['datetime', 'generation_mw']].sort_values('datetime')

print(f"üìä Dataset: {len(df):,} observations")
print(f"üìÖ Period: {df['datetime'].min()} to {df['datetime'].max()}")
print(f"‚è∞ Frequency: 30-minute intervals")
print(f"\n{df.head()}")

### üìà Visualize the Time Series

In [None]:
# Interactive time series plot
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df['datetime'],
    y=df['generation_mw'],
    mode='lines',
    name='Solar Generation',
    line=dict(color='orange', width=1)
))

fig.update_layout(
    title='Solar PV Generation Over Time',
    xaxis_title='Date',
    yaxis_title='Generation (MW)',
    template='plotly_white',
    height=500,
    hovermode='x unified'
)

fig.show()

print("üí° Zoom in to see daily patterns! Click and drag to zoom.")

### üîç Zoom Into One Week

In [None]:
# Show one week for detail
week_data = df[df['datetime'].between('2025-01-01', '2025-01-07')]

fig = px.line(week_data, x='datetime', y='generation_mw',
              title='One Week of Solar Generation (Clear Daily Pattern)',
              labels={'generation_mw': 'Generation (MW)', 'datetime': 'Date/Time'})

fig.update_traces(line_color='orange')
fig.update_layout(template='plotly_white', height=400)
fig.show()

print("\nüìä Key Observations:")
print("   - Zero generation at night")
print("   - Peak around noon (12-1 PM)")
print("   - Clear daily cycle")
print("   - This is SEASONALITY!")

---
## 3. Time Series Components

### üß© Every Time Series Has:

**Decomposition Formula:**
```
Time Series = Trend + Seasonality + Residual
```

**1. Trend:** Long-term increase/decrease
```
Example: Solar capacity growing over months
```

**2. Seasonality:** Regular, repeating patterns
```
Example: Daily cycle (high at noon, zero at night)
```

**3. Residual:** Random noise
```
Example: Weather variations, clouds
```

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Resample to hourly for cleaner decomposition
hourly = df.set_index('datetime').resample('1H')['generation_mw'].mean()

# Decompose (takes a moment...)
print("üîÑ Decomposing time series...")
decomposition = seasonal_decompose(hourly.dropna(), model='additive', period=24)  # 24-hour cycle

print("‚úÖ Decomposition complete!")

In [None]:
# Visualize components
fig = make_subplots(
    rows=4, cols=1,
    subplot_titles=['Original', 'Trend', 'Seasonality', 'Residual'],
    vertical_spacing=0.08
)

# Original
fig.add_trace(go.Scatter(x=decomposition.observed.index, y=decomposition.observed.values,
                         mode='lines', name='Original', line=dict(color='blue')),
              row=1, col=1)

# Trend
fig.add_trace(go.Scatter(x=decomposition.trend.index, y=decomposition.trend.values,
                         mode='lines', name='Trend', line=dict(color='red')),
              row=2, col=1)

# Seasonality
fig.add_trace(go.Scatter(x=decomposition.seasonal.index, y=decomposition.seasonal.values,
                         mode='lines', name='Seasonality', line=dict(color='green')),
              row=3, col=1)

# Residual
fig.add_trace(go.Scatter(x=decomposition.resid.index, y=decomposition.resid.values,
                         mode='lines', name='Residual', line=dict(color='gray')),
              row=4, col=1)

fig.update_layout(height=1000, showlegend=False, title_text="Time Series Decomposition")
fig.update_xaxes(title_text="Date", row=4, col=1)

fig.show()

print("\nüí° Interpretation:")
print("   Trend: Slight upward trend (more solar capacity?)")
print("   Seasonality: Strong 24-hour pattern")
print("   Residual: Random variations (weather, clouds)")

---
## 4. Stationarity - The Foundation

### ‚ùì What is Stationarity?

**Stationary Time Series:**
- Mean doesn't change over time
- Variance doesn't change over time
- No seasonality

```
Non-Stationary (BAD):        Stationary (GOOD):
     ‚Üó                              ~  ~
   ‚Üó                                 ~  ~
 ‚Üó                                  ~  ~
(trend!)                        (constant)
```

**Why care?** Most forecasting models assume stationarity!

### üß™ Test: Augmented Dickey-Fuller (ADF)

In [None]:
from statsmodels.tsa.stattools import adfuller

def check_stationarity(series, name='Series'):
    """Perform ADF test for stationarity"""
    result = adfuller(series.dropna())
    
    print(f"\nüìä Stationarity Test: {name}")
    print("="*50)
    print(f"ADF Statistic: {result[0]:.4f}")
    print(f"P-value: {result[1]:.4f}")
    print(f"Critical Values:")
    for key, value in result[4].items():
        print(f"   {key}: {value:.3f}")
    
    if result[1] < 0.05:
        print("\n‚úÖ Result: STATIONARY (p < 0.05)")
    else:
        print("\n‚ùå Result: NON-STATIONARY (p >= 0.05)")
        print("   ‚Üí Need to transform data!")
    
    return result[1] < 0.05

# Test original series
is_stationary = check_stationarity(hourly, 'Original Solar Generation')

### üîß Making Data Stationary: Differencing

**Differencing:** Calculate change between consecutive points

```
Original:    [100, 150, 200, 180]
Differenced: [ 50,  50, -20]
             (150-100, 200-150, 180-200)
```

In [None]:
# Apply differencing
hourly_diff = hourly.diff().dropna()

# Test again
is_stationary_diff = check_stationarity(hourly_diff, 'Differenced Series')

---
## 5. Simple Forecasting: Moving Average

### üìä Naive Approach: Use Recent Average

**Moving Average:**
```
Forecast = Average of last N values

Example (N=3):
Last 3 values: [100, 110, 105]
Forecast: (100 + 110 + 105) / 3 = 105
```

**Good:** Simple, fast  
**Bad:** No seasonality, no trend

In [None]:
# Create train/test split
# Use last 7 days for testing
train_size = len(hourly) - (7 * 24)
train = hourly[:train_size]
test = hourly[train_size:]

print(f"üìä Train: {len(train)} hours ({len(train)/24:.0f} days)")
print(f"üìä Test:  {len(test)} hours ({len(test)/24:.0f} days)")

In [None]:
# Simple moving average forecast
window = 24  # Use last 24 hours

def moving_average_forecast(series, window, steps):
    """Forecast next 'steps' values using moving average"""
    forecasts = []
    for i in range(steps):
        forecast = series[-window:].mean()
        forecasts.append(forecast)
        series = pd.concat([series, pd.Series([forecast])])
    return forecasts

ma_forecast = moving_average_forecast(train, window, len(test))

print(f"‚úÖ Moving average forecast generated ({len(ma_forecast)} hours)")

In [None]:
# Visualize forecast
fig = go.Figure()

# Last week of training data
fig.add_trace(go.Scatter(
    x=train.index[-168:],  # Last 7 days
    y=train.values[-168:],
    mode='lines',
    name='Training Data',
    line=dict(color='blue')
))

# Actual test data
fig.add_trace(go.Scatter(
    x=test.index,
    y=test.values,
    mode='lines',
    name='Actual',
    line=dict(color='green')
))

# Forecast
fig.add_trace(go.Scatter(
    x=test.index,
    y=ma_forecast,
    mode='lines',
    name='Moving Average Forecast',
    line=dict(color='red', dash='dash')
))

fig.update_layout(
    title='Moving Average Forecast vs Actual',
    xaxis_title='Date',
    yaxis_title='Generation (MW)',
    template='plotly_white',
    height=500
)

fig.show()

print("\nüí° Problem: Flat line! Doesn't capture daily pattern.")

---
## 6. Better Approach: Seasonal Naive

### üîÑ Use Same Time Yesterday

**Seasonal Naive:**
```
Forecast for 12 PM today = Actual at 12 PM yesterday

Why? Solar generation at noon is similar day-to-day!
```

**Perfect for data with strong daily patterns**

In [None]:
# Seasonal naive forecast
seasonal_period = 24  # 24-hour cycle
seasonal_naive_forecast = train.values[-seasonal_period:].tolist() * (len(test) // seasonal_period + 1)
seasonal_naive_forecast = seasonal_naive_forecast[:len(test)]

In [None]:
# Visualize
fig = go.Figure()

fig.add_trace(go.Scatter(x=train.index[-168:], y=train.values[-168:],
                         mode='lines', name='Training', line=dict(color='blue')))
fig.add_trace(go.Scatter(x=test.index, y=test.values,
                         mode='lines', name='Actual', line=dict(color='green')))
fig.add_trace(go.Scatter(x=test.index, y=seasonal_naive_forecast,
                         mode='lines', name='Seasonal Naive', line=dict(color='orange', dash='dash')))

fig.update_layout(
    title='Seasonal Naive Forecast (Much Better!)',
    xaxis_title='Date',
    yaxis_title='Generation (MW)',
    template='plotly_white',
    height=500
)

fig.show()

print("\n‚úÖ Much better! Captures daily pattern.")

---
## 7. Evaluation Metrics

### üìè How Good is Our Forecast?

**Common Metrics:**

1. **MAE (Mean Absolute Error):** Average error in MW
   ```
   MAE = mean(|actual - forecast|)
   Lower is better
   Same units as data (MW)
   ```

2. **RMSE (Root Mean Squared Error):** Penalizes large errors
   ```
   RMSE = sqrt(mean((actual - forecast)¬≤))
   Lower is better
   Sensitive to outliers
   ```

3. **MAPE (Mean Absolute Percentage Error):** Error as %
   ```
   MAPE = mean(|actual - forecast| / actual) √ó 100%
   Scale-independent
   Easy to interpret
   ```

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

def evaluate_forecast(actual, forecast, model_name):
    """Calculate forecast metrics"""
    mae = mean_absolute_error(actual, forecast)
    rmse = np.sqrt(mean_squared_error(actual, forecast))
    mape = np.mean(np.abs((actual - forecast) / actual)) * 100
    
    print(f"\nüìä {model_name} Performance:")
    print("="*50)
    print(f"MAE:  {mae:.2f} MW")
    print(f"RMSE: {rmse:.2f} MW")
    print(f"MAPE: {mape:.2f}%")
    
    return {'Model': model_name, 'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# Evaluate both models
results = []
results.append(evaluate_forecast(test.values, ma_forecast, 'Moving Average'))
results.append(evaluate_forecast(test.values, seasonal_naive_forecast, 'Seasonal Naive'))

---
## 8. Advanced: Prophet (Facebook's Time Series Tool)

### üîÆ Prophet Benefits:

- Handles seasonality automatically
- Handles missing data
- Robust to outliers
- Fast and easy to use
- Built by Facebook for production forecasting

**Perfect for business forecasting!**

In [None]:
# Install Prophet if needed (uncomment)
# !pip install prophet

try:
    from prophet import Prophet
    prophet_available = True
    print("‚úÖ Prophet available!")
except ImportError:
    prophet_available = False
    print("‚ö†Ô∏è Prophet not installed. Run: pip install prophet")
    print("   Skipping Prophet section...")

In [None]:
if prophet_available:
    # Prepare data for Prophet (needs 'ds' and 'y' columns)
    prophet_train = pd.DataFrame({
        'ds': train.index,
        'y': train.values
    })
    
    # Create and fit model
    print("üîÑ Training Prophet model...")
    model = Prophet(
        daily_seasonality=True,
        weekly_seasonality=False,
        yearly_seasonality=False
    )
    model.fit(prophet_train)
    
    # Make forecast
    future = pd.DataFrame({'ds': test.index})
    prophet_forecast = model.predict(future)
    
    print("‚úÖ Prophet forecast complete!")
else:
    print("‚è≠Ô∏è Skipping Prophet (not installed)")

In [None]:
if prophet_available:
    # Visualize Prophet forecast
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(x=train.index[-168:], y=train.values[-168:],
                             mode='lines', name='Training', line=dict(color='blue')))
    fig.add_trace(go.Scatter(x=test.index, y=test.values,
                             mode='lines', name='Actual', line=dict(color='green')))
    fig.add_trace(go.Scatter(x=prophet_forecast['ds'], y=prophet_forecast['yhat'],
                             mode='lines', name='Prophet Forecast', line=dict(color='purple', dash='dash')))
    
    # Confidence interval
    fig.add_trace(go.Scatter(
        x=prophet_forecast['ds'].tolist() + prophet_forecast['ds'].tolist()[::-1],
        y=prophet_forecast['yhat_upper'].tolist() + prophet_forecast['yhat_lower'].tolist()[::-1],
        fill='toself',
        fillcolor='rgba(128, 0, 128, 0.2)',
        line=dict(color='rgba(255,255,255,0)'),
        name='Confidence Interval',
        showlegend=True
    ))
    
    fig.update_layout(
        title='Prophet Forecast with Confidence Interval',
        xaxis_title='Date',
        yaxis_title='Generation (MW)',
        template='plotly_white',
        height=500
    )
    
    fig.show()
    
    # Evaluate Prophet
    results.append(evaluate_forecast(test.values, prophet_forecast['yhat'].values, 'Prophet'))

---
## 9. Compare All Models

In [None]:
# Create comparison DataFrame
comparison = pd.DataFrame(results).sort_values('MAE')

print("\nüìä MODEL COMPARISON")
print("="*70)
print(comparison.to_string(index=False))
print("="*70)

print(f"\nüèÜ Best Model: {comparison.iloc[0]['Model']}")
print(f"   MAE: {comparison.iloc[0]['MAE']:.2f} MW")
print(f"   MAPE: {comparison.iloc[0]['MAPE']:.2f}%")

In [None]:
# Visualize comparison
fig = go.Figure()

fig.add_trace(go.Bar(
    x=comparison['Model'],
    y=comparison['MAE'],
    text=[f"{x:.1f}" for x in comparison['MAE']],
    textposition='auto',
    marker_color=['gold', 'silver', '#CD7F32'][:len(comparison)]
))

fig.update_layout(
    title='Model Comparison: Mean Absolute Error (Lower = Better)',
    xaxis_title='Model',
    yaxis_title='MAE (MW)',
    template='plotly_white',
    height=450
)

fig.show()

---
## 10. Key Takeaways

### ‚úÖ What We Learned:

**1. Time Series Components:**
- **Trend:** Long-term direction
- **Seasonality:** Repeating patterns (daily, weekly, yearly)
- **Residual:** Random noise

**2. Stationarity:**
- Most models require stationary data
- Test with ADF test
- Fix with differencing or transformations

**3. Forecasting Methods:**

| Method | Pros | Cons | Best For |
|--------|------|------|----------|
| **Moving Average** | Simple, fast | No seasonality | Baseline |
| **Seasonal Naive** | Captures patterns | No trend | Strong seasonality |
| **Prophet** | Automatic, robust | Needs tuning | Business forecasting |
| **ARIMA/SARIMA** | Statistical, accurate | Complex | Expert use |

**4. Evaluation Metrics:**
- **MAE:** Easy to interpret (same units)
- **RMSE:** Penalizes large errors
- **MAPE:** Scale-independent (%)

**5. Solar Generation Insights:**
- Strong daily seasonality (24-hour cycle)
- Zero at night, peak at noon
- Weather creates residual variability
- Seasonal naive works well for short-term

---

### üéØ Practical Recommendations:

**For Solar Energy Forecasting:**
1. **Short-term (1-2 days):** Seasonal Naive or Prophet
2. **Medium-term (1 week):** Prophet with weather data
3. **Long-term (months):** Prophet with trend analysis

**General Time Series:**
1. **Always visualize first** - understand your data
2. **Check for seasonality** - determines method choice
3. **Start simple** - Moving Average baseline
4. **Add complexity gradually** - Seasonal ‚Üí Prophet ‚Üí ARIMA
5. **Cross-validate** - Don't trust single test period

---

### üí° Real-World Applications:

**Energy:**
- Solar/wind generation forecasting
- Grid load balancing
- Energy storage optimization

**Business:**
- Sales forecasting
- Inventory management
- Demand prediction

**Finance:**
- Stock price prediction
- Risk management
- Portfolio optimization

---

### üöÄ Next Steps:

**To improve forecasts:**
1. Add external features (weather, holidays)
2. Try ARIMA/SARIMA for statistical approach
3. Use ML models (LSTM, XGBoost) for complex patterns
4. Ensemble multiple models
5. Real-time updating as new data arrives

---

**Excellent work! You can now forecast time series data!** üåü

**AI Tech Institute** | *Building Tomorrow's AI Engineers Today*