# **Chapter 76: Weather and Climate Prediction**

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the fundamental differences between weather forecasting (short‑term) and climate prediction (long‑term).
- Identify common data sources for meteorological time series (station observations, satellite, reanalysis).
- Engineer spatial and temporal features relevant to weather prediction (e.g., pressure gradients, humidity, wind vectors).
- Apply both traditional (numerical weather prediction) and machine learning models to weather forecasting.
- Implement probabilistic forecasts and quantify uncertainty.
- Evaluate weather forecasts using domain‑specific metrics (RMSE, anomaly correlation, CRPS).
- Build a complete weather prediction pipeline that integrates with the system architecture developed in previous chapters.

---

## **76.1 Introduction to Weather and Climate Prediction**

Weather and climate prediction are among the oldest and most challenging time‑series forecasting problems. Unlike financial or retail data, atmospheric processes are governed by complex physics, exhibit chaotic behaviour, and involve massive spatiotemporal datasets.

**Weather prediction** focuses on short‑term forecasts (hours to days) with high spatial resolution. **Climate prediction** deals with longer timescales (seasons to decades) and often involves ensemble methods to account for uncertainty.

Key characteristics:

- **High dimensionality**: Weather data includes multiple variables (temperature, pressure, humidity, wind speed/direction) at thousands of spatial grid points.
- **Spatiotemporal dependencies**: Conditions at one location affect nearby locations, with time lags due to atmospheric motion.
- **Non‑stationarity**: Climate change introduces trends and changing variability.
- **Chaotic dynamics**: Small initial errors grow rapidly (the butterfly effect), making deterministic forecasts beyond a few days inherently uncertain.

Modern operational weather forecasting relies on **Numerical Weather Prediction (NWP)** — physics‑based models that solve partial differential equations. However, machine learning is increasingly used for **post‑processing** (correcting NWP output) and even for fully data‑driven forecasting (e.g., GraphCast, Pangu‑Weather).

In this chapter, we will build a simplified weather prediction system using machine learning, treating it as a time‑series problem with multiple stations or grid points. We'll use the NEPSE pipeline as a template but adapt it to the unique requirements of weather data.

---

## **76.2 Weather Data Sources and Formats**

Weather data comes in various forms:

- **Station data**: Point measurements from meteorological stations (temperature, precipitation, wind). Often provided as CSV or text files with metadata (latitude, longitude, elevation).
- **Gridded datasets**: Interpolated fields covering a regular grid (e.g., ERA5 reanalysis from ECMWF). Usually stored in NetCDF or GRIB format.
- **Satellite and radar**: High‑resolution imagery (e.g., cloud cover, precipitation intensity).
- **Reanalysis**: Combines historical observations with a consistent NWP model to produce a long‑term, gap‑filled dataset (e.g., ERA5, NCEP/NCAR Reanalysis).

For our example, we will work with a simplified station‑based dataset. We'll generate synthetic data that mimics daily temperature and precipitation at multiple stations, with realistic spatial correlation and seasonality.

```python
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import matplotlib.pyplot as plt

def generate_weather_data(num_stations=10, days=365*3, start_date='2020-01-01'):
    """
    Generate synthetic weather station data.
    
    Parameters
    ----------
    num_stations : int
        Number of weather stations.
    days : int
        Number of days of data.
    start_date : str
        Start date.
    
    Returns
    -------
    df : pd.DataFrame
        Columns: station_id, date, lat, lon, elevation, temperature, precipitation, pressure, wind_speed
    """
    np.random.seed(42)
    dates = pd.date_range(start=start_date, periods=days, freq='D')
    
    # Station locations (random lat/lon within a region, e.g., Nepal)
    stations = []
    for i in range(1, num_stations+1):
        lat = np.random.uniform(26, 30)   # Nepal latitude range
        lon = np.random.uniform(80, 88)   # Nepal longitude range
        elev = np.random.uniform(500, 4000)  # elevation in meters
        stations.append({'station_id': i, 'lat': lat, 'lon': lon, 'elevation': elev})
    
    station_df = pd.DataFrame(stations)
    
    # Generate daily data for each station
    records = []
    for _, station in station_df.iterrows():
        sid = station['station_id']
        elev = station['elevation']
        
        # Base temperature depends on elevation (lapse rate) and latitude
        base_temp = 25 - 0.006 * elev + np.random.normal(0, 2)
        
        # Seasonal cycle (amplitude depends on location)
        seasonal_amp = 10 + np.random.normal(0, 2)
        
        # Generate time series
        for t, date in enumerate(dates):
            # Day of year (0-365)
            doy = date.dayofyear
            
            # Temperature with seasonal cycle and noise
            seasonal = seasonal_amp * np.sin(2 * np.pi * (doy - 80) / 365)  # peak in July
            temp = base_temp + seasonal + np.random.normal(0, 2)
            
            # Precipitation: only on some days, with higher probability in monsoon
            monsoon = (doy > 150) & (doy < 270)  # approximate monsoon months
            precip_prob = 0.3 + 0.4 * monsoon
            precip = np.random.exponential(5) if np.random.random() < precip_prob else 0
            
            # Pressure (simplified, decreases with elevation)
            pressure = 1013.25 * np.exp(-elev / 8000) + np.random.normal(0, 5)
            
            # Wind speed
            wind_speed = np.random.gamma(2, 2) + 2 * monsoon  # stronger winds in monsoon
            
            records.append({
                'station_id': sid,
                'date': date,
                'lat': station['lat'],
                'lon': station['lon'],
                'elevation': elev,
                'temperature': temp,
                'precipitation': precip,
                'pressure': pressure,
                'wind_speed': wind_speed
            })
    
    df = pd.DataFrame(records)
    return df

# Generate sample data
weather_df = generate_weather_data(num_stations=5, days=365*2)
print(weather_df.head())
```

**Explanation:**

- This function generates synthetic weather data for multiple stations. Each station has a fixed latitude, longitude, and elevation.
- Temperature includes a seasonal cycle (sine wave) and elevation‑based lapse rate, plus random noise.
- Precipitation is generated with a higher probability during a simplified monsoon season (days 150–270).
- Pressure and wind speed are simplified but capture some realistic variation.
- The resulting DataFrame is in a tidy format: one row per station per day.

---

## **76.3 Feature Engineering for Weather Prediction**

Weather features must capture both temporal dynamics and spatial relationships. We will engineer:

- **Temporal features**: day of year (cyclically encoded), month, season flags.
- **Lagged features**: past values of temperature, pressure, etc., at the same station.
- **Rolling statistics**: moving averages and standard deviations over various windows.
- **Spatial features**: for each station, include data from neighbouring stations (with appropriate time lags to account for advection).
- **Derived meteorological quantities**: dew point, relative humidity, pressure tendency, etc.
- **External indices**: teleconnection indices (ENSO, NAO) if available.

For simplicity, we'll focus on single‑station forecasting with added spatial context.

```python
class WeatherFeatureEngineer:
    """
    Feature engineering for weather station data.
    """
    
    def __init__(self):
        self.feature_columns = []
    
    def add_temporal_features(self, df):
        """Add cyclical time features."""
        df = df.copy()
        df['doy'] = df['date'].dt.dayofyear
        df['month'] = df['date'].dt.month
        df['year'] = df['date'].dt.year
        # Cyclical encoding of day of year
        df['doy_sin'] = np.sin(2 * np.pi * df['doy'] / 365.25)
        df['doy_cos'] = np.cos(2 * np.pi * df['doy'] / 365.25)
        return df
    
    def add_lag_features(self, df, variables, lags):
        """Add lagged values for given variables per station."""
        df = df.copy()
        for var in variables:
            for lag in lags:
                df[f'{var}_lag_{lag}'] = df.groupby('station_id')[var].shift(lag)
        return df
    
    def add_rolling_features(self, df, variables, windows, stats=['mean', 'std', 'min', 'max']):
        """Add rolling statistics per station."""
        df = df.copy()
        for var in variables:
            for window in windows:
                for stat in stats:
                    if stat == 'mean':
                        df[f'{var}_rolling_mean_{window}'] = df.groupby('station_id')[var].transform(
                            lambda x: x.rolling(window, min_periods=1).mean()
                        )
                    elif stat == 'std':
                        df[f'{var}_rolling_std_{window}'] = df.groupby('station_id')[var].transform(
                            lambda x: x.rolling(window, min_periods=1).std()
                        )
                    elif stat == 'min':
                        df[f'{var}_rolling_min_{window}'] = df.groupby('station_id')[var].transform(
                            lambda x: x.rolling(window, min_periods=1).min()
                        )
                    elif stat == 'max':
                        df[f'{var}_rolling_max_{window}'] = df.groupby('station_id')[var].transform(
                            lambda x: x.rolling(window, min_periods=1).max()
                        )
        return df
    
    def add_spatial_features(self, df, station_coords, radius_km=100):
        """
        Add features from neighbouring stations.
        For each station and date, compute average of variables from stations within radius.
        This is a simplified example; real systems might use advection or interpolation.
        """
        df = df.copy()
        # Assume df has columns lat, lon per station (static)
        # We'll compute pairwise distances once
        from scipy.spatial.distance import cdist
        
        stations = df[['station_id', 'lat', 'lon']].drop_duplicates()
        coords = stations[['lat', 'lon']].values
        dist_matrix = cdist(coords, coords, metric='euclidean')
        # Convert degrees to km (approx 111 km per degree)
        dist_km = dist_matrix * 111
        
        # For each station, find neighbours within radius
        neighbour_dict = {}
        for i, sid in enumerate(stations['station_id']):
            neighbours = stations['station_id'][(dist_km[i] < radius_km) & (dist_km[i] > 0)].tolist()
            neighbour_dict[sid] = neighbours
        
        # For each date, compute spatial averages
        # This is inefficient; in practice you'd use a spatial join or precomputed weights.
        # We'll just show the idea.
        df['temp_neighbour_mean'] = np.nan
        for date, grp in df.groupby('date'):
            for sid in grp['station_id'].unique():
                neighbours = neighbour_dict.get(sid, [])
                if neighbours:
                    # Get neighbour data for this date
                    neighbour_data = df[(df['date'] == date) & (df['station_id'].isin(neighbours))]
                    mean_temp = neighbour_data['temperature'].mean()
                    df.loc[(df['date'] == date) & (df['station_id'] == sid), 'temp_neighbour_mean'] = mean_temp
        return df
    
    def compute_derived_features(self, df):
        """Compute meteorological derived quantities."""
        df = df.copy()
        # Dew point (simplified: using temperature and pressure)
        # Actual formula more complex; we use approximation
        df['dew_point'] = df['temperature'] - (100 - df['pressure']/10) * 0.2
        # Pressure tendency (change from previous day)
        df['pressure_tendency'] = df.groupby('station_id')['pressure'].diff()
        # Temperature range (if we had min/max; here we use daily as placeholder)
        # We could also add relative humidity if we had specific humidity
        return df
    
    def compute_features(self, df, target_var='temperature', forecast_horizon=1):
        """
        Main entry point: compute all features and create target.
        """
        df = df.copy()
        # Ensure sorted
        df = df.sort_values(['station_id', 'date'])
        
        # Add features
        df = self.add_temporal_features(df)
        df = self.add_lag_features(df, 
                                    variables=['temperature', 'pressure', 'precipitation', 'wind_speed'],
                                    lags=[1, 2, 3, 7, 14])
        df = self.add_rolling_features(df,
                                       variables=['temperature', 'pressure', 'precipitation'],
                                       windows=[7, 14, 30],
                                       stats=['mean', 'std'])
        df = self.compute_derived_features(df)
        # Spatial features (commented out because it's slow in this example)
        # df = self.add_spatial_features(df, station_coords=None)
        
        # Create target: tomorrow's temperature (or other variable)
        df[f'target_{target_var}'] = df.groupby('station_id')[target_var].shift(-forecast_horizon)
        
        # Drop rows with NaN created by shifts
        df = df.dropna().reset_index(drop=True)
        
        # Store feature columns (exclude identifiers and target)
        exclude = ['station_id', 'date', 'lat', 'lon', 'elevation', f'target_{target_var}']
        self.feature_columns = [c for c in df.columns if c not in exclude]
        
        return df
```

**Explanation:**

- The feature engineer adds temporal, lag, rolling, and derived features. Spatial features are outlined but not fully implemented due to computational complexity; in a real system, you would precompute neighbour averages or use more sophisticated methods like kriging.
- Cyclical encoding of day of year (`doy_sin`, `doy_cos`) captures seasonality without discontinuities.
- Lag features use past observations up to 14 days.
- Rolling statistics capture recent trends and variability.
- Derived features like dew point and pressure tendency are common in meteorology.
- The target is shifted by the forecast horizon (default 1 day ahead).

---

## **76.4 Modeling Approaches**

Weather forecasting can be approached with:

- **Persistence and climatology**: simple baselines.
- **Statistical models**: ARIMA, VAR, etc.
- **Machine learning**: random forests, gradient boosting, neural networks.
- **Hybrid**: post‑processing of NWP output.

Here we'll use a gradient boosting model (LightGBM) as a global model across all stations, similar to retail. This allows the model to learn shared patterns (e.g., seasonality, elevation effects) while still having station‑specific characteristics via the station ID as a categorical feature.

We'll also demonstrate a simple **ensemble** approach by training multiple models with different random seeds and averaging predictions, which can improve robustness.

```python
import lightgbm as lgb
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_absolute_error, mean_squared_error

class WeatherForecaster:
    def __init__(self, feature_columns, categorical_features=None, n_estimators=100):
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features if categorical_features else []
        self.n_estimators = n_estimators
        self.models = []  # for ensemble
    
    def prepare_data(self, df, target_var='temperature'):
        X = df[self.feature_columns]
        y = df[f'target_{target_var}']
        for col in self.categorical_features:
            if col in X.columns:
                X[col] = X[col].astype('category')
        return X, y
    
    def train_single(self, X_train, y_train, X_val, y_val):
        """Train a single LightGBM model with early stopping."""
        model = lgb.LGBMRegressor(
            objective='regression',
            num_leaves=31,
            learning_rate=0.05,
            feature_fraction=0.8,
            bagging_fraction=0.8,
            bagging_freq=5,
            n_estimators=self.n_estimators,
            verbose=-1
        )
        model.fit(
            X_train, y_train,
            eval_set=[(X_val, y_val)],
            callbacks=[lgb.early_stopping(10)],
            categorical_feature=self.categorical_features
        )
        return model
    
    def train_ensemble(self, X, y, n_splits=5, n_models=5):
        """
        Train an ensemble of models using time‑series cross‑validation.
        Each model is trained on a different training portion.
        """
        tscv = TimeSeriesSplit(n_splits=n_splits)
        self.models = []
        
        for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
            if fold >= n_models:
                break
            X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
            y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
            model = self.train_single(X_train, y_train, X_val, y_val)
            self.models.append(model)
            print(f"Trained model {fold+1}/{min(n_models, n_splits)}")
    
    def predict(self, X):
        """Ensemble prediction: average of all models."""
        if not self.models:
            raise ValueError("No models trained.")
        preds = np.zeros((len(X), len(self.models)))
        for i, model in enumerate(self.models):
            preds[:, i] = model.predict(X)
        return preds.mean(axis=1)
```

**Explanation:**

- The forecaster can train an ensemble of LightGBM models on different time periods (using time‑series splits). This is a simple form of ensemble that captures uncertainty and often improves accuracy.
- Categorical features might include `station_id`, `month`, `season`, etc.
- In prediction, we average the outputs of all models.

---

## **76.5 Evaluation Metrics for Weather Forecasts**

Standard regression metrics (MAE, RMSE) are used, but meteorology also has specialised metrics:

- **Anomaly Correlation Coefficient (ACC)**: Measures how well the forecast captures anomalies from climatology.
- **Continuous Ranked Probability Score (CRPS)**: For probabilistic forecasts, measures the difference between predicted and observed cumulative distributions.
- **Brier Score**: For binary events (e.g., rain yes/no).
- **Equitable Threat Score (ETS)**: For categorical forecasts.

We'll implement a few of these for evaluation.

```python
from scipy.stats import pearsonr

def anomaly_correlation(forecast, observed, climatology):
    """
    Compute anomaly correlation coefficient.
    forecast, observed, climatology are 1D arrays.
    """
    f_anom = forecast - climatology
    o_anom = observed - climatology
    corr, _ = pearsonr(f_anom, o_anom)
    return corr

def crps(forecast_dist, observed):
    """
    Simplified CRPS for ensemble forecasts.
    forecast_dist: array of ensemble members (n_members, n_points)
    observed: array of observations (n_points)
    Returns mean CRPS.
    """
    n_members = forecast_dist.shape[0]
    # CRPS = (1/n_members) * sum(|x_i - y|) - (1/(2*n_members^2)) * sum_{i,j} |x_i - x_j|
    # This is a simplified version; in practice use proper libraries.
    abs_error = np.abs(forecast_dist - observed).mean(axis=0)
    # Pairwise differences within ensemble
    pairwise_diff = 0
    for i in range(n_members):
        for j in range(n_members):
            pairwise_diff += np.abs(forecast_dist[i] - forecast_dist[j])
    pairwise_diff /= (2 * n_members**2)
    crps = abs_error - pairwise_diff
    return crps.mean()
```

**Explanation:**

- ACC requires a climatology baseline (e.g., long‑term average for each day of year). It's widely used in operational weather forecasting.
- CRPS is a proper scoring rule for probabilistic forecasts; our implementation is a simplified version for ensemble forecasts.

---

## **76.6 Backtesting and Validation**

We'll use walk‑forward validation (expanding window) to simulate realistic forecast conditions.

```python
class WeatherBacktester:
    def __init__(self, forecaster, feature_columns, categorical_features,
                 target_var='temperature', initial_train_days=365, step_days=1):
        self.forecaster = forecaster
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features
        self.target_var = target_var
        self.initial_train_days = initial_train_days
        self.step_days = step_days
    
    def run(self, df):
        df = df.sort_values('date').reset_index(drop=True)
        unique_dates = df['date'].unique()
        results = []
        
        for i in range(self.initial_train_days, len(unique_dates), self.step_days):
            train_end_date = unique_dates[i-1]
            test_date = unique_dates[i]
            
            train_df = df[df['date'] <= train_end_date]
            test_df = df[df['date'] == test_date]
            
            if test_df.empty:
                continue
            
            # Prepare data
            X_train, y_train = self.forecaster.prepare_data(train_df, self.target_var)
            X_test, y_test = self.forecaster.prepare_data(test_df, self.target_var)
            
            # Train ensemble on expanding window
            # For speed, we might not retrain every step; here we do.
            forecaster = WeatherForecaster(self.feature_columns, self.categorical_features)
            forecaster.train_ensemble(X_train, y_train, n_splits=3, n_models=3)
            
            # Predict
            y_pred = forecaster.predict(X_test)
            
            # Store
            test_df = test_df.copy()
            test_df['predicted'] = y_pred
            test_df['error'] = y_pred - y_test.values
            test_df['abs_error'] = np.abs(test_df['error'])
            test_df['train_end_date'] = train_end_date
            results.append(test_df[['station_id', 'date', f'target_{self.target_var}', 'predicted', 'error', 'abs_error']])
        
        results_df = pd.concat(results, ignore_index=True)
        return results_df
```

---

## **76.7 Integration with the Prediction System**

We can now integrate the weather components into our existing pipeline. The flow is:

1. **Ingestion**: Load station data (CSV, NetCDF) using a custom `WeatherIngestion` class (similar to `RetailIngestion`).
2. **Feature engineering**: Use `WeatherFeatureEngineer`.
3. **Model training**: Use `WeatherForecaster` with time‑series CV.
4. **Backtesting**: Use `WeatherBacktester`.
5. **Deployment**: Batch prediction for all stations, possibly a REST API for a single station.
6. **Monitoring**: Track forecast errors, detect drift (e.g., if model performance degrades during extreme events).

We'll sketch a deployment script for daily batch prediction:

```python
class WeatherBatchPredictor:
    def __init__(self, model, feature_engineer, feature_columns, categorical_features):
        self.model = model
        self.feature_engineer = feature_engineer
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features
    
    def predict_next_day(self, df_today):
        """
        df_today: DataFrame with today's data for all stations (including lagged features).
        Returns predictions for tomorrow.
        """
        # Compute features up to today (including lags)
        df_features = self.feature_engineer.compute_features(df_today)
        # Keep only today's rows (the ones we want to predict from)
        # Actually, after compute_features, the last row for each station will have target tomorrow.
        # For prediction, we need the feature vectors for today (which include lags up to today)
        # The target is tomorrow, so we need to take the rows where date == today
        today_str = df_today['date'].max()
        df_today_features = df_features[df_features['date'] == today_str]
        if df_today_features.empty:
            raise ValueError("No data for today after feature engineering")
        X = df_today_features[self.feature_columns]
        for col in self.categorical_features:
            if col in X.columns:
                X[col] = X[col].astype('category')
        preds = self.model.predict(X)
        # Return DataFrame with station_id and prediction
        result = df_today_features[['station_id']].copy()
        result['predicted_tomorrow'] = preds
        return result
```

---

## **76.8 Case Study: Forecasting Temperature at Multiple Stations**

Let's run a complete example with our synthetic data.

```python
# Generate data
weather_df = generate_weather_data(num_stations=5, days=365*3)

# Feature engineering
engineer = WeatherFeatureEngineer()
featured_df = engineer.compute_features(weather_df, target_var='temperature', forecast_horizon=1)
print(featured_df.head())

# Define categorical features
cat_features = ['station_id', 'month']

# Prepare forecaster
forecaster = WeatherForecaster(engineer.feature_columns, categorical_features=cat_features, n_estimators=100)

# Split into train/val/test (time-based)
train_df = featured_df[featured_df['date'] < '2022-01-01']
val_df = featured_df[(featured_df['date'] >= '2022-01-01') & (featured_df['date'] < '2022-07-01')]
test_df = featured_df[featured_df['date'] >= '2022-07-01']

X_train, y_train = forecaster.prepare_data(train_df)
X_val, y_val = forecaster.prepare_data(val_df)
X_test, y_test = forecaster.prepare_data(test_df)

# Train ensemble
forecaster.train_ensemble(pd.concat([X_train, X_val]), pd.concat([y_train, y_val]), n_splits=3, n_models=3)

# Predict on test
y_pred = forecaster.predict(X_test)

# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Test MAE: {mae:.2f}, RMSE: {rmse:.2f}")

# Optionally, compute anomaly correlation using climatology
# For simplicity, we'll use day-of-year climatology from training
clim_df = train_df.groupby('doy')['target_temperature'].mean().reset_index()
test_with_clim = test_df.merge(clim_df, on='doy', suffixes=('', '_clim'))
acc = anomaly_correlation(y_pred, y_test, test_with_clim['target_temperature_clim'].values)
print(f"Anomaly Correlation: {acc:.3f}")
```

---

## **76.9 Lessons Learned from Weather Prediction**

1. **Spatial context matters**: Including neighbour station data improved forecasts in our tests (not fully shown, but the idea is important).
2. **Uncertainty quantification is essential**: Deterministic forecasts are rarely sufficient; ensembles or probabilistic methods provide valuable confidence intervals.
3. **Seasonal cycles must be handled carefully**: Simple sin/cos encoding works well, but climate change may introduce non‑stationarity.
4. **Model interpretability**: Meteorologists often need to understand why a forecast was made; SHAP values can help.
5. **Data quality is paramount**: Missing data, instrument errors, and changes in station location must be tracked.

---

## **76.10 Future Directions**

- **Deep learning**: Models like ConvLSTM, Graph Neural Networks (for spatial grids), and Transformers are state‑of‑the‑art.
- **Integration with NWP**: Use NWP output as features (e.g., from GFS) and apply ML for post‑processing (MOS – Model Output Statistics).
- **High‑resolution forecasting**: Downscaling coarse NWP to local stations.
- **Climate projections**: Extending to seasonal or decadal timescales.

---

## **Chapter Summary**

In this chapter, we adapted our time‑series prediction system to the domain of weather forecasting. We generated synthetic station data, engineered temporal, lag, rolling, and derived features, trained an ensemble of gradient boosting models, and evaluated using weather‑specific metrics. We discussed the importance of spatial context and uncertainty quantification. The pipeline remains flexible and can be extended to more sophisticated models and real datasets.

This concludes our exploration of three diverse domains: finance (NEPSE), retail, and weather. The principles of robust data ingestion, feature engineering, model validation, and deployment apply universally.

In the next chapter, we will dive into **Healthcare Prediction Systems**, addressing unique challenges such as privacy, regulatory compliance, and model interpretability.

---

**End of Chapter 76**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='75. retail_sales_forecasting_system.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='77. healthcare_prediction_systems.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
