# Advanced Time Series Analysis: Traffic Accident Risk Prediction

This notebook implements advanced time series analysis techniques to create a practical model for predicting traffic accident risks. We'll learn how to:

* Analyze temporal patterns in accident data
* Implement multiple forecasting approaches (SARIMA and Prophet)
* Create a production-ready prediction model
* Visualize results in PowerBI

## Prerequisites

Before starting this tutorial, ensure you have:
- Basic understanding of Python programming
- Familiarity with pandas and numpy
- PowerBI Desktop installed (for visualization)

## Required Libraries

Let's start by importing all the libraries we'll need for our analysis:

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
from prophet import Prophet
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error, mean_absolute_error
import matplotlib.pyplot as plt
from typing import Dict
from datetime import datetime

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

## Data Loading and Preparation

First, let's define our TimeSeriesAnalyser class that will handle all our analysis:

In [2]:
class TimeSeriesAnalyser:
    """Handles time series analysis of accident data."""
    
    def __init__(self):
        self.prophet_model = None
        self.sarima_model = None
        self.seasonal_patterns = {}
        self.weather_columns = None
        self.weather_means = None
    
    def analyse_seasonality(self, data: pd.DataFrame) -> Dict:
        """Analyse seasonal patterns in the data."""
        patterns = {
            'hourly': data.groupby('hour').size(),
            'daily': data.groupby('day_of_week').size(),
            'monthly': data.groupby('month').size()
        }
        
        self.seasonal_patterns = {
            'peak_hour': patterns['hourly'].idxmax(),
            'peak_day': patterns['daily'].idxmax(),
            'peak_month': patterns['monthly'].idxmax()
        }
        
        return self.seasonal_patterns

Next, let's add methods for visualizing our seasonal patterns:

In [3]:
def visualise_seasonality(self, data: pd.DataFrame):
    """Create visualisations of seasonal patterns."""
    fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(15, 12))
    
    # Hourly pattern
    hourly = data.groupby('hour').size()
    ax1.bar(hourly.index, hourly.values, color='skyblue')
    ax1.set_title('Hourly Distribution of Accidents')
    ax1.set_xlabel('Hour of Day')
    ax1.set_ylabel('Number of Accidents')
    ax1.grid(True, alpha=0.3)
    
    # Daily pattern
    daily = data.groupby('day_of_week').size()
    days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    ax2.bar(days, daily.values, color='lightgreen')
    ax2.set_title('Daily Distribution of Accidents')
    ax2.set_xlabel('Day of Week')
    ax2.set_ylabel('Number of Accidents')
    plt.setp(ax2.xaxis.get_majorticklabels(), rotation=45)
    ax2.grid(True, alpha=0.3)
    
    # Monthly pattern
    monthly = data.groupby('month').size()
    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ax3.bar(months, monthly.values, color='salmon')
    ax3.set_title('Monthly Distribution of Accidents')
    ax3.set_xlabel('Month')
    ax3.set_ylabel('Number of Accidents')
    plt.setp(ax3.xaxis.get_majorticklabels(), rotation=45)
    ax3.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print summary statistics
    print("\nSeasonal Pattern Summary:")
    print(f"Peak hour: {self.seasonal_patterns['peak_hour']:02d}:00")
    print(f"Peak day: {days[self.seasonal_patterns['peak_day']]}")
    print(f"Peak month: {months[self.seasonal_patterns['peak_month']-1]}")

## Model Implementation

Now let's implement our prediction models, starting with Prophet. The Prophet model is particularly good at handling multiple seasonality patterns and incorporating external regressors like weather conditions.

In [4]:
def fit_prophet(self, data: pd.DataFrame):
    """Implement Prophet model for accident prediction.
    
    Prophet handles multiple seasonal patterns and can incorporate
    additional regressors like weather conditions. We configure it
    with multiplicative seasonality to account for varying amplitudes.
    """
    # Prepare data for Prophet
    prophet_data = data.groupby('date').size().reset_index()
    prophet_data.columns = ['ds', 'y']
    
    # Add weather as a regressor
    weather_dummies = pd.get_dummies(data['weather_conditions'], prefix='weather')
    daily_weather = weather_dummies.groupby(data['date']).mean()
    prophet_data = prophet_data.join(daily_weather, on='ds')
    
    # Store weather columns for prediction
    self.weather_columns = daily_weather.columns
    self.weather_means = daily_weather.mean()
    
    # Initialize Prophet model with comprehensive seasonality
    self.prophet_model = Prophet(
        yearly_seasonality=True,
        weekly_seasonality=True,
        daily_seasonality=True,
        seasonality_mode='multiplicative'
    )
    
    # Add weather regressors
    for column in self.weather_columns:
        self.prophet_model.add_regressor(str(column))
    
    # Fit the model
    self.prophet_model.fit(prophet_data)

Next, let's implement our SARIMA model. SARIMA (Seasonal Autoregressive Integrated Moving Average) is particularly effective at capturing complex time dependencies in the data.

In [5]:
def fit_sarima(self, data: pd.DataFrame):
    """Implement SARIMA model following time series best practices.
    
    We configure SARIMA with parameters that capture:
    - Short-term dependencies (AR and MA components)
    - Trend (Integration)
    - Seasonal patterns
    """
    # Prepare daily counts
    daily_counts = data.groupby('date').size()
    
    # Initialize SARIMA model with carefully chosen parameters
    self.sarima_model = SARIMAX(
        daily_counts,
        order=(2, 1, 1),          # Non-seasonal components
        seasonal_order=(1, 1, 1, 12)  # Seasonal components
    )
    
    # Fit the model and store results
    self.sarima_results = self.sarima_model.fit()
    
    return self.sarima_results

Now let's implement methods to generate predictions from our models:

In [6]:
def predict_prophet(self, dates):
    """Generate Prophet predictions including weather effects.
    
    This method creates a prediction dataframe with appropriate
    weather conditions (using historical averages) and generates
    forecasts with uncertainty intervals.
    """
    # Create future dataframe
    future = pd.DataFrame({'ds': dates})
    
    # Add weather regressors with mean values
    for column in self.weather_columns:
        future[column] = self.weather_means[column]
    
    return self.prophet_model.predict(future)

## Model Evaluation

Let's create comprehensive evaluation methods to compare our models' performance:

In [7]:
def evaluate_models(self, test_data: pd.DataFrame) -> Dict:
    """Evaluate model performance using multiple metrics.
    
    This method calculates several key performance metrics:
    - RMSE (Root Mean Square Error)
    - MAE (Mean Absolute Error)
    - Prediction intervals coverage
    """
    # Prepare actual values
    daily_actuals = test_data.groupby('date').size()
    test_dates = daily_actuals.index
    
    # Get predictions from both models
    prophet_preds = self.predict_prophet(test_dates)
    sarima_preds = self.sarima_results.predict(
        start=test_dates[0],
        end=test_dates[-1]
    )
    
    # Calculate metrics
    metrics = {
        'Prophet': {
            'RMSE': np.sqrt(mean_squared_error(daily_actuals, prophet_preds['yhat'])),
            'MAE': mean_absolute_error(daily_actuals, prophet_preds['yhat'])
        },
        'SARIMA': {
            'RMSE': np.sqrt(mean_squared_error(daily_actuals, sarima_preds)),
            'MAE': mean_absolute_error(daily_actuals, sarima_preds)
        }
    }
    
    return metrics

## PowerBI Integration

Let's create methods to export our results in a PowerBI-friendly format:

In [8]:
def export_for_powerbi(self, predictions: pd.DataFrame, 
                      actual_data: pd.DataFrame) -> Dict:
    """Prepare and export results for PowerBI visualization.
    
    Creates a structured set of dataframes that can be easily
    imported into PowerBI for interactive visualization.
    """
    # Prepare predictions table
    powerbi_predictions = pd.DataFrame({
        'Date': predictions.index,
        'Prophet_Prediction': predictions['prophet_pred'],
        'SARIMA_Prediction': predictions['sarima_pred'],
        'Actual_Value': actual_data,
        'Year': predictions.index.year,
        'Month': predictions.index.month,
        'Day': predictions.index.day,
        'DayOfWeek': predictions.index.dayofweek
    })
    
    # Prepare seasonal patterns table
    seasonal_patterns = pd.DataFrame({
        'Pattern_Type': ['Hourly', 'Daily', 'Monthly'],
        'Peak_Time': [
            self.seasonal_patterns['peak_hour'],
            self.seasonal_patterns['peak_day'],
            self.seasonal_patterns['peak_month']
        ]
    })
    
    return {
        'predictions': powerbi_predictions,
        'patterns': seasonal_patterns
    }

## Example Usage

Let's demonstrate how to use our complete analysis framework:

In [9]:
def main():
    """Demonstrate the complete accident prediction workflow."""
    # Load and prepare data
    data = pd.read_csv('accident_data.csv')
    data['date'] = pd.to_datetime(data['date'])
    
    # Initialize analyzer
    analyser = TimeSeriesAnalyser()
    
    # Analyze patterns
    patterns = analyser.analyse_seasonality(data)
    analyser.visualise_seasonality(data)
    
    # Train models
    print("Training models...")
    analyser.fit_prophet(data)
    analyser.fit_sarima(data)
    
    # Generate and evaluate predictions
    evaluation_results = analyser.evaluate_models(data)
    
    # Export for PowerBI
    powerbi_data = analyser.export_for_powerbi(
        predictions,
        actual_data
    )
    
    return analyser, evaluation_results, powerbi_data

## Conclusion

This notebook has implemented a comprehensive framework for traffic accident prediction using advanced time series analysis. The combination of Prophet and SARIMA models, along with seasonal pattern analysis, provides robust predictions that can be valuable for traffic safety planning.

Key features of our implementation:
1. Multiple modeling approaches for robust predictions
2. Comprehensive seasonal pattern analysis
3. Integration with PowerBI for interactive visualization
4. Production-ready code structure

For practical application, consider:
- Regular retraining of models with new data
- Monitoring prediction accuracy over time
- Incorporating additional features like weather forecasts
- Setting up automated alerts for high-risk periods