## Type of Data in This Time Series:
This dataset represents simulated monthly temperatures for six European capital cities—Berlin, Paris, London, Rome, Madrid, and Vienna—over a period of 20 years. Each city has its own time series of temperatures, with a distinction between winter (cooler months) and summer (warmer months).

Winter months (January, February, March, October, November, December) are simulated with an average temperature of around 6°C with some variability (standard deviation of 10°C).
Summer months (April through September) are simulated with an average temperature of around 27°C with a lower variability (standard deviation of 6°C).

## Seasonality:
Seasonality refers to regular, predictable patterns in time series data that repeat over a specific time period (e.g., monthly, yearly). In this case, the seasonality is evident:

Temperatures rise during the summer months (April–September) and fall during the winter months (October–March).
This type of seasonal cycle, typical in meteorological data, repeats annually.



In [30]:
import pandas as pd
import numpy as np
import plotly.express as px

# Generate date range: 20 years with 12 months per year (monthly data)
years = pd.date_range(start='2003-01-01', periods=240, freq='M')  # 240 months (20 years)
# Simulate temperatures for 6 capital cities
city_names = ['Berlin', 'Paris', 'London', 'Rome', 'Madrid', 'Vienna']

# Create an empty DataFrame to store the temperatures
df = pd.DataFrame(index=years, columns=city_names)

# Generate temperature data with a trend: rising by 0.5°C per year on average
np.random.seed(42)
base_winter_temp = 5  # Base temperature for winter months
base_summer_temp = 22  # Base temperature for summer months
yearly_trend = 0.7  # Temperature increases by 0.5°C per year

for i, (year, month) in enumerate(zip(df.index.year, df.index.month)):
    # Calculate how many years have passed since 2003 to apply the temperature trend
    year_offset = year - 2003
    trend_adjustment = year_offset * yearly_trend

    if month in [1, 2, 3, 10, 11, 12]:  # Winter months
        # Apply trend adjustment to winter base temperature
        df.iloc[i] = np.random.normal(base_winter_temp + trend_adjustment, 6, size=6)
    else:  # Summer months (April - September)
        # Apply trend adjustment to summer base temperature
        df.iloc[i] = np.random.normal(base_summer_temp + trend_adjustment, 6, size=6)

# Round the temperatures to one decimal place
df = df.apply(pd.to_numeric)  # Ensure the DataFrame contains numeric values
df = df.round(1)  # Round to 1 decimal place

# Reshape df for Plotly Express, custom colors and symbols
df_reset = df.reset_index().melt(id_vars="index", var_name="City", value_name="Temperature")
df_reset.columns = ['Date', 'City', 'Temperature']

# Plot using Plotly Express
fig = px.scatter(df_reset, x='Date', y='Temperature', color='City', symbol='City',
              title="Temperatures in 6 European Capitals - Monthly for 20 years",
              labels={'Temperature': 'Temperature (°C)'}, trendline="ols", color_discrete_map={
                  'Berlin': '#636EFA',   # Medium Light Blue
                  'Paris': '#EF553B',    # Bright Red
                  'London': '#00CC96',   # Teal
                  'Rome': '#AB63FA',     # Light Purple
                  'Madrid': '#FFA15A',   # Vibrant Orange
                  'Vienna': '#19D3F3'    # Light Blue (Cyan)
                }) 

# ordinary least square trendlines

# Show the plot
fig.show()


In [19]:
from statsmodels.tsa.vector_ar.var_model import VAR

# Train a VAR model on the multivariate time series
# Split into train and test sets
train_data = df[:-10]  # All but the last 10 periods for training
test_data = df[-10:]   # Last 10 periods for testing/validation

# chose best maxlags based on AIC, BIC, etc. 
model = VAR(train_data)
lag_order = model.select_order(maxlags=15)  # test up to 15 lags
print(lag_order.summary())



 VAR Order Selection (* highlights the minimums)  
       AIC         BIC         FPE         HQIC   
--------------------------------------------------
0        24.01      24.10*   2.663e+10       24.04
1        23.58       24.24   1.739e+10      23.84*
2        23.73       24.95   2.026e+10       24.23
3        23.85       25.64   2.288e+10       24.57
4        23.94       26.29   2.500e+10       24.89
5        23.96       26.87   2.568e+10       25.13
6        24.01       27.49   2.743e+10       25.42
7        23.59       27.64   1.821e+10       25.23
8        23.53       28.14   1.741e+10       25.39
9        23.63       28.80   1.960e+10       25.72
10       23.68       29.42   2.126e+10       26.00
11       23.65       29.96   2.133e+10       26.20
12      23.39*       30.25  1.698e+10*       26.16
13       23.49       30.92   1.973e+10       26.49
14       23.53       31.53   2.182e+10       26.76
15       23.56       32.12   2.411e+10       27.02
-------------------------------

## VAR Order Explanation:

The table in the output shows different statistical criteria to evaluate the performance of the VAR model with different lag lengths (from 0 to 15). Each row represents a different number of lags (maxlags), and the columns show the evaluation results based on different criteria:

#### AIC (Akaike Information Criterion): 
A measure of model performance that penalizes complexity. Lower values indicate a better model. The model with the lowest AIC is preferred.

#### BIC (Bayesian Information Criterion): 
Similar to AIC, but it penalizes more heavily for model complexity. It's generally used when you want to choose a simpler model.

#### FPE (Final Prediction Error): 
Estimates the prediction error of the model. The smaller the FPE, the better the model's performance.

#### HQIC (Hannan-Quinn Information Criterion): 
Another criterion that penalizes complexity but falls between AIC and BIC in terms of strictness. Lower values indicate a better model.



In [20]:
# Cross-Validation to find the best lag length

best_lag = 0
best_score = float('inf')

for lags in range(1, 15):  # test lag values 1 to 14
    model = VAR(train_data)
    model_fit = model.fit(lags)
    forecast = model_fit.forecast(train_data.values, steps=len(test_data))
    mse = np.mean((forecast - test_data.values) ** 2)  # Mean Squared Error (MSE)
    if mse < best_score:
        best_score = mse
        best_lag = lags

print(f'Best lag lenth: {best_lag} with MSE: {best_score}')


Best lag lenth: 12 with MSE: 47.932053349292474


In [36]:
import plotly.graph_objects as go

# Train the VAR model with the optimal lag length of 12
model = VAR(train_data)
model_fit = model.fit(maxlags=12)  # Use 12 lags as recommended by the tests

# Forecast for the next 10 time periods
forecast = model_fit.forecast(train_data.values, steps=10)

# Convert the forecast into a DataFrame
forecast_df = pd.DataFrame(forecast, index=test_data.index, columns=city_names)

# Reshape historical data for Plotly Express
df_reset = train_data.reset_index().melt(id_vars='index', var_name='City', value_name='Temperature')
df_reset.columns = ['Date', 'City', 'Temperature']

# Reshape forecasted data for Plotly Express
forecast_reset = forecast_df.reset_index().melt(id_vars='index', var_name='City', value_name='Temperature')
forecast_reset.columns = ['Date', 'City', 'Temperature']

# Plot historical data using Plotly Express with pastel colors
fig = px.line(df_reset, x='Date', y='Temperature', color='City',
              title='VAR Model: Historical Data and Forecast (with 12 Lags)',
              labels={'Temperature': 'Temperature (°C)'},  color_discrete_map={
              'Berlin': '#818CFA',  # Lighter Blue
              'Paris': '#F27868',   # Lighter Red
              'London': '#33D5B0',  # Lighter Teal
              'Rome': '#BF86FB',    # Lighter Lavender
              'Madrid': '#FFBA7F',  # Lighter Orange
              'Vienna': '#4DDFF9'   # Lighter Cyan
              }) 


# Add forecast data with manually assigned colors
forecast_colors = {
    'Berlin': '#3A4BB7',  # Dark Blue
    'Paris': '#A23321',   # Dark Red
    'London': '#007F66',  # Dark Teal
    'Rome': '#7A3DA8',    # Dark Purple
    'Madrid': '#CC6634',  # Dark Orange
    'Vienna': '#10819C'   # Dark Cyan
}

for city in forecast_df.columns:
    forecast_city_data = forecast_reset[forecast_reset['City'] == city]
    
    # Add forecast trace using go.Scatter
    fig.add_trace(go.Scatter(
        x=forecast_city_data['Date'],
        y=forecast_city_data['Temperature'],
        mode='lines',
        name=f'Forecast - {city}',
        line=dict(color=forecast_colors[city], dash='dash')  # Apply the darker color and dashed line
    ))

# Show the plot
fig.show()
