# Results

<font size = '3.5'>
    
>Consolidating all results for the three models-arima_manual, auto_arima and Prophet models, and comparing their performances using the cross-validation results for one year forecastts and then three forecasts.
</font>

In [1]:
# Importing relevant libraries
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns


## One year forecast cross-validation.

In [2]:
# Load the one year  cross-validation results for all three models.

arima_manual_cv = pd.read_csv('/Users/gorji1/Desktop/MSC_RESEARCH_PROJECT/Results/arima_manual_cv.csv')
auto_arima_cv = pd.read_csv('/Users/gorji1/Desktop/MSC_RESEARCH_PROJECT/Results/auto_arima_cv.csv')
prophet_365_df = pd.read_csv('/Users/gorji1/Desktop/MSC_RESEARCH_PROJECT/Results/prophet_365_metrics_df.csv')
# Show the first few rows of each DataFrame to understand the data structure
arima_manual_cv.head(), auto_arima_cv.head(), prophet_365_df.head()

(  Metric       Value
 0    MSE  1887702.06
 1   RMSE     1373.94
 2    MAE     1334.83
 3   MASE        1.44,
   Metric  ARIMA (2,2,0)
 0    MSE   1.831175e+06
 1   RMSE   1.353209e+03
 2    MAE   1.314914e+03
 3   MASE   8.116751e+00,
   Metric  Prophet (365 days)
 0    MSE        1.395395e+09
 1   RMSE        3.735498e+04
 2    MAE        3.638702e+04
 3   MASE        3.511304e+01)

In [3]:
# Combine all metrics into a single DataFrame for easier comparison
all_metrics_df = pd.merge(arima_manual_cv, auto_arima_cv, on='Metric', how='outer')
all_metrics_df = pd.merge(all_metrics_df, prophet_365_df, on='Metric', how='outer')

# Rename columns for clarity
all_metrics_df.columns = ['Metric', 'Manual_ARIMA (1,1,3)', 'Auto_ARIMA (2,2,0)', 'Prophet (365 days)']

# Round the numeric columns to whole numbers
numeric_columns = ['Manual_ARIMA (1,1,3)', 'Auto_ARIMA (2,2,0)', 'Prophet (365 days)']

all_metrics_df[numeric_columns] = all_metrics_df[numeric_columns].round(0).astype(int)

all_metrics_df


Unnamed: 0,Metric,"Manual_ARIMA (1,1,3)","Auto_ARIMA (2,2,0)",Prophet (365 days)
0,MSE,1887702,1831175,1395394612
1,RMSE,1374,1353,37355
2,MAE,1335,1315,36387
3,MASE,1,8,35


### Observations:

<font size = '3.5'>
    

>Manual_ARIMA (1,1,3) vs. Auto_ARIMA (2,2,0): These two models perform almost identically, with virtually no difference in their MSE, RMSE, and MAE. This similarity suggests that the simpler Auto ARIMA model is just as effective as the manually configured ARIMA model for this particular dataset, providing accuracy without the need for additional complexity.

>Prophet (365 days): The Prophet model exhibits significantly higher MSE, RMSE, and MAE compared to both ARIMA models, indicating less accuracy in these terms. However, its MAPE is considerably lower, suggesting that the model's percentage errors are smaller relative to actual values. This disparity could result from Prophet's handling of trends and seasonality, possibly capturing larger movements missed by the ARIMA models but at the cost of higher individual prediction errors.
>Mean Absolute Error (MAE): This metric calculates the absolute value of the errors. It is less sensitive to outliers than the MSE. The Auto_ARIMA (0,1,0) model has the lowest MAE, indicating strong performance.​  
    
MAPE Insights: The stark difference in MAPE between the ARIMA models and Prophet indicates divergent predictive accuracies when considering the scale of the data. While ARIMA models seem to predict values closer to the true figures, their percentage errors are larger, suggesting less reliability when the scale of the dataset is a crucial factor.


In summary, while the Auto ARIMA model matches the manually tuned ARIMA in standard error metrics, suggesting efficient performance without manual intervention, the Prophet model's lower MAPE indicates potential advantages in analyses where understanding the proportion of error to actual values is critical. This insight is especially relevant in datasets with wide-ranging values, where prediction reliability requires accurately capturing the data scale.
</font>

In [4]:
#extracting the manual and prophet arima values.

selected_columns = ['Metric', 'Prophet (365 days)', 'Manual_ARIMA (1,1,3)']
new_df = all_metrics_df[selected_columns]

# Display the new DataFrame
new_df


Unnamed: 0,Metric,Prophet (365 days),"Manual_ARIMA (1,1,3)"
0,MSE,1395394612,1887702
1,RMSE,37355,1374
2,MAE,36387,1335
3,MASE,35,1


## Calculating the percentage accuracy.

In [5]:
# Calculate the percentage accuracy of the best-performing model (Auto ARIMA) relative to the others

percentage_accuracy_df = pd.DataFrame({
    'Metric': all_metrics_df['Metric'],
    'Auto_ARIMA_vs_Manual_ARIMA (%)': (((all_metrics_df['Manual_ARIMA (1,1,3)'] - all_metrics_df['Auto_ARIMA (2,2,0)']) / all_metrics_df['Manual_ARIMA (1,1,3)']) * 100).round(0),
    'Auto_ARIMA_vs_Prophet (%)': (((all_metrics_df['Prophet (365 days)'] - all_metrics_df['Auto_ARIMA (2,2,0)']) / all_metrics_df['Prophet (365 days)']) * 100).round(0)
})

# Add a row to calculate the average percentage accuracy across all metrics
average_percentage_accuracy = pd.DataFrame({
    'Metric': ['Average'],
    'Auto_ARIMA_vs_Manual_ARIMA (%)': [percentage_accuracy_df['Auto_ARIMA_vs_Manual_ARIMA (%)'].mean().round(0)],
    'Auto_ARIMA_vs_Prophet (%)': [percentage_accuracy_df['Auto_ARIMA_vs_Prophet (%)'].mean().round(0)]
})

percentage_accuracy_df = pd.concat([percentage_accuracy_df, average_percentage_accuracy]).reset_index(drop=True)

percentage_accuracy_df


Unnamed: 0,Metric,Auto_ARIMA_vs_Manual_ARIMA (%),Auto_ARIMA_vs_Prophet (%)
0,MSE,3.0,100.0
1,RMSE,2.0,96.0
2,MAE,1.0,96.0
3,MASE,-700.0,77.0
4,Average,-174.0,92.0


## Three year forecast cross-validation.

In [6]:
# Load the one year  cross-validation results for all three models.

arima_manual_cv3 = pd.read_csv('/Users/gorji1/Desktop/MSC_RESEARCH_PROJECT/Results/arima_manual_cv3.csv')
auto_arima_cv3 = pd.read_csv('/Users/gorji1/Desktop/MSC_RESEARCH_PROJECT/Results/auto_arima_cv3.csv')
prophet_1096_df = pd.read_csv('/Users/gorji1/Desktop/MSC_RESEARCH_PROJECT/Results/prophet_1096_metrics_df.csv')
# Show the first few rows of each DataFrame to understand the data structure
arima_manual_cv3.head(), auto_arima_cv3.head(), prophet_1096_df.head()

(  Metric         Value
 0    MSE  3.189651e+08
 1   RMSE  1.785959e+04
 2    MAE  1.379657e+04
 3   MASE  1.195251e+01,
   Metric  ARIMA (2,2,0)
 0    MSE   1.248106e+09
 1   RMSE   3.532854e+04
 2    MAE   2.175351e+04
 3   MASE   1.932909e+01,
   Metric  Prophet (1096 days)
 0    MSE         2.945410e+09
 1   RMSE         5.427163e+04
 2    MAE         5.060995e+04
 3   MASE         4.883799e+01)

In [7]:
# Combine all metrics into a single DataFrame for easier comparison
all_metrics_3yr_df = pd.merge(arima_manual_cv3, auto_arima_cv3, on='Metric', how='outer')
all_metrics_3yr_df = pd.merge(all_metrics_3yr_df, prophet_1096_df, on='Metric', how='outer')

# Rename columns for clarity
all_metrics_3yr_df.columns = ['Metric', 'Manual_ARIMA (1,1,3)', 'Auto_ARIMA (2,2,0)', 'Prophet (1096 days)']

# Round the numeric columns to whole numbers
numeric_columns = ['Manual_ARIMA (1,1,3)', 'Auto_ARIMA (2,2,0)', 'Prophet (1096 days)']

all_metrics_3yr_df[numeric_columns] = all_metrics_3yr_df[numeric_columns].round(0).astype(int)

all_metrics_3yr_df.head()

Unnamed: 0,Metric,"Manual_ARIMA (1,1,3)","Auto_ARIMA (2,2,0)",Prophet (1096 days)
0,MSE,318965132,1248106044,2945409991
1,RMSE,17860,35329,54272
2,MAE,13797,21754,50610
3,MASE,12,19,49


<font size = '3.5'>
General Analysis:

Among the three models, the Manual ARIMA (1,1,3) model consistently outperforms the others in all metrics, suggesting it may be the most suitable model among the ones tested for this specific dataset.
The Auto ARIMA model seems to perform moderately, not as well as the Manual ARIMA but better than the Prophet model.
The Prophet model appears to have the least accurate forecasts, as indicated by its generally higher error metrics. This result may be due to various factors, including the model's sensitivity to the data's characteristics (such as seasonality and holiday effects), the chosen parameters, or the data preprocessing steps.
Conclusion:
While the Manual ARIMA model seems to be the best among the three, the high values of the metrics across all models indicate that there is significant room for improvement in the forecasts. This situation might involve more advanced preprocessing of the data, feature engineering, tuning of model parameters, or even exploring different forecasting models. This will be added to recommendations and future research section of the thesis. As this project is mainly to determine the accuracy of arima and Prophet in forecasting cyber incidents, a conclusion will be drawn thus.
</font>

In [8]:
#extracting the manual and prophet arima values for the 3 year forecast.

selected_columns2 = ['Metric', 'Prophet (1096 days)','Manual_ARIMA (1,1,3)']
new_df2 = all_metrics_3yr_df[selected_columns2]

# Display the new DataFrame

new_df2

Unnamed: 0,Metric,Prophet (1096 days),"Manual_ARIMA (1,1,3)"
0,MSE,2945409991,318965132
1,RMSE,54272,17860
2,MAE,50610,13797
3,MASE,49,12


In [9]:
# Calculate the percentage accuracy for the 3-year forecast
percentage_accuracy_3yr_df = pd.DataFrame({
    'Metric': all_metrics_3yr_df['Metric'],
    'Auto_ARIMA_vs_Manual_ARIMA (%)': (((all_metrics_3yr_df['Manual_ARIMA (1,1,3)'] - all_metrics_3yr_df['Auto_ARIMA (2,2,0)']) / all_metrics_3yr_df['Manual_ARIMA (1,1,3)']) * 100).round(0),
    'Auto_ARIMA_vs_Prophet (%)': (((all_metrics_3yr_df['Prophet (1096 days)'] - all_metrics_3yr_df['Auto_ARIMA (2,2,0)']) / all_metrics_3yr_df['Prophet (1096 days)']) * 100).round(0)
})

# Add a row to calculate the average percentage accuracy across all metrics
average_percentage_accuracy_3yr = pd.DataFrame({
    'Metric': ['Average'],
    'Auto_ARIMA_vs_Manual_ARIMA (%)': [percentage_accuracy_3yr_df['Auto_ARIMA_vs_Manual_ARIMA (%)'].mean().round(0)],
    'Auto_ARIMA_vs_Prophet (%)': [percentage_accuracy_3yr_df['Auto_ARIMA_vs_Prophet (%)'].mean().round(0)]
})

percentage_accuracy_3yr_df = pd.concat([percentage_accuracy_3yr_df, average_percentage_accuracy_3yr]).reset_index(drop=True)

percentage_accuracy_3yr_df


Unnamed: 0,Metric,Auto_ARIMA_vs_Manual_ARIMA (%),Auto_ARIMA_vs_Prophet (%)
0,MSE,-291.0,58.0
1,RMSE,-98.0,35.0
2,MAE,-58.0,57.0
3,MASE,-58.0,61.0
4,Average,-126.0,53.0


## Comparing ARIMA(manual) with Prophet for 1 and 3 years forecast.

In [10]:

# Combine the 1-year metrics into a single DataFrame for easier comparison
comparison_1yr_df = pd.merge(arima_manual_cv, prophet_365_df, on='Metric', how='outer')
comparison_1yr_df.columns = ['Metric', 'Manual_ARIMA (1 Year)', 'Prophet (1 Year)']

# Combine the 3-year metrics into a single DataFrame for easier comparison
comparison_3yr_df = pd.merge(arima_manual_cv3,prophet_1096_df, on='Metric', how='outer')
comparison_3yr_df.columns = ['Metric', 'Manual_ARIMA (3 Years)', 'Prophet (3 Years)']

# Calculate the percentage accuracy for Manual_ARIMA vs Prophet for the 1-year and 3-year forecasts
percentage_accuracy_comparison_1yr = pd.DataFrame({
    'Metric': comparison_1yr_df['Metric'],
    'Manual_ARIMA_vs_Prophet (1 Year) (%)': ((comparison_1yr_df['Prophet (1 Year)'] - comparison_1yr_df['Manual_ARIMA (1 Year)']) / comparison_1yr_df['Prophet (1 Year)']) * 100
})

percentage_accuracy_comparison_3yr = pd.DataFrame({
    'Metric': comparison_3yr_df['Metric'],
    'Manual_ARIMA_vs_Prophet (3 Years) (%)': ((comparison_3yr_df['Prophet (3 Years)'] - comparison_3yr_df['Manual_ARIMA (3 Years)']) / comparison_3yr_df['Prophet (3 Years)']) * 100
})

# Add a row to calculate the average percentage accuracy across all metrics for both 1-year and 3-year forecasts
average_percentage_accuracy_comparison_1yr = pd.DataFrame({
    'Metric': ['Average'],
    'Manual_ARIMA_vs_Prophet (1 Year) (%)': [percentage_accuracy_comparison_1yr['Manual_ARIMA_vs_Prophet (1 Year) (%)'].mean()]
})

average_percentage_accuracy_comparison_3yr = pd.DataFrame({
    'Metric': ['Average'],
    'Manual_ARIMA_vs_Prophet (3 Years) (%)': [percentage_accuracy_comparison_3yr['Manual_ARIMA_vs_Prophet (3 Years) (%)'].mean()]
})

percentage_accuracy_comparison_1yr = pd.concat([percentage_accuracy_comparison_1yr, average_percentage_accuracy_comparison_1yr]).reset_index(drop=True)
percentage_accuracy_comparison_3yr = pd.concat([percentage_accuracy_comparison_3yr, average_percentage_accuracy_comparison_3yr]).reset_index(drop=True)

# Round the values to nearest whole numbers
percentage_accuracy_comparison_1yr['Manual_ARIMA_vs_Prophet (1 Year) (%)'] = percentage_accuracy_comparison_1yr['Manual_ARIMA_vs_Prophet (1 Year) (%)'].round(0)
percentage_accuracy_comparison_3yr['Manual_ARIMA_vs_Prophet (3 Years) (%)'] = percentage_accuracy_comparison_3yr['Manual_ARIMA_vs_Prophet (3 Years) (%)'].round(0)


percentage_accuracy_comparison_1yr.head() 


Unnamed: 0,Metric,Manual_ARIMA_vs_Prophet (1 Year) (%)
0,MSE,100.0
1,RMSE,96.0
2,MAE,96.0
3,MASE,96.0
4,Average,97.0


In [11]:
percentage_accuracy_comparison_3yr.head()

Unnamed: 0,Metric,Manual_ARIMA_vs_Prophet (3 Years) (%)
0,MSE,89.0
1,RMSE,67.0
2,MAE,73.0
3,MASE,76.0
4,Average,76.0


In [12]:
# to get Prophet vs manual_arima, we revise the code

# Calculate the percentage accuracy for Prophet vs Manual_ARIMA for the 1-year and 3-year forecasts
percentage_accuracy_comparison_1yr_prophet = pd.DataFrame({
    'Metric': comparison_1yr_df['Metric'],
    'Prophet_vs_Manual_ARIMA (1 Year) (%)': ((comparison_1yr_df['Manual_ARIMA (1 Year)'] - comparison_1yr_df['Prophet (1 Year)']) / comparison_1yr_df['Manual_ARIMA (1 Year)']) * 100
})

percentage_accuracy_comparison_3yr_prophet = pd.DataFrame({
    'Metric': comparison_3yr_df['Metric'],
    'Prophet_vs_Manual_ARIMA (3 Years) (%)': ((comparison_3yr_df['Manual_ARIMA (3 Years)'] - comparison_3yr_df['Prophet (3 Years)']) / comparison_3yr_df['Manual_ARIMA (3 Years)']) * 100
})

# Add a row to calculate the average percentage accuracy across all metrics for both 1-year and 3-year forecasts
average_percentage_accuracy_comparison_1yr_prophet = pd.DataFrame({
    'Metric': ['Average'],
    'Prophet_vs_Manual_ARIMA (1 Year) (%)': [percentage_accuracy_comparison_1yr_prophet['Prophet_vs_Manual_ARIMA (1 Year) (%)'].mean()]
})

average_percentage_accuracy_comparison_3yr_prophet = pd.DataFrame({
    'Metric': ['Average'],
    'Prophet_vs_Manual_ARIMA (3 Years) (%)': [percentage_accuracy_comparison_3yr_prophet['Prophet_vs_Manual_ARIMA (3 Years) (%)'].mean()]
})

percentage_accuracy_comparison_1yr_prophet = pd.concat([percentage_accuracy_comparison_1yr_prophet, average_percentage_accuracy_comparison_1yr_prophet]).reset_index(drop=True)
percentage_accuracy_comparison_3yr_prophet = pd.concat([percentage_accuracy_comparison_3yr_prophet, average_percentage_accuracy_comparison_3yr_prophet]).reset_index(drop=True)

# Round the values to nearest whole numbers
percentage_accuracy_comparison_1yr_prophet['Prophet_vs_Manual_ARIMA (1 Year) (%)'] = percentage_accuracy_comparison_1yr_prophet['Prophet_vs_Manual_ARIMA (1 Year) (%)'].round(0)
percentage_accuracy_comparison_3yr_prophet['Prophet_vs_Manual_ARIMA (3 Years) (%)'] = percentage_accuracy_comparison_3yr_prophet['Prophet_vs_Manual_ARIMA (3 Years) (%)'].round(0)

percentage_accuracy_comparison_1yr_prophet.head()


Unnamed: 0,Metric,Prophet_vs_Manual_ARIMA (1 Year) (%)
0,MSE,-73820.0
1,RMSE,-2619.0
2,MAE,-2626.0
3,MASE,-2338.0
4,Average,-20351.0


In [13]:
percentage_accuracy_comparison_3yr_prophet.head()

Unnamed: 0,Metric,Prophet_vs_Manual_ARIMA (3 Years) (%)
0,MSE,-823.0
1,RMSE,-204.0
2,MAE,-267.0
3,MASE,-309.0
4,Average,-401.0


## Calculating the performance scores of both models for comparison...

In [14]:
# Define the metrics for the 1-year forecast for Prophet and ARIMA
# Using the performance metrics measures from manual arima  one year cross validation results, create variables...

mse = 1887702
rmse = 1374
mae = 1335
mase = 1

metrics_arima_1yr = {
    "MSE":mse,
    "RMSE":rmse,
    "MAE": mae,
    "MASE":mase  # Assuming this is already in a standard unit
}

# create variables for prophet's measures

prophet_mse = 1395394612
prophet_rmse = 37355
prophet_mae = 36387
prophet_mase = 35

metrics_prophet_1yr = {
    "MSE":prophet_mse ,
    "RMSE":prophet_rmse ,
    "MAE":prophet_mae ,
    "MASE":prophet_mase  # Assuming this is already in a standard unit
}

# Calculate performance scores for each metric
performance_scores_prophet_1yr = {}
performance_scores_arima_1yr = {}

for metric in metrics_prophet_1yr.keys():
    performance_scores_prophet_1yr[metric] = (1 - metrics_prophet_1yr[metric] / (metrics_prophet_1yr[metric] + metrics_arima_1yr[metric])) * 100
    performance_scores_arima_1yr[metric] = (1 - metrics_arima_1yr[metric] / (metrics_prophet_1yr[metric] + metrics_arima_1yr[metric])) * 100

performance_scores_prophet_1yr, performance_scores_arima_1yr


({'MSE': 0.13509811017331508,
  'RMSE': 3.5477290918949578,
  'MAE': 3.5390488309209434,
  'MASE': 2.777777777777779},
 {'MSE': 99.86490188982668,
  'RMSE': 96.45227090810505,
  'MAE': 96.46095116907905,
  'MASE': 97.22222222222221})

In [15]:
# Create a DataFrame to display the scores in a table
scores_df = pd.DataFrame({
    "Metric": ["MSE", "RMSE", "MAE", "MASE"],
    "Prophet (365_Days)%": [performance_scores_prophet_1yr["MSE"], performance_scores_prophet_1yr["RMSE"], performance_scores_prophet_1yr["MAE"], performance_scores_prophet_1yr["MASE"]],
    "ARIMA (1,1,3)%": [performance_scores_arima_1yr["MSE"], performance_scores_arima_1yr["RMSE"], performance_scores_arima_1yr["MAE"], performance_scores_arima_1yr["MASE"]]
})
# If you want to globally set the display format for floating point numbers in Pandas:

pd.options.display.float_format = '{:.1f}'.format

scores_df


Unnamed: 0,Metric,Prophet (365_Days)%,"ARIMA (1,1,3)%"
0,MSE,0.1,99.9
1,RMSE,3.5,96.5
2,MAE,3.5,96.5
3,MASE,2.8,97.2


In [16]:
# Calculating the overall performance score for the 1-year forecast for both models
overall_performance_prophet_1yr = sum(performance_scores_prophet_1yr.values()) / 4
overall_performance_arima_1yr = sum(performance_scores_arima_1yr.values()) / 4

pd.options.display.float_format = '{:.1f}'.format

overall_performance_prophet_1yr, overall_performance_arima_1yr



(2.499913452691749, 97.50008654730826)

In [17]:

# Add the overall performance scores as an extra column to the table
scores_df["Overall Performance"] = [overall_performance_prophet_1yr, overall_performance_arima_1yr, overall_performance_prophet_1yr, overall_performance_arima_1yr]

scores_df

Unnamed: 0,Metric,Prophet (365_Days)%,"ARIMA (1,1,3)%",Overall Performance
0,MSE,0.1,99.9,2.5
1,RMSE,3.5,96.5,97.5
2,MAE,3.5,96.5,2.5
3,MASE,2.8,97.2,97.5


In [18]:

scores_df = scores_df.drop(columns=["Overall Performance"])  # removing the overall performance column

# Creating a new DataFrame for overall performance scores
overall_scores_df = pd.DataFrame({
    "Metric": ["Overall Performance"],
    "ARIMA (1,1,3)%": [overall_performance_arima_1yr],
    "Prophet (365_Days)%": [overall_performance_prophet_1yr]  
      
})

# Concatenating the original DataFrame with the overall performance scores DataFrame
scores_df = pd.concat([scores_df, overall_scores_df], ignore_index=True)

scores_df



Unnamed: 0,Metric,Prophet (365_Days)%,"ARIMA (1,1,3)%"
0,MSE,0.1,99.9
1,RMSE,3.5,96.5
2,MAE,3.5,96.5
3,MASE,2.8,97.2
4,Overall Performance,2.5,97.5


In [19]:
# Rounding the scores to the nearest whole number
scores_df["Prophet (365_Days)%"] = scores_df["Prophet (365_Days)%"].round(0)
scores_df["ARIMA (1,1,3)%"] = scores_df["ARIMA (1,1,3)%"].round(0)

scores_df

Unnamed: 0,Metric,Prophet (365_Days)%,"ARIMA (1,1,3)%"
0,MSE,0.0,100.0
1,RMSE,4.0,96.0
2,MAE,4.0,96.0
3,MASE,3.0,97.0
4,Overall Performance,2.0,98.0


In [20]:
# Define the metrics for the 3-year forecast for Prophet and ARIMA (to the nearest whole number)

# Create variables to store the measures

mse_cv3=318965132
rmse_cv3=17860
mae_cv3=13797
mase_cv3=12

metrics_arima_3yr = {
    "MSE": mse_cv3,
    "RMSE":rmse_cv3,
    "MAE":mae_cv3,
    "MASE":mase_cv3 
}

# Create variables to store the measures

mse_prophet_cv3=2945409991
rmse_prophet_cv3=54272
mae_prophet_cv3=50610
mase_prophet_cv3=49 


metrics_prophet_3yr = {
    "MSE":mse_prophet_cv3, 
    "RMSE":rmse_prophet_cv3,
    "MAE": mae_prophet_cv3,
    "MASE": mase_prophet_cv3 
}

# Calculate performance scores for each metric

performance_scores_prophet_3yr = {}
performance_scores_arima_3yr = {}

for metric in metrics_prophet_3yr.keys():
    performance_scores_prophet_3yr[metric] = (1 - metrics_prophet_3yr[metric] / (metrics_prophet_3yr[metric] + metrics_arima_3yr[metric])) * 100
    performance_scores_arima_3yr[metric] = (1 - metrics_arima_3yr[metric] / (metrics_prophet_3yr[metric] + metrics_arima_3yr[metric])) * 100

performance_scores_prophet_3yr, performance_scores_arima_3yr


({'MSE': 9.771093087698423,
  'RMSE': 24.760161925359057,
  'MAE': 21.421584610368438,
  'MASE': 19.672131147540984},
 {'MSE': 90.22890691230158,
  'RMSE': 75.23983807464094,
  'MAE': 78.57841538963156,
  'MASE': 80.32786885245902})

In [21]:
# Create a DataFrame to display the scores in a table
scores_df2 = pd.DataFrame({
    "Metric": ["MSE", "RMSE", "MAE", "MASE"],
    "Prophet (1096_Days)%": [performance_scores_prophet_3yr["MSE"], performance_scores_prophet_3yr["RMSE"], performance_scores_prophet_3yr["MAE"], performance_scores_prophet_3yr["MASE"]],
    "ARIMA (1,1,3)%": [performance_scores_arima_3yr["MSE"], performance_scores_arima_3yr["RMSE"], performance_scores_arima_3yr["MAE"], performance_scores_arima_3yr["MASE"]]
})

scores_df2

Unnamed: 0,Metric,Prophet (1096_Days)%,"ARIMA (1,1,3)%"
0,MSE,9.8,90.2
1,RMSE,24.8,75.2
2,MAE,21.4,78.6
3,MASE,19.7,80.3


In [22]:
# Calculating the overall performance score for the 3-year forecast for both models
overall_performance_prophet_3yr = sum(performance_scores_prophet_3yr.values()) / 4
overall_performance_arima_3yr = sum(performance_scores_arima_3yr.values()) / 4

overall_performance_prophet_3yr, overall_performance_arima_3yr


(18.906242692741724, 81.09375730725827)

In [23]:
# Add the overall performance scores as an extra column to the table
scores_df2["Overall Performance"] = [overall_performance_prophet_3yr, overall_performance_arima_3yr, overall_performance_prophet_3yr, overall_performance_arima_3yr]

scores_df2

Unnamed: 0,Metric,Prophet (1096_Days)%,"ARIMA (1,1,3)%",Overall Performance
0,MSE,9.8,90.2,18.9
1,RMSE,24.8,75.2,81.1
2,MAE,21.4,78.6,18.9
3,MASE,19.7,80.3,81.1


In [24]:
# Drop overall performance column, add to row

scores_df2 = scores_df2.drop(columns=["Overall Performance"])  # removing the overall performance column

# Creating a new DataFrame for overall performance scores
overall_scores_df2 = pd.DataFrame({
    "Metric": ["Overall Performance"],
    "Prophet (1096_Days)%": [overall_performance_prophet_3yr],  
    "ARIMA (1,1,3)%": [overall_performance_arima_3yr]  
})

# Concatenating the original DataFrame with the overall performance scores DataFrame
scores_df2 = pd.concat([scores_df2, overall_scores_df2], ignore_index=True)

scores_df2

Unnamed: 0,Metric,Prophet (1096_Days)%,"ARIMA (1,1,3)%"
0,MSE,9.8,90.2
1,RMSE,24.8,75.2
2,MAE,21.4,78.6
3,MASE,19.7,80.3
4,Overall Performance,18.9,81.1


In [25]:
# Rounding the scores to the nearest whole number
scores_df2["Prophet (1096_Days)%"] = scores_df2["Prophet (1096_Days)%"].round(0)
scores_df2["ARIMA (1,1,3)%"] = scores_df2["ARIMA (1,1,3)%"].round(0)

scores_df2

Unnamed: 0,Metric,Prophet (1096_Days)%,"ARIMA (1,1,3)%"
0,MSE,10.0,90.0
1,RMSE,25.0,75.0
2,MAE,21.0,79.0
3,MASE,20.0,80.0
4,Overall Performance,19.0,81.0


### This script is part of the author's research project,  4th stage: Results


    
### This script can be reproduced without permission.
    


### Author : Chinyere.O.Ugorji &copy; 2023


In [26]:
import sys
print(sys.version)


3.10.12 (main, Jul  5 2023, 14:49:34) [Clang 14.0.6 ]
