In [42]:
from argotools.dataFormatter import Load
from argotools.visualize import InputVis
from argotools.visualize import OutputVis
from argotools.experiment import ARGO_lrmse
from argotools.forecastlib.argo_methods_ import *
from argotools.forecastlib.functions import *
from sklearn.linear_model import LassoCV


In [28]:
path_to_ili = '/Users/leonardo/Desktop/FORECASTING/TESIS/MXNAT/ILI/ALL_ILI.csv'
path_to_GC = '/Users/leonardo/Desktop/FORECASTING/TESIS/MXNAT/gc_output'
country_codes = [ 'AR','BR', 'CL', 'MX']
study_period = ['2010-01-02', '2016-12-26']

## Initializing object

Load is the main class from dataFormatter. It contains several functions that automate the loading, formatting and basic preprocessing of data prior to any more serious procedures. To generate an object, you just call its constructor:

In [29]:
data_object = Load(start_period=study_period[0], end_period=study_period[1], ids=country_codes)

Load object initialized with the following ids : ['AR', 'BR', 'CL', 'MX']


To load data in the object, you first need to initialize the IDs for each location you will be working on. In this example, we will be working on generating influenza predictions for Argentina, Brazil, Chile and Mexico, thus we input their codenames. This action instantiates several dictionaries within the load object, where each key corresponds to the IDs we'll be working with

If you'd like to separately initialize an ID after creating the object, you can do it through the "new_id" function.

In [30]:
print(data_object.id)
print(data_object.target)
print(data_object.features)

['AR', 'BR', 'CL', 'MX']
{'AR': None, 'BR': None, 'CL': None, 'MX': None}
{'AR': None, 'BR': None, 'CL': None, 'MX': None}


## Adding features and target data

In its inner structure, the Load object keeps target and feature data in separate dictionaries. There are several functions to add data into the load object, depending on which is your case. The most commonly used are "add_target" and "add_features_customSource". After learning how to use these two, it is fairly easy to understand the other functions available.

We'll add target data from a formatted source (influenza cases from Flunet) and features data from a custom source (Google Correlate data).

NOTE: In this example we use GC data directly from Google Correlate. This data has not been properly filtered, and many words that are correlated with the term influenza may not be useful to fit a model. For real model-fitting, please make sure to properly preprocess your data.  
    

In [31]:
for country in country_codes:
    data_object.add_target(id_=country, path_to_file=path_to_ili)
    data_object.add_features_customSource(id_=country, path_to_file='{0}/{1}_GC.csv'.format(path_to_GC,country), source='GC', overwrite=False, verbose=True, autoanswer=None)
    




Following features read from file :                GC  mandarinas  mandarina  bisolvon  torta de mandarina  \
Date                                                                     
2010-01-03 -0.868      -0.783     -0.741    -0.992              -0.607   
2010-01-10 -0.879      -0.489     -0.655    -0.476              -0.607   
2010-01-17 -0.885      -0.476     -0.627    -0.992              -0.607   
2010-01-24 -1.007      -0.474     -0.643    -0.378              -0.607   
2010-01-31 -0.897      -0.902     -0.788    -0.523              -0.607   

            bronquitis  la bronquitis  nebulizador  cerro caviahue  nebulizar  \
Date                                                                            
2010-01-03      -0.997         -0.531       -1.098          -0.609     -0.685   
2010-01-10      -0.902         -0.531       -1.186          -0.609     -0.429   
2010-01-17      -1.042         -0.588       -1.025          -0.609     -0.576   
2010-01-24      -1.124         -0.977   

Following features read from file :                GC  influenza sintomas  sintomas de la influenza  \
Date                                                              
2010-01-03  2.424               0.200                     0.173   
2010-01-10  2.263               0.082                     0.309   
2010-01-17  2.827               0.276                     0.478   
2010-01-24  1.705               0.046                     0.366   
2010-01-31  2.181               0.165                     0.239   

            sintomas influenza  influenza  h1n1 sintomas  \
Date                                                       
2010-01-03               0.198      0.001          2.338   
2010-01-10               0.121      0.112          1.606   
2010-01-17               0.342      0.114          1.793   
2010-01-24               0.105      0.082          1.744   
2010-01-31               0.023      0.041          1.077   

            sintomas de la influenza h1n1  sintomas de influenza   n1h1  

In [32]:
print(data_object.target)

{'AR':                 AR
2010-01-04   445.0
2010-01-11   434.0
2010-01-18   427.0
2010-01-25   298.0
2010-02-01   415.0
2010-02-08   410.0
2010-02-15   420.0
2010-02-22   430.0
2010-03-01   485.0
2010-03-08   553.0
2010-03-15   709.0
2010-03-22   804.0
2010-03-29   772.0
2010-04-05  1190.0
2010-04-12  1261.0
2010-04-19  1635.0
2010-04-26  1721.0
2010-05-03  2114.0
2010-05-10  2313.0
2010-05-17  2510.0
2010-05-24  2442.0
2010-05-31  3039.0
2010-06-07  3378.0
2010-06-14  3500.0
2010-06-21  2935.0
2010-06-28  2975.0
2010-07-05  2280.0
2010-07-12  2574.0
2010-07-19  2321.0
2010-07-26  2206.0
...            ...
2016-06-06  5431.0
2016-06-13  4806.0
2016-06-20  4862.0
2016-06-27  4578.0
2016-07-04  3355.0
2016-07-11  3657.0
2016-07-18  3099.0
2016-07-25  2609.0
2016-08-01  2326.0
2016-08-08  1910.0
2016-08-15  1748.0
2016-08-22  1592.0
2016-08-29  1602.0
2016-09-05  1286.0
2016-09-12  1262.0
2016-09-19  1201.0
2016-09-26  1108.0
2016-10-03  1016.0
2016-10-10   855.0
2016-10-17   823.0
2016-

In [33]:
print(data_object.features)

{'AR':                GC  mandarinas  mandarina  bisolvon  torta de mandarina  \
Date                                                                     
2010-01-03 -0.868      -0.783     -0.741    -0.992              -0.607   
2010-01-10 -0.879      -0.489     -0.655    -0.476              -0.607   
2010-01-17 -0.885      -0.476     -0.627    -0.992              -0.607   
2010-01-24 -1.007      -0.474     -0.643    -0.378              -0.607   
2010-01-31 -0.897      -0.902     -0.788    -0.523              -0.607   
2010-02-07 -0.901      -0.778     -0.731    -0.992              -0.607   
2010-02-14 -0.892      -0.838     -0.726    -0.461              -0.607   
2010-02-21 -0.883      -0.954     -0.750    -0.619              -0.607   
2010-02-28 -0.831      -0.842     -0.626    -0.992              -0.607   
2010-03-07 -0.767      -0.741     -0.615    -0.108              -0.607   
2010-03-14 -0.620      -0.826     -0.606    -0.277              -0.607   
2010-03-21 -0.531      -0.528  

Finally, we'll be dropping the 'GC' column in the features, since this is actually the input data we used to correlate terms in Google Correlate (See the dbscrape tutorial for more information about this).

In [34]:
for country in country_codes:
    data_object.features[country].drop(['GC'], axis=1, inplace=True)
    

## Different indices
Note that the target and feature data have different indices (Load pops-up a message if this happens). This is a problem that's frequent when using different data sources. It is greatly recommended to set all the indices alike to avoid any confusion from pandas in the following phases of data. In this case, changing the indices is fairly easy because our data sources do not have missing rows (they're both 365 rows) and the dates correspond only to 1 day difference. For other problems, it might not be the case, and it is a good idea to fix this differences prior to using Load.

In this example, we'll use the target's index as our standard and overwrite the features indices using pandas function "set_index".

In [35]:
# Fixing indices
for country in country_codes:
    data_object.features[country].set_index(data_object.target[country].index, inplace=True)

print(data_object.target['MX'])
print(data_object.features['MX'])

                MX
2010-01-04  1225.0
2010-01-11  1168.0
2010-01-18  1368.0
2010-01-25   970.0
2010-02-01  1139.0
2010-02-08  1211.0
2010-02-15  1274.0
2010-02-22  1184.0
2010-03-01  1363.0
2010-03-08  1017.0
2010-03-15   781.0
2010-03-22   573.0
2010-03-29   443.0
2010-04-05   371.0
2010-04-12   427.0
2010-04-19   316.0
2010-04-26   252.0
2010-05-03   192.0
2010-05-10   164.0
2010-05-17   198.0
2010-05-24   201.0
2010-05-31   201.0
2010-06-07   142.0
2010-06-14   142.0
2010-06-21   138.0
2010-06-28   107.0
2010-07-05   105.0
2010-07-12    87.0
2010-07-19   104.0
2010-07-26   110.0
...            ...
2016-06-06   147.0
2016-06-13   174.0
2016-06-20   108.0
2016-06-27   208.0
2016-07-04   233.0
2016-07-11   177.0
2016-07-18   117.0
2016-07-25   106.0
2016-08-01    87.0
2016-08-08   109.0
2016-08-15   123.0
2016-08-22    95.0
2016-08-29   110.0
2016-09-05   119.0
2016-09-12   155.0
2016-09-19   177.0
2016-09-26   185.0
2016-10-03   137.0
2016-10-10   188.0
2016-10-17   181.0
2016-10-24  

### Visualizing input data

After loading the data we'll be using for to fit a predictive model, you may be interested in performing some pre-analysis (or EDA). We can use InputVis library for this purpose.

In [36]:
datavis = InputVis(dataObject=data_object, folder_name='INPUTVIS_TEST', output_dir=None, verbose = True)


Unable to find an specified output directory. Using current working directory (/Users/leonardo/Desktop/flu-code/argotools-pkg) as output.
Successfully generated class folder. All results will be written to: /Users/leonardo/Desktop/flu-code/argotools-pkg/INPUTVIS_TEST


Inputvis generates a separate folder. Given this library purpose is to deal with a big number of locations, the folder structure automatically organizes these visualizations in separate folders for each location (In this case, it would generate a sub-folder for each country) and a folder called _overview, which contains  grouped visualizations.

In [37]:
study_period[0] = '2010-01-04' # We changed the dates to the exact ones because datavis does not accept approximate dates.
study_period[1] = '2016-12-26'
for country_code in country_codes:
    feature_names = list(data_object.features[country_code])
    datavis.plot_features(id_=country_code, mode='save', feature_names=feature_names[0:10], start_period=study_period[0], end_period=study_period[1])
    datavis.similarity_barplot(id_=country_code, mode='save', start_period=study_period[0], end_period=study_period[1], alpha=3)

datavis.target_heatmap_plot(ids='all', to_overview=True, start_period=study_period[0],\
end_period=study_period[1], filename='initial_dataset', ext='png', alpha=1.5)

'plot_features' does a timeseries plot of the features, along with the target (in black) and the Pearson Correlation Coefficient between the feature and the target for the specified period. 'similarity_barplot' serves the same purpose, but lets you select which metric to choose (default is pearson, but you could create your own and input it) and is more adequate when you're dealing with too many features. Finally, "target_heatmap_plot" gives an overall look at the target data from the different locations in the form of a heatmap. This visualization helps you find out how often is data missing and also gives you a slight intuition of the seasonality of each region
![features.png](attachment:features.png)
![features_barplot.png](attachment:features_barplot.png)
![initial_dataset.png](attachment:initial_dataset.png)

There are other functions within inputvis, but they follow the same intuition (the library is still in development, so please send an e-mail to clemclem1991@gmail.com if you'd like to contribute / report any bugs).

## Fitting a linear regression using  argotools

After you've looked at your data and performed the neccessary pre-processing steps, it is time to fit a model. We use the experiment library to do this.

In [41]:
mod = LassoCV(cv=10, fit_intercept=True, n_alphas=1000, max_iter=20000, tol=.001, normalize=True,\
             positive=True)

model_dict = {
    'AR': [lasso_family, preproc_rmv_values, None, None, False, mod],
    'ARGO': [lasso_family, preproc_rmv_values, None, None, False, mod],
    'ARGO_filtered': [lasso_family, preproc_rmv_values, None, .70, False, mod],
}

#Loading per_state params

argo_tester = ARGO_lrmse(data=data_object, model_dict=model_dict, output_name=None, training_window= 'static', \
            training_window_size=104, horizon=1, feature_preprocessing='zscore',\
            ar_model=52, load_folder = None, ar_column_filtering=True, out_of_sample_rmse=False)
argo_tester.run(period_start='2013-01-07', period_end='2016-12-26', verbose=False, cont=False)

Succesfully initialized experiment object.
Entering main computation loop. 

Preparing data and performing a forecasting loop for AR
Starting predictor AR with model_funct=<function lasso_family at 0x124a00d08> and model_preproc=<function preproc_rmv_values at 0x123901730>
Starting predictor ARGO with model_funct=<function lasso_family at 0x124a00d08> and model_preproc=<function preproc_rmv_values at 0x123901730>
Starting predictor ARGO_filtered with model_funct=<function lasso_family at 0x124a00d08> and model_preproc=<function preproc_rmv_values at 0x123901730>
AR [[ 523.39968551]
 [ 539.53030266]
 [ 555.74436953]
 [ 508.70181787]
 [ 477.2787148 ]
 [ 573.9243585 ]
 [ 458.08698589]
 [ 578.36025904]
 [ 636.47643929]
 [ 712.70973018]
 [ 771.78989128]
 [ 850.57704087]
 [ 583.81452727]
 [ 671.19279181]
 [ 805.11192201]
 [ 944.45965969]
 [1044.11064877]
 [1186.90155152]
 [1451.54135625]
 [1746.40047651]
 [2190.85547377]
 [2634.25186185]
 [3263.0223185 ]
 [3502.78143029]
 [3441.92695323]
 [4

  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den
  r = r_num / r_den



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.
AR [[  78.30209702]
 [ 130.15751557]
 [ 112.97660822]
 [ 134.9770012 ]
 [ 125.36353181]
 [ 125.30812903]
 [  90.78488794]
 [  97.29195297]
 [ 115.78762854]
 [ 122.72080771]
 [ 191.51647133]
 [ 174.4424325 ]
 [ 148.73342546]
 [ 240.71431592]
 [ 273.63136673]
 [ 348.74385022]
 [ 506.40534135]
 [ 638.26864567]
 [ 790.27072547]
 [ 945.9983166 ]
 [1300.18692681]
 [1373.59615895]
 [1436.25618199]
 [1797.23791596]
 [1435.11440191]
 [1027.50909536]
 [1217.12226475]
 [1282.50155992]
 [1290.42527085]
 [ 787.5036809 ]
 [ 610.47919314]
 [ 497.53452922]
 [ 412.47477141]
 [ 423.49415645]
 [ 374.57108369]
 [ 302.43489275]
 [ 261.19300937]
 [ 223.20136016]
 [ 175.7068205 ]
 [ 175.6739003 ]
 [ 203.9949423 ]
 [ 177.77714695]
 [ 166.77852061]
 [ 210.69953966]
 [ 158.39097711]
 [ 117.25685648]
 [ 201.97921511]
 [ 121.90630359]
 [ 143.22100088]
 [ 107.79357212]
 [ 142.307819  ]
 [  91.75824218]
 [ 111.7033978 ]
 [ 163


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.
AR [[ 348.18747403]
 [ 431.04462458]
 [ 598.69110016]
 [ 658.80553733]
 [ 599.28636422]
 [ 596.71168946]
 [ 572.94587722]
 [ 461.29566532]
 [ 471.55659247]
 [ 452.42324025]
 [ 360.26430527]
 [ 287.48095793]
 [ 210.457585  ]
 [ 215.05781439]
 [ 239.83350475]
 [ 213.45722749]
 [ 186.31120539]
 [ 200.96551439]
 [ 199.30593986]
 [ 183.87123365]
 [ 196.45738295]
 [ 187.76677191]
 [ 228.95322101]
 [ 179.41123442]
 [ 189.85917312]
 [ 169.66820865]
 [ 218.13705285]
 [ 193.67308146]
 [ 227.23828279]
 [ 210.17279568]
 [ 205.77030603]
 [ 235.29144965]
 [ 233.18173682]
 [ 237.9085364 ]
 [ 241.15877471]
 [ 298.06236779]
 [ 279.87127284]
 [ 355.2596114 ]
 [ 366.41783509]
 [ 363.26570102]
 [ 336.51062893]
 [ 355.51181145]
 [ 347.77074113]
 [ 350.50689996]
 [ 381.39429224]
 [ 313.72730988]
 [ 438.12548835]
 [ 437.6227447 ]
 [ 477.17982931]
 [ 551.759402  ]
 [ 631.54416023]
 [ 684.40348048]
 [ 899.32662167]
 [2134.46421353]
 [1554.39390159]
 [

the experiment library fits a multivariate linear model using the ARGO methodology. In timeseries prediction, data becomes available with time (every week, in this case), therefore, it is useful to update and recalibrate your prediction model everytime a new prediction week is coming. This library helps with the recalibration process for every model, in every location. After finishing with the fitting / prediction process, it writes out the model predictions into a csv file in the folder structure it created. the library also has the advantage that it keeps track of the model coefficients in this recalibration process, giving you the possibility of analyzing how the features impact change within time.
## Visualizing the output

Lets take a look at the data that we have created through the ARGO_lrmse class. 

In [44]:
results_df = pd.read_csv('/Users/leonardo/Desktop/flu-code/argotools-pkg/ARGO_experiment/AR/preds.csv', index_col=0)
print(results_df)

               ILI           AR         ARGO  ARGO_filtered
2013-01-07   490.0   523.399686   557.189137     557.189137
2013-01-14   526.0   539.530303   627.035354     627.035354
2013-01-21   495.0   555.744370   547.764090     547.764090
2013-01-28   460.0   508.701818   488.074090     488.074090
2013-02-04   608.0   477.278715   477.634215     477.634215
2013-02-11   439.0   573.924358   530.988124     530.988124
2013-02-18   563.0   458.086986   439.061478     439.061478
2013-02-25   565.0   578.360259   517.663583     517.663583
2013-03-04   650.0   636.476439   577.740583     577.740583
2013-03-11   747.0   712.709730   735.994293     735.994293
2013-03-18   834.0   771.789891   879.507739     879.507739
2013-03-25   563.0   850.577041   880.438866     880.438866
2013-04-01   556.0   583.814527   777.867286     777.867286
2013-04-08   762.0   671.192792   886.895382     886.895382
2013-04-15   848.0   805.111922   947.816776     947.816776
2013-04-22   985.0   944.459660  1081.59

The csv file contains the predictions for each model and for each week. There are several things we can do with this results. We'll Output vis to benchmark and visualize them.

## Computing metrics from the results

We'll start fist with generating the metrics. We initialize the class object by inputting the results folder name (since we didn't we give it any particular name, the experiment object output a generic name like "ARGO_experiment") and the location identifiers.

To perform the metric computation, we use the "group_compute_metrics" function, which loads every location results and computes the metrics specified in the "which_metrics" variable. There are several metrics already available within the class, and it is fairly use to generate your own metric to work with this function.

To compute your metrics you only need the following:

1.- The intervals where to compute the metrics. the intervals are input for each ID in terms of a dictionary. Each key in the dictionary is an ID and it contains a list of tuples indicating the intervals: [('YYYY-MM-DD', 'YYYY-MM-DD')...]

2.- The name of the intervals, in the same format of a dict.

3.- The metrics you want to use (they should be available within the class)

In [53]:
results_visualizer = OutputVis('ARGO_experiment', ids=country_codes)


# To compute metrics, we need to set the intervals where to compute the metrics, here we do it yearly and as whole period
start_interval = ['2013-01-06', '2014-01-05', '2015-01-04', '2016-01-03', '2013-01-06'] #'2013-01-05'
end_interval = ['2013-12-29', '2014-12-28', '2015-12-28', '2016-12-25', '2016-12-25']
period_labels = ['Y2013', 'Y2014', 'Y2015', 'Y2016', 'ALL_YEARS']

i = list(zip(start_interval, end_interval))
intervals = dict( zip(country_codes,[i]*len(country_codes)) )
interval_labels = dict( zip(country_codes, [period_labels]*len(country_codes)))

results_visualizer.group_compute_metrics(intervals, interval_labels, which_metrics=['PEARSON', 'RMSE', 'NRMSE'], write_to_overview=True)

metrics_example = pd.read_csv('/Users/leonardo/Desktop/flu-code/argotools-pkg/ARGO_experiment/AR/metrics.csv')
print('This are the metrics for Argentina: \n',metrics_example)

Visualizer initialized
Finished iterating over all ids. Writing out condensed file in _overview folder
This are the metrics for Argentina: 
     METRIC          MODEL       Y2013       Y2014       Y2015       Y2016  \
0  PEARSON             AR    0.970698    0.980056    0.982453    0.967739   
1  PEARSON           ARGO    0.976914    0.963389    0.981372    0.977771   
2  PEARSON  ARGO_filtered    0.976914    0.963348    0.981365    0.974820   
3     RMSE             AR  371.154312  230.491084  179.126742  383.057850   
4     RMSE           ARGO  350.300295  288.448467  180.230583  320.286254   
5     RMSE  ARGO_filtered  350.300295  288.429611  180.423363  336.135149   
6    NRMSE             AR    0.024609    0.022105    0.016846    0.024566   
7    NRMSE           ARGO    0.023227    0.027664    0.016949    0.020540   
8    NRMSE  ARGO_filtered    0.023227    0.027662    0.016968    0.021557   

    ALL_YEARS  
0    0.969721  
1    0.973389  
2    0.972297  
3  302.286541  
4  290.2

After performing the metrics, we'll do a series of visualizations for the models. First we setup a set of style dictionaries, we'll give each model a color and a transparency to keep all the plots with the same style. After that, we call out a series of functions that will do the work for us.

In [61]:
color_dict = {
    'AR':'blue',
    'ARGO':'r',
    'ARGO_filtered':'blueviolet',
    'ILI':'black'
}

alpha_dict = {
    'AR': .8,
    'ARGO':.8,
    'ARGO_filtered':.8,
    'ILI':.8
}


mods = ['AR', 'ARGO', 'ARGO_filtered']
results_visualizer.group_barplot_metric(ids=country_codes, metric='NRMSE', period='ALL_YEARS',\
                             models=mods, color_dict=color_dict,\
                             alpha_dict=alpha_dict, metric_filename='metrics.csv',\
                             bar_separation_multiplier=1.5, mode='save', output_filename='NRMSE_ALLYEARS', ext='png')

results_visualizer.season_analysis(country_codes, ['Y2014', 'Y2015', 'Y2016'], mods, main_folder=None, metrics=['PEARSON', 'NRMSE'], filename='metrics_condensed.csv', output_filename='season_analysis',\
 color_dict=None, alpha_dict=None, mode='save', ext='png')

for country_code in country_codes:
    results_visualizer.plot_series(id_=country_code,series_names=['AR','ARGO', 'ARGO_filtered','ILI'], color_dict=color_dict, alpha_dict=alpha_dict,\
                        add_weekly_winner=True, winner_models=['AR', 'ARGO'], mode='save')




"group_barplot_metric" produces a horizontal barplot that lets you compare the performance of the models you fit based on a given metric (In this example, we use the 'NRMSE' metric, which is an RMSE scaled by the target's euclidean norm).

'season_analysis' provides two visualizations: the first provides a violin plot and box plot of the metrics. The distribution show by the violin plot gives us an idea of where is the most data concentrated in the range spanned by the box plot. The second visualization is a heatmap that contains the numer of times each model gets ranked in the first, second, ... nth place for each interval of time you ask the function look at (For example, if you look at the yearly performance (2013, 2014, 2015, 2016) for these three models (ARGO), you'll have 4 first places, 4 second places and 4 third places). The models which have the "best" performance would more often have the first and second places, thus having a stronger shade of red on the upper squares.


'plot_series' is a function that works individually for each location of study (Here we are just showing 1). the function just makes a quick plot of the model and the predictions based on the color scheme and transparency scheme we provided through our dictionaries. The model also gives some extra information below the timeseries plot. The rectangular heatmap tells you which model (identified by the colorbar) had the least regular error. For example, we can see that, in Mexico's 2015 flu unusual outbreak, the autoreggressive model dominated by having less error for almost the whole outbreak.

![NRMSE_ALLYEARS.png](attachment:NRMSE_ALLYEARS.png)

![season_analysis.png](attachment:season_analysis.png)

![series.png](attachment:series.png)

We have performed basic EDA, pre-processing, prototyping and benchmarking by only writing some lines of code. Moreover, our data has an organized structure and is easily compatible with other libraries through the CSV files. Hopefully you'll find some value in this library. 