# celverapi

> Library that allows the user to forecast multipletime Timeseries, parallel, using multiprocessing 

## Install

`pip install celverapi`

## How to use

In this example we are going to use a DataFrame from 4 different grocery stores. 
The idea is to Forecast various models, from every single store, simultaneously 

In [None]:
from celverapi.core import *
from celverapi.main import *
import pandas as pd

In [None]:
df = pd.read_parquet('Data/example.parquet')
df.head(10)

Unnamed: 0,date,store_nbr,family,sales,onpromotion
0,2013-01-01,6,GROCERY I,0.0,0.0
1,2013-01-01,7,GROCERY I,0.0,0.0
2,2013-01-01,8,GROCERY I,0.0,0.0
3,2013-01-01,9,GROCERY I,0.0,0.0
4,2013-01-02,6,GROCERY I,5535.0,0.0
5,2013-01-02,7,GROCERY I,4172.0,0.0
6,2013-01-02,8,GROCERY I,5277.0,0.0
7,2013-01-02,9,GROCERY I,7718.0,0.0
8,2013-01-03,6,GROCERY I,4040.0,0.0
9,2013-01-03,7,GROCERY I,3279.0,0.0


Using the function unique_dataframe_values(), we are going to create a list of DataFrames from every unique ID value in the Main DataFrame. 
In this case, we need a DataFrame from every Store.

In [None]:
list_of_DataFrames = unique_dataframe_values('store_nbr')
list_of_DataFrames

[            date  store_nbr     family   sales  onpromotion
 0     2013-01-01          6  GROCERY I     0.0          0.0
 4     2013-01-02          6  GROCERY I  5535.0          0.0
 8     2013-01-03          6  GROCERY I  4040.0          0.0
 12    2013-01-04          6  GROCERY I  3314.0          0.0
 16    2013-01-05          6  GROCERY I  4857.0          0.0
 ...          ...        ...        ...     ...          ...
 6716  2017-08-11          6  GROCERY I  4466.0        776.0
 6720  2017-08-12          6  GROCERY I  4027.0        770.0
 6724  2017-08-13          6  GROCERY I  5481.0        817.0
 6728  2017-08-14          6  GROCERY I  4142.0        773.0
 6732  2017-08-15          6  GROCERY I  4334.0        779.0
 
 [1684 rows x 5 columns],
             date  store_nbr     family   sales  onpromotion
 1     2013-01-01          7  GROCERY I     0.0          0.0
 5     2013-01-02          7  GROCERY I  4172.0          0.0
 9     2013-01-03          7  GROCERY I  3279.0          

Once we have a list with the DataFrames that we want to work with, we are going to proceed splitting them into a Train and Test DataFrame.
In this case the Train DF is going to be from the Start Date until the 31-07-2017.

In [None]:
list_of_trained_df, list_of_tested = split_into_train_test_dataframe(list_of_DataFrames, "date", "2017-08-01")
list_of_trained_df

[            date  store_nbr     family   sales  onpromotion
 0     2013-01-01          6  GROCERY I     0.0          0.0
 4     2013-01-02          6  GROCERY I  5535.0          0.0
 8     2013-01-03          6  GROCERY I  4040.0          0.0
 12    2013-01-04          6  GROCERY I  3314.0          0.0
 16    2013-01-05          6  GROCERY I  4857.0          0.0
 ...          ...        ...        ...     ...          ...
 6656  2017-07-27          6  GROCERY I  3820.0        765.0
 6660  2017-07-28          6  GROCERY I  4602.0        807.0
 6664  2017-07-29          6  GROCERY I  6050.0        817.0
 6668  2017-07-30          6  GROCERY I  6664.0        840.0
 6672  2017-07-31          6  GROCERY I  5237.0        806.0
 
 [1669 rows x 5 columns],
             date  store_nbr     family   sales  onpromotion
 1     2013-01-01          7  GROCERY I     0.0          0.0
 5     2013-01-02          7  GROCERY I  4172.0          0.0
 9     2013-01-03          7  GROCERY I  3279.0          

After that, we will convert this DataFrames into TimeSeries objects and save them in their respective lists. 

In [None]:
trained_timeseries, valed_timeseries = converting_dataframes_into_objects('date', 'store_nbr', 'sales')
trained_timeseries

[          date  store_nbr     family   sales  onpromotion
 0   2013-01-01          6  GROCERY I     0.0          0.0
 4   2013-01-02          6  GROCERY I  5535.0          0.0
 8   2013-01-03          6  GROCERY I  4040.0          0.0
 12  2013-01-04          6  GROCERY I  3314.0          0.0
 16  2013-01-05          6  GROCERY I  4857.0          0.0,
           date  store_nbr     family   sales  onpromotion
 1   2013-01-01          7  GROCERY I     0.0          0.0
 5   2013-01-02          7  GROCERY I  4172.0          0.0
 9   2013-01-03          7  GROCERY I  3279.0          0.0
 13  2013-01-04          7  GROCERY I  2681.0          0.0
 17  2013-01-05          7  GROCERY I  2662.0          0.0,
           date  store_nbr     family   sales  onpromotion
 2   2013-01-01          8  GROCERY I     0.0          0.0
 6   2013-01-02          8  GROCERY I  5277.0          0.0
 10  2013-01-03          8  GROCERY I  3783.0          0.0
 14  2013-01-04          8  GROCERY I  3481.0         

The last part of the preparation, is to create a dictionary that maps the ID from the TimeSeries object with the ForecastingTask Object. 


In [None]:
dict_of_forecasting_tasks = creating_forecasting_task_dict(trained_timeseries, valed_timeseries, forecast_horizon=15)
dict_of_forecasting_tasks

{6: <celverapi.core.ForecastingTask at 0x148d64d8400>,
 7: <celverapi.core.ForecastingTask at 0x148d64d8460>,
 8: <celverapi.core.ForecastingTask at 0x148d64d8880>,
 9: <celverapi.core.ForecastingTask at 0x148d64d8790>}

To Forecast this TimeSeries, we need to call the forecast() function, where we need to pass a list of the models that we want to use and the Forecasting Task dictionary.


In [None]:
if __name__ == "__main__":
    final_result = forecast(['ets', 'naive', 'trend'], dict_of_forecasting_tasks)

final_result

Unnamed: 0,Qty,ID,Date,Model
8344,5234.366950,6,2017-08-01,ETS-Model
8348,5234.366950,6,2017-08-02,ETS-Model
8352,5234.366950,6,2017-08-03,ETS-Model
8356,5234.366950,6,2017-08-04,ETS-Model
8360,5234.366950,6,2017-08-05,ETS-Model
...,...,...,...,...
8387,17776.977872,9,2017-08-11,Trend Model
8391,17783.141263,9,2017-08-12,Trend Model
8395,17789.304655,9,2017-08-13,Trend Model
8399,17795.468047,9,2017-08-14,Trend Model
