# Modelos de regressão em séries temporais - Covid-19

Francisco Rosa Dias de Miranda

PREDICT-ICMC

Universidade de São Paulo

Módulo experimental! Consulte a documentação em https://pycaret.readthedocs.io/en/time_series/api/time_series.html

In [1]:
import pandas as pd
from pycaret.time_series import *

## Leitura dos Dados

Vamos utilizar nesse exemplo com os novos casos no estado de São Paulo do dia 25/02/2020 até 07/08/2021.

In [2]:
dados = pd.read_csv("https://github.com/predict-icmc/covid19/raw/master/leitura-dados/sp_ts_.csv").head(500)
dados.tail()

Unnamed: 0,date,newCases
495,2021-07-04,6451
496,2021-07-05,4231
497,2021-07-06,19132
498,2021-07-07,14889
499,2021-07-08,14453


## Data preparation

Transformando datas em indices

In [3]:
# create a pd.series with data
ts = pd.Series(dados['newCases'])
ts.index = pd.to_datetime(dados['date'])
ts.head()


date
2020-02-25    1
2020-02-26    0
2020-02-27    0
2020-02-28    1
2020-02-29    0
Name: newCases, dtype: int64

## Data splitting

Vamos dividir nossos dados em treino e teste, este último contendo 80% da amostra. Esse passo não é necessário, mas faremos para testar nosso classificador.

In [4]:
# train test with 480 first rows
train = ts.head(480)
test = ts.tail(20)

## Setup

In [5]:
ts_reg = setup(data=train, session_id=42,
                seasonal_period = 7,
                use_gpu=True,
                fh = 20 # prediction horizon)

Unnamed: 0,Description,Value
0,session_id,42
1,Original Data,"(480, 1)"
2,Missing Values,False
3,Transformed Train Set,"(460,)"
4,Transformed Test Set,"(20,)"
5,Fold Generator,ExpandingWindowSplitter
6,Fold Number,3
7,Enforce Prediction Interval,False
8,Seasonal Period Tested,7
9,Seasonality Detected,True


## Comparando os modelos

In [6]:
# show available models
models()

Unnamed: 0_level_0,Name,Reference,Turbo
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
naive,Naive Forecaster,sktime.forecasting.naive.NaiveForecaster,True
grand_means,Grand Means Forecaster,sktime.forecasting.naive.NaiveForecaster,True
snaive,Seasonal Naive Forecaster,sktime.forecasting.naive.NaiveForecaster,True
polytrend,Polynomial Trend Forecaster,sktime.forecasting.trend.PolynomialTrendForeca...,True
arima,ARIMA,sktime.forecasting.arima.ARIMA,True
auto_arima,Auto ARIMA,sktime.forecasting.arima.AutoARIMA,True
exp_smooth,Exponential Smoothing,sktime.forecasting.exp_smoothing.ExponentialSm...,True
ets,ETS,sktime.forecasting.ets.AutoETS,True
theta,Theta Forecaster,sktime.forecasting.theta.ThetaForecaster,True
tbats,TBATS,sktime.forecasting.tbats.TBATS,False


In [7]:

compare_models()


Unnamed: 0,Model,MAE,RMSE,MAPE,SMAPE,R2,TT (Sec)
snaive,Seasonal Naive Forecaster,2431.5,3555.793,0.3315,0.1961,0.6121,0.02
arima,ARIMA,2617.6108,3738.2558,0.3589,0.2088,0.569,0.0733
et_cds_dt,Extra Trees w/ Cond. Deseasonalize & Detrending,2506.9055,3508.8681,0.3756,0.2221,0.6276,2.4067
rf_cds_dt,Random Forest w/ Cond. Deseasonalize & Detrending,2518.6676,3489.0982,0.3807,0.2274,0.6278,2.4733
catboost_cds_dt,CatBoost Regressor w/ Cond. Deseasonalize & Detrending,2859.7548,3988.4945,0.3977,0.2324,0.4785,2.6967
knn_cds_dt,K Neighbors w/ Cond. Deseasonalize & Detrending,2592.892,3489.5984,0.3781,0.2348,0.6318,2.97
gbr_cds_dt,Gradient Boosting w/ Cond. Deseasonalize & Detrending,2680.6796,3766.7958,0.4146,0.236,0.566,0.2167
lightgbm_cds_dt,Light Gradient Boosting w/ Cond. Deseasonalize & Detrending,3104.9906,4155.1763,0.4054,0.2454,0.3813,1.0467
huber_cds_dt,Huber w/ Cond. Deseasonalize & Detrending,2920.8983,3909.5526,0.406,0.2458,0.5255,0.1167
exp_smooth,Exponential Smoothing,2717.8194,3783.0055,0.4142,0.249,0.5656,0.3533


INFO:logs:master_model_container: 26
INFO:logs:display_container: 2
INFO:logs:NaiveForecaster(sp=7, strategy='last', window_length=None)
INFO:logs:compare_models() successfully completed......................................


NaiveForecaster(sp=7, strategy='last', window_length=None)

## Criando modelo: regressão logística

In [9]:
aa = create_model('arima')

Unnamed: 0,cutoff,MAE,RMSE,MAPE,SMAPE,R2
0,2021-03-30,2912.6734,3911.3656,0.2692,0.213,0.6933
1,2021-04-19,2884.5505,3829.2469,0.2735,0.2157,0.4294
2,2021-05-09,2055.6085,3474.1549,0.5341,0.1978,0.5845
Mean,,2617.6108,3738.2558,0.3589,0.2088,0.569
SD,,397.5614,189.7328,0.1239,0.0079,0.1083


INFO:logs:master_model_container: 27
INFO:logs:display_container: 3
INFO:logs:ARIMA(maxiter=50, method='lbfgs', order=(1, 0, 0), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(0, 1, 0, 7),
      with_intercept=True)
INFO:logs:create_model() successfully completed......................................


## Tunando o modelo ARIMA

In [10]:
tuned_aa = tune_model(aa)
print(tuned_aa)

Unnamed: 0,cutoff,MAE,RMSE,MAPE,SMAPE,R2
0,2021-03-30,2411.018,3644.3171,0.2096,0.1698,0.7337
1,2021-04-19,3070.063,3996.0035,0.2872,0.2258,0.3786
2,2021-05-09,1992.0907,3406.9758,0.5318,0.2054,0.6004
Mean,,2491.0572,3682.4321,0.3429,0.2003,0.5709
SD,,443.7047,241.9751,0.1373,0.0231,0.1465


INFO:logs:master_model_container: 30
INFO:logs:display_container: 4
INFO:logs:ARIMA(maxiter=50, method='lbfgs', order=(0, 0, 0), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(1, 1, 0, 7),
      with_intercept=False)
INFO:logs:tune_model() succesfully completed......................................


ARIMA(maxiter=50, method='lbfgs', order=(0, 0, 0), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(1, 1, 0, 7),
      with_intercept=False)


## Avaliando ao modelo

In [11]:
plot_model(tuned_aa)

INFO:logs:Visual Rendered Successfully


## Previsões dentro da amostra de treino

In [None]:
predict_model(tuned_aa, 20)

2021-06-18    14361.5282
2021-06-19    16839.9486
2021-06-20     9696.7732
2021-06-21     6925.4692
2021-06-22    16241.8640
2021-06-23    15325.6852
2021-06-24    18493.6299
2021-06-25    16122.1184
2021-06-26    17016.3496
2021-06-27    10117.2370
2021-06-28     7149.0019
2021-06-29    19833.6637
2021-06-30    20105.1223
2021-07-01    17189.8843
2021-07-02    15205.2602
2021-07-03    17183.3809
2021-07-04    10203.3394
2021-07-05     7495.3256
2021-07-06    16598.9552
2021-07-07    15703.6999
Freq: D, Name: newCases, dtype: float64

## Finalizando o modelo

In [12]:
final_tuned_aa = finalize_model(tuned_aa)
print(final_tuned_aa)

INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(display=None, model_only=True, groups=None, fit_kwargs=None, estimator=ARIMA(maxiter=50, method='lbfgs', order=(0, 0, 0), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(1, 1, 0, 7),
      with_intercept=False), self=<pycaret.internal.pycaret_experiment.time_series_experiment.TimeSeriesExperiment object at 0x0000000062371358>)
INFO:logs:Finalizing ARIMA(maxiter=50, method='lbfgs', order=(0, 0, 0), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(1, 1, 0, 7),
      with_intercept=False)
INFO:logs:Initializing create_model()
INFO:logs:create_model(kwargs={}, display=None, metrics=None, add_to_model_list=False, system=False, verbose=False, refit=True, groups=None, fit_kwargs={}, predict=True, cross_validation=True, round=4, fold=None, estimator=ARIMA(maxiter=50, method='lbfgs', order=(0, 0, 0), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_or

ARIMA(maxiter=50, method='lbfgs', order=(0, 0, 0), out_of_sample_size=0,
      scoring='mse', scoring_args=None, seasonal_order=(1, 1, 0, 7),
      with_intercept=False)


## Previsões na amostra de teste

In [29]:
unseen_predictions = predict_model(final_tuned_aa, 20)

unseen_predictions.tail()

2021-07-04     9483.2846
2021-07-05     5236.7265
2021-07-06    18550.7526
2021-07-07    16926.2165
2021-07-08    17507.9150
Freq: D, Name: newCases, dtype: float64

## Gráficos:



- ‘ts’ - Time Series Plot
- ‘train_test_split’ - Train Test Split
- ‘cv’ - Cross Validation
- ‘acf’ - Auto Correlation (ACF)
- ‘pacf’ - Partial Auto Correlation (PACF)
- ‘decomp_classical’ - Decomposition Classical
- ‘decomp_stl’ - Decomposition STL
- ‘diagnostics’ - Diagnostics Plot
- ‘forecast’ - “Out-of-Sample” Forecast Plot
- ‘insample’ - “In-Sample” Forecast Plot
- ‘residuals’ - Residuals Plot


In [31]:
# ts plot
plot_model(final_tuned_aa)



INFO:logs:Visual Rendered Successfully


In [33]:
# train test plot
plot_model(final_tuned_aa, "train_test_split")

INFO:logs:Visual Rendered Successfully


In [34]:
plot_model(final_tuned_aa, "cv")

INFO:logs:Visual Rendered Successfully


In [35]:
plot_model(final_tuned_aa, "acf")

INFO:logs:Visual Rendered Successfully


In [36]:
plot_model(final_tuned_aa, "pacf")

INFO:logs:Visual Rendered Successfully


In [40]:
plot_model(final_tuned_aa, "decomp_classical")

INFO:logs:Visual Rendered Successfully


In [39]:
plot_model(final_tuned_aa, "decomp_stl")

INFO:logs:Visual Rendered Successfully


In [38]:
plot_model(final_tuned_aa, "decomp_classical")

INFO:logs:Visual Rendered Successfully


In [37]:
plot_model(final_tuned_aa, "diagnostics")

INFO:logs:Visual Rendered Successfully


In [41]:
plot_model(final_tuned_aa, "forecast")



INFO:logs:Visual Rendered Successfully


In [42]:
plot_model(final_tuned_aa, "insample")

INFO:logs:Visual Rendered Successfully


In [43]:
plot_model(final_tuned_aa, "residuals")

INFO:logs:Visual Rendered Successfully


## Referências


- **pycaret.org**. PyCaret, April 2020. URL https://pycaret.org/about. PyCaret version 1.0.0.
