# Previsão do IBOV utilizando Prophet.

### Por que utilizar o Prophet?

1. **Simplicidade e Facilidade de Uso:**
   - O Prophet foi desenvolvido para ser acessível e fácil de usar, mesmo para aqueles que não são especialistas em estatística ou aprendizado de máquina.

2. **Flexibilidade com Sazonalidades:**
   - O Prophet lida muito bem com dados de séries temporais que apresentam padrões de sazonalidade complexos e múltiplos, como sazonalidades anuais, semanais e diárias. Além disso, ele pode acomodar feriados e eventos especiais, o que pode ser particularmente útil para dados de mercado.

3. **Robustez a Dados Faltantes e Mudanças na Tendência:**
   - O modelo é robusto a dados faltantes e mudanças na tendência, o que o torna adequado para conjuntos de dados que podem não ser perfeitamente consistentes ou completos.

4. **Desempenho e Precisão:**
   - Embora modelos como LSTM, GRU ou DNNs possam oferecer maior precisão em algumas situações, o Prophet frequentemente fornece um bom equilíbrio entre precisão e complexidade. Modelos mais complexos como redes neurais exigem uma grande quantidade de dados e poder computacional, além de serem mais sensíveis a overfitting.

5. **Interpretabilidade dos Resultados:**
   - O Prophet fornece componentes modelados (tendência, sazonalidade, feriados) de forma clara, tornando os resultados mais interpretáveis. Em contraste, modelos como redes neurais são frequentemente considerados "caixas-pretas", onde a interpretação dos resultados pode ser desafiadora.

6. **Rapidez no Desenvolvimento e Testes:**
   - Implementar e testar o Prophet geralmente leva menos tempo do que construir e ajustar modelos de redes neurais.

7. **Menor Necessidade de Ajustes Finos:**
   - Enquanto modelos como LSTM e DNNs podem requerer um ajuste fino extenso dos hiperparâmetros, o Prophet tem menos parâmetros para ajustar, o que facilita o processo de modelagem.

8. **Feriados:**
   - Por fim, escolhi o Prophet por que ele tem a facilidade de considerar os feriádos que interferem no IBOV, como feriados americanos, nacionais e do estado de SP.

### Carregando os dados: 

In [50]:
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime
from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly
from prophet.diagnostics import cross_validation
from prophet.diagnostics import performance_metrics
from sklearn.metrics import mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

end_data = datetime.today().strftime('%Y-%m-%d')
df = yf.download("^BVSP", start="2021-01-01", end=end_data, progress=False)
df.reset_index(inplace=True)
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2021-01-04,119024.0,120354.0,118062.0,118558.0,118558.0,8741400
1,2021-01-05,118835.0,119790.0,116756.0,119223.0,119223.0,9257100
2,2021-01-06,119377.0,120924.0,118917.0,119851.0,119851.0,11638200
3,2021-01-07,119103.0,121983.0,119101.0,121956.0,121956.0,11774800
4,2021-01-08,122387.0,125324.0,122386.0,125077.0,125077.0,11085800
...,...,...,...,...,...,...,...
757,2024-01-18,128524.0,129047.0,127316.0,127316.0,127316.0,12460800
758,2024-01-19,127319.0,127820.0,126533.0,127636.0,127636.0,11956900
759,2024-01-22,127636.0,127843.0,125876.0,126602.0,126602.0,9509100
760,2024-01-23,126612.0,128331.0,126612.0,128263.0,128263.0,9366100


### Preparando para trabalhar com o Prophet:

In [17]:
df = df[['Date', 'Close']]
df.rename(columns={'Date':'ds','Close':'y'},inplace=True)
df['ds'] = pd.to_datetime(df['ds'], format='%d.%m.%Y')
df.head()

Unnamed: 0,ds,y
0,2021-01-04,118558.0
1,2021-01-05,119223.0
2,2021-01-06,119851.0
3,2021-01-07,121956.0
4,2021-01-08,125077.0


In [18]:
df.count()

ds    762
y     762
dtype: int64

### Inserindo os feriados importantes:

In [19]:
import holidays

years = list(range(2021, 2026))

us_holidays = holidays.country_holidays('US', years=years)
nyse_holidays = holidays.financial_holidays('NYSE', years=years)

br_holidays = holidays.country_holidays('BR', years=years)

sp_holidays = holidays.Brazil(state='SP', years=years)

us_holidays_df = pd.DataFrame(list(us_holidays.items()), columns=['ds', 'holiday'])
nyse_holidays_df = pd.DataFrame(list(nyse_holidays.items()), columns=['ds', 'holiday'])
br_holidays_df = pd.DataFrame(list(br_holidays.items()), columns=['ds', 'holiday'])
sp_holidays_df = pd.DataFrame(list(sp_holidays.items()), columns=['ds', 'holiday'])

total_holidays = pd.concat([us_holidays_df, nyse_holidays_df, br_holidays_df, sp_holidays_df]).drop_duplicates().reset_index(drop=True)
total_holidays['ds'] = pd.to_datetime(total_holidays['ds'])

total_holidays.count()

ds         123
holiday    123
dtype: int64

### Separando os dados em treino e teste:

In [20]:
train_data = df.sample(frac=0.8, random_state=0)
test_data = df.drop(train_data.index)
print(f'training data size : {train_data.shape}')
print(f'testing data size : {test_data.shape}')

training data size : (610, 2)
testing data size : (152, 2)


### Treinando o Modelo:

In [33]:
m = Prophet(holidays=total_holidays)
m.fit(train_data)
future = m.make_future_dataframe(len(test_data))
forecast = m.predict(future)
forecast.head()

20:17:01 - cmdstanpy - INFO - Chain [1] start processing
20:17:01 - cmdstanpy - INFO - Chain [1] done processing


Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,Christmas Day,Christmas Day_lower,Christmas Day_upper,Christmas Day (observed),...,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
0,2021-01-04,119799.143814,114875.17127,121943.511806,119799.143814,119799.143814,0.0,0.0,0.0,0.0,...,423.741778,423.741778,423.741778,-1857.763823,-1857.763823,-1857.763823,0.0,0.0,0.0,118365.121769
1,2021-01-05,119753.475813,115305.560687,121959.674241,119753.475813,119753.475813,0.0,0.0,0.0,0.0,...,610.528633,610.528633,610.528633,-1880.816061,-1880.816061,-1880.816061,0.0,0.0,0.0,118483.188385
2,2021-01-06,119707.807812,114712.120774,122081.179713,119707.807812,119707.807812,0.0,0.0,0.0,0.0,...,480.509386,480.509386,480.509386,-1902.804585,-1902.804585,-1902.804585,0.0,0.0,0.0,118285.512613
3,2021-01-07,119662.139811,114620.531441,121631.854075,119662.139811,119662.139811,0.0,0.0,0.0,0.0,...,467.857853,467.857853,467.857853,-1921.661929,-1921.661929,-1921.661929,0.0,0.0,0.0,118208.335735
4,2021-01-08,119616.47181,114885.850415,121838.086897,119616.47181,119616.47181,0.0,0.0,0.0,0.0,...,763.926789,763.926789,763.926789,-1935.336102,-1935.336102,-1935.336102,0.0,0.0,0.0,118445.062497


In [41]:
plot_plotly(m, forecast, xlabel='Date', ylabel='Close', figsize=(1200, 600))

In [44]:
plot_components_plotly(m, forecast, figsize=(1200, 300))

In [48]:
forecast_cols = ['ds', 'yhat']
valores_reais_cols = ['ds', 'y']

forecast = forecast[forecast_cols]
valores_reais = train_data[valores_reais_cols]

resultados = pd.merge(forecast, valores_reais, on='ds', how='inner')

resultados['mape'] = np.abs((resultados['y'] - resultados['yhat']) / resultados['y']) * 100

mape = np.mean(resultados['mape'])

print(f"Mean Absolute Percentage Error (MAPE): {mape:.2f}%")

Mean Absolute Percentage Error (MAPE): 1.93%


### Validação Cruzada:

In [46]:
df_cv = cross_validation(m, initial='365 days', period='30 days', horizon = '7 days')

Seasonality has period of 365.25 days which is larger than initial window. Consider increasing initial.
  0%|          | 0/25 [00:00<?, ?it/s]

20:22:50 - cmdstanpy - INFO - Chain [1] start processing
20:22:50 - cmdstanpy - INFO - Chain [1] done processing
  4%|▍         | 1/25 [00:00<00:03,  6.95it/s]20:22:50 - cmdstanpy - INFO - Chain [1] start processing
20:22:50 - cmdstanpy - INFO - Chain [1] done processing
  8%|▊         | 2/25 [00:00<00:03,  7.07it/s]20:22:51 - cmdstanpy - INFO - Chain [1] start processing
20:22:51 - cmdstanpy - INFO - Chain [1] done processing
 12%|█▏        | 3/25 [00:00<00:03,  6.74it/s]20:22:51 - cmdstanpy - INFO - Chain [1] start processing
20:22:51 - cmdstanpy - INFO - Chain [1] done processing
 16%|█▌        | 4/25 [00:00<00:03,  6.83it/s]20:22:51 - cmdstanpy - INFO - Chain [1] start processing
20:22:51 - cmdstanpy - INFO - Chain [1] done processing
 20%|██        | 5/25 [00:00<00:03,  6.20it/s]20:22:51 - cmdstanpy - INFO - Chain [1] start processing
20:22:51 - cmdstanpy - INFO - Chain [1] done processing
 24%|██▍       | 6/25 [00:00<00:03,  6.22it/s]20:22:51 - cmdstanpy - INFO - Chain [1] start 

In [26]:
df_cv.tail(10)

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper,y,cutoff
84,2023-12-18,124781.44897,121044.309935,128342.320637,131084.0,2023-12-17
85,2023-12-19,124957.679577,121454.679016,128267.339619,131851.0,2023-12-17
86,2023-12-20,124927.359329,121167.102224,128527.045768,130804.0,2023-12-17
87,2023-12-21,125132.546929,121709.296124,128502.548174,132182.0,2023-12-17
88,2023-12-22,125505.440439,122077.178745,129041.15248,132753.0,2023-12-17
89,2024-01-17,133998.621266,130558.347375,137287.614662,128524.0,2024-01-16
90,2024-01-18,134307.325161,130568.432428,137758.540375,127316.0,2024-01-16
91,2024-01-19,134908.494698,131575.782493,138207.784868,127636.0,2024-01-16
92,2024-01-22,135622.889974,132087.339314,139069.418994,126602.0,2024-01-16
93,2024-01-23,136084.863145,132599.132937,139487.760322,128263.0,2024-01-16


In [27]:
df_p = performance_metrics(df_cv)
df_p

Unnamed: 0,horizon,mse,rmse,mae,mape,mdape,smape,coverage
0,1 days,30132830.0,5489.337864,4562.358149,0.040123,0.042596,0.039641,0.454545
1,2 days,30943890.0,5562.723207,4727.599342,0.041776,0.040635,0.041935,0.285714
2,3 days,49905170.0,7064.359165,5752.719533,0.050045,0.044445,0.05062,0.333333
3,4 days,29031040.0,5388.045996,3836.767,0.033813,0.022775,0.03476,0.533333
4,5 days,10230300.0,3198.484234,2506.150091,0.021552,0.01615,0.021661,0.615385
5,6 days,31478670.0,5610.585347,4488.074069,0.039036,0.037315,0.039031,0.357143
6,7 days,48193590.0,6942.16046,5220.09826,0.047344,0.035579,0.047601,0.333333


### Os resultados:

MAPE: Varia de de 3% até 5% para um horizonte de 7 dias. Esses valores indicam que as previsões são relativamente precisas, com erros percentuais aumentando ligeiramente à medida que o horizonte de previsão se estende.

Cobertura: A cobertura do intervalo de previsão parece diminuir com horizontes de previsão mais longos, o que é esperado, pois previsões mais distantes tendem a ser menos precisas.