# Previsão do IBOV utilizando Prophet.

### Por que utilizar o Prophet?

1. **Simplicidade e Facilidade de Uso:**
   - O Prophet foi desenvolvido para ser acessível e fácil de usar, mesmo para aqueles que não são especialistas em estatística ou aprendizado de máquina.

2. **Flexibilidade com Sazonalidades:**
   - O Prophet lida muito bem com dados de séries temporais que apresentam padrões de sazonalidade complexos e múltiplos, como sazonalidades anuais, semanais e diárias. Além disso, ele pode acomodar feriados e eventos especiais, o que pode ser particularmente útil para dados de mercado.

3. **Robustez a Dados Faltantes e Mudanças na Tendência:**
   - O modelo é robusto a dados faltantes e mudanças na tendência, o que o torna adequado para conjuntos de dados que podem não ser perfeitamente consistentes ou completos.

4. **Desempenho e Precisão:**
   - Embora modelos como LSTM, GRU ou DNNs possam oferecer maior precisão em algumas situações, o Prophet frequentemente fornece um bom equilíbrio entre precisão e complexidade. Modelos mais complexos como redes neurais exigem uma grande quantidade de dados e poder computacional, além de serem mais sensíveis a overfitting.

5. **Interpretabilidade dos Resultados:**
   - O Prophet fornece componentes modelados (tendência, sazonalidade, feriados) de forma clara, tornando os resultados mais interpretáveis. Em contraste, modelos como redes neurais são frequentemente considerados "caixas-pretas", onde a interpretação dos resultados pode ser desafiadora.

6. **Rapidez no Desenvolvimento e Testes:**
   - Implementar e testar o Prophet geralmente leva menos tempo do que construir e ajustar modelos de redes neurais.

7. **Menor Necessidade de Ajustes Finos:**
   - Enquanto modelos como LSTM e DNNs podem requerer um ajuste fino extenso dos hiperparâmetros, o Prophet tem menos parâmetros para ajustar, o que facilita o processo de modelagem.

8. **Feriados:**
   - Por fim, escolhi o Prophet por que ele tem a facilidade de considerar os feriádos que interferem no IBOV, como feriados americanos, nacionais e do estado de SP.

### Carregando os dados: 

In [328]:
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime
from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly
from prophet.diagnostics import cross_validation
from prophet.diagnostics import performance_metrics
from sklearn.metrics import mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

end_data = datetime.today().strftime('%Y-%m-%d')
df = yf.download("^BVSP", start="2021-01-01", end=end_data)
df.reset_index(inplace=True)
df

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2021-01-04,119024.0,120354.0,118062.0,118558.0,118558.0,8741400
1,2021-01-05,118835.0,119790.0,116756.0,119223.0,119223.0,9257100
2,2021-01-06,119377.0,120924.0,118917.0,119851.0,119851.0,11638200
3,2021-01-07,119103.0,121983.0,119101.0,121956.0,121956.0,11774800
4,2021-01-08,122387.0,125324.0,122386.0,125077.0,125077.0,11085800
...,...,...,...,...,...,...,...
757,2024-01-18,128524.0,129047.0,127316.0,127316.0,127316.0,12460800
758,2024-01-19,127319.0,127820.0,126533.0,127636.0,127636.0,11956900
759,2024-01-22,127636.0,127843.0,125876.0,126602.0,126602.0,9509100
760,2024-01-23,126612.0,128331.0,126612.0,128263.0,128263.0,9366100


### Preparando para trabalhar com o Prophet:

In [329]:
df = df[['Date', 'Close']]
df.rename(columns={'Date':'ds','Close':'y'},inplace=True)
df.head()

Unnamed: 0,ds,y
0,2021-01-04,118558.0
1,2021-01-05,119223.0
2,2021-01-06,119851.0
3,2021-01-07,121956.0
4,2021-01-08,125077.0


In [330]:
df.count()

ds    762
y     762
dtype: int64

### Inserindo os feriados importantes:

In [331]:
import holidays

years = list(range(2021, 2026))

us_holidays = holidays.country_holidays('US', years=years)
nyse_holidays = holidays.financial_holidays('NYSE', years=years)

br_holidays = holidays.country_holidays('BR', years=years)

sp_holidays = holidays.Brazil(state='SP', years=years)

us_holidays_df = pd.DataFrame(list(us_holidays.items()), columns=['ds', 'holiday'])
nyse_holidays_df = pd.DataFrame(list(nyse_holidays.items()), columns=['ds', 'holiday'])
br_holidays_df = pd.DataFrame(list(br_holidays.items()), columns=['ds', 'holiday'])
sp_holidays_df = pd.DataFrame(list(sp_holidays.items()), columns=['ds', 'holiday'])

total_holidays = pd.concat([us_holidays_df, nyse_holidays_df, br_holidays_df, sp_holidays_df]).drop_duplicates().reset_index(drop=True)
total_holidays['ds'] = pd.to_datetime(total_holidays['ds'])

total_holidays.count()

ds         123
holiday    123
dtype: int64

### Separando os dados em treino e teste:

In [332]:
train_data = df.sample(frac=0.8, random_state=0)
test_data = df.drop(train_data.index)
print(f'training data size : {train_data.shape}')
print(f'testing data size : {test_data.shape}')

training data size : (610, 2)
testing data size : (152, 2)


### Treinando o Modelo:

In [341]:
m = Prophet(holidays=total_holidays,daily_seasonality=True)
m.fit(train_data)
future = m.make_future_dataframe(periods=len(test_data))
forecast = m.predict(future)
forecast.head()

23:32:20 - cmdstanpy - INFO - Chain [1] start processing
23:32:20 - cmdstanpy - INFO - Chain [1] done processing


Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,Christmas Day,Christmas Day_lower,Christmas Day_upper,Christmas Day (observed),...,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
0,2021-01-04,117695.384476,114761.709639,121744.944054,117695.384476,117695.384476,0.0,0.0,0.0,0.0,...,-1.788321,-1.788321,-1.788321,-1883.387503,-1883.387503,-1883.387503,0.0,0.0,0.0,118299.475146
1,2021-01-05,117654.623109,114725.32363,121956.793054,117654.623109,117654.623109,0.0,0.0,0.0,0.0,...,185.4525,185.4525,185.4525,-1907.116701,-1907.116701,-1907.116701,0.0,0.0,0.0,118422.2254
2,2021-01-06,117613.861741,114788.287474,121697.897267,117613.861741,117613.861741,0.0,0.0,0.0,0.0,...,55.147956,55.147956,55.147956,-1929.796393,-1929.796393,-1929.796393,0.0,0.0,0.0,118228.479798
3,2021-01-07,117573.100374,114819.284835,121556.068487,117573.100374,117573.100374,0.0,0.0,0.0,0.0,...,44.62447,44.62447,44.62447,-1949.383639,-1949.383639,-1949.383639,0.0,0.0,0.0,118157.607699
4,2021-01-08,117532.339007,115005.265775,121526.846616,117532.339007,117532.339007,0.0,0.0,0.0,0.0,...,338.881524,338.881524,338.881524,-1963.852085,-1963.852085,-1963.852085,0.0,0.0,0.0,118396.634939


In [342]:
plot_plotly(m, forecast, xlabel='Date', ylabel='Close', figsize=(1200, 600))

In [343]:
plot_components_plotly(m, forecast, figsize=(1200, 300))

In [344]:
forecast_cols = ['ds', 'yhat']
valores_reais_cols = ['ds', 'y']

forecast = forecast[forecast_cols]
valores_reais = train_data[valores_reais_cols]

resultados = pd.merge(forecast, valores_reais, on='ds', how='inner')

resultados['mape'] = np.abs((resultados['y'] - resultados['yhat']) / resultados['y']) * 100

mape = np.mean(resultados['mape'])

print(f"Mean Absolute Percentage Error (MAPE): {mape:.2f}%")

Mean Absolute Percentage Error (MAPE): 1.93%


### Validação Cruzada:

In [345]:
df_cv = cross_validation(m, initial='365 days', period='30 days', horizon = '7 days')

Seasonality has period of 365.25 days which is larger than initial window. Consider increasing initial.
  0%|          | 0/25 [00:00<?, ?it/s]23:32:21 - cmdstanpy - INFO - Chain [1] start processing
23:32:21 - cmdstanpy - INFO - Chain [1] done processing
  4%|▍         | 1/25 [00:00<00:03,  7.73it/s]23:32:21 - cmdstanpy - INFO - Chain [1] start processing
23:32:21 - cmdstanpy - INFO - Chain [1] done processing
  8%|▊         | 2/25 [00:00<00:03,  6.98it/s]23:32:21 - cmdstanpy - INFO - Chain [1] start processing
23:32:21 - cmdstanpy - INFO - Chain [1] done processing
 12%|█▏        | 3/25 [00:00<00:03,  6.56it/s]23:32:21 - cmdstanpy - INFO - Chain [1] start processing
23:32:21 - cmdstanpy - INFO - Chain [1] done processing
 16%|█▌        | 4/25 [00:00<00:03,  6.59it/s]23:32:22 - cmdstanpy - INFO - Chain [1] start processing
23:32:22 - cmdstanpy - INFO - Chain [1] done processing
 20%|██        | 5/25 [00:00<00:03,  5.60it/s]23:32:22 - cmdstanpy - INFO - Chain [1] start processing
23:32:

In [346]:
df_cv.tail()

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper,y,cutoff
89,2024-01-17,133539.27261,129979.579015,137059.54777,128524.0,2024-01-16
90,2024-01-18,133856.35219,130387.075414,137312.040257,127316.0,2024-01-16
91,2024-01-19,134443.483571,130938.189251,137898.26917,127636.0,2024-01-16
92,2024-01-22,135089.015796,131543.873093,138643.190566,126602.0,2024-01-16
93,2024-01-23,135573.354272,132051.488121,139262.87697,128263.0,2024-01-16


In [347]:
df_p = performance_metrics(df_cv)
df_p

Unnamed: 0,horizon,mse,rmse,mae,mape,mdape,smape,coverage
0,1 days,30329460.0,5507.21893,4595.86595,0.040457,0.039022,0.040001,0.454545
1,2 days,30161130.0,5491.914555,4637.115503,0.040987,0.039081,0.041162,0.285714
2,3 days,48984500.0,6998.893059,5688.990088,0.049488,0.044513,0.050103,0.333333
3,4 days,29450280.0,5426.811266,3886.044929,0.034256,0.022368,0.035202,0.533333
4,5 days,10599100.0,3255.626157,2556.385156,0.022009,0.018185,0.022114,0.615385
5,6 days,31397310.0,5603.330573,4521.361534,0.039414,0.036892,0.039388,0.357143
6,7 days,48124000.0,6937.14652,5218.50653,0.047366,0.040092,0.047632,0.333333


### Os resultados:

MAPE: Varia de de 3% até 5% para um horizonte de 7 dias. Esses valores indicam que as previsões são relativamente precisas, com erros percentuais aumentando ligeiramente à medida que o horizonte de previsão se estende.

Cobertura: A cobertura do intervalo de previsão parece diminuir com horizontes de previsão mais longos, o que é esperado, pois previsões mais distantes tendem a ser menos precisas.

In [348]:
day = df['ds'].iloc[-1]
price_day = df['y'].iloc[-1]

future = m.make_future_dataframe(periods=1, freq='D')
forecast = m.predict(future)
predicted_price = np.round(forecast['yhat'].iloc[-1], 2)
predicted_day = forecast['ds'].iloc[-1]

change_percent = np.round(100 - (price_day * 100)/predicted_price, 2)

plus = '+'; minus = ''
print(f'O valor de fechamento para o IBOV do dia {day} foi {price_day}')
print(f'O valor de fechamento predito para {predicted_day} é {predicted_price} ({plus if change_percent > 0 else minus}{change_percent}%)')

O valor de fechamento para o IBOV do dia 2024-01-24 00:00:00 foi 127816.0
O valor de fechamento predito para 2024-01-24 00:00:00 é 132811.1 (+3.76%)
