# Capture the Flag Miguel: Forecast de Nível do Rio Itajaí-Açu
O objetivo desse notebook é cobrir todas as etapas de desenvolvimento de um modelo de Machine Learning para previsão de nível do Rio Itajaí-Açu. Rio do Sul historicamente possui problemas de enchentes que afetam draticamente a cidade, impactando diretamente a vida da população. O modelo tentará prever o nível do rio, com uma certa precisão, baseado em dados históricos resgatados da API do [Open Meteo](https://open-meteo.com/) e também dos dados de nível do rio capturados pela [Defesa Civil de Rio do Sul](https://defesacivil.riodosul.sc.gov.br/).

Etapas:
- Coleta de dados via API e Webscraping ✅
- Análise Exploratória das Enchentes nos últimos 6 anos ⌛️
    - Relatório com indicadores, médias e gráficos sobre os acontencimentos de enchente
- Feature Engineering ⌛️
    - Criação de novas features
    - Inclusão de novas features (mais informações da API do Open Meteo)
- Treinamento do Modelo ⌛️
    - Comparação entre modelos (random forest vs xgboost vs outros)
    - Métricas de desempenho
    - Validação cruzada
    - Tuning de hiperparâmetros
    - Seleção da melhor target: water_level_next_1h, water_level_next_3h, water_level_next_6h ...
- Avaliação de performance do melhor modelo ⌛️
- Deploy via API ⌛️
- Website contendo gráfico de nível com a previsão ⌛️
    - Vide exemplo: https://miro.medium.com/v2/resize:fit:1400/1*N3rDEJAvV_wolqXFl8HtEw.png

In [1]:
import os
import sys

# add root to sys.path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname('.'), '..')))

In [2]:
from data.generate_data import generate

In [3]:
train_df = generate(
    start_date="2025-03-31",
    end_date="2025-04-06",
    type="train",
    save=False
)

[32m2025-04-13 15:32:53.665[0m | [1mINFO    [0m | [36mdata.scraping[0m:[36m__init__[0m:[36m12[0m - [1mWebScraper inicializado[0m
[32m2025-04-13 15:32:53.669[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m__init__[0m:[36m31[0m - [1mDataGenerator initialized with output directory: /Users/matheus/Documents/projects/Enchentes/data/output[0m
[32m2025-04-13 15:32:53.670[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36mgenerate[0m:[36m170[0m - [1mGenerating train dataset from 2025-03-31 to 2025-04-06[0m
[32m2025-04-13 15:32:53.671[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_get_weather_data[0m:[36m44[0m - [1mFetching weather data from 2025-03-31 to 2025-04-06[0m
[32m2025-04-13 15:32:54.426[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_get_weather_data[0m:[36m48[0m - [1mWeather data fetched successfully: (168, 4) rows[0m
[32m2025-04-13 15:32:54.426[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_get

Processando intervalos:   0%|          | 0/3 [00:00<?, ?it/s]

[32m2025-04-13 15:32:54.959[0m | [1mINFO    [0m | [36mdata.scraping[0m:[36mparse_data[0m:[36m68[0m - [1mAnálise de dados concluída com sucesso[0m
[32m2025-04-13 15:32:54.963[0m | [1mINFO    [0m | [36mdata.scraping[0m:[36mparse_data[0m:[36m80[0m - [1mFormato final do DataFrame: (98, 2)[0m
[32m2025-04-13 15:32:54.964[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_get_water_level_data[0m:[36m68[0m - [1mWater level data scraped successfully: (98, 2) rows[0m
[32m2025-04-13 15:32:54.968[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_merge_datasets[0m:[36m106[0m - [1mDatasets merged successfully: (168, 6) rows[0m
[32m2025-04-13 15:32:54.976[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_process_data[0m:[36m145[0m - [1mData processing completed: (121, 17) rows[0m


In [4]:
train_df.head()

Unnamed: 0,time,temperature_2m,relative_humidity_2m,apparent_temperature,rain,water_level,hour,day_of_week,month,rain_24h,temperature_24h_avg,humidity_24h_avg,water_level_next_1h,water_level_next_3h,water_level_next_6h,water_level_next_12h,water_level_next_24h
23,2025-03-31 23:00:00,23.0,87,27.0,0.0,1.46,23,0,3,0.4,24.9125,80.0,1.46,1.46,1.44,1.29,1.32
24,2025-04-01 00:00:00,22.5,90,26.0,0.0,1.46,0,1,4,0.4,24.920833,79.833333,1.45,1.44,1.44,1.32,1.33
25,2025-04-01 01:00:00,21.6,93,25.5,0.0,1.45,1,1,4,0.4,24.8875,79.833333,1.46,1.44,1.44,1.28,1.32
26,2025-04-01 02:00:00,21.1,96,24.9,0.0,1.46,2,1,4,0.4,24.8375,79.958333,1.44,1.44,1.38,1.25,1.31
27,2025-04-01 03:00:00,20.7,98,24.6,0.0,1.44,3,1,4,0.4,24.770833,80.166667,1.44,1.44,1.37,1.22,1.32


In [5]:
predict_df = generate(
    start_date="2025-04-07",
    end_date="2025-04-12",
    type="predict",
    save=False
)

[32m2025-04-13 15:34:02.875[0m | [1mINFO    [0m | [36mdata.scraping[0m:[36m__init__[0m:[36m12[0m - [1mWebScraper inicializado[0m
[32m2025-04-13 15:34:02.877[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m__init__[0m:[36m31[0m - [1mDataGenerator initialized with output directory: /Users/matheus/Documents/projects/Enchentes/data/output[0m
[32m2025-04-13 15:34:02.877[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36mgenerate[0m:[36m170[0m - [1mGenerating predict dataset from 2025-04-07 to 2025-04-12[0m
[32m2025-04-13 15:34:02.878[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_get_weather_data[0m:[36m44[0m - [1mFetching weather data from 2025-04-07 to 2025-04-12[0m
[32m2025-04-13 15:34:03.599[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_get_weather_data[0m:[36m48[0m - [1mWeather data fetched successfully: (144, 4) rows[0m
[32m2025-04-13 15:34:03.605[0m | [1mINFO    [0m | [36mdata.generate_data[0m:[36m_p

In [6]:
predict_df.head()

Unnamed: 0,time,temperature_2m,relative_humidity_2m,apparent_temperature,rain,hour,day_of_week,month,rain_24h,temperature_24h_avg,humidity_24h_avg
23,2025-04-07 23:00:00,18.5,81.0,19.5,0.0,23,0,4,1.3,18.670833,82.125
24,2025-04-08 00:00:00,18.2,84.0,19.3,0.0,0,1,4,1.3,18.841667,81.666667
25,2025-04-08 01:00:00,18.2,86.0,19.2,0.0,1,1,4,1.3,19.016667,81.25
26,2025-04-08 02:00:00,18.1,88.0,18.9,0.0,2,1,4,1.3,19.1625,80.916667
27,2025-04-08 03:00:00,18.1,89.0,19.0,0.0,3,1,4,1.3,19.275,80.666667
