## A Importância das Features na Predição

Ter muitas features não significa necessariamente um modelo melhor. O que realmente importa é se elas capturam padrões relevantes que ajudem na previsão da variável alvo (target).




**Como as Features Contribuem para a Predição?**
- Capturam Relações com o Target
    - Se uma feature tem correlação forte com o target, ela é útil para a previsão.
    - Exemplo: Para prever se vai chover, a umidade do ar pode ser uma feature relevante.

**📌 Como verificar?**
Podemos usar a correlação para medir a relação entre as variáveis:
```python
df.corr() # → Mostra se há relação linear.
```
- Se a feature tiver correlação for baixa comparado com a feature alvo, pode não ajudar.


Outra abordagem é verificar a importância das features em modelos como Random Forest:

```python
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)
importances = model.feature_importances_
```
- Isso nos ajuda a entender quais variáveis têm mais impacto na decisão do modelo.


**Melhoram a Separação das Classes**
- Algumas features tornam as classes mais distintas, facilitando a decisão do modelo.
    - Exemplo: Se queremos prever se um cliente comprará ou não, tempo gasto no site pode ser um bom indicador.

**Ajudam Algoritmos a Encontrar Padrões**
- Modelos como árvores de decisão ou redes neurais buscam divisões inteligentes nos dados.
    - Exemplo: Em reconhecimento facial, distância entre olhos pode ser uma feature útil.





In [4]:
%load_ext autoreload 
%autoreload 2

In [5]:
import os
import numpy as np
import pandas as pd
from datetime import datetime
from pathlib import Path

pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 100)



In [6]:
BASE_DIR = os.path.dirname(Path("__file__").resolve().parent)
DATA_DIR = os.path.join(BASE_DIR, "pred_soccer", "modelagem", 'feature_eng', 'data')
MODEL_DIR = os.path.join(os.path.dirname(BASE_DIR), 'database')
LOG_DIR = os.path.join(os.path.dirname(BASE_DIR), 'logs')

df = pd.read_csv(os.path.join(DATA_DIR, 'ft_df.csv'))

In [11]:
df[df["season"]==2024].head(30)

Unnamed: 0,season,home_team_encoder,away_team_encoder,winner,ht_rank,ht_ls_rank,ht_days_ls_match,ht_points,ht_l_points,ht_l_wavg_points,ht_goals,ht_l_goals,ht_l_wavg_goals,ht_goals_sf,ht_l_goals_sf,ht_l_wavg_goals_sf,ht_wins,ht_draws,ht_losses,ht_win_streak,ht_loss_streak,ht_draw_streak,at_rank,at_ls_rank,at_days_ls_match,at_points,at_l_points,at_l_wavg_points,at_goals,at_l_goals,at_l_wavg_goals,at_goals_sf,at_l_goals_sf,at_l_wavg_goals_sf,at_wins,at_draws,at_losses,at_win_streak,at_loss_streak,at_draw_streak
4559,2024,13,24,0,18,-33.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,15,-33.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4560,2024,22,5,2,5,9.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,8,16.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4561,2024,18,7,0,13,8.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,16,6.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4562,2024,32,19,1,6,11.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,4,10.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4563,2024,1,15,2,17,7.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,19,12.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4564,2024,2,17,1,20,-33.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,3,4.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4565,2024,11,3,0,7,13.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,11,3.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4566,2024,34,21,2,10,15.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,14,2.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4567,2024,14,6,2,9,14.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,1,5.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
4568,2024,35,26,1,12,-33.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0,2,1.0,-33.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0,0,0,0,0
