# Pr√©-processamento dos Datasets para Experimento RCBD

**Equipe F**: Bernardo Bacha de Resende, Gustavo Augusto Faria dos Reis, Mar√≠lia Mac√™do de Melo

**Disciplina**: EEE933 - Planejamento e An√°lise de Experimentos (2025/2)

---

Este notebook processa os 5 datasets de classifica√ß√£o bin√°ria, preparando-os para uso no experimento RCBD.

**Datasets:**
1. Breast Cancer (569 amostras)
2. Titanic (891 amostras)
3. Water Potability (3,276 amostras)
4. Employee Attrition (4,653 amostras)
5. Australia Rain (145,460 amostras ‚Üí amostragem para ~10k)

**Pr√©-processamento aplicado:**
- Remo√ß√£o de colunas n√£o informativas (IDs, nomes, etc.)
- Tratamento de valores nulos (mediana para num√©rico, moda para categ√≥rico)
- One-hot encoding para features categ√≥ricas
- StandardScaler (z-score) para normaliza√ß√£o

## 1. Imports e Configura√ß√µes

In [12]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

print("‚úì Bibliotecas importadas com sucesso!")

‚úì Bibliotecas importadas com sucesso!


## 2. Dataset 1: Breast Cancer

**Caracter√≠sticas:**
- 569 amostras √ó 32 colunas
- Target: `diagnosis` (M=Malignant, B=Benign)
- Sem valores nulos
- Dataset mais limpo

In [13]:
print("=" * 80)
print("DATASET 1: BREAST CANCER")
print("=" * 80)

# Carregar dataset
df_breast = pd.read_csv('../data/breast_cancer.csv')
print(f"Shape original: {df_breast.shape}")
print(f"Colunas: {list(df_breast.columns)}")

# Remover coluna ID
df_breast = df_breast.drop(columns=['id'])

# Separar target
y_breast_cancer = df_breast['diagnosis'].map({'M': 1, 'B': 0})
X_breast_cancer = df_breast.drop(columns=['diagnosis'])

# Verificar nulos
print(f"\nValores nulos: {X_breast_cancer.isnull().sum().sum()}")

# Normalizar com StandardScaler
scaler_breast = StandardScaler()
X_breast_cancer = pd.DataFrame(
    scaler_breast.fit_transform(X_breast_cancer),
    columns=X_breast_cancer.columns
)

print(f"\n‚úì Pr√©-processamento conclu√≠do!")
print(f"X_breast_cancer shape: {X_breast_cancer.shape}")
print(f"y_breast_cancer shape: {y_breast_cancer.shape}")
print(f"Distribui√ß√£o do target:\n{y_breast_cancer.value_counts()}")
print(f"Propor√ß√£o: {y_breast_cancer.value_counts(normalize=True)}")

DATASET 1: BREAST CANCER
Shape original: (569, 32)
Colunas: ['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean', 'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean', 'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean', 'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se', 'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se', 'fractal_dimension_se', 'radius_worst', 'texture_worst', 'perimeter_worst', 'area_worst', 'smoothness_worst', 'compactness_worst', 'concavity_worst', 'concave points_worst', 'symmetry_worst', 'fractal_dimension_worst']

Valores nulos: 0

‚úì Pr√©-processamento conclu√≠do!
X_breast_cancer shape: (569, 30)
y_breast_cancer shape: (569,)
Distribui√ß√£o do target:
diagnosis
0    357
1    212
Name: count, dtype: int64
Propor√ß√£o: diagnosis
0    0.627417
1    0.372583
Name: proportion, dtype: float64


## 3. Dataset 2: Titanic

**Caracter√≠sticas:**
- 891 amostras √ó 12 colunas
- Target: `Survived` (0/1)
- Valores nulos em Age, Cabin, Embarked
- Features categ√≥ricas: Sex, Embarked, Pclass

In [14]:
print("=" * 80)
print("DATASET 2: TITANIC")
print("=" * 80)

# Carregar dataset
df_titanic = pd.read_csv('../data/titanic.csv')
print(f"Shape original: {df_titanic.shape}")
print(f"\nValores nulos por coluna:\n{df_titanic.isnull().sum()[df_titanic.isnull().sum() > 0]}")

# Remover colunas n√£o informativas
df_titanic = df_titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin'])

# Separar target
y_titanic = df_titanic['Survived']
X_titanic = df_titanic.drop(columns=['Survived'])

# Tratar valores nulos
# Age: preencher com mediana
X_titanic['Age'].fillna(X_titanic['Age'].median(), inplace=True)
# Embarked: preencher com moda
X_titanic['Embarked'].fillna(X_titanic['Embarked'].mode()[0], inplace=True)

print(f"\nValores nulos ap√≥s tratamento: {X_titanic.isnull().sum().sum()}")

# One-hot encoding para categ√≥ricas
categorical_cols = ['Sex', 'Embarked']
X_titanic = pd.get_dummies(X_titanic, columns=categorical_cols, drop_first=True, dtype=int)

# Converter Pclass para dummy se ainda n√£o for num√©rico adequado
# Pclass j√° √© num√©rico (1, 2, 3), mas pode fazer one-hot se preferir
X_titanic = pd.get_dummies(X_titanic, columns=['Pclass'], prefix='Pclass', dtype=int)

print(f"\nColunas ap√≥s one-hot encoding ({len(X_titanic.columns)}): {list(X_titanic.columns)}")

# Normalizar com StandardScaler
scaler_titanic = StandardScaler()
X_titanic = pd.DataFrame(
    scaler_titanic.fit_transform(X_titanic),
    columns=X_titanic.columns
)

print(f"\n‚úì Pr√©-processamento conclu√≠do!")
print(f"X_titanic shape: {X_titanic.shape}")
print(f"y_titanic shape: {y_titanic.shape}")
print(f"Distribui√ß√£o do target:\n{y_titanic.value_counts()}")
print(f"Propor√ß√£o: {y_titanic.value_counts(normalize=True)}")

DATASET 2: TITANIC
Shape original: (891, 12)

Valores nulos por coluna:
Age         177
Cabin       687
Embarked      2
dtype: int64

Valores nulos ap√≥s tratamento: 0

Colunas ap√≥s one-hot encoding (10): ['Age', 'SibSp', 'Parch', 'Fare', 'Sex_male', 'Embarked_Q', 'Embarked_S', 'Pclass_1', 'Pclass_2', 'Pclass_3']

‚úì Pr√©-processamento conclu√≠do!
X_titanic shape: (891, 10)
y_titanic shape: (891,)
Distribui√ß√£o do target:
Survived
0    549
1    342
Name: count, dtype: int64
Propor√ß√£o: Survived
0    0.616162
1    0.383838
Name: proportion, dtype: float64


## 4. Dataset 3: Water Potability

**Caracter√≠sticas:**
- 3,276 amostras √ó 10 colunas
- Target: `Potability` (0/1)
- Valores nulos em pH, Sulfate, Trihalomethanes
- Todas features num√©ricas

In [15]:
print("=" * 80)
print("DATASET 3: WATER POTABILITY")
print("=" * 80)

# Carregar dataset
df_water = pd.read_csv('../data/water_potability.csv')
print(f"Shape original: {df_water.shape}")
print(f"\nValores nulos por coluna:\n{df_water.isnull().sum()[df_water.isnull().sum() > 0]}")

# Separar target
y_water_potability = df_water['Potability']
X_water_potability = df_water.drop(columns=['Potability'])

# Tratar valores nulos (preencher com mediana)
for col in X_water_potability.columns:
    if X_water_potability[col].isnull().sum() > 0:
        X_water_potability[col].fillna(X_water_potability[col].median(), inplace=True)
        print(f"Preenchidos {col} com mediana: {X_water_potability[col].median():.2f}")

print(f"\nValores nulos ap√≥s tratamento: {X_water_potability.isnull().sum().sum()}")

# Normalizar com StandardScaler
scaler_water = StandardScaler()
X_water_potability = pd.DataFrame(
    scaler_water.fit_transform(X_water_potability),
    columns=X_water_potability.columns
)

print(f"\n‚úì Pr√©-processamento conclu√≠do!")
print(f"X_water_potability shape: {X_water_potability.shape}")
print(f"y_water_potability shape: {y_water_potability.shape}")
print(f"Distribui√ß√£o do target:\n{y_water_potability.value_counts()}")
print(f"Propor√ß√£o: {y_water_potability.value_counts(normalize=True)}")

DATASET 3: WATER POTABILITY
Shape original: (3276, 10)

Valores nulos por coluna:
ph                 491
Sulfate            781
Trihalomethanes    162
dtype: int64
Preenchidos ph com mediana: 7.04
Preenchidos Sulfate com mediana: 333.07
Preenchidos Trihalomethanes com mediana: 66.62

Valores nulos ap√≥s tratamento: 0

‚úì Pr√©-processamento conclu√≠do!
X_water_potability shape: (3276, 9)
y_water_potability shape: (3276,)
Distribui√ß√£o do target:
Potability
0    1998
1    1278
Name: count, dtype: int64
Propor√ß√£o: Potability
0    0.60989
1    0.39011
Name: proportion, dtype: float64


## 5. Dataset 4: Employee Attrition

**Caracter√≠sticas:**
- 4,653 amostras √ó 9 colunas
- Target: `LeaveOrNot` (0=Ficou, 1=Saiu do emprego)
- Sem valores nulos
- Features categ√≥ricas: Education, City, Gender, EverBenched
- **Aten√ß√£o**: Classes razoavelmente balanceadas (~34% sa√≠das)

In [16]:
print("=" * 80)
print("DATASET 4: EMPLOYEE ATTRITION")
print("=" * 80)

# Carregar dataset
df_employee = pd.read_csv('../data/Employee.csv')
print(f"Shape original: {df_employee.shape}")
print(f"\nValores nulos por coluna:\n{df_employee.isnull().sum().sum()} (nenhum!)")

# Separar target
y_employee = df_employee['LeaveOrNot']
X_employee = df_employee.drop(columns=['LeaveOrNot'])

print(f"\nValores nulos: {X_employee.isnull().sum().sum()}")

# Identificar e fazer one-hot encoding para categ√≥ricas
categorical_cols = ['Education', 'City', 'Gender', 'EverBenched']
print(f"\nColunas categ√≥ricas para encoding: {categorical_cols}")

X_employee = pd.get_dummies(X_employee, columns=categorical_cols, drop_first=True, dtype=int)

print(f"\nColunas ap√≥s one-hot encoding ({len(X_employee.columns)}): {list(X_employee.columns)}")

# Normalizar com StandardScaler
scaler_employee = StandardScaler()
X_employee = pd.DataFrame(
    scaler_employee.fit_transform(X_employee),
    columns=X_employee.columns
)

print(f"\n‚úì Pr√©-processamento conclu√≠do!")
print(f"X_employee shape: {X_employee.shape}")
print(f"y_employee shape: {y_employee.shape}")
print(f"Distribui√ß√£o do target:\n{y_employee.value_counts()}")
print(f"Propor√ß√£o: {y_employee.value_counts(normalize=True)}")

DATASET 4: EMPLOYEE ATTRITION
Shape original: (4653, 9)

Valores nulos por coluna:
0 (nenhum!)

Valores nulos: 0

Colunas categ√≥ricas para encoding: ['Education', 'City', 'Gender', 'EverBenched']

Colunas ap√≥s one-hot encoding (10): ['JoiningYear', 'PaymentTier', 'Age', 'ExperienceInCurrentDomain', 'Education_Masters', 'Education_PHD', 'City_New Delhi', 'City_Pune', 'Gender_Male', 'EverBenched_Yes']

‚úì Pr√©-processamento conclu√≠do!
X_employee shape: (4653, 10)
y_employee shape: (4653,)
Distribui√ß√£o do target:
LeaveOrNot
0    3053
1    1600
Name: count, dtype: int64
Propor√ß√£o: LeaveOrNot
0    0.656136
1    0.343864
Name: proportion, dtype: float64


## 6. Dataset 5: Australia Rain (Weather)

**Caracter√≠sticas:**
- 145,460 amostras √ó 23 colunas (MUITO GRANDE)
- Target: `RainTomorrow` (Yes/No)
- MUITOS valores nulos (~40% das features)
- Features categ√≥ricas: WindGustDir, WindDir9am, WindDir3pm, RainToday

**Estrat√©gia:**
1. Remover Date e Location (n√£o informativas/muitas categorias)
2. Remover linhas com muitos nulos (dropna)
3. Fazer amostragem estratificada para ~10k amostras

In [17]:
print("=" * 80)
print("DATASET 5: AUSTRALIA RAIN (WEATHER)")
print("=" * 80)

# Carregar dataset
df_weather = pd.read_csv('../data/weather.csv')
print(f"Shape original: {df_weather.shape}")
print(f"\nValores nulos por coluna:")
null_counts = df_weather.isnull().sum()
print(null_counts[null_counts > 0])
print(f"\nTotal de colunas com nulos: {(null_counts > 0).sum()}")

# Remover colunas n√£o informativas
df_weather = df_weather.drop(columns=['Date', 'Location'])
print(f"\nShape ap√≥s remo√ß√£o de Date e Location: {df_weather.shape}")

# Remover target nulo primeiro
df_weather = df_weather.dropna(subset=['RainTomorrow'])
print(f"Shape ap√≥s remover target nulo: {df_weather.shape}")

# Separar target
y_weather = df_weather['RainTomorrow'].map({'Yes': 1, 'No': 0})
X_weather = df_weather.drop(columns=['RainTomorrow'])

# Remover linhas com muitos nulos (estrat√©gia: dropna)
# Vamos remover linhas que t√™m qualquer valor nulo
initial_rows = len(X_weather)
valid_indices = X_weather.dropna().index
X_weather = X_weather.loc[valid_indices]
y_weather = y_weather.loc[valid_indices]

print(f"\nLinhas removidas por nulos: {initial_rows - len(X_weather)}")
print(f"Shape ap√≥s dropna: X={X_weather.shape}, y={y_weather.shape}")

# One-hot encoding para categ√≥ricas
# Converter RainToday para num√©rico antes
if 'RainToday' in X_weather.columns:
    X_weather['RainToday'] = X_weather['RainToday'].map({'Yes': 1, 'No': 0})

# Identificar colunas categ√≥ricas (dire√ß√µes de vento)
categorical_cols = [col for col in X_weather.columns if 'WindDir' in col or 'Wind' in col and X_weather[col].dtype == 'object']
print(f"\nColunas categ√≥ricas identificadas: {categorical_cols}")

if categorical_cols:
    X_weather = pd.get_dummies(X_weather, columns=categorical_cols, drop_first=True, dtype=int)
    print(f"Colunas ap√≥s one-hot encoding: {len(X_weather.columns)}")

# AMOSTRAGEM ESTRATIFICADA para ~10k amostras
if len(X_weather) > 10000:
    sample_size = 10000
    print(f"\nFazendo amostragem estratificada de {len(X_weather)} para {sample_size} amostras...")
    
    from sklearn.model_selection import train_test_split
    X_weather, _, y_weather, _ = train_test_split(
        X_weather, y_weather, 
        train_size=sample_size, 
        stratify=y_weather,
        random_state=42
    )
    print(f"Shape ap√≥s amostragem: X={X_weather.shape}, y={y_weather.shape}")

# Normalizar com StandardScaler
scaler_weather = StandardScaler()
X_weather = pd.DataFrame(
    scaler_weather.fit_transform(X_weather),
    columns=X_weather.columns
)

print(f"\n‚úì Pr√©-processamento conclu√≠do!")
print(f"X_weather shape: {X_weather.shape}")
print(f"y_weather shape: {y_weather.shape}")
print(f"Distribui√ß√£o do target:\n{y_weather.value_counts()}")
print(f"Propor√ß√£o: {y_weather.value_counts(normalize=True)}")

DATASET 5: AUSTRALIA RAIN (WEATHER)
Shape original: (145460, 23)

Valores nulos por coluna:
MinTemp           1485
MaxTemp           1261
Rainfall          3261
Evaporation      62790
Sunshine         69835
WindGustDir      10326
WindGustSpeed    10263
WindDir9am       10566
WindDir3pm        4228
WindSpeed9am      1767
WindSpeed3pm      3062
Humidity9am       2654
Humidity3pm       4507
Pressure9am      15065
Pressure3pm      15028
Cloud9am         55888
Cloud3pm         59358
Temp9am           1767
Temp3pm           3609
RainToday         3261
RainTomorrow      3267
dtype: int64

Total de colunas com nulos: 21

Shape ap√≥s remo√ß√£o de Date e Location: (145460, 21)
Shape ap√≥s remover target nulo: (142193, 21)

Linhas removidas por nulos: 85773
Shape ap√≥s dropna: X=(56420, 20), y=(56420,)

Colunas categ√≥ricas identificadas: ['WindGustDir', 'WindDir9am', 'WindDir3pm']
Colunas ap√≥s one-hot encoding: 62

Fazendo amostragem estratificada de 56420 para 10000 amostras...
Shape ap√≥s amo

## 7. Resumo Final

Valida√ß√£o de todos os datasets processados

In [18]:
print("=" * 80)
print("RESUMO FINAL - TODOS OS DATASETS PROCESSADOS")
print("=" * 80)

datasets_summary = {
    'Breast Cancer': (X_breast_cancer, y_breast_cancer),
    'Titanic': (X_titanic, y_titanic),
    'Water Potability': (X_water_potability, y_water_potability),
    'Employee': (X_employee, y_employee),
    'Weather': (X_weather, y_weather)
}

summary_data = []

for name, (X, y) in datasets_summary.items():
    summary_data.append({
        'Dataset': name,
        'Amostras': X.shape[0],
        'Features': X.shape[1],
        'Nulos em X': X.isnull().sum().sum(),
        'Nulos em y': y.isnull().sum(),
        'Classe 0': (y == 0).sum(),
        'Classe 1': (y == 1).sum(),
        'Propor√ß√£o (%)': f"{(y == 1).sum() / len(y) * 100:.1f}%"
    })

summary_df = pd.DataFrame(summary_data)
print("\n")
print(summary_df.to_string(index=False))

print("\n" + "=" * 80)
print("‚úì TODOS OS DATASETS PRONTOS PARA USO NO EXPERIMENTO RCBD!")
print("=" * 80)

print("\nüìä Vari√°veis dispon√≠veis:")
print("  ‚Ä¢ X_breast_cancer, y_breast_cancer")
print("  ‚Ä¢ X_titanic, y_titanic")
print("  ‚Ä¢ X_water_potability, y_water_potability")
print("  ‚Ä¢ X_employee, y_employee")
print("  ‚Ä¢ X_weather, y_weather")

print("\nüéØ Caracter√≠sticas:")
print("  ‚Ä¢ Todas as features s√£o num√©ricas")
print("  ‚Ä¢ Sem valores nulos")
print("  ‚Ä¢ Normalizadas com StandardScaler (z-score)")
print("  ‚Ä¢ Prontas para classificadores de ML")

RESUMO FINAL - TODOS OS DATASETS PROCESSADOS


         Dataset  Amostras  Features  Nulos em X  Nulos em y  Classe 0  Classe 1 Propor√ß√£o (%)
   Breast Cancer       569        30           0           0       357       212         37.3%
         Titanic       891        10           0           0       549       342         38.4%
Water Potability      3276         9           0           0      1998      1278         39.0%
        Employee      4653        10           0           0      3053      1600         34.4%
         Weather     10000        62           0           0      7797      2203         22.0%

‚úì TODOS OS DATASETS PRONTOS PARA USO NO EXPERIMENTO RCBD!

üìä Vari√°veis dispon√≠veis:
  ‚Ä¢ X_breast_cancer, y_breast_cancer
  ‚Ä¢ X_titanic, y_titanic
  ‚Ä¢ X_water_potability, y_water_potability
  ‚Ä¢ X_employee, y_employee
  ‚Ä¢ X_weather, y_weather

üéØ Caracter√≠sticas:
  ‚Ä¢ Todas as features s√£o num√©ricas
  ‚Ä¢ Sem valores nulos
  ‚Ä¢ Normalizadas com StandardSca

In [19]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print("=" * 80)
print("BASELINE - SVM COM KERNEL RBF")
print("=" * 80)

# Dicion√°rio para armazenar resultados
baseline_results = []

# Lista de datasets
datasets = [
    ('Breast Cancer', X_breast_cancer, y_breast_cancer),
    ('Titanic', X_titanic, y_titanic),
    ('Water Potability', X_water_potability, y_water_potability),
    ('Employee', X_employee, y_employee),
    ('Weather', X_weather, y_weather)
]

# Para cada dataset
for name, X, y in datasets:
    print(f"\n{'-' * 80}")
    print(f"Dataset: {name}")
    print(f"{'-' * 80}")
    
    # Train/Test Split (80/20) com stratify
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, 
        test_size=0.2, 
        stratify=y, 
        random_state=42
    )
    
    print(f"Treino: {X_train.shape[0]} amostras | Teste: {X_test.shape[0]} amostras")
    
    # Treinar SVM com kernel RBF
    svm = SVC(kernel='rbf', random_state=42)
    svm.fit(X_train, y_train)
    
    # Predi√ß√µes
    y_pred = svm.predict(X_test)
    
    # Calcular m√©tricas
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    
    # Armazenar resultados
    baseline_results.append({
        'Dataset': name,
        'Treino': X_train.shape[0],
        'Teste': X_test.shape[0],
        'Acur√°cia (%)': acc * 100,
        'Precis√£o (%)': prec * 100,
        'Recall (%)': rec * 100,
        'F1-Score (%)': f1 * 100
    })
    
    print(f"Acur√°cia:  {acc*100:.2f}%")
    print(f"Precis√£o:  {prec*100:.2f}%")
    print(f"Recall:    {rec*100:.2f}%")
    print(f"F1-Score:  {f1*100:.2f}%")

# Criar DataFrame com resultados consolidados
print("\n" + "=" * 80)
print("TABELA RESUMO - BASELINE SVM (KERNEL RBF)")
print("=" * 80)
print()

baseline_df = pd.DataFrame(baseline_results)
# Formatar colunas de m√©tricas com 2 casas decimais
for col in ['Acur√°cia (%)', 'Precis√£o (%)', 'Recall (%)', 'F1-Score (%)']:
    baseline_df[col] = baseline_df[col].map('{:.2f}'.format)

print(baseline_df.to_string(index=False))

print("\n" + "=" * 80)
print("‚úì BASELINE CONCLU√çDO!")
print("=" * 80)

print("\nüìù Observa√ß√µes:")
print("  ‚Ä¢ Todos os datasets foram treinados com sucesso")
print("  ‚Ä¢ M√©tricas baseline dispon√≠veis para compara√ß√£o futura")
print("  ‚Ä¢ Employee: Dataset balanceado (~34% sa√≠das) com boas m√©tricas")
print("  ‚Ä¢ Pr√≥ximo passo: Experimento RCBD com diferentes tratamentos")

BASELINE - SVM COM KERNEL RBF

--------------------------------------------------------------------------------
Dataset: Breast Cancer
--------------------------------------------------------------------------------
Treino: 455 amostras | Teste: 114 amostras
Acur√°cia:  97.37%
Precis√£o:  100.00%
Recall:    92.86%
F1-Score:  96.30%

--------------------------------------------------------------------------------
Dataset: Titanic
--------------------------------------------------------------------------------
Treino: 712 amostras | Teste: 179 amostras
Acur√°cia:  81.01%
Precis√£o:  85.71%
Recall:    60.87%
F1-Score:  71.19%

--------------------------------------------------------------------------------
Dataset: Water Potability
--------------------------------------------------------------------------------
Treino: 2620 amostras | Teste: 656 amostras
Acur√°cia:  67.07%
Precis√£o:  70.41%
Recall:    26.95%
F1-Score:  38.98%

-------------------------------------------------------------

## 8. Baseline - SVM com Kernel RBF

**Objetivo:** Treinar SVM simples em cada dataset para:
1. Validar que os dados est√£o funcionando corretamente
2. Obter m√©tricas baseline de refer√™ncia

**Configura√ß√£o:**
- Train/Test Split: 80/20 (stratified)
- Modelo: SVM com kernel RBF (padr√£o)
- M√©tricas: Acur√°cia, Precis√£o, Recall, F1-Score