# 🤖 Ensemble de Modelos - Comparativo Prático

Este notebook demonstra os principais tipos de Ensemble:
- Bagging (RandomForest)
- Boosting (GradientBoosting)
- Voting
- Stacking

Todos aplicados sobre o dataset Iris, com comparação de acurácia.


In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# Dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [2]:
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
print("RandomForest Accuracy:", accuracy_score(y_test, rf_pred))


RandomForest Accuracy: 1.0


In [3]:
gb = GradientBoostingClassifier(random_state=42)
gb.fit(X_train, y_train)
gb_pred = gb.predict(X_test)
print("GradientBoosting Accuracy:", accuracy_score(y_test, gb_pred))


GradientBoosting Accuracy: 1.0


In [4]:
voting = VotingClassifier(
    estimators=[
        ('rf', RandomForestClassifier(random_state=42)),
        ('gb', GradientBoostingClassifier(random_state=42)),
        ('gnb', GaussianNB())
    ],
    voting='hard'
)
voting.fit(X_train, y_train)
voting_pred = voting.predict(X_test)
print("VotingClassifier Accuracy:", accuracy_score(y_test, voting_pred))


VotingClassifier Accuracy: 1.0


In [5]:
base_models = [
    ('svc', SVC(probability=True)),
    ('dt', DecisionTreeClassifier())
]
meta_model = LogisticRegression()

stack = StackingClassifier(estimators=base_models, final_estimator=meta_model)
stack.fit(X_train, y_train)
stack_pred = stack.predict(X_test)
print("StackingClassifier Accuracy:", accuracy_score(y_test, stack_pred))


StackingClassifier Accuracy: 1.0


In [6]:
models = {
    "RandomForest": rf,
    "GradientBoosting": gb,
    "Voting": voting,
    "Stacking": stack
}

print("Comparação com Cross-Validation (5-fold):")
for name, model in models.items():
    scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
    print(f"{name}: {scores.mean():.4f}")


Comparação com Cross-Validation (5-fold):
RandomForest: 0.9667
GradientBoosting: 0.9600
Voting: 0.9667
Stacking: 0.9733


## ✅ Conclusões

- **Bagging (RF)** ajuda a reduzir variância.
- **Boosting (GB)** melhora o desempenho corrigindo erros.
- **Voting** é simples e eficaz se modelos forem complementares.
- **Stacking** pode superar os demais, mas é mais complexo.

Escolher o tipo de ensemble ideal depende do **problema**, **tempo de treino** e **necessidade de interpretabilidade**.
