<font color="red" size="6">Ensemble methods</font>
<p> <font color="Yellow" size="5"><b>8_Blending</font>

Blending is an ensemble learning technique similar to stacking, but it is typically simpler and faster to implement. The main difference between blending and stacking lies in how the base models' predictions are combined.

<font color="pink" size=4>Differences Between Blending and Stacking:</font>
<ol>
    <li><font color="orange">Data Split:</font>
        <ol><li><font color="violet">Blending:</font> In blending, the dataset is typically split into two parts. The first part is used to train the base models, and the second part (validation set) is used to generate predictions, which are then used to train the meta-model.</li>
        <li><font color="violet">Stacking:</font> Stacking typically uses k-fold cross-validation to train the base models and generate out-of-fold predictions for training the meta-model.</li></ol></li>
    <li><font color="orange">Complexity:</font>
        Blending is simpler and faster to implement than stacking because it uses a single validation set for meta-model training, whereas stacking requires cross-validation.</li>
    <li><font color="orange">Performance:</font>
        Stacking can sometimes yield better performance because it uses out-of-fold predictions, which help mitigate overfitting. Blending, on the other hand, might suffer from overfitting if the validation set is not large enough or representative of the test set.</li></ol>

In [7]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import numpy as np

In [8]:
# 1. Load the Wine dataset
data = load_wine()
X = data.data
y = data.target

# 2. Split the dataset into training and validation sets (80% train, 20% validation)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
# 3. Define base models
log_reg = LogisticRegression(max_iter=1000)
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier(random_state=42)

In [10]:
# 4. Train the base models
log_reg.fit(X_train, y_train)
knn.fit(X_train, y_train)
dt.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [11]:
# 5. Get predictions from each base model on the validation set
pred_log_reg = log_reg.predict(X_val)
pred_knn = knn.predict(X_val)
pred_dt = dt.predict(X_val)


In [12]:
# 6. Stack the predictions of the base models as features for the meta-model
X_meta = np.column_stack((pred_log_reg, pred_knn, pred_dt))


In [13]:
# 7. Train the meta-model (Logistic Regression)
meta_model = LogisticRegression(max_iter=1000)
meta_model.fit(X_meta, y_val)

In [14]:
# 8. Make predictions on the validation set using the base models and the meta-model
meta_pred = meta_model.predict(X_meta)


In [15]:
# 9. Evaluate the blended model
accuracy = accuracy_score(y_val, meta_pred)
print(f"Blended Model Accuracy: {accuracy:.4f}")

# Display the classification report and confusion matrix
print("\nBlended Model Classification Report:")
print(classification_report(y_val, meta_pred))

print("\nBlended Model Confusion Matrix:")
print(confusion_matrix(y_val, meta_pred))

Blended Model Accuracy: 0.9722

Blended Model Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.93      0.96        14
           1       0.93      1.00      0.97        14
           2       1.00      1.00      1.00         8

    accuracy                           0.97        36
   macro avg       0.98      0.98      0.98        36
weighted avg       0.97      0.97      0.97        36


Blended Model Confusion Matrix:
[[13  1  0]
 [ 0 14  0]
 [ 0  0  8]]
