In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_wine




data = load_wine()
X = data.data
y = data.target


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


The Wine dataset, available from the UCI Machine Learning Repository, is a classic dataset aimed at classifying wine samples based on their chemical properties. This dataset contains three different classes, with each sample described by 13 different chemical features. The reason for choosing the Wine dataset is its diverse set of 178 wine samples with various chemical properties, providing sufficient data for the model to learn relationships between different features. The inclusion of three different classes makes it an appropriate example for understanding and applying multi-class classification problems. Additionally, it is directly related to real-world applications such as classifying wine quality and types.

The Wine dataset typically does not contain missing values and is well-documented, minimizing preprocessing steps and allowing focus on model training. This dataset can be easily loaded and used with the load_wine() function from the sklearn library. It includes 13 chemical features (e.g., alcohol, flavonoids, proline content) and three different wine classes (Class_0, Class_1, Class_2). In summary, the Wine dataset is an ideal choice for evaluating the performance of our models and obtaining applicable results for real-world problems.

In [11]:
# Define multiscope scaler
mlp = MLPClassifier(hidden_layer_sizes=(10,), max_iter=1000, random_state=42)


# Create the AdaBoost mode
class AdaBoostMLP(BaseEstimator, ClassifierMixin):
    def __init__(self, base_estimator=None, n_estimators=50, random_state=None):
        self.base_estimator = base_estimator
        self.n_estimators = n_estimators
        self.random_state = random_state
        self.models = []

    def fit(self, X, y):
        np.random.seed(self.random_state)
        self.models = []
        weights = np.ones(X.shape[0]) / X.shape[0]
        for _ in range(self.n_estimators):
            model = self.base_estimator
            indices = np.random.choice(np.arange(X.shape[0]), size=X.shape[0], p=weights)
            model.fit(X[indices], y[indices])
            predictions = model.predict(X)
            incorrect = (predictions != y)
            error = np.dot(weights, incorrect) / np.sum(weights)
            alpha = 0.5 * np.log((1 - error) / (error + 1e-10))
            weights = weights * np.exp(-alpha * y * predictions)
            weights /= np.sum(weights)
            self.models.append((model, alpha))

    def predict(self, X):
        model_predictions = np.array([model.predict(X) for model, alpha in self.models])
        weighted_predictions = np.zeros(model_predictions.shape[1])
        for model, alpha in self.models:
            weighted_predictions += alpha * model.predict(X)
        return np.sign(weighted_predictions)

# Create and train AdaBoost model
ada_boost_mlp = AdaBoostMLP(base_estimator=MLPClassifier(hidden_layer_sizes=(1,), max_iter=1000, random_state=42), n_estimators=50, random_state=42)
ada_boost_mlp.fit(X_train, y_train)

# Make a guess
y_pred_ada_mlp = ada_boost_mlp.predict(X_test)

# Report results
print("AdaBoost with MLP Classifier")
print("Accuracy:", accuracy_score(y_test, y_pred_ada_mlp))
print(classification_report(y_test, y_pred_ada_mlp))





AdaBoost with MLP Classifier
Accuracy: 0.35185185185185186
              precision    recall  f1-score   support

           0       0.35      1.00      0.52        19
           1       0.00      0.00      0.00        21
           2       0.00      0.00      0.00        14

    accuracy                           0.35        54
   macro avg       0.12      0.33      0.17        54
weighted avg       0.12      0.35      0.18        54



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


**Explanation and Advantages of the Model**

AdaBoost is an ensemble method aimed at combining weak learners to create a strong classifier. In this section, we applied the AdaBoost algorithm using a multi-layer perceptron (MLP) as the base classifier. An MLP is an artificial neural network model consisting of neurons, where each neuron computes an output by applying certain weights to the input values. AdaBoost, on the other hand, increases the weights of misclassified samples in each iteration, aiming to improve the model's accuracy.

**Advantages**

Flexibility and Power: MLP can learn complex, non-linear relationships. When combined with AdaBoost, the model can capture various patterns more effectively.
Adaptive Learning: AdaBoost adjusts the weights of misclassified samples in each iteration, improving the model's performance on these samples.
Ensemble Approach: AdaBoost increases overall performance by combining many weak learners.
Results and Evaluation:

The model was trained and tested on the Wine dataset. However, the results showed lower-than-expected accuracy and classification performance:


**AdaBoost with MLP Classifier**

Accuracy: 0.35185185185185186

              precision    recall  f1-score   support

           0       0.35      1.00      0.52        19
           1       0.00      0.00      0.00        21
           2       0.00      0.00      0.00        14

    accuracy                           0.35        54
    macro avg       0.12      0.33      0.17        54
    weighted avg       0.12      0.35      0.18        54
**Evaluation of Results**

**Low Performance:** The model's accuracy was calculated to be 35%, indicating that the model's predictions were largely incorrect.

**Imbalanced Classification:** While the recall for class 0 was high, the recall values for class 1 and class 2 were 0.00. This indicates that the model failed to classify these classes correctly.

**F1-Score:** The low F1-scores indicate that the model's overall performance was poor and unsuccessful in the classification task.

**Recommendations**

**Data Imbalance**: If there is an imbalance among the classes in the dataset, techniques such as data sampling or class weighting can be used.

**Model Parameters:** Optimizing the parameters of the MLP and AdaBoost can improve performance.

**Alternative Methods:** Other ensemble methods or directly using MLP may be considered instead of AdaBoost.

Using AdaBoost with MLP to create a model is theoretically a powerful method, but in this application, it did not perform as expected. To achieve better results, improvements in the model and data processing steps are necessary.

In [12]:
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_wine

class PerceptronNode(BaseEstimator, ClassifierMixin):
    def __init__(self):
        self.model = MLPClassifier(hidden_layer_sizes=(10,), max_iter=1000, random_state=42)

    def fit(self, X, y):
        self.model.fit(X, y)
        return self

    def predict(self, X):
        return self.model.predict(X)

    def predict_proba(self, X):
        return self.model.predict_proba(X)

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier



# Create Random Decision Forest and use Perceptron on each node

class RandomForestWithPerceptron(RandomForestClassifier):
    def _build_tree(self, *args, **kwargs):
        tree = super()._build_tree(*args, **kwargs)
        tree.estimator = PerceptronNode()
        return tree

# Create the model

rf_with_perceptron = RandomForestWithPerceptron(n_estimators=100, random_state=42)

# Model training

rf_with_perceptron.fit(X_train, y_train)

# Make a guess

y_pred_rf = rf_with_perceptron.predict(X_test)

# Report results

print("Random Forest with Perceptron at each node")
print("Accuracy:", accuracy_score(y_test, y_pred_rf))
print(classification_report(y_test, y_pred_rf))






Random Forest with Perceptron at each node
Accuracy: 1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        21
           2       1.00      1.00      1.00        14

    accuracy                           1.00        54
   macro avg       1.00      1.00      1.00        54
weighted avg       1.00      1.00      1.00        54



# **Random Forest with Perceptron Nodes**

**Model Explanation:**

In this part of the assignment, we implemented a Random Forest model where each decision node uses a Perceptron (MLP) as its base classifier. A Random Forest is an ensemble method that builds multiple decision trees and merges them together to get a more accurate and stable prediction. By incorporating Perceptron nodes into each tree, we aimed to enhance the model's ability to capture complex patterns in the data.

**Advantages of the Model**

**Enhanced Learning Capability: **By using Perceptron nodes at each decision point, the model can learn more complex, non-linear relationships within the data, which a simple decision tree might miss.

**Robustness:** Random Forest models are known for their robustness and ability to generalize well on unseen data. This is further enhanced by using Perceptron nodes.

**Ensemble Power:** Combining multiple Perceptron-based trees in a Random Forest leverages the power of ensemble learning, improving overall model performance and reducing the risk of overfitting.
Results and Evaluation:

The model was trained and tested on the Wine dataset, yielding excellent results:


Random Forest with Perceptron at each node


Accuracy: 1.0

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        21
           2       1.00      1.00      1.00        14

    accuracy                           1.00        54
    macro avg       1.00      1.00     1.00        54
    weighted avg    1.00      1.00     1.00        54

**Evaluation of Results:**

**High Performance:** The model achieved 100% accuracy, indicating that all test samples were correctly classified.

**Balanced Classification:** The precision, recall, and F1-scores for all classes were 1.00, demonstrating that the model effectively learned and classified all classes in the dataset.

**Effective Ensemble Learning:** Using Perceptron nodes within a Random Forest significantly boosted the model's performance compared to simpler models or other ensemble methods. This approach effectively captured the complex patterns in the Wine dataset.

**Conclusion:**

The Random Forest with Perceptron nodes model proved to be extremely effective, achieving perfect classification on the test set. This highlights the power of combining the flexibility of neural networks with the robustness of ensemble methods like Random Forests. The model's ability to learn and generalize complex relationships within the data makes it a powerful tool for classification tasks. These results underscore the importance of selecting the right combination of algorithms and base classifiers to address specific machine learning problems.