# Classification with Iris and Wine Datasets

This notebook demonstrates how to classify the Iris and Wine datasets using `RandomForestClassifier` and `GradientBoostingClassifier` from scikit-learn.

The notebook is divided into two problems:
1. Classification with the Iris dataset
2. Classification with the Wine dataset

## Problem 1: Classification with the Iris Dataset

**Objective:** Learn how to classify the Iris dataset using `RandomForestClassifier` and `GradientBoostingClassifier`.

In [1]:
# Step 1: Setup and Import Libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report

In [2]:
# Step 2: Load the Iris Dataset
# Load the Iris dataset
iris_data = load_iris()
X_iris, y_iris = iris_data.data, iris_data.target

In [3]:
iris_data 

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [4]:
# Step 3: Split the Data into Training and Test Sets
# Split Iris dataset
X_iris_train, X_iris_test, y_iris_train, y_iris_test = train_test_split(
    X_iris, y_iris, test_size=0.3, random_state=42)

In [5]:
# Step 4: Train the Classifiers
# Train classifiers on Iris dataset
rf_iris = RandomForestClassifier(random_state=42)
gb_iris = GradientBoostingClassifier(random_state=42)

rf_iris.fit(X_iris_train, y_iris_train)
gb_iris.fit(X_iris_train, y_iris_train)

In [6]:
# Step 5: Evaluate the Models
# Evaluate on Iris dataset
y_iris_pred_rf = rf_iris.predict(X_iris_test)
y_iris_pred_gb = gb_iris.predict(X_iris_test)

print('Iris Dataset - Random Forest')
print(f'Accuracy: {accuracy_score(y_iris_test, y_iris_pred_rf):.2f}')
print(classification_report(y_iris_test, y_iris_pred_rf))

print('Iris Dataset - Gradient Boosting')
print(f'Accuracy: {accuracy_score(y_iris_test, y_iris_pred_gb):.2f}')
print(classification_report(y_iris_test, y_iris_pred_gb))

Iris Dataset - Random Forest
Accuracy: 1.00
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Iris Dataset - Gradient Boosting
Accuracy: 1.00
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



## Problem 2: Classification with the Wine Dataset

**Objective:** Learn how to classify the Wine dataset using `RandomForestClassifier` and `GradientBoostingClassifier`.

In [7]:
# Step 1: Setup and Import Libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report

In [8]:
# Step 2: Load the Wine Dataset
# Load the Wine dataset
wine_data = load_wine()
X_wine, y_wine = wine_data.data, wine_data.target

In [9]:
wine_data

{'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
         1.065e+03],
        [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
         1.050e+03],
        [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
         1.185e+03],
        ...,
        [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
         8.350e+02],
        [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
         8.400e+02],
        [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
         5.600e+02]]),
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

In [10]:
# Step 3: Split the Data into Training and Test Sets
# Split Wine dataset
X_wine_train, X_wine_test, y_wine_train, y_wine_test = train_test_split(
    X_wine, y_wine, test_size=0.3, random_state=42)

In [11]:
# Step 4: Train the Classifiers
# Train classifiers on Wine dataset
rf_wine = RandomForestClassifier(random_state=42)
gb_wine = GradientBoostingClassifier(random_state=42)

rf_wine.fit(X_wine_train, y_wine_train)
gb_wine.fit(X_wine_train, y_wine_train)

In [12]:
# Step 5: Evaluate the Models
# Evaluate on Wine dataset
y_wine_pred_rf = rf_wine.predict(X_wine_test)
y_wine_pred_gb = gb_wine.predict(X_wine_test)

print('Wine Dataset - Random Forest')
print(f'Accuracy: {accuracy_score(y_wine_test, y_wine_pred_rf):.2f}')
print(classification_report(y_wine_test, y_wine_pred_rf))

print('Wine Dataset - Gradient Boosting')
print(f'Accuracy: {accuracy_score(y_wine_test, y_wine_pred_gb):.2f}')
print(classification_report(y_wine_test, y_wine_pred_gb))

Wine Dataset - Random Forest
Accuracy: 1.00
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        21
           2       1.00      1.00      1.00        14

    accuracy                           1.00        54
   macro avg       1.00      1.00      1.00        54
weighted avg       1.00      1.00      1.00        54

Wine Dataset - Gradient Boosting
Accuracy: 0.91
              precision    recall  f1-score   support

           0       0.86      1.00      0.93        19
           1       0.90      0.90      0.90        21
           2       1.00      0.79      0.88        14

    accuracy                           0.91        54
   macro avg       0.92      0.90      0.90        54
weighted avg       0.91      0.91      0.91        54



### Key Learning Points

- **Loading Data**: How to work with built-in datasets from `scikit-learn`.
- **Model Training**: Understanding the basics of Random Forest and Gradient Boosting.
- **Evaluation**: Using accuracy and classification reports to assess model performance.