
# Wine Dataset Classification with Various Models

This notebook demonstrates classification of the Wine dataset using multiple machine learning models.
The objective is to compare the performance of different models on the Wine dataset to find the most
accurate and efficient model for classifying wine types based on their characteristics.

## Models Used
- Logistic Regression
- Random Forest
- XGBoost
- K-Nearest Neighbors (KNN)

Each model is trained on the same dataset and evaluated using accuracy score and classification report.
This notebook also covers data preprocessing steps, such as scaling, to ensure fair comparison across models.


# Wine Dataset Classification with Various Models

This notebook demonstrates the use of different machine learning models on the Wine dataset. The models used are Logistic Regression, Random Forest, XGBoost, and K-Nearest Neighbors.

In [1]:

# Load the Wine dataset and split it into training and testing sets
data = load_wine()
X = data.data
y = data.target

# Split the data with 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

import pandas as pd
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report


## Load the Wine Dataset

In [2]:

# Load the Wine dataset
data = load_wine()
X = data.data
y = data.target


## Split the Dataset into Training and Testing Sets

In [3]:

# Load the Wine dataset and split it into training and testing sets
data = load_wine()
X = data.data
y = data.target

# Split the data with 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)


## Scale the Features

In [4]:

# Standardize features by removing the mean and scaling to unit variance.
# This ensures that each feature contributes equally to the model performance.
scaler = StandardScaler()

# Fit the scaler on the training data and transform both training and test sets.
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## Logistic Regression

In [5]:

# Logistic Regression Model
# Training and evaluating the logistic regression model on the scaled dataset
logreg = LogisticRegression(max_iter=1000, random_state=42)
logreg.fit(X_train, y_train)
y_pred_logreg = logreg.predict(X_test)

# Evaluate the model performance
print("Logistic Regression Accuracy:", accuracy_score(y_test, y_pred_logreg))
print("Classification Report for Logistic Regression:\n", classification_report(y_test, y_pred_logreg))

logistic_regression = LogisticRegression(max_iter=200)
logistic_regression.fit(X_train_scaled, y_train)
y_pred_lr = logistic_regression.predict(X_test_scaled)

print("Modelo: Logistic Regression")
print(f"Acurácia: {accuracy_score(y_test, y_pred_lr):.2f}")
print("Relatório de Classificação:")
print(classification_report(y_test, y_pred_lr))
print("-" * 50)


Modelo: Logistic Regression
Acurácia: 0.98
Relatório de Classificação:
              precision    recall  f1-score   support

           0       0.95      1.00      0.97        18
           1       1.00      0.95      0.98        21
           2       1.00      1.00      1.00        15

    accuracy                           0.98        54
   macro avg       0.98      0.98      0.98        54
weighted avg       0.98      0.98      0.98        54

--------------------------------------------------


## Random Forest

In [6]:

random_forest = RandomForestClassifier()
random_forest.fit(X_train, y_train)
y_pred_rf = random_forest.predict(X_test)

print("Modelo: Random Forest")
print(f"Acurácia: {accuracy_score(y_test, y_pred_rf):.2f}")
print("Relatório de Classificação:")
print(classification_report(y_test, y_pred_rf))
print("-" * 50)


Modelo: Random Forest
Acurácia: 1.00
Relatório de Classificação:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        21
           2       1.00      1.00      1.00        15

    accuracy                           1.00        54
   macro avg       1.00      1.00      1.00        54
weighted avg       1.00      1.00      1.00        54

--------------------------------------------------


## XGBoost

In [7]:

xgboost = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss')
xgboost.fit(X_train, y_train)
y_pred_xgb = xgboost.predict(X_test)

print("Modelo: XGBoost")
print(f"Acurácia: {accuracy_score(y_test, y_pred_xgb):.2f}")
print("Relatório de Classificação:")
print(classification_report(y_test, y_pred_xgb))
print("-" * 50)


Parameters: { "use_label_encoder" } are not used.



Modelo: XGBoost
Acurácia: 1.00
Relatório de Classificação:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        21
           2       1.00      1.00      1.00        15

    accuracy                           1.00        54
   macro avg       1.00      1.00      1.00        54
weighted avg       1.00      1.00      1.00        54

--------------------------------------------------


## K-Nearest Neighbors

In [8]:

knn = KNeighborsClassifier()
knn.fit(X_train_scaled, y_train)
y_pred_knn = knn.predict(X_test_scaled)

print("Modelo: KNN")
print(f"Acurácia: {accuracy_score(y_test, y_pred_knn):.2f}")
print("Relatório de Classificação:")
print(classification_report(y_test, y_pred_knn))
print("-" * 50)


Modelo: KNN
Acurácia: 0.94
Relatório de Classificação:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      0.86      0.92        21
           2       0.83      1.00      0.91        15

    accuracy                           0.94        54
   macro avg       0.94      0.95      0.94        54
weighted avg       0.95      0.94      0.94        54

--------------------------------------------------
