# Wine Quality Classification

In this notebook, we will demonstrate how to load the Wine Quality dataset, preprocess it, and apply a Random Forest classifier to predict the quality of wines.

In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

## Step 1: Load the Wine Quality Dataset

We will load the Wine Quality dataset from the UCI Machine Learning Repository and display the first few rows of the dataset.

In [2]:
# Load the Wine Quality dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
wine_data = pd.read_csv(url, sep=';')

In [3]:
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(wine_data.head())

First few rows of the dataset:
   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.4              0.70         0.00             1.9      0.076   
1            7.8              0.88         0.00             2.6      0.098   
2            7.8              0.76         0.04             2.3      0.092   
3           11.2              0.28         0.56             1.9      0.075   
4            7.4              0.70         0.00             1.9      0.076   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 11.0                  34.0   0.9978  3.51       0.56   
1                 25.0                  67.0   0.9968  3.20       0.68   
2                 15.0                  54.0   0.9970  3.26       0.65   
3                 17.0                  60.0   0.9980  3.16       0.58   
4                 11.0                  34.0   0.9978  3.51       0.56   

   alcohol  quality  
0      9.4        5  
1      9.8 

## Step 2: Preprocess the Data

For simplicity, we'll consider wine quality as binary: 'good' (1) or 'not good' (0).

In [4]:
# Preprocess the data
# Convert wine quality into binary labels
wine_data['quality'] = wine_data['quality'].apply(lambda x: 1 if x >= 7 else 0)

## Step 3: Split the Dataset

Next, we'll separate features (X) and the target variable (y), and split the dataset into training and testing sets.

In [5]:
# Separate features (X) and target variable (y)
X = wine_data.drop('quality', axis=1)
y = wine_data['quality']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Apply Random Forest Classifier

Now, let's initialize a Random Forest classifier, train it on the training data, and predict the labels for the test set.

In [6]:
# Initialize the Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier on the training data
rf_classifier.fit(X_train, y_train)

# Predict the labels for the test set
y_pred = rf_classifier.predict(X_test)

## Step 5: Evaluate the Classifier

Finally, let's calculate the accuracy of the classifier.

In [7]:
# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)