# **Wine Classification Using RandomForestClassifier**



---



## **Objective**

The objective of wine classification using the RandomForestClassifier algorithm is to develop a model that can accurately classify wines into different predefined classes or categories based on their characteristics. The goal is to build a predictive model that can effectively differentiate wines based on various features, such as acidity, pH levels, alcohol content, residual sugar, and more.

The RandomForestClassifier algorithm, which is an ensemble learning method, combines multiple decision trees to make classification predictions. By training the model on a labeled dataset, where wines are already categorized into classes (e.g., low, medium, and high quality), the algorithm learns patterns and relationships between the input features and the corresponding wine classes.

## **Data Source**

In this project, we are going to use the dataset available on github channel of YBI foundation. The URL is provided below:

https://github.com/YBI-Foundation/Dataset/blob/main/Wine.csv

## **Import Library**

In [None]:
import pandas as pd

## **Import Data**


In [None]:
wine= pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/Wine.csv')

## **Describe Data**

In [None]:
wine.head(8)

Unnamed: 0,class_label,class_name,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280,proline
0,1,Barolo,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,Barolo,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,Barolo,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,Barolo,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,Barolo,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
5,1,Barolo,14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450
6,1,Barolo,14.39,1.87,2.45,14.6,96,2.5,2.52,0.3,1.98,5.25,1.02,3.58,1290
7,1,Barolo,14.06,2.15,2.61,17.6,121,2.6,2.51,0.31,1.25,5.05,1.06,3.58,1295


In [None]:
wine.columns

Index(['class_label', 'class_name', 'alcohol', 'malic_acid', 'ash',
       'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids',
       'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue',
       'od280', 'proline'],
      dtype='object')

## **Describe Target variable 'y' and Features Variable 'x'**

In [None]:
y = wine['class_name']
x = wine.drop(['class_label','class_name'],axis = 1)

## **Train Test Split**

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,random_state=2529)

## **Modeling**

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

## **Model Evaluation**

In [None]:
model.fit(x_train,y_train)

## **Classification**

In [None]:
y_pred = model.predict(x_test)

In [None]:
y_pred

array(['Barbera', 'Barbera', 'Grignolino', 'Barolo', 'Grignolino',
       'Grignolino', 'Barolo', 'Barbera', 'Grignolino', 'Grignolino',
       'Grignolino', 'Grignolino', 'Barbera', 'Barolo', 'Barbera',
       'Barolo', 'Barbera', 'Barbera', 'Barolo', 'Grignolino', 'Barbera',
       'Grignolino', 'Barolo', 'Barbera', 'Barbera', 'Grignolino',
       'Grignolino', 'Barbera', 'Barolo', 'Grignolino', 'Grignolino',
       'Barolo', 'Grignolino', 'Barolo', 'Grignolino', 'Grignolino',
       'Barolo', 'Barbera', 'Grignolino', 'Barolo', 'Barolo', 'Barolo',
       'Barolo', 'Barbera', 'Grignolino'], dtype=object)

## **Model Accuracy**

In [None]:
from sklearn.metrics import classification_report,accuracy_score 

In [None]:
accuracy_score(y_test, y_pred)

0.9777777777777777

In [None]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

     Barbera       0.92      1.00      0.96        12
      Barolo       1.00      1.00      1.00        14
  Grignolino       1.00      0.95      0.97        19

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



## **Explaination**

We have tested this model using two different classification algorithms:


*   Logistic Regression
*   RandomForestClassifer

We have observed that the accuracy score is different in both the cases. 

*   Accuracy score using Logistic Regression is 95%.
*   Accuracy score using RandomForestClassifer is 97%.