# Supervised Learning Using Wine Dataset

This notebook contains sample code to predict the correct class using different machine learning methods. We will use the following:

- Naive Bayes (Gaussian)
- Decision Tree
- Random Forest
- SVM

All models in this code follow the same pattern: 

1. we get a "black box" version of the model using sklearn
2. the black box model is trained by giving it a set of features (training dataframe) and their corresponding labels (y value)
3. the trained model is used to predict classes for a different set of features (test dataframe). It will return an array for the predicted labels
4. The accuracy and confusion matrix are produced from the predicted labels to evaluate the model

## Importing Packages

In [1]:
# Naive Bayes
from sklearn.naive_bayes import GaussianNB

# Decision Tree
from sklearn import tree

# Random Forest
from sklearn.ensemble import RandomForestClassifier

# SVM
from sklearn.svm import SVC

# DataFrame
import pandas as pd

from sklearn.metrics import confusion_matrix

## Preparing Wine Dataset

In [2]:
# Load and preprocess wine dataset
wine_full_df = pd.read_csv("wine.csv")
wine_full_df.rename(columns={'OD280/OD315 of diluted wines': 'Wine Dilution'}, inplace=True)
wine_classes = ["1", "2", "3"]

wine_df = wine_full_df.drop(columns=['Wine Variety'])
wine_y = wine_full_df['Wine Variety'] 

wine_features = wine_df.columns.tolist()
wine_features

['Alcohol',
 'Malic acid',
 'Ash',
 'Alcalinity of ash',
 'Magnesium',
 'Total Phenols',
 'Flavanoids',
 'Nonflavanoid phenols',
 'Proanthocyanins',
 'Color intensity',
 'Hue',
 'Wine Dilution',
 'Proline']

In [3]:
# split training (75%) and test (25%) dataset

train_df = wine_df.sample(frac = 0.75, random_state = 0)
train_y = wine_y[train_df.index]

test_df = wine_df.drop(train_df.index)
test_y = wine_y[test_df.index]

train_df

Unnamed: 0,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total Phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,Wine Dilution,Proline
54,13.74,1.67,2.25,16.4,118,2.60,2.90,0.21,1.62,5.85,0.92,3.20,1060
151,12.79,2.67,2.48,22.0,112,1.48,1.36,0.24,1.26,10.80,0.48,1.47,480
63,12.37,1.13,2.16,19.0,87,3.50,3.10,0.19,1.87,4.45,1.22,2.87,420
55,13.56,1.73,2.46,20.5,116,2.96,2.78,0.20,2.45,6.25,0.98,3.03,1120
123,13.05,5.80,2.13,21.5,86,2.62,2.65,0.30,2.01,2.60,0.73,3.10,380
...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,13.62,4.95,2.35,20.0,92,2.00,0.80,0.47,1.02,4.40,0.91,2.05,550
42,13.88,1.89,2.59,15.0,101,3.25,3.56,0.17,1.70,5.43,0.88,3.56,1095
105,12.42,2.55,2.27,22.0,90,1.68,1.84,0.66,1.42,2.70,0.86,3.30,315
132,12.81,2.31,2.40,24.0,98,1.15,1.09,0.27,0.83,5.70,0.66,1.36,560


## Naive Bayes

In [4]:
gnb = GaussianNB()
gnb.fit(train_df, train_y)

In [5]:
# Make predictions for a sample wine observation
gnb_prediction = gnb.predict(test_df)
gnb_accuracy = gnb.score(test_df, test_y)
gnb_conf_matrix = confusion_matrix(test_y, gnb_prediction)

print(gnb_accuracy)
print(gnb_conf_matrix)

0.9545454545454546
[[14  1  0]
 [ 0 15  1]
 [ 0  0 13]]


## Decision Tree

In [6]:
dt_clf = tree.DecisionTreeClassifier()
dt_clf.fit(train_df, train_y)

In [7]:
# Make predictions for a sample wine observation
dt_prediction = dt_clf.predict(test_df)
dt_accuracy = dt_clf.score(test_df, test_y)
dt_conf_matrix = confusion_matrix(test_y, dt_prediction)

print(dt_accuracy)
print(dt_conf_matrix)

0.8863636363636364
[[13  2  0]
 [ 1 15  0]
 [ 0  2 11]]


## Random Forest

In [8]:
rf_clf = RandomForestClassifier(n_estimators=100)
rf_clf.fit(train_df, train_y)

In [9]:
# Make predictions for a sample wine observation
rf_prediction = rf_clf.predict(test_df)
rf_accuracy = rf_clf.score(test_df, test_y)
rf_conf_matrix = confusion_matrix(test_y, rf_prediction)

print(rf_accuracy)
print(rf_conf_matrix)

0.9772727272727273
[[14  1  0]
 [ 0 16  0]
 [ 0  0 13]]


## SVM

In [10]:
svm_clf = SVC(kernel='linear')
svm_clf.fit(train_df, train_y)

In [11]:
# Make predictions for a sample wine observation
svm_prediction = svm_clf.predict(test_df)
svm_accuracy = svm_clf.score(test_df, test_y)
svm_conf_matrix = confusion_matrix(test_y, svm_prediction)

print(svm_accuracy)
print(svm_conf_matrix)

0.9545454545454546
[[14  1  0]
 [ 0 15  1]
 [ 0  0 13]]
