#**About Dataset**
Wine quality prediction is a fascinating and impactful field within data science and machine learning. The ability to predict the quality of wine based on various chemical and physical features can be valuable for both consumers and producers, enabling better decision-making and quality control.

In this dataset, we aim to predict the quality of wine using a comprehensive set of features that reflect its chemical composition. Each sample is characterized by attributes such as alcohol content, malic acid levels, and total phenols, among others. Specifically, the dataset includes the following feature variables:

- **Alcohol**: The percentage of alcohol in the wine, which often correlates with its body and flavor profile.
- **Malic Acid**: A component that contributes to the wine's acidity and overall taste.
- **Ash**: The inorganic residue left after burning, which can impact the wine's flavor.
- **Alcalinity of Ash**: The alkalinity of the ash, which helps in understanding the wine's pH and stability.
- **Magnesium**: An essential mineral that can affect the wine's taste and mouthfeel.
- **Total Phenols**: Compounds that contribute to the wine's color, flavor, and mouthfeel.
- **Flavanoids**: A subclass of phenols that are particularly influential in the wine's taste and color.
- **Nonflavanoid Phenols**: Phenolic compounds that also play a role in the wine's sensory characteristics.
- **Proanthocyanins**: Polyphenolic compounds contributing to the astringency and color of the wine.
- **Color Intensity**: A measure of the wine's color depth, which can be indicative of its age and type.
- **Hue**: The color hue of the wine, reflecting its oxidative state and aging process.
- **OD280/OD315 of Diluted Wines**: An optical density measure that can be associated with the wine's quality and concentration of phenolic compounds.
- **Proline**: An amino acid that can influence the wine's flavor profile and overall quality.

By analyzing these features, the goal is to build predictive models that can accurately assess the quality of wine. This not only aids in quality assurance for producers but also provides valuable insights for consumers looking to select wines that match their preferences. Through sophisticated analysis and machine learning techniques, we can uncover patterns and relationships within the data that contribute to the overall quality of wine.

In [1]:
import warnings
warnings.filterwarnings('ignore')

#**Importing Libraries and Dataset**

In [43]:
import numpy as np
import pandas as pd
df=pd.read_csv("https://github.com/YBI-Foundation/Dataset/raw/main/Wine.csv")
df

Unnamed: 0,class_label,class_name,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280,proline
0,1,Barolo,14.23,1.71,2.43,15.6,127,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,Barolo,13.20,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050
2,1,Barolo,13.16,2.36,2.67,18.6,101,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185
3,1,Barolo,14.37,1.95,2.50,16.8,113,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480
4,1,Barolo,13.24,2.59,2.87,21.0,118,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,3,Barbera,13.71,5.65,2.45,20.5,95,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740
174,3,Barbera,13.40,3.91,2.48,23.0,102,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750
175,3,Barbera,13.27,4.28,2.26,20.0,120,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835
176,3,Barbera,13.17,2.59,2.37,20.0,120,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840


#**Descibing the dataset**

In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   class_label           178 non-null    int64  
 1   class_name            178 non-null    object 
 2   alcohol               178 non-null    float64
 3   malic_acid            178 non-null    float64
 4   ash                   178 non-null    float64
 5   alcalinity_of_ash     178 non-null    float64
 6   magnesium             178 non-null    int64  
 7   total_phenols         178 non-null    float64
 8   flavanoids            178 non-null    float64
 9   nonflavanoid_phenols  178 non-null    float64
 10  proanthocyanins       178 non-null    float64
 11  color_intensity       178 non-null    float64
 12  hue                   178 non-null    float64
 13  od280                 178 non-null    float64
 14  proline               178 non-null    int64  
dtypes: float64(11), int64(3

In [46]:
df.describe()

Unnamed: 0,class_label,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280,proline
count,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0
mean,1.938202,13.000618,2.336348,2.366517,19.494944,99.741573,2.295112,2.02927,0.361854,1.590899,5.05809,0.957449,2.611685,746.893258
std,0.775035,0.811827,1.117146,0.274344,3.339564,14.282484,0.625851,0.998859,0.124453,0.572359,2.318286,0.228572,0.70999,314.907474
min,1.0,11.03,0.74,1.36,10.6,70.0,0.98,0.34,0.13,0.41,1.28,0.48,1.27,278.0
25%,1.0,12.3625,1.6025,2.21,17.2,88.0,1.7425,1.205,0.27,1.25,3.22,0.7825,1.9375,500.5
50%,2.0,13.05,1.865,2.36,19.5,98.0,2.355,2.135,0.34,1.555,4.69,0.965,2.78,673.5
75%,3.0,13.6775,3.0825,2.5575,21.5,107.0,2.8,2.875,0.4375,1.95,6.2,1.12,3.17,985.0
max,3.0,14.83,5.8,3.23,30.0,162.0,3.88,5.08,0.66,3.58,13.0,1.71,4.0,1680.0


In [47]:
df.isna().sum()

Unnamed: 0,0
class_label,0
class_name,0
alcohol,0
malic_acid,0
ash,0
alcalinity_of_ash,0
magnesium,0
total_phenols,0
flavanoids,0
nonflavanoid_phenols,0


In [48]:
df.duplicated().sum()

0

In [50]:
df['class_label'].unique()

array([1, 2, 3])

In [49]:
df['class_name'].unique()

array(['Barolo', 'Grignolino', 'Barbera'], dtype=object)

#**Defining target variable(y) and feature variable(x)**

In [51]:
df.columns

Index(['class_label', 'class_name', 'alcohol', 'malic_acid', 'ash',
       'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids',
       'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue',
       'od280', 'proline'],
      dtype='object')

In [52]:
y=df['class_label']
x=df[['alcohol', 'malic_acid', 'ash',
       'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids',
       'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue',
       'od280', 'proline']]

#**Train-Test Split**

In [72]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split( x, y, train_size=0.6, random_state=2529 )

#**Model Selection**

In [73]:
from sklearn.svm import SVC
support_vector_classifier = SVC(kernel='linear')

#**Training the model**

In [74]:
support_vector_classifier.fit(x_train,y_train)

#**Testing the model**

In [75]:
y_pred = support_vector_classifier.predict(x_test)

#**Calculating the metrics**

In [76]:
from sklearn.metrics import confusion_matrix, classification_report
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           1       0.92      0.96      0.94        24
           2       0.96      0.89      0.93        28
           3       0.95      1.00      0.98        20

    accuracy                           0.94        72
   macro avg       0.94      0.95      0.95        72
weighted avg       0.95      0.94      0.94        72



In [77]:
confusion_matrix(y_test,y_pred)

array([[23,  1,  0],
       [ 2, 25,  1],
       [ 0,  0, 20]])