#Wine Quality Prediction through SVM and LR
- Data Exploration and Preprocessing

    - Explore the dataset thoroughly and provide a summary of your observations.
    - Perform necessary preprocessing steps:
        * Preparing feature values to be used by your models.
        * Optionally, data augmentation techniques.
        * Splitting the data into training and test sets appropriately.

- SVM and LR Implementation
    - Implement both SVM and LR from scratch. Evaluate and compare their performance.
    - Clearly define the two models and describe your implementation, also listing their hyperparameters if any.
    - Train the two models using an appropriate performance metric.
    - Demonstrate proper hyperparameter tuning, and evaluate at least one of your models using accuracy estimates via 5-fold cross-validation.

- Kernel Methods
    - Extend the above models to a kernelized form by adopting non-linear kernels.
    - Clearly describe how the kernelization happens and its consequences for both predictions and performance.
    - Comment on how the kernelized models compare with respect to the standard ones.

- Evaluation and Analysis
    - Evaluate your model performance using suitable metrics such as accuracy, precision, recall, and F1-score.
    - Provide appropriate visualizations of the performance of each model (loss and accuracy).
    - When reasonable, conduct an analysis of misclassified examples to understand potential model limitations.
    - Discuss the presence or absence of overfitting and underfitting at any point.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
#to output plots within the nootbook
%matplotlib inline

#!git clone https://github.com/Abudo-S/WineQualityPrediction.git

In [7]:
red_wine_quality = pd.read_csv("/content/WineQualityPrediction/wine+quality/winequality-red.csv", sep=';')
red_wine_quality.info()

white_wine_quality = pd.read_csv("/content/WineQualityPrediction/wine+quality/winequality-white.csv", sep=';')
white_wine_quality.info()
#should we mix both datasets into a single dataset introducing and new field wins_type = 'Red', 'White'?

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4898 entries, 0 to 4897
Data columns (total 12 columns):
 #   Column        

In [11]:
red_wine_quality.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [12]:
white_wine_quality.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
