## WineQuality-Red Dataset

The "winequality-red.csv" dataset contains various physicochemical properties of red wine samples along with their quality ratings. This dataset is often used for regression or classification tasks.

Here are the details of its columns:

1. **Fixed acidity**: The amount of fixed acids in the wine.
2. **Volatile acidity**: The amount of volatile acids in the wine, which contribute to vinegar-like flavors.
3. **Citric acid**: The amount of citric acid in the wine, which can add freshness and flavor.
4. **Residual sugar**: The amount of residual sugar left after fermentation.
5. **Chlorides**: The amount of chlorides in the wine, which can contribute to saltiness.
6. **Free sulfur dioxide**: The amount of free sulfur dioxide in the wine, which acts as an antioxidant and antimicrobial agent.
7. **Total sulfur dioxide**: The total amount of sulfur dioxide present in the wine.
8. **Density**: The density of the wine, which is related to its alcohol content.
9. **pH**: The pH level of the wine, which indicates its acidity or basicity.
10. **Sulphates**: The amount of sulphates in the wine, which can contribute to its preservation and flavor enhancement.
11. **Alcohol**: The alcohol content of the wine.
12. **Quality**: The quality rating of the wine, ranging from 3 to 8. This is the target variable.

Each row in the dataset represents a single red wine sample, with its corresponding physicochemical properties and quality rating.







## Load the required libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

## Load the dataset

In [None]:
dataset = pd.read_csv("winequality-red.csv")
print(dataset.columns)

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='object')


In [None]:
print(dataset.head())

   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.4              0.70         0.00             1.9      0.076   
1            7.8              0.88         0.00             2.6      0.098   
2            7.8              0.76         0.04             2.3      0.092   
3           11.2              0.28         0.56             1.9      0.075   
4            7.4              0.70         0.00             1.9      0.076   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 11.0                  34.0   0.9978  3.51       0.56   
1                 25.0                  67.0   0.9968  3.20       0.68   
2                 15.0                  54.0   0.9970  3.26       0.65   
3                 17.0                  60.0   0.9980  3.16       0.58   
4                 11.0                  34.0   0.9978  3.51       0.56   

   alcohol  quality  
0      9.4        5  
1      9.8        5  
2      9.8        5 

In [None]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
print(X.shape)

(1599, 11)


## Take care of missing values

In [None]:
null_values = dataset.isnull().sum()
print(null_values)

fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64


In [None]:
nan_values = dataset.isna().sum()
print(nan_values)

fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64


In [None]:
zero_values = (dataset == 0).sum()
print(zero_values)

fixed acidity             0
volatile acidity          0
citric acid             132
residual sugar            0
chlorides                 0
free sulfur dioxide       0
total sulfur dioxide      0
density                   0
pH                        0
sulphates                 0
alcohol                   0
quality                   0
dtype: int64


In [None]:
# Some wine may not contain citric acid at all, so the dataset doesn't have any missing values.

# Encoding categorical data

In [None]:
# This dataset doesn't contain any categorical data, so no need to encode any data.

## Split the data into train and test set

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
print(X_train.shape)
print(X_test.shape)

(1279, 11)
(320, 11)


## Feature Scaling

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)