# Red Wine Quality with ANNs
Given wine dataset contains information about Portuguese *Vinho Verde* wine. This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality.

### Column descriptions
- **fixed acidity**: in wine refers to the nonvolatile acids that do not evaporate easily and remain dissolved in the wine, contributing directly to its taste, structure, and freshness;

- **volatile acidity**:  the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste;

- **citric acid**: found in small quantities, citric acid can add 'freshness' and flavor to wines;

- **residual sugar**: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter;

- **chlorides**: the amount of salt in the wine;

- **free sulfur dioxide**: Unbound sulfur dioxide that protects wine from oxidation and microbes; effectiveness depends on pH;

- **total sulfur dioxide**:  Sum of free and bound sulfur dioxide; high levels (>50 ppm free SO₂) can affect wine aroma and taste;

- **density**: the density of wine is close to that of water depending on the percent alcohol and sugar content;


- **pH**: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale;

- **sulphates**: Sulphates: Wine additives that help preserve wine by contributing to SO₂ levels; act as antioxidants and antimicrobials;

- **alcohol**: Ethanol content (% vol) in wine; influences body, warmth, and overall quality;

- **quality**: Wine taste score rated by sensory experts, typically from 0 (worst) to 10 (best);

*In this notebook we will create NN model that will rate wine on scale (3-8)!*


### Load wine dataset

In [None]:
import pandas as pd

wine_df = pd.read_csv('./../datasets/red-wine-quality.csv')
wine_df.head(n=5)

In [None]:
wine_df.describe()

### Matrix of correlations

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,5))
sns.heatmap(wine_df.corr(), cmap='coolwarm', annot=True)

There is no correlation between *quality* and *residual sugar*, *free sulfur dioxide*, *pH*.

In [None]:
wine_df.drop(columns=['residual sugar', 'free sulfur dioxide', 'pH'], inplace=True)

In [None]:
wine_df.columns

### Prepare training and testing datasets

In [None]:
from sklearn.model_selection import train_test_split

features = ['fixed acidity', 'volatile acidity', 'citric acid', 'chlorides',
       'total sulfur dioxide', 'density', 'sulphates', 'alcohol']
target = 'quality'

X = wine_df[features]
y = wine_df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=32, shuffle=True)

Apply scaling to features

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Create tensors

In [None]:
from torch.utils.data import DataLoader, TensorDataset
import torch

X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)
y_test = torch.tensor(y_test.values, dtype=torch.float32).view(-1, 1)

train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

train_loader = DataLoader(train_dataset, 128, True)
test_loader = DataLoader(test_dataset, 32, True) 

### Create Wine Neural Network

In [None]:
import torch.nn as nn

class WineNN(nn.Module):

    def __init__(self):
        super().__init__()
        
        self.layers = nn.Sequential(
            nn.Linear(8, 64),
            nn.ReLU(),

            nn.Linear(64, 16),
            nn.ReLU(),

            nn.Linear(16, 1)
        )

    def forward(self, x):
        return self.layers(x)

In [None]:
model = WineNN()

### Train Neural Network Model

In [None]:
import torch.optim as optim

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
n_epochs = 100

for _ in range(n_epochs):
    model.train()

    for x_batch, y_batch in train_loader:
        optimizer.zero_grad()
        outputs = model(x_batch)
        loss = criterion(outputs, y_batch)
        
        loss.backward()

        optimizer.step()


### Evaluate Neural Network Model

In [None]:
import numpy as np 

model.eval()
n_elements = 0
absolute_error = 0
squarred_error = 0

y_true = []
y_pred = []

with torch.no_grad():
    for x_batch, y_batch in test_loader:
        outputs = model(x_batch)

        n_elements += len(y_batch)
        squarred_error += ((y_batch - outputs)**2).sum().item()
        absolute_error += (y_batch - outputs).abs().sum().item()
        
        y_true.extend(y_batch.numpy().reshape(-1))
        y_pred.extend(outputs.numpy().reshape(-1))

y_true = np.array(y_true)
y_pred = np.array(y_pred)

In [None]:
plt.scatter(y_true, y_pred)

In [None]:
print('Mean Squarred Error:', (squarred_error /  n_elements))
print('Mean Absolute Error:', (absolute_error /  n_elements))
