# Wine Quality Dataset: Predicting Wine Quality with a Feedforward Neural Network  
**Date:** February 4, 2025  
**Author:** Dario Piga  

In this notebook, we will implement a **feedforward neural network (FNN)** using PyTorch to predict the **quality of red wine** based on various **chemical properties**. The **Wine Quality Dataset** is a well-known real-world dataset for regression tasks.

---

## **The Wine Quality Dataset**  

The **Wine Quality Dataset** comes from the UCI Machine Learning Repository and contains **1,599 samples** of red wines, each described by **11 numerical features**. The goal is to predict the **wine quality score** (on a scale from 0 to 10) based on its **physicochemical properties**.

### **🔹 Features in the Dataset**
| **Feature Name**         | **Description**                                  |
|--------------------------|--------------------------------------------------|
| `fixed acidity`         | Tartaric acid content (g/dm³)                     |
| `volatile acidity`      | Acetic acid content (g/dm³)                       |
| `citric acid`           | Citric acid content (g/dm³)                       |
| `residual sugar`        | Sugar remaining after fermentation (g/dm³)        |
| `chlorides`             | Salt content (g/dm³)                              |
| `free sulfur dioxide`   | Free SO₂ in mg/L                                  |
| `total sulfur dioxide`  | Total SO₂ in mg/L                                 |
| `density`               | Density of the wine (g/cm³)                       |
| `pH`                    | Acidity level (pH scale)                          |
| `sulphates`             | Sulfate content (g/dm³)                           |
| `alcohol`               | Alcohol percentage (%)                            |
| **Target: `quality`**   | Wine quality score (integer between 0 and 10)    |

---

## **Implementation Steps**
1. **Load the Wine Quality dataset** from UCI 
2. **Preprocess the data** (normalize and convert to tensors).
3. **Define a feedforward neural network** for regression.
4. **Define training loss (MSE Loss)**
5. **Train the model** 
6. **Evaluate the model** performance.



In [8]:
## **1. Import Required Libraries**
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)


In [9]:
# Load dataset
path = 'data\winequality-red.csv' # path where your dataset is saved

df = pd.read_csv(path, sep=";")

# Display dataset information
print(df.head())
print(df.info())

   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.4              0.70         0.00             1.9      0.076   
1            7.8              0.88         0.00             2.6      0.098   
2            7.8              0.76         0.04             2.3      0.092   
3           11.2              0.28         0.56             1.9      0.075   
4            7.4              0.70         0.00             1.9      0.076   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 11.0                  34.0   0.9978  3.51       0.56   
1                 25.0                  67.0   0.9968  3.20       0.68   
2                 15.0                  54.0   0.9970  3.26       0.65   
3                 17.0                  60.0   0.9980  3.16       0.58   
4                 11.0                  34.0   0.9978  3.51       0.56   

   alcohol  quality  
0      9.4        5  
1      9.8        5  
2      9.8        5 

In [10]:
# Define features and target
features = df.columns[:-1]  # All columns except the last one (quality)
target = "quality"  # Target variable (wine score from 0 to 10)

X = df[features].values
y = df[target].values.reshape(-1, 1)  # Ensure y is a column vector

## Step 1. Make sure you have understood the structure of your dataset.
### Note that the input features are stored in the variable X and the target in y

## Step 2. Split your dataset in training and test data,  normalize your variables, and convert your variables to torch tensors

### Note: All tensors must have elements in float32 format, as PyTorch models require this data type to compute gradients.

In [11]:
# Write your code here
...

## Step 3. Define your FeedForward Neural Network

In [12]:
# Write your solution here (you can complete the code below or rewrite it from scratch)

class FeedforwardNN(nn.Module):
    def __init__(self, input_dim, hidden1, hidden2):
        super(FeedforwardNN, self).__init__()
        ...

    def forward(self, x):
        ...
        y = ...
        return y


model = FeedforwardNN(input_dim = X_train.shape[1], hidden1=12, hidden2=8)

print(f"Model structure: {model}")



Model structure: FeedforwardNN()


## Step 4. Define the loss function and optimizer

In [14]:
# write your solution here
criterion = ...

optimizer = ...

## Step 5. Train your model (during the training, print the value of the loss to see if it is decresing)

In [11]:
# Write your solution here
...

## Step 6. Assess model performance 
### Note: check if your model overfitted the training dataset (very high performance in training and low performance in test data)


In [None]:
# Write your code here
...