# PyTorch Demo: Wine Quality Prediction Model Training

This notebook demonstrates how to use PyTorch within Azure Machine Learning Studio notebooks, leveraging their built-in PyTorch environments.

In this example, we use the wine quality dataset to show how you can use Azure Machine Learning for a regression problem with PyTorch neural networks. The goal is to predict wine quality (score 0-10) based on physicochemical properties like acidity, sugar content, alcohol level, etc.

## Learning Objectives:
- Set up and verify PyTorch environment in Azure ML
- Load and preprocess data for deep learning
- Build a simple neural network for regression
- Train and evaluate the model

## Prerequisites
You need a compute instance to run the code as it relies upon a custom environment that is not available with "Serverless Spark Compute". If you don't have a compute instance, select **Create compute** on the toolbar to first create one.  You can use all the default settings. 

## Set your kernel

* If your compute instance is stopped, start it now.  
        
* Once your compute instance is running, make sure the that the kernel, found on the top right, is `Python 3.10 - AzureML`.  If not, use the dropdown to select this kernel.

## 1. Environment Setup and Imports

First, let's verify our PyTorch installation and import necessary libraries.

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sklearn for preprocessing and metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

# Check PyTorch version and CUDA availability
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")

As we're using the free subscription compute we won't be able to use CUDA as we lack GPU's. If you wish to use GPU's speak to your IT department for access under your university/institute's Azure tenant.

## 2. Load Wine Quality Dataset

For this demo, we'll use the UCI Wine Quality dataset. You can access this dataset using a web URL, however in a typical Azure ML scenario you would load this from your Azure storage. Here we'll show both.

In [None]:
# Loading from your Azure storage
# 1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
# 2. Copy the value for workspace, resource group and subscription ID into the code.
# This will only work if you have a dataset called 'wine_quality_kaggle' in your workspace storage

from azureml.core import Workspace, Dataset

subscription_id = "SUBSCRIPTION ID HERE"
resource_group = "RESOURCE NAME HERE"
workspace_name = "WORKSPACE NAME HERE"

workspace = Workspace(subscription_id, resource_group, workspace_name)

dataset = Dataset.get_by_name(workspace, name='wine_quality_kaggle')
wine_data = dataset.to_pandas_dataframe()

# Display basic information
print(f"Dataset shape: {wine_data.shape}")
print(f"\nColumn names: {wine_data.columns.tolist()}")
print(f"\nQuality distribution:")
print(wine_data['quality'].value_counts().sort_index())

# Show first few rows
wine_data.head()

For a quick and easy application of this code head to the data section in ML studio, find your dataset and click on the "consume" tab. Here, under the 'Interactive Development" drop-down you'll find the code needed to access your data. 

For more information on importing data in Azure, head to the explore-data.ipynb notebook from the getting-started section in Azure ML Studio.

<strong> We can also load our data via a URL if that is an option to you</strong>

In [None]:
# Load wine quality dataset for URL
# For this demo, choose this option

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
wine_data = pd.read_csv(url, sep=';')

# Display basic information
print(f"Dataset shape: {wine_data.shape}")
print(f"\nColumn names: {wine_data.columns.tolist()}")
print(f"\nQuality distribution:")
print(wine_data['quality'].value_counts().sort_index())

# Show first few rows
wine_data.head()

## 3. Data Preprocessing

Preprocessing step is crucial for neural network training. We separate our features (wine properties) from our target variable (quality score), split the data into training and testing sets to evaluate model performance, and apply standardisation to normalise all features to have zero mean and unit variance. This standardisation ensures that no single feature dominates the learning process due to its scale. Finally, we convert our data to PyTorch tensors and create DataLoaders that will efficiently batch and shuffle our data during training.

In [None]:
# Separate features and target
X = wine_data.drop('quality', axis=1).values
y = wine_data['quality'].values

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardise features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test)

# Create PyTorch datasets and dataloaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Number of features: {X_train.shape[1]}")

<strong>What is a tensor?</strong>

A tensor is a mathematical object that generalises familiar concepts like numbers, vectors, and matrices. Think of it as a container for data that can have different dimensions - a single number (0D), a list of numbers (1D), a table of numbers (2D), or even higher-dimensional arrangements. In machine learning, tensors are fundamental because they provide a unified way to represent and manipulate the complex, multi-dimensional data that neural networks process, from simple input features to the intricate weight matrices that define how networks learn and make predictions.

## 4. Define the Neural Network Model

Create a simple feedforward neural network for wine quality prediction.

**Neural Network Architecture Explanation:**

Our `WineQualityNet` is a feedforward neural network with four fully connected (linear) layers that progressively reduce the dimensionality: input → 64 → 32 → 16 → 1 neuron. This architecture creates a funnel-like structure that learns to extract increasingly abstract features from the wine properties to predict quality. We use ReLU (Rectified Linear Unit) activation functions between layers to introduce non-linearity, allowing the network to learn complex patterns. Dropout layers (20% probability) are included to prevent overfitting by randomly setting some neurons to zero during training, forcing the network to be more robust and generalisable.

In [None]:
class WineQualityNet(nn.Module):
    def __init__(self, input_dim):
        super(WineQualityNet, self).__init__()
        
        # Define layers
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 16)
        self.fc4 = nn.Linear(16, 1)
        
        # Activation and dropout
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.relu(self.fc3(x))
        x = self.fc4(x)
        return x

# Initialize model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = WineQualityNet(input_dim=X_train.shape[1]).to(device)

# Display model architecture
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")

## 5. Training Setup

Configure loss function, optimiser, and training parameters.

**Key Components Explained:**

**MSE (Mean Squared Error):** Our loss function that measures the average squared difference between predicted and actual wine quality scores. MSE is ideal for regression problems as it penalises larger errors more heavily and provides smooth gradients for optimisation.

**Adam Optimiser:** An adaptive learning rate optimiser that combines the benefits of momentum and RMSprop. Adam automatically adjusts the learning rate for each parameter individually, making it very effective for training neural networks with minimal hyperparameter tuning.

**Training Functions:** The `train_epoch` function performs forward propagation (calculating predictions), computes the loss, and uses backpropagation to update model weights. The `validate` function evaluates model performance on unseen data without updating weights, helping us monitor for overfitting.

In [None]:
# Loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training parameters
num_epochs = 50
train_losses = []
val_losses = []

# Training function
def train_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    
    for batch_X, batch_y in dataloader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        
        # Forward pass
        outputs = model(batch_X).squeeze()
        loss = criterion(outputs, batch_y)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    return total_loss / len(dataloader)

# Validation function
def validate(model, dataloader, criterion, device):
    model.eval()
    total_loss = 0
    
    with torch.no_grad():
        for batch_X, batch_y in dataloader:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device)
            outputs = model(batch_X).squeeze()
            loss = criterion(outputs, batch_y)
            total_loss += loss.item()
    
    return total_loss / len(dataloader)

## 6. Train the Model

Train the neural network and monitor performance by calling the previous functions per training epoch.

In [None]:
# Training loop
print("Starting training...")
print("-" * 50)

for epoch in range(num_epochs):
    train_loss = train_epoch(model, train_loader, criterion, optimizer, device)
    val_loss = validate(model, test_loader, criterion, device)
    
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}] - Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

print("-" * 50)
print("Training completed!")

## 7. Visualise Training Progress

**What we expect to see:** 
Both training and validation loss should decrease over time, indicating the model is learning. Ideally, both curves should converge to a similar low value. If the training loss continues decreasing while validation loss increases, this indicates overfitting. If both curves plateau at a high value, the model may be underfitting and need more complexity or different hyperparameters.

In [None]:
# Plot training history
plt.figure(figsize=(10, 6))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.grid(True)
plt.show()

## 8. Model Evaluation

Evaluate the model's performance on the test set.

**Metrics Explained:**

**MSE (Mean Squared Error):** The average of squared differences between predicted and actual values. Lower values indicate better performance.

**RMSE (Root Mean Squared Error):** The square root of MSE, expressed in the same units as our target variable (wine quality scores). For wine quality (scale 0-10), an RMSE of 0.5-0.7 would be considered good.

**R² Score:** Coefficient of determination indicating how much variance in the target variable is explained by the model. Values closer to 1.0 indicate better performance.

**What we expect to see:** A scatter plot where points cluster around the diagonal line (perfect predictions), with minimal spread. The error distribution should be roughly normal and centered around zero.

In [None]:
# Make predictions
model.eval()
with torch.no_grad():
    X_test_device = X_test_tensor.to(device)
    predictions = model(X_test_device).squeeze().cpu().numpy()

# Calculate metrics
mse = mean_squared_error(y_test, predictions)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, predictions)

print(f"Test Set Performance:")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"R² Score: {r2:.4f}")

# Visualization of predictions
plt.figure(figsize=(10, 6))
plt.scatter(y_test, predictions, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual Quality')
plt.ylabel('Predicted Quality')
plt.title('Actual vs Predicted Wine Quality')
plt.grid(True)
plt.show()

# Distribution of prediction errors
errors = predictions - y_test
plt.figure(figsize=(10, 6))
plt.hist(errors, bins=30, edgecolor='black')
plt.xlabel('Prediction Error')
plt.ylabel('Frequency')
plt.title('Distribution of Prediction Errors')
plt.grid(True, alpha=0.3)
plt.show()

## 9. Save the Model to the local folder

You would typically save the model to a local file for recall at another time for further trainging or predictions on new data.

In [None]:
# Save the trained model as a .pth file in the current directory
model_path = 'wine_quality_model.pth'
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'scaler': scaler,
    'input_dim': X_train.shape[1],
    'train_losses': train_losses,
    'val_losses': val_losses
}, model_path)


print(f"Model saved to {model_path}")

Load in the model as so:

In [None]:
# load in the models paramters 
checkpoint = torch.load('./wine_quality_model.pth')

# Grab the input dimensions and intialise the model
input_dim = checkpoint['input_dim']
model = WineQualityNet(input_dim=input_dim)

# Load the model state dictionary
model.load_state_dict(checkpoint['model_state_dict'])

# Set to evaluation mode
model.eval()

# Move to appropriate device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# Load the models parameters
model.load_state_dict(torch.load('path/to/your/model.pth'))

# Set to evaluation mode for predictions (it switches off the drop-out layers so all neurons are active)
# If you are training further, forgo this step
model.eval()

## 10. Summary and Next Steps

### What we've accomplished:
- ✅ Set up PyTorch in Azure ML environment
- ✅ Loaded and preprocessed wine quality data
- ✅ Built a simple neural network for regression
- ✅ Trained and evaluated the model
- ✅ Visualised results and saved the model