# **Neural Network: Wine Classification**

## **Introduction**

In this project, I dive into the basics of Neural Networks using PyTorch. I will be using the **Wine** dataset, from **sklearn**, aiming to predict wine classes based on various features. This is my first project in this domain, and I am excited to share my learnings and insights.

Before diving into the code, let's revist our task and dataset to ensure clarity. This is the Wine dataset, which comprises various chemical consituents of wines grown in the same region in Italy but derived from three different cultivars (types of grape plants). The dataset provides 13 features based on these chemical constituents and a target label representing the wine's class. The classes are as follows:

- Class 0: Wine from the first cultivar
- Class 1: Wine from the second cultivar
- Class 2: Wine from the third cultivar

Traditionally, this would be a classification task where the goal is to assign one of the discrete class labels (0, 1, or 2) based on the wine's chemical composition. However, in this approach, I have converted the class labels to floating-point numbers to frame this as a regression task.

## **Setup and Data Loading**

First, we need to import the necessary libraries and load the data.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
data = load_wine()
X = data.data
y = data.target.astype(float)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Here, the Wine dataset is loaded and split into training and test sets. The features are also standardized, which is a crucial preprocessing step to ensure that all features contribute equally to the model training.

## **Data Conversion to PyTorch Tensors**

Next, the data is converted to PyTorch tensors, which are the primary data structure used in PyTorch for storing multidimensional arrays.

In [None]:
# Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).view(-1, 1)

## **Neural Network Model**

Now, let's define the neural network model. We will start with a simple two-layer model.

In [None]:
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNet, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.layer2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        return x

# Initialize the model
input_size = X_train.shape[1]
hidden_size = 10
output_size = 1
model = NeuralNet(input_size, hidden_size, output_size)

The **'NeuralNet'** class defines a simple neural network with two fully connected layers, created to perform a regression task on the wine dataset.

- **Layer 1**: A linear layer that transforms the input features into an intermediate representation of size 10.
- **Layer 2**: Another linear layer that takes the intermediate representation and outputs a single continuous value.

The forward pass of the network is defined to take an input, pass it through both layers sequentially, and produce the final output. The model is initialized with the size of the input features, an arbitrary hidden size of 10, and an output size of 1. This architecture provides a basic yet effective introduction to neural networks.

## **Model Training**

Next, we will train the model using Mean Squared Error as the loss function and Adam as the optimizer.

In [None]:
# Set hyperparameters and loss function
learning_rate = 0.01
epochs = 5000
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
for epoch in range(epochs):
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 500 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

Epoch [500/5000], Loss: 0.0580
Epoch [1000/5000], Loss: 0.0580
Epoch [1500/5000], Loss: 0.0580
Epoch [2000/5000], Loss: 0.0580
Epoch [2500/5000], Loss: 0.0580
Epoch [3000/5000], Loss: 0.0580
Epoch [3500/5000], Loss: 0.0580
Epoch [4000/5000], Loss: 0.0580
Epoch [4500/5000], Loss: 0.0582
Epoch [5000/5000], Loss: 0.0580


- **Hyperparameters**: The learning rate is set to 0.01 and a total of 5000 training epochs was decided. The learning rate controls the step size during optimization, while the number of epochs defines how many times the learning algorithm will work through the entire training dataset.

- **Loss Function**: The MSE loss function measures the average squared difference between the estimated values and the actual value, making it suitable for regression tasks.

- **Optimizer**: The Adam optimizer is employed, a popular choice due to its adaptiveness in handling sparse gradients on noisy problems.

- **Training Loop**: In each epoch of the training loop, the model performs the following steps:
  
    1. **Forward Pass**: Compute the predicted values.
    2. **Calculate Loss**: Compute the loss using the predicted values and actual labels.
    3. **Backward Pass**: Compute the gradient of the loss with respect to the parameters.
    4. **Update Weights**: Adjust the weights to minimize the loss.

The loss outputs every 500 epochs to track the training progress and ensure that the model is learning effectively.

## **Model Evaluation**

Now, let's evaluate the model on the test data.

In [None]:
# Evaluate the model
model.eval()
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    test_loss = criterion(test_outputs, y_test_tensor)
    print(f"Test Loss: {test_loss.item():.4f}")

Test Loss: 0.0685


In the evaluation phase, the gradient computations are turned off, as we are not looking to update the model's weights any further. This is done using the context manager **'torch.no_grad()'**, which results in less memory consumption and speeds up computations.

Within this context, the test data, **'X_test_tensor'**, is passed through the model to get predictions. We then calculate and print the Mean Squared Error Loss between these predictions and the actual values, **'y_test_tensor'**, resulting in a test loss of 0.0685. This loss gives a quantitative measure of how well the model is performing; the lower the loss, the better the model's predictions are.

## **Predictions**

Finally, let's print out some of the model's predictions along with the true values.

In [None]:
# Print predictions and true values
num_examples = 10
predicted_values = test_outputs[:num_examples].squeeze().numpy()
true_values = y_test_tensor[:num_examples].squeeze().numpy()

for i in range(num_examples):
    print(f"Example {i+1}")
    print(f"Predicted Value: {predicted_values[i]:.4f}")
    print(f"True Value: {true_values[i]}")
    print("------")

Example 1
Predicted Value: 0.1695
True Value: 0.0
------
Example 2
Predicted Value: 0.3071
True Value: 0.0
------
Example 3
Predicted Value: 1.4919
True Value: 2.0
------
Example 4
Predicted Value: 0.2259
True Value: 0.0
------
Example 5
Predicted Value: 0.9322
True Value: 1.0
------
Example 6
Predicted Value: 0.1266
True Value: 0.0
------
Example 7
Predicted Value: 0.9608
True Value: 1.0
------
Example 8
Predicted Value: 1.9369
True Value: 2.0
------
Example 9
Predicted Value: 0.4734
True Value: 1.0
------
Example 10
Predicted Value: 1.4022
True Value: 2.0
------


This seciton outputs the predicted class values as floating-point numbers, alongside their true class values. By observing the printed values, you can notice that the model provides predictions that are relatively close to the true values, indicating a satisfactory performance. However, there are some discrepancies, demonstrating potential areas for improvement in the model's accuracy.

## **Enhanced Neural Network Model**

Given the foundation of this basic neural network, we can now work on enhancing the model by introducing additional layers and non-linear activation functions. This aims to capture more complex patterns in the data and improve the model's predictive performance.





## **Model Definition**

Let's define the enhanced neural network model with three layers and ReLU activation functions.

In [None]:
class EnhancedNeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, output_size):
        super(EnhancedNeuralNet, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size1)
        self.layer2 = nn.Linear(hidden_size1, hidden_size2)
        self.layer3 = nn.Linear(hidden_size2, output_size)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.relu(self.layer2(x))
        x = self.layer3(x)
        return x

hidden_size1 = 10
hidden_size2 = 10
enhanced_model = EnhancedNeuralNet(input_size, hidden_size1, hidden_size2, output_size)

In this enhanced model, the introduction of non-linear activation functions (ReLU) allow the network to capture more complex relationships in the data.

## **Training the Enhanced Model**

Now, the enhanced model can be trained, including L2 regularization and a learning rate scheduler.

In [None]:
# Set hyperparameters
learning_rate = 0.01
weight_decay = 0.01  # L2 regularization
epochs = 5000
criterion = nn.MSELoss()
optimizer = optim.Adam(enhanced_model.parameters(), lr=learning_rate, weight_decay=weight_decay)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1000, gamma=0.9)

# Train the enhanced model
for epoch in range(epochs):
    outputs = enhanced_model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    scheduler.step()

    if (epoch+1) % 500 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

Epoch [500/5000], Loss: 0.0090
Epoch [1000/5000], Loss: 0.0087
Epoch [1500/5000], Loss: 0.0084
Epoch [2000/5000], Loss: 0.0084
Epoch [2500/5000], Loss: 0.0084
Epoch [3000/5000], Loss: 0.0084
Epoch [3500/5000], Loss: 0.0069
Epoch [4000/5000], Loss: 0.0068
Epoch [4500/5000], Loss: 0.0061
Epoch [5000/5000], Loss: 0.0061


Here, L2 regularization is used to prevent overfitting and a learning rate scheduler is used to adjust the learning rate during training, aiming to achieve better convergence.

## **Evaluating the Enhanced Model**

Let's evaluate the enhanced model's performance on the test set.

In [None]:
# Evaluate the enhanced model
enhanced_model.eval()
with torch.no_grad():
    test_outputs_enhanced = enhanced_model(X_test_tensor)
    test_loss_enhanced = criterion(test_outputs_enhanced, y_test_tensor)
    print(f"Enhanced Model Test Loss: {test_loss_enhanced.item():.4f}")

Enhanced Model Test Loss: 0.0141


The model is set to evaluation mode, and the test loss is computed to assess the enhanced model's performance.

## **Predictions from Enhanced Model**

Finally, let's compare the predictions from the enhanced model to the true values.

In [None]:
# Print predictions and true values from the enhanced model
predicted_values_enhanced = test_outputs_enhanced[:num_examples].squeeze().numpy()
for i in range(num_examples):
    print(f"Example {i+1}")
    print(f"Predicted Value (Enhanced): {predicted_values_enhanced[i]:.4f}")
    print(f"True Value: {true_values[i]}")
    print("------")

Example 1
Predicted Value (Enhanced): 0.0031
True Value: 0.0
------
Example 2
Predicted Value (Enhanced): 0.0059
True Value: 0.0
------
Example 3
Predicted Value (Enhanced): 1.9167
True Value: 2.0
------
Example 4
Predicted Value (Enhanced): 0.0244
True Value: 0.0
------
Example 5
Predicted Value (Enhanced): 1.0435
True Value: 1.0
------
Example 6
Predicted Value (Enhanced): 0.0043
True Value: 0.0
------
Example 7
Predicted Value (Enhanced): 0.9385
True Value: 1.0
------
Example 8
Predicted Value (Enhanced): 1.9299
True Value: 2.0
------
Example 9
Predicted Value (Enhanced): 0.9299
True Value: 1.0
------
Example 10
Predicted Value (Enhanced): 1.7543
True Value: 2.0
------


This section presents the predictions made by the enhanced neural network model alongside the true values for comparison. The predictions from the enhanced model are visibly more accurate and closer to the true values compared to the basic model, indicating an improvement in the model's performance.

## **Conclusion**

This project provided an in-depth look at building, training, and evaluating neural networks using PyTorch, starting with a basic two-layer model and progressing to a more complex architecture. While the initial model showcased the essential steps in handling and processing a dataset for a regression task, its limitations highlighted the need for a more sophisticated approach. The enhanced neural network, featuring additional layers, non-linear activation functions, L2 regularization, and a learning rate scheduler, demonstrated a notable improvement in performance, as evident from the reduced loss on both training and test datasets.

However, it is important to acknowledge that the Wine dataset, used for this project, is inherently categorical, suggesting that a classification approach might be more suitable. Despite this, the project effectively illustrates the capabilities of neural networks and the versatility of PyTorch, even when applied to a less conventional task.

Future endeavors could explore classification models, compare their effectiveness with the developed regression models, and experiment with advanced neural network architectures and training techniques. Additionally, a comprehensive hyperparameter tuning could yield even better results and provide deeper insights into the model's behavior.
