In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler 

The PyTorch, for building and training the model 

Loads a medical databset about breast cancer 

The train_test_split, a tool to split our data into training and test sets.   

In [3]:
print(torch.__version__)

2.6.0


Showing an example of how Pytorch handle the gradient calculation, assuming the model and loss function. 

In [4]:
x = torch.tensor([6.0], requires_grad=True)

# Define a function y = x^2
y = x ** 2 + 1

# Compute the gradient
y.backward()
print(x.grad)

tensor([12.])


The example creates a tensor x with the value of 6. 

The function tells Pytorch to track gradient or slope for this variable. 

Showing further an example with loss function. Assuming we want to learn y = 2x + 1. Tnerefore, 2 and 1 is our target learnable parameter. Assuming y = wx + b

In [5]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=False)
y = torch.tensor([3.0, 5.0, 7.0], requires_grad=False)  # We give three observations. i.e., when x = 1, y = 3; x =2, y = 5

# Initialize weights with requires_grad=True so we can compute gradients
w = torch.tensor([1.0], requires_grad=True)  # initial guess for weight
b = torch.tensor([0.0], requires_grad=True)  # initial guess for bias
print(f'Before optimization, w is {w}')
print(f'Before optimization, b is {b}')


optimizer = torch.optim.Adam([w,b], lr = 0.1)
y_pred = w*x+b
optimizer.zero_grad() # we reset the optimizer here
loss = ((y_pred - y) ** 2).mean() # calculate the loss value 
loss.backward() # as shown before, this step calculates the gradient of both w and b
optimizer.step()   # it performs the change of w and b

# Print gradients
print("Gradient of w:", w.grad)  # ∂loss/∂w
print("Gradient of b:", b.grad)  # ∂loss/∂b

print(f'After optimization, w is {w}')
print(f'After optimization, b is {b}')

Before optimization, w is tensor([1.], requires_grad=True)
Before optimization, b is tensor([0.], requires_grad=True)
Gradient of w: tensor([-13.3333])
Gradient of b: tensor([-6.])
After optimization, w is tensor([1.1000], requires_grad=True)
After optimization, b is tensor([0.1000], requires_grad=True)


The model starts with with w=1, b=0 

It calculated how to adjust w and b to reduce error. 

It update w to 1.1000 and b to 0.1000. 

Optimizing the model will eventually find the best w and b.

In [6]:
for i in range(20):
    optimizer = torch.optim.Adam([w,b], lr = 0.1)
    y_pred = w*x+b
    optimizer.zero_grad() # we reset the optimizer here
    loss = ((y_pred - y) ** 2).mean() # calculate the loss value 
    print(f"##### this is the {i} times try")
    print(f"current loss is {loss}")
    loss.backward() # as shown before, this step calculates the gradient of both w and b
    optimizer.step()   # it performs the change of w and b
    print(f'After optimization, w is {w}')
    print(f'After optimization, b is {b}')

##### this is the 0 times try
current loss is 7.829999923706055
After optimization, w is tensor([1.2000], requires_grad=True)
After optimization, b is tensor([0.2000], requires_grad=True)
##### this is the 1 times try
current loss is 6.186666011810303
After optimization, w is tensor([1.3000], requires_grad=True)
After optimization, b is tensor([0.3000], requires_grad=True)
##### this is the 2 times try
current loss is 4.736665725708008
After optimization, w is tensor([1.4000], requires_grad=True)
After optimization, b is tensor([0.4000], requires_grad=True)
##### this is the 3 times try
current loss is 3.479998826980591
After optimization, w is tensor([1.5000], requires_grad=True)
After optimization, b is tensor([0.5000], requires_grad=True)
##### this is the 4 times try
current loss is 2.41666579246521
After optimization, w is tensor([1.6000], requires_grad=True)
After optimization, b is tensor([0.6000], requires_grad=True)
##### this is the 5 times try
current loss is 1.5466665029525

In [7]:
# Load the breast cancer dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
print(X[:10, :])
print(y[:10])
print(type(X))

[[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
  1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
  6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
  1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
  4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 1.326e+03 8.474e-02 7.864e-02 8.690e-02
  7.017e-02 1.812e-01 5.667e-02 5.435e-01 7.339e-01 3.398e+00 7.408e+01
  5.225e-03 1.308e-02 1.860e-02 1.340e-02 1.389e-02 3.532e-03 2.499e+01
  2.341e+01 1.588e+02 1.956e+03 1.238e-01 1.866e-01 2.416e-01 1.860e-01
  2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 1.203e+03 1.096e-01 1.599e-01 1.974e-01
  1.279e-01 2.069e-01 5.999e-02 7.456e-01 7.869e-01 4.585e+00 9.403e+01
  6.150e-03 4.006e-02 3.832e-02 2.058e-02 2.250e-02 4.571e-03 2.357e+01
  2.553e+01 1.525e+02 1.709e+03 1.444e-01 4.245e-01 4.504e-01 2.430e-01
  3.613e-01 8.758e-02]
 [1.142e+01 2.038e+01 7.758e+01 3.861e+02 1.425e-01 2.839e-01 2.414

Normalization is a common process in machine learning, which normalize all the input data into [0-1] or [-1, 1].

It has several main benefits:

1. Faster Convergence during Training

2. Enhanced Stability of the Optimization Process

3. Improved Model Accuracy

package sklearn provides a useful function for different types of normalization. https://scikit-learn.org/stable/modules/preprocessing.html

In [8]:
# Normalize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=40)

This creates a scaler object and the tool adjusts the features (columns of data) so that they have a mean of 0 and SD of 1. 

The data is split into training set (80%) and testing set (20%). 

THe random_state of 40 ensure the split is repeatable. 

All the variables before are stored using Numpy package. As just discussed, they are not designed for Neural Network computing

In [9]:
# Convert to PyTorch tensors
X_train = torch.tensor(X_train)
y_train = torch.tensor(y_train)
X_test = torch.tensor(X_test)
y_test = torch.tensor(y_test)

We can test different data slicing here.

In [10]:
print(X_train[:, 0])
print(X_train[0, :])
print(X_train[0,0])
print(y_train[:10])

tensor([-0.2719, -0.1214, -0.9706, -0.1072, -0.1611,  1.6367,  0.6625, -0.6241,
        -0.0986, -0.3542, -0.3457, -0.8655,  1.4975, -0.6780, -0.9393, -0.0788,
        -0.2634, -0.8712, -0.9592, -0.4735, -0.5332, -0.5076, -0.2974,  0.2109,
         0.6086, -0.6752,  0.0803, -0.6581, -1.0813,  1.4379,  0.0604, -0.0305,
        -0.2719, -0.3628, -0.8144,  1.7134, -0.3344,  0.9011, -0.8087, -0.7689,
        -0.1100, -0.6922,  1.5089, -0.4281,  0.9494, -1.2517,  1.7503, -0.2605,
        -1.0330, -0.6468, -0.4735, -0.7547, -0.8172, -1.2665,  1.1084,  0.2848,
         3.1505, -0.6610, -1.0984,  0.3331, -0.4707,  0.3387, -0.6951, -0.5389,
        -0.1668, -1.0274, -1.0898, -0.1810, -0.7149,  0.0207, -1.6846, -0.4821,
        -0.8257,  1.8241,  0.0349, -0.9280, -0.5076, -0.7490,  0.5404,  0.3984,
        -0.2605,  1.3328, -0.3514,  0.2763,  0.7534, -1.1211,  1.5401,  1.7191,
        -0.8655, -0.3315, -0.7121,  0.6029, -0.9848,  1.0800,  1.2306,  0.0405,
         1.3186, -1.1154, -1.6780, -0.49

(X_train[:, 0]) - All the rows in the first column

(X_train[0, :]) - The first row and all column

(X_train[0,0]) - The top left value in the table

The variable can also be put into either GPU or CPU, by the following commands. Remember, a variable in CPU cannot compute with a variable in GPU!

In [11]:
device = torch.device('cpu') # if you want to put it to gpu, use 'cuda'
x_GPU = X_train.to(device)

Define the Model

In [48]:
class CancerNet(nn.Module):
    def __init__(self):
        super(CancerNet, self).__init__()
        self.fc1 = nn.Linear(30, 10) # 30 input features (breast cancer).
        self.fc2 = nn.Linear(10, 1)  # 1 output (0 or 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

model = CancerNet()
mode = model.to(torch.float32)


The model looks at 30 medical measurements of a breast tumor and decide if 

0 = cancerous 

1= not cancerous 

Takes 30 numbers and passes them through 10 brain cells (neurons) to find the patterns.

Sigmoid converts the output to a 0-1 probability

Define Loss and Optimizer

In [49]:
criterion = nn.BCELoss() # Binary Cross-Entropy Loss. It measures how wrong the model's predictions are i.e, compares predicted probabilities and true labels .
optimizer = optim.Adam(model.parameters(), lr=0.01) # we have seen this optimizer before.

In [57]:
X_train = X_train.to(torch.float32)
y_train = y_train.to(torch.float32)

In [64]:
epochs = 500
loss_values = []

for epoch in range(epochs):
    optimizer.zero_grad() #remember this training loop, it is the most standard way to train the model, i.e. adjusting the paraemters until loss value is minimal. 
    outputs = model(X_train)
    loss = criterion(outputs.ravel(), y_train)
    loss.backward()
    optimizer.step()
    
    loss_values.append(loss.item())
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")
        
# we actually can let the training process stop if the loss value is lower than a threshold.

Epoch [10/500], Loss: 0.0131
Epoch [20/500], Loss: 0.0121
Epoch [30/500], Loss: 0.0112
Epoch [40/500], Loss: 0.0104
Epoch [50/500], Loss: 0.0097
Epoch [60/500], Loss: 0.0090
Epoch [70/500], Loss: 0.0084
Epoch [80/500], Loss: 0.0078
Epoch [90/500], Loss: 0.0073
Epoch [100/500], Loss: 0.0068
Epoch [110/500], Loss: 0.0064
Epoch [120/500], Loss: 0.0060
Epoch [130/500], Loss: 0.0056
Epoch [140/500], Loss: 0.0053
Epoch [150/500], Loss: 0.0049
Epoch [160/500], Loss: 0.0047
Epoch [170/500], Loss: 0.0044
Epoch [180/500], Loss: 0.0041
Epoch [190/500], Loss: 0.0039
Epoch [200/500], Loss: 0.0037
Epoch [210/500], Loss: 0.0035
Epoch [220/500], Loss: 0.0033
Epoch [230/500], Loss: 0.0031
Epoch [240/500], Loss: 0.0030
Epoch [250/500], Loss: 0.0028
Epoch [260/500], Loss: 0.0027
Epoch [270/500], Loss: 0.0026
Epoch [280/500], Loss: 0.0024
Epoch [290/500], Loss: 0.0023
Epoch [300/500], Loss: 0.0022
Epoch [310/500], Loss: 0.0021
Epoch [320/500], Loss: 0.0020
Epoch [330/500], Loss: 0.0019
Epoch [340/500], Lo

In [65]:
y_train.type()
outputs.type()

'torch.FloatTensor'

The model studies the entire dataset 100 times

The loss keeps decreasing from 0.0601 to 0.0299. 

In [69]:
with torch.no_grad():
    test_outputs = model(X_test.to(torch.float32))
    predicted = torch.argmax(test_outputs, dim=1)
    accuracy = (predicted == y_test).float().mean()
    print(f"Test Accuracy: {accuracy:.2f}")

Test Accuracy: 0.34


No grad tells the model not to learn from the data, just what it knows. 

Predicted functions converts probabilities to clear decision. 

Accuracy function compares predictions to real real diagnosis (y_test). 

The test accuracy is very poor. The model correctly cancerous/non-cancerous tumor 34% of the time. 