## Activation Functions

- Activation functions add non-linearity to the network
    * **Sigmoid** for binary classification
    * **Softmax** for multi-class classification

- A network can learn more complex relationships with non-linearity
- "Pre-activation" output passed to theactivation function


In [1]:
import torch
import torch.nn as nn

input_tensor = torch.tensor([[6]])
sigmoid = nn.Sigmoid()
output = sigmoid(input_tensor)
print(output)

tensor([[0.9975]])


In [2]:
model = nn.Sequential(
    nn.Linear(6, 4), # First linear layer
    nn.Linear(4, 1), # Second linear layer
    nn.Sigmoid() # Sigmoid activation function
)

In [3]:
# Create an input tensor
input_tensor = torch.tensor([[4.3, 6.1, 2.3]])

# Apply softmax along the last dimension
probabilities = nn.Softmax(dim=-1)
output_tensor = probabilities(input_tensor)
print(output_tensor)

tensor([[0.1392, 0.8420, 0.0188]])


**dim = -1** indicates softmax is applied to the input tensor's last dimension
**nn.Softmax()** can be used as last step in **nn.Sequential()**

In [4]:
# training

input_tensor = torch.tensor([[2.4]])

# Create a sigmoid function and apply it on input_tensor
sigmoid = nn.Sigmoid()
probability = sigmoid(input_tensor)
print(probability)

tensor([[0.9168]])


In [5]:
# training ver 2

input_tensor = torch.tensor([[1.0, -6.0, 2.5, -0.3, 1.2, 0.8]])

# Create a softmax function and apply it on input_tensor
softmax = nn.Softmax(dim = -1)
probabilities = softmax(input_tensor)
print(probabilities)

tensor([[1.2828e-01, 1.1698e-04, 5.7492e-01, 3.4961e-02, 1.5669e-01, 1.0503e-01]])


## Running a Forward Pass

Forward Pass:
- Input data flows through layers
- Calculations performed at each layer
- Final layer generates outputs
- Outputs produced based on weights and biases
- Used for training and making predictions

Possible outputs:
- Binary classification
- Multi-class classification
- Regressions


### Binary Classification: Forward Pass

In [6]:
# Create input data of shape 5x6
input_data = torch.tensor(
 [[-0.4421, 1.5207, 2.0607, -0.3647,  0.4691,  0.0946],
  [-0.9155, -0.0475, -1.3645,  0.6336, -1.9520, -0.3398],
  [ 0.7406,  1.6763, -0.8511,  0.2432,  0.1123, -0.0633],
  [-1.6630, -0.0718, -0.1285,  0.5396, -0.0288, -0.8622],
  [-0.7413,  1.7920, -0.0883, -0.6685,  0.4745, -0.4245]]
)

In [7]:
# Create binary classification model
model = nn.Sequential(nn.Linear(6,4), # First linear layer  
                      nn.Linear(4,1),# Second linear layer  
                      nn.Sigmoid()# Sigmoid activation function
                     )

# Pass input data through model
output = model(input_data)
print(output)

tensor([[0.3420],
        [0.6496],
        [0.4133],
        [0.5792],
        [0.3903]], grad_fn=<SigmoidBackward0>)


### Multi-classification: Forward Pass

In [8]:
n_classes = 3 

# Create multi-class classification model
model = nn.Sequential(nn.Linear(6,4), # First linear layer  
                      nn.Linear(4, n_classes), # Second linear layer  
                      nn.Softmax(dim=-1) # Softmax activation 
                     )

# Pass input data through model
output = model(input_data)
print(output.shape)
print("\n",output)

torch.Size([5, 3])

 tensor([[0.2291, 0.5181, 0.2528],
        [0.3912, 0.3141, 0.2947],
        [0.1658, 0.5470, 0.2872],
        [0.3649, 0.3823, 0.2528],
        [0.1698, 0.5507, 0.2795]], grad_fn=<SoftmaxBackward0>)


### Regression: Forward Pass

In [9]:
# Create regression model
model = nn.Sequential(nn.Linear(6, 4), # First linear layer  
                      nn.Linear(4, 1) # Second linear layer
                     ) 

# Pass input data through model
output = model(input_data)

# Return output
print(output)

tensor([[ 0.0557],
        [ 0.5150],
        [-0.3101],
        [ 0.7701],
        [-0.3550]], grad_fn=<AddmmBackward0>)


In [10]:
# training ver 3

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Implement a small neural network for binary classification
model = nn.Sequential(
  nn.Linear(8, 1),
  nn.Sigmoid()
)

output = model(input_tensor)
print(output)

tensor([[0.9627]], grad_fn=<SigmoidBackward0>)


In [11]:
# training ver 4

input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Implement a neural network with exactly four linear layers
model = nn.Sequential(
  nn.Linear(11, 64),   # Layer 1: from 11 input features to 64
  nn.Linear(64, 32),   # Layer 2: from 64 to 32
  nn.Linear(32, 16),   # Layer 3: from 32 to 16
  nn.Linear(16, 1)     # Layer 4: from 16 to 1 output (regression)
)

output = model(input_tensor)
print(output)

tensor([[-0.3306]], grad_fn=<AddmmBackward0>)


In [12]:
# training ver 5

input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Update network below to perform a multi-class classification with four labels
model = nn.Sequential(
  nn.Linear(11, 20),
  nn.Linear(20, 12),
  nn.Linear(12, 6),
  nn.Linear(6, 4),
  nn.Softmax(dim=-1)
)

output = model(input_tensor)
print(output)

tensor([[0.1137, 0.6410, 0.0748, 0.1706]], grad_fn=<SoftmaxBackward0>)


## Loss Function

- Tells us how good our model is during training
- Takes a model prediction *$\hat{y}$* and ground truth *y*
- Outputs a float

**loss** = *F(y, $\hat{y}$)*

y is a single integer (class label)

$\hat{y}$ is a tensor (prediction before softmax)

In [13]:
# Transforming labels with one-hot encoding

import torch.nn.functional as F

print(F.one_hot(torch.tensor(0), num_classes = 3))

print("\n", F.one_hot(torch.tensor(1), num_classes = 3))

print("\n", F.one_hot(torch.tensor(2), num_classes = 3))

tensor([1, 0, 0])

 tensor([0, 1, 0])

 tensor([0, 0, 1])


In [14]:
# Cross entropy loss in PyTorch

from torch.nn import CrossEntropyLoss

scores = torch.tensor([-5.2, 4.6, 0.8])
one_hot_target = torch.tensor([1, 0, 0])
criterion = CrossEntropyLoss()

print(criterion(scores.double(), one_hot_target.double()))


tensor(9.8222, dtype=torch.float64)


In [15]:
# training ver 5

import numpy as np

y = 1
num_classes = 3

# Create the one-hot encoded vector using NumPy
one_hot_numpy = np.array([0, 1, 0])

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(y), num_classes=3)

print("One-hot vector using NumPy:", one_hot_numpy)
print("One-hot vector using PyTorch:", one_hot_pytorch)

One-hot vector using NumPy: [0 1 0]
One-hot vector using PyTorch: tensor([0, 1, 0])


In [16]:
# training ver 6

y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), num_classes=4)

# Create the cross entropy loss function
criterion = CrossEntropyLoss()

# Calculate the cross entropy loss
loss = criterion(scores.double(), one_hot_label.double())
print(loss)

tensor(8.0619, dtype=torch.float64)


## Backpropagation in PyTorch

In [18]:
# Define a sample input tensor (batch size = 1, input features = 16)
sample = torch.randn(1, 16)  # random float tensor with shape (1, 16)

# Define a target tensor (class label index, should be a scalar for CrossEntropyLoss)
target = torch.tensor([1])  # class index, assuming 2 classes: 0 or 1

# Define the model
model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Linear(8, 4),
    nn.Linear(4, 2)
)

# Run a forward pass
prediction = model(sample)

# Calculate the loss and gradients
criterion = CrossEntropyLoss()
loss = criterion(prediction, target)
loss.backward()

# Access each layer's gradients
print("Layer 0 weight grad:\n", model[0].weight.grad)
print("Layer 0 bias grad:\n", model[0].bias.grad)
print("Layer 1 weight grad:\n", model[1].weight.grad)
print("Layer 1 bias grad:\n", model[1].bias.grad)
print("Layer 2 weight grad:\n", model[2].weight.grad)
print("Layer 2 bias grad:\n", model[2].bias.grad)


Layer 0 weight grad:
 tensor([[-0.0447,  0.0411,  0.0220,  0.0040,  0.0658,  0.0156,  0.0333,  0.0338,
         -0.0132,  0.0699, -0.0336, -0.0146, -0.0129,  0.0273, -0.0432,  0.0111],
        [ 0.0766, -0.0704, -0.0377, -0.0069, -0.1129, -0.0268, -0.0570, -0.0579,
          0.0227, -0.1199,  0.0577,  0.0251,  0.0221, -0.0468,  0.0741, -0.0191],
        [ 0.1172, -0.1078, -0.0578, -0.0105, -0.1728, -0.0410, -0.0873, -0.0886,
          0.0347, -0.1836,  0.0883,  0.0384,  0.0338, -0.0717,  0.1135, -0.0292],
        [-0.2169,  0.1995,  0.1069,  0.0195,  0.3198,  0.0759,  0.1616,  0.1641,
         -0.0642,  0.3398, -0.1634, -0.0710, -0.0625,  0.1327, -0.2100,  0.0540],
        [ 0.1375, -0.1265, -0.0678, -0.0123, -0.2027, -0.0481, -0.1024, -0.1040,
          0.0407, -0.2154,  0.1036,  0.0450,  0.0396, -0.0841,  0.1331, -0.0342],
        [-0.1167,  0.1074,  0.0575,  0.0105,  0.1721,  0.0409,  0.0870,  0.0883,
         -0.0346,  0.1829, -0.0879, -0.0382, -0.0336,  0.0714, -0.1130,  0.0291],


In [19]:
# Learning rate is typically small
lr = 0.001

# Update the weights
weight = model[0].weight
weight_grad = model[0].weight.grad 
weight = weight - lr * weight_grad

# Update the biases
bias = model[0].bias
bias_grad = model[0].bias.grad
bias = bias - lr * bias_grad

In [20]:
# Gradient Descent

import torch.optim as optim

# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)

# Perform parameter updates
optimizer.step()

In [23]:
# taining ver 7

model = nn.Sequential(nn.Linear(16, 8),
                      nn.Linear(8, 2)
                     )

# Access the weight of the first linear layer
weight_0 = model[0].weight
print("Weight of the first layer:", weight_0)

# Access the bias of the second linear layer
bias_1 = model[1].bias
print("Bias of the second layer:", bias_1)

Weight of the first layer: Parameter containing:
tensor([[-0.1036, -0.1248, -0.1848,  0.0141, -0.1558, -0.0666,  0.1544, -0.1820,
          0.1267, -0.1320, -0.1446, -0.1575,  0.1177,  0.0787,  0.0488,  0.1520],
        [ 0.1046,  0.0921, -0.1327,  0.2453, -0.0696, -0.2135,  0.0331, -0.1939,
          0.2161,  0.1761,  0.1340, -0.1116, -0.0188, -0.1235,  0.0274, -0.1254],
        [-0.1565, -0.2480, -0.1953,  0.0049, -0.1227, -0.1135,  0.2010,  0.0796,
         -0.1304,  0.2435,  0.1611,  0.1908,  0.1889,  0.2149,  0.2492,  0.2395],
        [ 0.1736,  0.1248, -0.0503,  0.1658,  0.0115,  0.0879, -0.1803,  0.1376,
          0.0195, -0.1506,  0.1189,  0.0783,  0.2294,  0.1511,  0.1976,  0.2394],
        [-0.1578, -0.0007, -0.2233,  0.2405, -0.1803, -0.0566,  0.2214, -0.0371,
         -0.0817, -0.1036, -0.1219, -0.0834, -0.0039,  0.2437,  0.0330,  0.1861],
        [ 0.1130, -0.0859, -0.0590, -0.0926, -0.2295,  0.1491, -0.1013, -0.2133,
          0.1659,  0.2019,  0.0551,  0.2438,  0.0117, -

In [25]:
# training ver 8

# weight0 = model[0].weight
# weight1 = model[1].weight
# weight2 = model[2].weight

# # Access the gradients of the weight of each linear layer
# grads0 = weight0.grad
# grads1 = weight1.grad
# grads2 = weight2.grad

# Update the weights using the learning rate and the gradients
# weight0 = weight0 - lr * grads0
# weight1 = weight1 - lr * grads1
# weight2 = weight2 - lr * grads2

In [27]:
# training ver 9

# Create the optimizer
# optimizer = optim.SGD(model.parameters(), lr=0.001)

# loss = criterion(pred, target)
# loss.backward()

# # Update the model's parameters using the optimizer
# optimizer.step()