# Week 6: Introduction to Pytorch & Neural Networks


## Chapter Outline
<hr>

<div class="toc"><ul class="toc-item"><li><span><a href="#Chapter-Learning-Objectives" data-toc-modified-id="Chapter-Learning-Objectives-2">Chapter Learning Objectives</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-3">Imports</a></span></li><li><span><a href="#1.-Introduction" data-toc-modified-id="1.-Introduction-4">1. Introduction</a></span></li><li><span><a href="#2.-PyTorch's-Tensor" data-toc-modified-id="2.-PyTorch's-Tensor-5">2. PyTorch's Tensor</a></span></li><li><span><a href="#3.-Neural-Network-Basics" data-toc-modified-id="3.-Neural-Network-Basics-6">3. Neural Network Basics</a></span></li></ul></div>

## Chapter Learning Objectives
<hr>

- Describe the difference between `NumPy` and `torch` arrays (`np.array` vs. `torch.Tensor`).
- Explain fundamental concepts of neural networks such as layers, nodes, activation functions, etc.
- Create a simple neural network in PyTorch for regression or classification.

## Imports
<hr>

In [None]:
import sys
import numpy as np
import pandas as pd
import torch
from torchsummary import summary
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import make_regression, make_circles, make_blobs
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

## 1. Introduction
<hr>

PyTorch is a Python-based tool for scientific computing that provides several main features:
- `torch.Tensor`, an n-dimensional array similar to that of `NumPy`, but which can run on GPUs
- Computational graphs and an automatic differentiation enginge for building and training neural networks

You can install PyTorch from: https://pytorch.org/.

## 2. PyTorch's Tensor
<hr>

In PyTorch a tensor is just like NumPy's `ndarray`.

A key difference between PyTorch's `torch.Tensor` and NumPy's `np.array` is that `torch.Tensor` was constructed to integrate with GPUs and PyTorch's computational graphs (more on that next chapter though).

### 2.1. `ndarray` vs `tensor`

Creating and working with tensors is much the same as with NumPy `ndarrays`. You can create a tensor with `torch.tensor()`:

In [None]:
tensor_1 = torch.tensor([1, 2, 3])
tensor_2 = torch.tensor([1, 2, 3], dtype=torch.float32)
tensor_3 = torch.tensor(np.array([1, 2, 3]))

for t in [tensor_1, tensor_2, tensor_3]:
    print(f"{t}, dtype: {t.dtype}")

PyTorch also comes with most of the `NumPy` functions you're probably already familiar with:

In [None]:
torch.zeros(2, 2)  # zeroes

In [None]:
torch.ones(2, 2)  # ones

In [None]:
torch.randn(3, 2)  # random normal

In [None]:
torch.rand(4, 2, 3)  # rand uniform

Just like in NumPy we can look at the shape of a tensor with the `.shape` attribute:

In [None]:
x = torch.rand(2, 3, 2, 2)
x.shape

In [None]:
x.ndim

### 2.2. Tensors and Data Types

Different data types have different memory and computational implications. In Pytorch we'll be building networks that require thousands or even millions of floating point calculations! In such cases, using a smaller dtype like `float32` can significantly speed up computations and reduce memory requirements. The default float dtype in pytorch `float32`, as opposed to NumPy's `float64`.

In [None]:
print(np.array([3.14159]).dtype)
print(torch.tensor([3.14159]).dtype)

But just like in NumPy, you can always specify the particular dtype you want using the `dtype` argument:

In [None]:
print(torch.tensor([3.14159], dtype=torch.float64).dtype)

### 2.3. Operations on Tensors

Tensors operate just like `ndarrays` and have a variety of familiar methods that can be called off them:

In [None]:
a = torch.rand(1, 3)
b = torch.rand(3, 1)

a + b  # broadcasting betweean a 1 x 3 and 3 x 1 tensor

In [None]:
a * b

In [None]:
a.mean()

In [None]:
a.sum()

### 2.4. Indexing

Once again, same as NumPy!

In [None]:
X = torch.rand(5, 2)
print(X)

In [None]:
print(X[0, :])
print(X[0])
print(X[:, 0])

### 2.5. GPU and CUDA Tensors

GPU stands for "graphical processing unit" (as opposed to a CPU: central processing unit). GPUs were originally developed for gaming, they are very fast at performing operations on large amounts of data by performing them in parallel (think about updating the value of all pixels on a screen very quickly as a player moves around in a game). More recently, GPUs have been adapted for more general purpose programming. Neural networks can typically be broken into smaller computations that can be performed in parallel on a GPU. PyTorch is tightly integrated with CUDA - a software layer that facilitates interactions with a GPU (if you have one). You can check if you have GPU capability using:

In [None]:
torch.cuda.is_available()  # my MacBook Pro does not have a GPU

When training on a machine that has a GPU, you need to tell PyTorch you want to use it. You'll see the following at the top of most PyTorch code:

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

You can then use the `device` argument when creating tensors to specify whether you wish to use a CPU or GPU. Or if you want to move a tensor between the CPU and GPU, you can use the `.to()` method:

In [None]:
X = torch.rand(2, 2, 2, device=device)
print(X.device)

In [None]:
# X.to('cuda')  # this would give me an error as I don't have a GPU so I'm commenting out

We'll revisit GPUs later in the course when we are working with bigger datasets and more complex networks. For now, we can work on the CPU just fine.

## 3. Neural Network Basics
<hr>

It's probably that you've already learned about several machine learning algorithms (kNN, Random Forest, SVM, etc.). Neural networks are simply another algorithm and actually one of the simplest in my opinion! As we'll see, a neural network is just a sequence of linear and non-linear transformations. Often you see something like this when learning about/using neural networks:

![](https://raw.githubusercontent.com/Shangyue-CWU/CS457Draft/refs/heads/main/img/nn-6.png)

So what on Earth does that all mean? Well we are going to build up some intuition one step at a time.

### 3.1. Simple Linear Regression with a Neural Network

Let's create a simple regression dataset with 500 observations:

In [None]:
X, y = make_regression(n_samples=500, n_features=1, random_state=0, noise=10.0)
import matplotlib.pyplot as plt
%matplotlib inline
# plot_regression(X, y)
# Plot the regression data
plt.figure(figsize=(6, 5))
plt.scatter(X, y, color='blue', alpha=0.5, label='Data Points')
plt.title('Regression Data', fontsize=14)
plt.xlabel('Feature (X)', fontsize=12)
plt.ylabel('Target (y)', fontsize=12)
plt.grid(True)
plt.legend()
plt.show()

We can fit a simple linear regression to this data using sklearn:

In [None]:
sk_model = LinearRegression().fit(X, y)

plt.figure(figsize=(6, 5))
plt.scatter(X, y, color='blue', alpha=0.5, label='Data Points')

plt.plot(X, sk_model.predict(X), color='red', label='Fitted line')
plt.title('Regression Data', fontsize=14)
plt.xlabel('Feature (X)', fontsize=12)
plt.ylabel('Target (y)', fontsize=12)
plt.grid(True)
plt.legend()
plt.show()


In [None]:
# plot_regression(X, y, sk_model.predict(X))

Here are the parameters of that fitted line:

In [None]:
print(f"w_0: {sk_model.intercept_:.2f} (bias/intercept)")
print(f"w_1: {sk_model.coef_[0]:.2f}")

As an equation, that looks like this:

$$\hat{y}=-0.77 + 45.50X$$

Or in matrix form:

$$\begin{bmatrix} \hat{y_1} \\ \hat{y_2} \\ \vdots \\ \hat{y_n} \end{bmatrix}=\begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} \begin{bmatrix} -0.77 \\ 45.55 \end{bmatrix}$$

Or in graph form I'll represent it like this: 

![](https://raw.githubusercontent.com/Shangyue-CWU/CS457Draft/refs/heads/main/img/nn-1.png)

### 3.2. Linear Regression with a Neural Network in PyTorch

So let's implement the above in PyTorch to start gaining an intuition about neural networks! Almost every neural network model you build in PyTorch will inherit from `torch.nn.Module`. 

Let's create a model called `linearRegression`:

In [None]:
class linearRegression(nn.Module):  # our class inherits from nn.Module and we can call it anything we like
    def __init__(self, input_size, output_size):
        super().__init__()                                # super().__init__() makes our class inherit everything from torch.nn.Module
        self.linear = nn.Linear(input_size, output_size)  # this is a simple linear layer: wX + b

    def forward(self, x):
        out = self.linear(x)
        return out

Let's step through the above:

```python
class linearRegression(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__() 
```

^ Here we're creating a class called `linearRegression` and inheriting the methods and attributes of `nn.Module` (hint: try typing `help(linearRegression)` to see all the things we inheritied from `nn.Module`).

```python
        self.linear = nn.Linear(input_size, output_size)
```

^ Here we're defining a "Linear" layer, which just means `wX + b`, i.e., the weights of the network, multiplied by the input features plus the bias.

```python
    def forward(self, x):
        out = self.linear(x)
        return out
```

^ PyTorch networks created with `nn.Module` must have a `forward()` method. It accepts the input data `x` and passes it through the defined operations. In this case, we are passing `x` into our linear layer and getting an output `out`.

After defining the model class, we can create an instance of that class:

In [None]:
model = linearRegression(input_size=1, output_size=1)

![](https://raw.githubusercontent.com/Shangyue-CWU/CS457Draft/refs/heads/main/img/nn-2.png)

We can check out our model using `print()`:

In [None]:
print(model)

Or the more useful `summary()` (which we imported at the top of this notebook with `from torchsummary import summary`):

In [None]:
summary(model, (1,));

Notice how we have two parameters? We have one for the weight (`w1`) and one for the bias (`w0`). These were initialized randomly by PyTorch when we created our model. They can be accessed with `model.state_dict()`:

In [None]:
model.state_dict()

Okay, before we move on, the `x` and `y` data I created are currently NumPy arrays but they need to be PyTorch tensors. Let's convert them:

In [None]:
X_t = torch.tensor(X, dtype=torch.float32)  # I'll explain requires_grad next Chapter
y_t = torch.tensor(y, dtype=torch.float32)

We have a working model right now and could tell it to give us some output with this syntax:

In [None]:
y_p = model(X_t[0]).item()
print(f"Predicted: {y_p:.2f}")
print(f"   Actual: {y[0]:.2f}")

Our prediction is pretty bad because our model is not trained/fitted yet! As we learned in the past few chapters, to fit our model we need:
1. **a loss function** (called "criterion" in PyTorch) to tell us how good/bad our predictions are - we'll use **Mean Squared Error**, `torch.nn.MSELoss()`
2. **an optimization algorithm** to help optimise model parameters - we'll use **Stochastic Gradient Descent**, `torch.optim.SGD()`

In [None]:
LEARNING_RATE = 0.1
criterion = nn.MSELoss()  # loss function
optimizer = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE)  # optimization algorithm is SGD

Before we train, we're going to create a "data loader" to help batch my data. We'll talk more about these in later chapters but just think of them as generators that yield data to us on request. We'll use a `BATCH_SIZE = 50` (which should give us 10 batches because we have 500 data points):

In [None]:
BATCH_SIZE = 50
dataset = TensorDataset(X_t, y_t)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)

So, we should have 10 batches:

In [None]:
print(f"Total number of batches: {len(dataloader)}")

We can look at a batch using this syntax:

In [None]:
XX, yy = next(iter(dataloader))
print(f" Shape of feature data (X) in batch: {XX.shape}")
print(f"Shape of response data (y) in batch: {yy.shape}")

With our data loader defined, let's train our simple network for 5 epochs of SGD!

> I'll explain all the code here next chapter but scan throught it, it's not too hard to see what's going on!

In [None]:
def trainer(model, criterion, optimizer, dataloader, epochs=5, verbose=True):
    """Simple training wrapper for PyTorch network."""
    
    for epoch in range(epochs):
        losses = 0
        for X, y in dataloader:
            # you code        # Clear gradients w.r.t. parameters
            # you code       # Forward pass to get output
            # you code        # Calculate loss
            # you code         # Getting gradients w.r.t. parameters
            # you code            # Update parameters
            # you code       # Add loss for this batch to running total
            
            optimizer.zero_grad()       # Clear gradients w.r.t. parameters
            y_hat = model(X).flatten()  # Forward pass to get output
            loss = criterion(y_hat, y)  # Calculate loss
            loss.backward()             # Getting gradients w.r.t. parameters
            optimizer.step()            # Update parameters
            losses += loss.item()       # Add loss for this batch to running total
        if verbose: print(f"epoch: {epoch + 1}, loss: {losses / len(dataloader):.4f}")
   
trainer(model, criterion, optimizer, dataloader, epochs=5, verbose=True)

Now our model has been trained, our parameters should be different than before:

In [None]:
model.state_dict()

Comparing to the sklearn model, we get a very similar answer:

In [None]:
pd.DataFrame({"w0": [sk_model.intercept_, model.state_dict()['linear.bias'].item()],
              "w1": [sk_model.coef_[0], model.state_dict()['linear.weight'].item()]},
             index=['sklearn', 'pytorch']).round(2)

We got pretty close! We could do better by changing the number of epochs or the learning rate. So here is our simple network once again:

![](https://raw.githubusercontent.com/Shangyue-CWU/CS457Draft/refs/heads/main/img/nn-2.png)

By the way, check out what happens if we run `trainer()` again:

In [None]:
trainer(model, criterion, optimizer, dataloader, epochs=5, verbose=True)

Our model continues where we left off! This may or may not be what you want. We can start from scratch by re-making our `model` and `optimizer`.

### 3.3. Multiple Linear Regression with a Neural Network

Okay, let's do a multiple linear regression now with 3 features. So our network will look like this:

![](https://raw.githubusercontent.com/Shangyue-CWU/CS457Draft/refs/heads/main/img/nn-3.png)

Let's go ahead and create some data:

In [None]:
# Create dataset
X, y = make_regression(n_samples=500, n_features=3, random_state=0, noise=10.0) # sklearn
X_t = torch.tensor(X, dtype=torch.float32)
y_t = torch.tensor(y, dtype=torch.float32)
# Create dataloader
dataset = TensorDataset(X_t, y_t)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)

And let's create the above model:

In [None]:
model = linearRegression(input_size=3, output_size=1)

We should now have 4 parameters (3 weights and 1 bias):

In [None]:
summary(model, (3,));

Looks good to me! Let's train the model and then compare it to sklearn's `LinearRegression()`:

In [None]:
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE)
trainer(model, criterion, optimizer, dataloader, epochs=5, verbose=True)

In [None]:
sk_model = LinearRegression().fit(X, y)
pd.DataFrame({"w0": [sk_model.intercept_, model.state_dict()['linear.bias'].item()],
              "w1": [sk_model.coef_[0], model.state_dict()['linear.weight'][0, 0].item()],
              "w2": [sk_model.coef_[1], model.state_dict()['linear.weight'][0, 1].item()],
              "w3": [sk_model.coef_[2], model.state_dict()['linear.weight'][0, 2].item()]},
             index=['sklearn', 'pytorch']).round(2)

### 3.4. Non-linear Regression with a Neural Network

Okay so we've made simple networks to imitate simple and multiple *linear* regression. You're probably thinking, so what? But we're getting to the good stuff I promise! For example, what happens when we have more complicated datasets like this?

In [None]:
# Create dataset
np.random.seed(2020)
X = np.sort(np.random.randn(500))
y = X ** 2 + 15 * np.sin(X) **3
X_t = torch.tensor(X[:, None], dtype=torch.float32)
y_t = torch.tensor(y, dtype=torch.float32)

# Create dataloader
dataset = TensorDataset(X_t, y_t)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)

In [None]:
plt.figure(figsize=(6, 5))
plt.scatter(X, y, color='blue', alpha=0.5, label='Data Points')
plt.title('Regression Data', fontsize=14)
plt.xlabel('Feature (X)', fontsize=12)
plt.ylabel('Target (y)', fontsize=12)
plt.ylim([-25,25])
plt.xlim([-3,3])
plt.grid(True)
plt.legend()
plt.show()

This is obviously non-linear, and we need to introduce some **non-linearities** into our network. These non-linearities are what make neural networks so powerful and they are called **"activation functions"**. We are going to create a new model class that includes a non-linearity - a sigmoid function:

$$S(X)=\frac{1}{1+e^{-x}}$$

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Generate a range of x values
x_range = np.linspace(-10, 10, 500)  # Covers a wide range to show the curve
sigmoid_values = sigmoid(x_range)

# Plot the sigmoid curve
plt.figure(figsize=(6, 5))
plt.plot(x_range, sigmoid_values, label="Sigmoid Curve", color="green", linewidth=2)

# Adding labels and title
# plt.axvline(0, color='gray', linestyle='--', alpha=0.7)  # Highlight x=0 (decision boundary)
# plt.axhline(0.5, color='gray', linestyle='--', alpha=0.7)  # Highlight y=0.5
plt.xlabel("Input Values (x)")
plt.ylabel("Sigmoid Output (S(x))")
plt.title("Sigmoid Function Curve")
plt.grid(alpha=0.4)
plt.legend()
plt.show()

# xs = np.linspace(-15, 15, 100)
# plot_regression(xs, [0], sigmoid(xs), x_range=[-5, 5], y_range=[0, 1], dy=0.2)

We'll talk more about activation functions later, but note how the sigmoid function non-linearly maps `x` to a value between 0 and 1. Okay, so let's create the following network:

![](https://raw.githubusercontent.com/Shangyue-CWU/CS457Draft/refs/heads/main/img/nn-5.png)

All this means is that the value of each node in the hidden layer will be transformed by the "activation function", thus introducing non-linear elements to our model! There's two main ways of creating the above model in PyTorch, I'll show you both:

In [None]:
class nonlinRegression(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.hidden = nn.Linear(input_size, hidden_size)
        self.output = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.hidden(x)       # input -> hidden layer
        x = self.sigmoid(x)      # sigmoid activation function in hidden layer
        x = self.output(x)       # hidden -> output layer
        return x

Note how our `forward()` method now passes `x` through the `nn.Sigmoid()` function after the hidden layer. The above method is very clear and flexible, but I prefer using `nn.Sequential()` to combine my layers together in the constructor:

In [None]:
class nonlinRegression(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.main = torch.nn.Sequential(
            nn.Linear(input_size, hidden_size),  # input -> hidden layer
            nn.Sigmoid(),                        # sigmoid activation function in hidden layer
            nn.Linear(hidden_size, output_size)  # hidden -> output layer
        )

    def forward(self, x):
        x = self.main(x)
        return x

Let's make an instance of our new class and confirm it has 10 parameters (6 weights + 4 biases):

In [None]:
model = nonlinRegression(1, 3, 1)
summary(model, (1,));

Okay, let's train:

In [None]:
criterion = nn.MSELoss() # Mean Squared Error (MSE) Loss
optimizer = torch.optim.SGD(model.parameters(), lr=0.3) # Stochastic Gradient Descent (SGD) lr (learning rate) = 0.3

# Call the training function to train the model
# - model: The neural network to train
# - criterion: The loss function used to calculate the error
# - optimizer: The optimization algorithm to update model parameters
# - dataloader: Provides batches of data for training
# - epochs: Number of times the entire dataset is passed through the model
# - verbose: If True, provides detailed output during training (e.g., loss values after each epoch)
trainer(model, criterion, optimizer, dataloader, epochs=5, verbose=True)

In [None]:
y_p = model(X_t).detach().numpy().squeeze()
plt.figure(figsize=(6, 5))
plt.scatter(X, y, color='blue', alpha=0.5, label='Data Points')
plt.plot(X, y_p, color='red', label='Fitted line')
plt.title('<Your name> + Regression Data', fontsize=14)
plt.xlabel('Feature (X)', fontsize=12)
plt.ylabel('Target (y)', fontsize=12)
plt.ylim([-25,25])
plt.xlim([-3,3])
plt.grid(True)
plt.legend()
plt.show()

### Please run y_p and model(X_t) explain why we do this.