<p align="center">
    <img src="https://github.com/FRI-Energy-Analytics/energyanalytics/blob/main/EA_logo.jpg?raw=true" width="240" height="240" />
</p>

# Neural Networks in PyTorch Forward Pass
## Freshman Research Initiative Energy Analytics CS 309


In [1]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
%matplotlib inline

In [2]:
data = pd.read_csv(r'well_data.csv') #read it in
data.tail()

Unnamed: 0,DEPT,AHT10,AHT20,AHT30,AHT60,AHT90,AHTCO60,AHTCO90,DPHZ,DSOZ,...,ITT,NPOR,PEFZ,RSOZ,RXOZ,SDEV,SP,SPHI,RHOZ,TOP
5466,1914.0,1.6167,3.0335,7.5475,8.5244,9.1691,117.3103,109.0619,-0.4129,0.5048,...,0.2649,0.4847,10.0,0.0348,2.9599,1.0538,1.625,0.7009,3.3313,MATANUSKA
5467,1913.5,1.6164,3.0324,7.5492,8.5195,9.183,117.3782,108.8963,-0.6763,0.3208,...,0.265,0.476,10.0,0.0,1.7452,1.077,10.9375,0.6161,3.7659,MATANUSKA
5468,1913.0,1.6163,3.0317,7.5488,8.5243,9.1852,117.3116,108.8711,-0.9772,0.2371,...,0.2651,0.4754,10.0,0.0,0.3407,1.0509,43.8125,0.5991,4.2624,MATANUSKA
5469,1912.5,1.6162,3.0311,7.5493,8.5248,9.1936,117.3051,108.7711,-1.1748,0.212,...,0.2652,0.4853,10.0,0.0,0.2168,0.8236,79.5,0.6521,4.5884,MATANUSKA
5470,1912.0,1.6161,3.0305,7.5496,8.5289,9.1974,117.2483,108.7263,-1.1654,0.208,...,0.2652,0.4471,9.9845,0.0,0.1797,0.7958,108.5,0.6699,4.5729,MATANUSKA


In [3]:
from sklearn import preprocessing #for label encoding
#label encode our formation data
le = preprocessing.LabelEncoder()
top_names = data.TOP
le.fit(data.TOP)
tops = le.transform(data.TOP)
data.drop('TOP', axis=1, inplace=True)

#### We have our data organized and split, now let's get into `PyTorch`

`PyTorch` is just like `Numpy` in that we can do things like matrix multiplications and manipulate `tensors` which are the `PyToch` equivalent of `arrays`. In fact we can create a `PyTorch` `tensor` directly from a `Numpy` `array`

In [4]:
# instead of doing a test, train split we are going to use torch's dataloader
# here we take our features and labels, convert them to numpy arrays, and then convert them to tensors

train = torch.utils.data.TensorDataset(torch.Tensor(np.array(data)), torch.Tensor(np.array(tops)))

# Now we split the data into different batch sizes
train_loader = torch.utils.data.DataLoader(train, batch_size = 64, shuffle = True)
# Here we grab some of the features and labels
features, labels = next(iter(train_loader))
features

tensor([[ 3.6220e+03,  4.4251e+00,  4.5191e+00,  ...,  1.7500e+00,
          3.6780e-01,  2.4403e+00],
        [ 2.6905e+03,  1.2609e+01,  1.3061e+01,  ..., -8.6875e+00,
          2.4510e-01,  2.4811e+00],
        [ 4.0075e+03,  7.6512e+00,  7.6816e+00,  ...,  2.4375e+00,
          3.0040e-01,  2.4709e+00],
        ...,
        [ 3.8820e+03,  3.2788e+00,  3.3559e+00,  ...,  0.0000e+00,
          4.5630e-01,  2.4266e+00],
        [ 4.2270e+03,  1.1139e+01,  1.1348e+01,  ..., -1.0438e+01,
          2.9420e-01,  2.4057e+00],
        [ 3.1425e+03,  1.3644e+01,  1.4071e+01,  ..., -3.9375e+00,
          2.1720e-01,  2.5147e+00]])

We can also do pretty much everything that `Numpy` does in `PyTorch`. Let's go ahead and reimplement our network from the previous notebook in `PyTorch`. Start with a sigmoid activation function like we had in the previous notebook

In [5]:
def activation(x):
    return 1 / (1 + torch.exp(-x))

Next, let's start to build our network. Since we have 27 features, let's make a network that takes in 27 input units, has 54 hidden units, and has two output units. We are going to use our sigmoid activation for the hidden layer of the network. 

In [6]:
n_input = features.shape[1] #27 features
n_hidden = 54
n_output = 2

# Weights for inputs to hidden layer
W1 = torch.randn(n_input, n_hidden)
# Weights for hidden layer to output layer
W2 = torch.randn(n_hidden, n_output)

# and bias terms for hidden and output layers
B1 = torch.randn((1, n_hidden))
B2 = torch.randn((1, n_output))

In [7]:
hidden = activation(torch.mm(features, W1)+B1)
y_hat = torch.mm(hidden, W2)+B2
print(y_hat[:10])

tensor([[-3.2210,  1.3566],
        [-3.2209,  1.3564],
        [-3.2209,  1.3564],
        [-3.3356,  1.7754],
        [-3.2209,  1.3564],
        [-3.2209,  1.3564],
        [-3.3345,  0.7197],
        [-3.2209,  1.3564],
        [-3.2209,  1.3564],
        [-3.3356,  1.7754]])


Just like in the last notebook, we want to run our final output through a softmax function to get a probability for each class. Let's define a softmax function using `PyTorch`

In [8]:
def softmax(x):
    return torch.exp(x)/torch.sum(torch.exp(x), dim=1).view(-1, 1)

If we run our predictions through the softmax function, we get the probability for each class. Here we see that the first class has a high probability and the second class has a low probability for the first 10 samples

In [9]:
print(softmax(y_hat)[:10])

tensor([[0.0102, 0.9898],
        [0.0102, 0.9898],
        [0.0102, 0.9898],
        [0.0060, 0.9940],
        [0.0102, 0.9898],
        [0.0102, 0.9898],
        [0.0171, 0.9829],
        [0.0102, 0.9898],
        [0.0102, 0.9898],
        [0.0060, 0.9940]])


This can all get wildly complex, but luckily `PyTorch` has lots of modules that makes it much easier to build and train neural networks. Let's use the `nn` module to build the same network from above and see how it does

In [10]:
from torch import nn

In [11]:
class Network(nn.Module):
    def __init__(self):
        # we are inheriting from the nn.Module class
        # This along with the line below creates a class that tracks architecture
        # and has those methods and attributes
        super().__init__() 
        
        # Inputs to hidden layer linear transformation
        # The weights and bias tensors are automatically created
        self.hidden = nn.Linear(27, 54)
        # Output layer, 2 units, one for each class
        # same with weights and bias tensor for the output layer
        self.output = nn.Linear(54, 2)
        
        # Define sigmoid activation and softmax output 
        self.sigmoid = nn.Sigmoid()
        self.softmax = nn.Softmax(dim=0)
        
        # nn.Module networks need to have a forward method defined
        # this method takes a tensor and passes it through the operations above
    def forward(self, x):
        # Pass the input tensor through each of our operations
        x = self.hidden(x)
        x = self.sigmoid(x)
        x = self.output(x)
        x = self.softmax(x)
        
        return x

In [12]:
# Instantiate the network and look at its text representation
model = Network()
model

Network(
  (hidden): Linear(in_features=27, out_features=54, bias=True)
  (output): Linear(in_features=54, out_features=2, bias=True)
  (sigmoid): Sigmoid()
  (softmax): Softmax(dim=0)
)

Now that we have defined our architecture let's go ahead and make a forward pass through the network and see what the output is

In [13]:
print(model.forward(features.float())[:10])

tensor([[0.0169, 0.0174],
        [0.0155, 0.0154],
        [0.0155, 0.0154],
        [0.0171, 0.0176],
        [0.0156, 0.0154],
        [0.0155, 0.0154],
        [0.0155, 0.0153],
        [0.0155, 0.0154],
        [0.0140, 0.0138],
        [0.0168, 0.0172]], grad_fn=<SliceBackward>)


Both classes have similar probability. This is because the network has not been trained and the random weights and bias tensors yield a random prediction centered between the two classes.

### Building Networks

There are lots of ways we can build networks in `PyTorch`, let's take a look at a couple of different options. First, we can use the `nn.functional` module:

In [14]:
import torch.nn.functional as F

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        # Inputs to hidden layer 
        self.hidden = nn.Linear(27, 54)
        # Output layer, 2 units, same as before
        self.output = nn.Linear(54, 2)
        
    def forward(self, x):
        # Hidden layer with sigmoid activation
        x = torch.sigmoid(self.hidden(x))
        # Output layer with softmax activation
        x = F.softmax(self.output(x), dim=0)
        
        return x

In [15]:
# Instantiate the network and look at its text representation
model = Network()
model
print(model.forward(features.float())[:10])

tensor([[0.0152, 0.0156],
        [0.0169, 0.0161],
        [0.0151, 0.0154],
        [0.0152, 0.0156],
        [0.0152, 0.0156],
        [0.0172, 0.0162],
        [0.0159, 0.0144],
        [0.0151, 0.0164],
        [0.0142, 0.0139],
        [0.0150, 0.0160]], grad_fn=<SliceBackward>)


We can also add in different activation functions such as `ReLu`. Let's add another hidden layer to our network and use `ReLu` activation functions for the hidden layers, and `Softmax` for our output layer

In [16]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        # Defining the layers, 27, 54, 10 units each
        self.fc1 = nn.Linear(27, 54)
        self.fc2 = nn.Linear(54, 10)
        # Output layer, 2 units, same as above
        self.fc3 = nn.Linear(10, 2)
        
    def forward(self, x):
        ''' Forward pass through the network, returns the output logits '''
        
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.fc3(x)
        x = F.softmax(x, dim=1)
        
        return x

model = Network()
model

Network(
  (fc1): Linear(in_features=27, out_features=54, bias=True)
  (fc2): Linear(in_features=54, out_features=10, bias=True)
  (fc3): Linear(in_features=10, out_features=2, bias=True)
)

We can print the weights and bias tensors for the different layers in the network

In [17]:
print(model.fc1.weight)
print(model.fc1.bias)

Parameter containing:
tensor([[-0.0103, -0.1655,  0.0881,  ..., -0.0273, -0.1077,  0.1087],
        [ 0.1207,  0.1368,  0.0889,  ..., -0.0606,  0.0185, -0.0042],
        [ 0.0782,  0.0809,  0.0131,  ..., -0.0685,  0.0523,  0.0028],
        ...,
        [-0.0340, -0.0925, -0.1813,  ..., -0.1811, -0.1805, -0.1083],
        [-0.1005, -0.1281,  0.1265,  ..., -0.1028,  0.0426,  0.0050],
        [-0.0429, -0.1316, -0.1778,  ...,  0.0416, -0.1582,  0.1541]],
       requires_grad=True)
Parameter containing:
tensor([-0.0979,  0.0906,  0.1698, -0.1031, -0.0441, -0.0693,  0.1183, -0.1859,
         0.1809,  0.0882,  0.0707,  0.0408, -0.0953,  0.0376, -0.1632,  0.1893,
        -0.1736, -0.1135, -0.1874, -0.1073, -0.0306, -0.0889, -0.0358,  0.0191,
         0.1872, -0.0086, -0.0286, -0.0331, -0.0768,  0.0154,  0.0049, -0.1875,
         0.0699,  0.1010, -0.1181, -0.1818,  0.1758,  0.0943,  0.0851, -0.0296,
        -0.0445,  0.0213,  0.1443,  0.1306, -0.0102, -0.1580,  0.1033,  0.1626,
         0.1302

### Sequential networks
We can also define networks using `nn.Sequential` where we define the input size, hidden layer sizes, and the output size

In [18]:
# Hyperparameters for our network
input_size = 27
hidden_sizes = [54, 10]
output_size = 2

# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[1], output_size),
                      nn.Softmax(dim=0))
print(model)

# Forward pass through the network and display output
print(model.forward(features.float())[:10])

Sequential(
  (0): Linear(in_features=27, out_features=54, bias=True)
  (1): ReLU()
  (2): Linear(in_features=54, out_features=10, bias=True)
  (3): ReLU()
  (4): Linear(in_features=10, out_features=2, bias=True)
  (5): Softmax(dim=0)
)
tensor([[3.4407e-09, 5.4438e-09],
        [4.6929e-18, 2.3776e-05],
        [2.8010e-06, 6.7094e-12],
        [1.8913e-06, 9.9627e-10],
        [1.5728e-09, 2.3568e-09],
        [4.0881e-20, 2.6142e-04],
        [3.4388e-23, 2.7660e-02],
        [3.2906e-14, 4.2045e-07],
        [1.1566e-04, 5.4128e-13],
        [4.5798e-23, 1.2102e-01]], grad_fn=<SliceBackward>)


So far we have built the forward pass through the network, but how do we update the weights and biases? We do this through training and backpropagation