# TorchBook

A Spellbook for Torch (basically local lookup of my own stuff, stackoverflow posts & docs for torchy things)

- Each cell should roughly be a standalone runable (copy and paste to the ether and will work)


がんばって


In [1]:
import torch
import numpy as np 

## Tensors 


+ Official Docs- https://pytorch.org/docs/stable/tensors.html 
+ Nice Tutorial- https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html
+ basically numpy arrays (even share memory address if on CPU, could be problematic)



In [2]:
# Creating a Tensor from Existing ndarray-like object (np.ndarrays or list of lists, etc.)

data = [[1, 2], [3, 4], [5, 6]] #list of lists work
x_data = torch.tensor(data) 

# or if in Numpy
np_array = np.array(data) #can also read from a numpy array (easy to connect to a numerically encoded PandasFrame)
x_data = torch.tensor(np_array) #same call as list-of-lists


In [None]:
# Generating Tensors w/o Existing Data
# More Methods- https://www.geeksforgeeks.org/creating-a-tensor-in-pytorch/

torch.randint(low=0, high=10, size=(3, 1, 5)) #create a tensor of shape [3x1x5] w random values between 0 & 10
torch.eye(n=5, m=5) #generate a identity matrix of shape nxm (diagonals=1, other_values=0)
torch.zeros(3,2) #returns tensor of zeroes in specified shape

In [None]:
# New Tensor Based on an Existing One (transformed in similar shape) -- "_like" is moniker for creating tes

misc_data = torch.randint(low=-3, high=3, size=(7, 2))

x_ones = torch.ones_like(misc_data) #returns a tensor of same shape as x_data with all values (x_ij) = 1
x_rand = torch.rand_like(misc_data, dtype=torch.float) #returns same shape tensor with random values, can change dtype as well

print(x_ones, x_rand, sep="\n\n")

In [4]:
# Tensor Attributes

t = torch.rand(4, 5)

t.shape #the dims of the Tensor (specified in above function)
t.device #where tensor is at the moment, either: ["cpu", "cuda"+device_index] (index provided since multi-gpu is possibility)
t.dtype #data-type for tensor, supported types are listed in docs
t.requires_grad #determines if this tensor requires gradient calculation (takes derivative in forward pass if model in train mode)


Gradient function for z = None


In [105]:
# Tensor Functions -- callable methods available within Tensor

t = torch.rand(2, 10)

# Send a Tensor to Device (CPU or GPU)
compute_device = "cuda" if torch.cuda.is_available() else "cpu"
t.to(compute_device) #send tensor/object to GPU or CPU, use above to specify (can also include index)


# Transformations & Math Operations
t.T #linalg-like transpose for a vector/matrix, can get finicky in higher dim tensors
t.matmul(t.T) #make sure dims are correct for matmul (vector needs multiply w transposed vector, i.e. Nx1 * 1XN)

t * t #elementwise product, need correct dims!
t.mul(t) #alt syntax for elementwise product

t.abs() #returns the absolute value of all items in tensor (converts negs into pos), same dim for returned tensor

sum_of_t = t.sum() #computes sum of all values in t (all dims condensed into a "scalar")
sum_of_t.item() #returns the value (python-like float, int, etc.) if this Tensor is a scalar (shape = "torch.Size([])")

torch.cat([t, t], dim=0) #concatenates N tensors (specified in list) along a specified dim (need same dimensions for sub-dims, but not necessarily same 'len' for cat dim)

# These are `in-place`operations, that is the _ in each of the functions does the operation without needing assignment, can omit the underscore for regular python behavior
t.add_(42) #adds the specified value to all elements in the tensor
t.subtract_(2) #same for subtraction
t.mul_(2) #same for scaling multiplication
t.div_(2) #same for scaling division, need a float dtype!


# Numpy Compatiblity -- if Tensor on CPU, any changes made to np array will be made to Tensor (same object in low-level memory)
np_t = t.numpy() #create new np array from tensor (this is a method of tensor, below is a torch method!)
t = torch.from_numpy(np_t) #alt way of creating tensor from an np.ndarray



In [None]:
# Indexing Tensors

t = torch.rand(10, 2, 2)

t[2] #access one of the ^ 10 Matricies of shape 2x2 (total tensor shape is 10x2x2 but we index each dim, this case is first dim)
t[0][1][1] #access an single item in dim 3, given prev dims we want to index

t[1:4] #slice based on an arbitrary dimension (0th indexed like numpy, so for accessing the "10th" element we use: "t[9]")


## Datasets & DataLoaders

Standardized way for interacting with external databases (in RL or similar scenario we could just use tensors for holding data, but images or audio need a wrapper for this layer)

- Data Docs- https://pytorch.org/docs/stable/data.html
- Related Tutorial– https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
- Custom Datasets are not too horrible to right
- Torch supports two types of "DataSet" styles, that is they have
  - [Iterable-Style](https://pytorch.org/docs/stable/data.html#iterable-style-datasets)
  - [Map-Style](https://pytorch.org/docs/stable/data.html#map-style-datasets)


In [None]:
# Custom Dataset for a File Directory - Need Implement (AT MIN) the following in a Custom SubClass

import os
import pandas as pd
from torchvision.io import read_image
from torch.utils.data import Dataset


# BoilerPlate Code -- Need finagle for any unique usage
class CustomImageDataset(Dataset):
    # Need an `__init__` method w these params written (dir for data, any custom transforms, mapping of labels, etc.)
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file) #very important! need filepaths for instances and labels in this CSV
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    # Total Number of Instances (need count of this for other ops)
    def __len__(self):
        return len(self.img_labels)

    # Where to Fetch a instance and what transforms to apply
    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label



In [None]:
# Make dataloader from Tensor Dataset
from torch.utils.data import DataLoader, TensorDataset

train_set = TensorDataset(x_train, y_train) #items here are "torch.tensor"'s of a PandasFrame's data
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True) #need specify `batch_size` param

# View a Sampled Set of Data
for i_batch, samples in enumerate(train_loader):
    print("Batch:", i_batch)
    print("Data:", samples[0].shape)
    print("Labels:", samples[1])

# Quick test -- get samples and labels (quicker method of iterating through)
next(iter(train_loader))

In [None]:
# Concatenating Two Datasets Together (combining train & test splits)

dataset = ConcatDataset([training_data, test_data]) #where each of these "_data" files is a torch dataset

In [None]:
# Indexing a DataSet Object for Specific Instances

train_set[500][1] #this is parsed as follows: test_data[instance][0=data, 1=label]

In [None]:
# Print Shape for Items in a single iter of a DataLoader
[i.shape for i in next(iter(train_loader))] #returns the shapes of the input data and correct output/label

## Model Architectures

Torch is the superior framework

- Models Docs- https://pytorch.org/docs/stable/nn.html
- If doing training interminently ("warm start" training as the cool kids call it), reference- https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html


In [None]:
# Sample Model Architecture Code

import torch
import torch.nn as nn 
from torch.utils.data import DataLoader, TensorDataset


# Define NN Architecture w Regularization (not making use of sequential API for layer calls -- done manually in `forward`)
                                           # Sequential API- https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html
class FFN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, dropout=False, dropout_rate=0.5, batchnorm=False, activation=None):
        super(FFN, self).__init__()
        
        # Network Architecture (Layers + HardCoded Activations) -- 1 Hidden Layer Here 
        self.fc = nn.Linear(input_size, hidden_size) #input-to-hidden Weight Matrix
        self.h1 = nn.Linear(hidden_size, output_size) #hidden-to-output Weight Matrix
        
        # Activation Functions -- Full List @ https://pytorch.org/docs/stable/nn.html
        self.sigmoid = nn.Sigmoid()
        self.softmax = nn.Softmax() #useful for multi-class probabilties

        # Regularization
        self.batch_norm = nn.BatchNorm1d(output_size) if batchnorm else None
        self.dropout = nn.Dropout(dropout_rate) if dropout else None
        self.activation = activation
        self.flatten = nn.Flatten() #nice way of making sure input to model is always a vector!

    def forward(self, x):
        # Compute Forward Pass
        x = self.fc(x)
        x = self.sigmoid(x)
        x = self.h1(x)
        # x = self.sigmoid(x) #apply on output -- learning may fail if applied (vanishing gradient, predicts only one class for some architectures)
                              #learning (in some cases) may benefit from no output activation, larger gradient updates through backprop
                              
        # Apply Regularization (if specified)
        if self.batch_norm:
            x = self.batch_norm(x)
        if self.activation:
            x = self.activation(x) #calls activation function at end of forward pass
        if self.dropout:
            x = self.dropout(x)
        return x #output for given input


# Calling a Forward Pass w Model
model(x_train) #this is the syntax for a forward pass, implicitly calls `forward` function, calling `forward` directly is discouraged as this method also calls background ops



In [None]:
# Model Param Summary (Keras-like -- can install sep module w `pip install torchsummary`)
import torchinfo 

torchinfo.summary(model) #can be issues w this if GPU, check on all available devices if being finicky

In [None]:
# HyperParameters - The Usual Suspects
# (these govern how a model selects parameters)

learning_rate = 7e-3 #7e-3=0.007 hehe, setting this varies, generally depends on optimizer of choice
batch_size = 8 #batch_size of 1 means we train (update model params) on each instance, if larger we accumulate gradients and update params per batch (larger~=faster, whereas smaller~=more learning)
n_epochs = 10 #number of epochs to train for (total number of training iterations over entire dataset)

# Regularization HyperParams
dropout_rate = 0.8 #proportion of neurons to drop out (in this formulation)


In [None]:
# Loss Functions - https://pytorch.org/docs/stable/nn.html#loss-functions

# Instantiate Loss Function
loss_fn = torch.nn.CrossEntropyLoss() 

# Backpropagate Loss
loss.backward() #after the optimizer has accumulated gradients wrt model parameters

In [None]:
# Optimizing Algorithms - https://pytorch.org/docs/stable/optim.html#algorithmsl

# Instantiate an `optimizer`
optimizer = torch.optim.adam(model.parameters(), lr=learning_rate) #adam ftw

# Zero out gradients (typically ran during a training loop)
optimizer.zero_grad() #helps prevent gradients from being counted twice!

# Update Model Weights (after backprop)
optimizer.step() #adjusts the network weights (function parameters) by the gradients computed in the Backward Pass (i.e. `loss.backward()`)

In [None]:
# Creating a Learning Rate Scheduler -- review docs before implementing

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = ExponentialLR(optimizer, gamma=0.9)

for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()

In [None]:
# View Distribution of Output Predictions (Class Probs for a Forward Pass on Test Data) -- currently built for binary classes

model.eval()
y_pred = model(x_test) #predict on test set
y_pred = torch.sigmoid(y_pred) #if need class probabilities or some other transformation to raw preds (mapping to labels)

y_pred = y_pred.detach().numpy().reshape(-1) #make a np array out of preds & reshape if necessary
print(f"Distribution of Probs: {list(np.histogram(y_pred, bins=4)[0])}")
print(f"     Probability Bins: {list(np.histogram(y_pred, bins=4)[1])}") #could tie this to some sort of visualization like a histogram

In [None]:
# Sample Training & Evaluating Loops

# Classification Specific (Discrete-ly Correct Preds + Accuracy as a Metric to eval on)
def train(model, dataloader, n_epochs, loss_function, optimizer):
    for epoch in range(n_epochs):
        total = 0 
        correct = 0 

        # Expect Indicies + Data & Samples in this Version
        for i, (samples, labels) in enumerate(dataloader):
            # Clear Out Gradients per batch
            optimizer.zero_grad()

            # Forward Pass + Compute Loss
            pred = model(samples)
            loss = loss_function(pred, labels)

            # Backprop
            loss.backward()
            optimizer.step()

            # Num Correct Predictions -- Accuracy Metric 
            y_pred = torch.round(pred, decimals=6) #if binary output neuron (single), else can use argmax (softmax or multi-output)
            y_pred = (torch.sigmoid(y_pred.reshape(-1).detach()) > 0.5).float() #logits into labels- round the output probs @ a 0.5 threshold
            total += labels.size(0) #total num of instances seen (evaluated on all)
            correct += (y_pred == labels).sum() #total num of correct instances (count restarted per epoch)

        # Print Learning Info - per Epoch
        accuracy = (100 * correct/total) #binary classification case
        print(f'Epoch {epoch+1} - Training - Loss: {loss.item():.2f} & Accuracy: {accuracy:.2f}%')


# Below might be very wrong, was written in the abstract (for binary classification, why we have `threshold` param)
def evaluate(model, loss_function, x_test, y_test, threshold):
        model.eval() #no gradients here

        # Get Preds + Compute Loss
        y_pred = model(x_test)
        y_pred = torch.sigmoid(y_pred) #squash values (binary classification scenario)
        y_true = y_test.detach().numpy()
        pred_performance = loss_function(y_pred.squeeze(), y_test)

        # Compute Accuracy (wrt labels)
        y_pred = (y_pred.reshape(-1).detach().numpy() > threshold).astype('float')
        total = y_test.size(0)
        correct += (y_pred == y_test)
        accuracy = (correct/total) #on the order of 10s?

        print(f'Test Set - Loss: {pred_performance.item() :.2f} & Accuracy: {accuracy :.2f}%')



In [None]:
# Getting Model Parameters

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

In [None]:
# Accessing a Model `state_dict()`

md = model.state_dict()
md.keys() #list out the keys in the dictionary (names of model parameters, layer weights or biases, etc.)


# Save Model Parameters to Dir -- Non-Useful? method of saving info about models
meta = dict() #save shape info about saved parameters

for k in md.keys():
    # Specify Value per Parameter
    v = md[k].numpy()
    # print(k, "-->", v) #visualize the map from key to value

    # Change Key Name for Better Saving
    name = k.split(".")
    name = "-".join(name)
    np.savetxt(f"model_parameters/{name}.csv", v) #using numpy to nicely save
    meta[name] = v.shape


# Write out Metadata for each Params File
with open('model_parameters/METADATA.csv', 'w') as f:
    for key in meta.keys():
        f.write("%s, %s\n" % (key, meta[key]))


In [None]:
# Saving Models via `state_dict()`

torch.save(model.state_dict(), "../models/SampleModel.pt") #serialized version of model, need same architecture & device (cpu or gpu) for loading in later!

In [None]:
# Loading in a `state_dict()` Saved Model 

model = FFN() #need specify the SAME architecture & device as saved model to load in the saved weights (including any potential finicky hyperparameters)
model.load_state_dict(torch.load("../models/SampleModel.pt"))

In [None]:
# Loading in a PreTrained Torch Model 
import torchvision

model = torchvision.models.vgg16(pretrained=True) #pre-trained models are available in each domain version of torch (will download the model so should save weights!)
torch.save(model.state_dict(), 'models/vgg16.pth')

## Misc Functions

- Accessing GPU 

In [None]:
# GPU Access Functions

# Check if GPU is available 
torch.cuda.is_available() #true if not poor

# Establish a Device for Sending Torch Objects to GPU or CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') #alt method of doing above


# Get Count of Available GPUs (if really rich)
torch.cuda.device_count()

# Get Name of GPU based on Index
torch.cuda.get_device_name(0) #returns string name of GPU based on supplied index (i.e, 'NVIDIA GeForce RTX 3060 Ti')

# Get Tuple of Device Compatibility Versions, Cryptic & Hopefully never use
torch.cuda.get_device_capability(0) #reference this post for info- https://stackoverflow.com/questions/64535324/pytorch-get-device-capability-output-explanation

# Move Tensor VERBOSELY to GPU -- assume tensor `t` is in memory
t.cuda(device=0)
# or
t.cuda() #if no device to specify



In [None]:
# Planting Seeds in Torch -- for Reproducibility & Testing Algorithms

# All the Modules that Torch needs Set to make reproducible results
import os
import random
import torch

GLOBAL_SEED = 42 #what other default seed could there be?

# This func needs to get run before any model calls (at instantiation & before any training, need include at each point!)
def set_global_seed(seed=GLOBAL_SEED):
    random.seed(seed)
    np.random.seed(GLOBAL_SEED)
    torch.manual_seed(GLOBAL_SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.cuda.manual_seed(GLOBAL_SEED)
    torch.cuda.manual_seed_all(GLOBAL_SEED) #not on GPU but still setting
    torch.use_deterministic_algorithms(mode=True) #uses algorithms (where possible) that are deterministic
    os.environ['PYTHONHASHSEED'] = str(seed)

def _init_fn(worker_id): #for dataloader's worker -- ensure fetching of data is deterministic if applicable (need to call by referencing as an arg in dataloader)
    np.random.seed(int(GLOBAL_SEED))

