torch - a DL library for manipulation of tensors [multi-dimensional arrays]. \
supports 13 different types - float32, float16, bfloat16(higher exponent), float64
complex : 32,64,128 bits, int : int8, uint8, int16, int32, int64 and bool

Tensors of different types are represented by different classes - torch.FloatTensor (for float32), torch.LongTensor(int64), torch.ByteTensor(uint8)

In [87]:
import torch
import numpy as np
a = torch.FloatTensor(3,2) # calling the constructor
a

tensor([[       nan, 1.4013e-45],
        [1.3593e-43, 0.0000e+00],
        [4.8354e-19, 0.0000e+00]])

In [88]:
a = torch.zeros(3,2) # torch.FloatTensor(3,2) initializes with zeros but in the previous versions, it kept the tensor uninitialized.

In [89]:
#alternative approach
a = torch.FloatTensor(3,2)
a.zero_()
a

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

Two types of operations on tensors : inplace and functional \
Inplace operations have underscore appended to their name and operate on the tensor's content. The functional equivalent creates a copy. \
Inplace - more efficient and does not require extra memory but might lead to hidden bugs.

In [90]:
# tensor from python iterable like list, tuple

a = torch.FloatTensor([[1,2],[3,4],[5,6]])
a

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

In [91]:
n = np.zeros(shape = (3,2))
n.shape , n.dtype

((3, 2), dtype('float64'))

In [92]:
b = torch.tensor(n)

In [93]:
b.shape, b.dtype

(torch.Size([3, 2]), torch.float64)

Usually in deep learning, float64 is too much memory overhead. float32 or float16 is enough.

to convert from numpy array to torch tensor, torch.from_numpy was used but is now deprecated and torch.tensor() is encouraged with torch datatypes available as well

In [94]:
n = np.zeros(shape=(3,2))
print(n.shape, n.dtype)

(3, 2) float64


In [95]:
t = torch.tensor(n, dtype=torch.float32)
print(t.shape, t.dtype)

torch.Size([3, 2]) torch.float32


In [96]:
#Scalar tensor - Now, zero-dimensional tensors are natively supported and returned by the appropriate functions
a = torch.tensor([1,2,3])
s = a.sum()
print(s.shape)


torch.Size([])


In [97]:
print(s.item())

6


GPU tensors:
Pytorch supports CUDA GPUs. it has two versions - CPU and GPU.where to process the tensors depends on the tensor configuration. GPU tensors reside in the torch.cuda class instead of the torch package. So, the tensor is torch.cuda.FloatTensor instead of torch.FloatTensor. \
Under the hood, there is no CPU, GPU. there is a backend, which is an abstract computation device with memory. it could be CUDA, CPU or Apple Metal performance Shader given by mps

In [98]:
a = torch.Tensor([1,2,3,4])
print(a.shape, a.dtype)

torch.Size([4]) torch.float32


In [99]:
c = a.to('cpu') # tensor copied to Apple's MPS
c

tensor([1., 2., 3., 4.])

device = 'mps:0' refers to the fact that the computation device in use for tensor c is mps and it uses the first card. if there are multiple cards, we could have mps:1 as well.

In [100]:
a+1

tensor([2., 3., 4., 5.])

In [101]:
c+1

tensor([2., 3., 4., 5.])

In [102]:
c.device

device(type='cpu')

Gradient calculation methods :
Static graph method : define your calculations in advance and cannot be changed later. graph is optimized by the dl library like tensorflow/theano and many other DL toolkits
Dynamic graph method : As you apply transformations on the data, the dl library will keep track of the computations and when requested will compute the gradients , accumulating the gradients of the network parameters.

From version 2.0, pytorch has torch.compile() method which speeds up pytorch code by using JIT (just in time) compiling into optimized kernels.

Gradients :
Pytorch has useful fields related to gradient computation of tensors like \
grad : A property that holds a tensor of the same shape containing computed gradients.
is_leaf : true if created by user, false if part of function transformation
requires_grads : Equals true if the tensor requires gradients to be calculated.

In [103]:
v1 = torch.tensor([1.,2.,3.,4.], requires_grad=True)
v2 = torch.tensor([5.,6.,7.,8.]) # by default requires_grad = False


In [104]:
v3 = v1 + v2
v_res = (v3 * 2).sum()

In [105]:
v1.is_leaf, v2.is_leaf, v3.is_leaf, v_res.is_leaf

(True, True, False, False)

In [106]:
v1.requires_grad, v2.requires_grad, v3.requires_grad, v_res.requires_grad

(True, False, True, True)

gradients are preserved only for the leaf nodes for memory reasons. So, if we want to gradients to be preserved for non-leaf nodes, retain_grad() method is to be called

In [107]:
v_res.backward()

In [108]:
v1.grad

tensor([2., 2., 2., 2.])

In [109]:
v2.grad

In [110]:
v3.grad

  v3.grad


torch.nn has many useful building blocks used in neural networks.
it is designed to be callable. create an instance of the class and then pass input to it just like function is called

In [111]:
import torch.nn as nn
l = nn.Linear(2,5)
v = torch.FloatTensor([1.,2.])
l(v)


tensor([-0.8457, -1.9337,  0.0750,  0.5338, -1.7690], grad_fn=<ViewBackward0>)

Some useful methods of nn.Module Base class from which torch.nn packages inherit and to create custom NN blocks, we use this class :
parameters() - return all module parameters(weights) \
to(device) - transfer module parameters to cpu or gpu \
zero_grad() - zero out the module gradients \
state_dict() - returns the dictionary with all module parameters and is useful for model serialization
load_state_dict() - loads the module with dictionary of model parameters



In [112]:
#example
s= nn.Sequential(
    nn.Linear(2,5),
    nn.ReLU(),
    nn.Linear(5,20),
    nn.ReLU(),
    nn.Linear(20,10),
    nn.Dropout(p=0.3),
    nn.Softmax(dim=1)
)
s

Sequential(
  (0): Linear(in_features=2, out_features=5, bias=True)
  (1): ReLU()
  (2): Linear(in_features=5, out_features=20, bias=True)
  (3): ReLU()
  (4): Linear(in_features=20, out_features=10, bias=True)
  (5): Dropout(p=0.3, inplace=False)
  (6): Softmax(dim=1)
)

In [113]:
s(torch.FloatTensor([[1.,2.]]))

tensor([[0.1293, 0.0451, 0.0826, 0.1143, 0.0882, 0.1394, 0.1208, 0.1057, 0.1143,
         0.0603]], grad_fn=<SoftmaxBackward0>)

In [114]:
# creating custom nn layer from nn.Module
class OurModule(nn.Module):
    def __init__(self, num_inputs, num_classes, dropout = 0.3):
        super(OurModule, self).__init__()
        self.pipe = nn.Sequential(
            nn.Linear(num_inputs, 5),
            nn.ReLU(),
            nn.Linear(5,20),
            nn.ReLU(),
            nn.Linear(20, num_classes),
            nn.Dropout(p=dropout),
            nn.Softmax(dim=1)
        )
    def forward(self, x):
        return self.pipe(x)

To write our custom layer, we have to inherit from the base nn.Module and override the forward API

In [115]:
net = OurModule(num_inputs=2, num_classes=3)
print(net)

OurModule(
  (pipe): Sequential(
    (0): Linear(in_features=2, out_features=5, bias=True)
    (1): ReLU()
    (2): Linear(in_features=5, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
    (5): Dropout(p=0.3, inplace=False)
    (6): Softmax(dim=1)
  )
)


In [116]:
v = torch.FloatTensor([[2.,3.]])
out = net(v)

In [117]:
out

tensor([[0.2355, 0.1142, 0.6503]], grad_fn=<SoftmaxBackward0>)

In [118]:
print("mps availability : ", torch.mps.is_available())

mps availability :  False


In [119]:
out.to('cpu')

tensor([[0.2355, 0.1142, 0.6503]], grad_fn=<SoftmaxBackward0>)

In [120]:
#Tensorboard for visualization of neural network metrics
import math
from torch.utils.tensorboard.writer import SummaryWriter

funs = {"sin":math.sin, "cos":math.cos, "tan":math.tan}
writer = SummaryWriter()
for angle in range(-360,360):
    angle_rad = angle * math.pi / 180
    for name, fun in funs.items():
        val = fun(angle_rad)
        writer.add_scalar(name, val, angle)
writer.close()



Generative Adversial networks on Atari images :
GANs are neural networks which are used to generate images

In [10]:
import gymnasium as gym
import cv2

In [11]:
IMAGE_SIZE = 64

In [12]:
class InputWrapper(gym.ObservationWrapper):
  def __init__(self,*args):
    super(InputWrapper,self).__init__(*args)
    old_space = self.observation_space
    assert isinstance(old_space, gym.spaces.Box)
    self.observation_space = gym.spaces.Box(self.observation(old_space.low), self.observation(old_space.high), dtype=np.float32)
  def observation(self, observation):
    new_obs = cv2.resize(observation,(IMAGE_SIZE,IMAGE_SIZE))
    # transpose from (w,h,c) -> (c,w,h)
    new_obs = np.moveaxis(new_obs,2,0)
    return new_obs.astype(np.float32)



In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
import random

In [125]:
LATENT_VECTOR_SIZE = 100
DISCR_FILTERS = 64
GENER_FILTERS = 64

let us now code up the generator and discriminator networks

In [14]:
# Disciminator : outputs a probability value that gives a measure of fakeness of the image produced by generator network
class Discriminator(nn.Module):
  def __init__(self,input_shape):
    super(Discriminator, self).__init__()
    self.pipe = nn.Sequential(
        nn.Conv2d(input_shape[0], DISCR_FILTERS, 4, 2, 1),
        nn.ReLU(),
        nn.Conv2d(DISCR_FILTERS, DISCR_FILTERS*2, 4, 2, 1),
        nn.BatchNorm2d(DISCR_FILTERS*2),
        nn.ReLU(),
        nn.Conv2d(DISCR_FILTERS*2, DISCR_FILTERS*4, 4, 2, 1),
        nn.BatchNorm2d(DISCR_FILTERS*4),
        nn.ReLU(),
        nn.Conv2d(DISCR_FILTERS*4, DISCR_FILTERS*8, 4, 2, 1),
        nn.BatchNorm2d(DISCR_FILTERS*8),
        nn.ReLU(),
        nn.Conv2d(DISCR_FILTERS*8, 1, 4, 1, 0),
        nn.Sigmoid()
    )
  def forward(self, x):
    conv_out =  self.pipe(x)
    return conv_out.view(-1,1).squeeze(dim=1)



In [15]:
class Generator(nn.Module):
  def __init__(self, output_shape):
    super(Generator,self).__init__()
    self.pipe = nn.Sequential(
        nn.ConvTranspose2d(LATENT_VECTOR_SIZE,GENER_FILTERS*8, 4,1,0),
        nn.BatchNorm2d(GENER_FILTERS*8),
        nn.ReLU(),
        nn.ConvTranspose2d(GENER_FILTERS*8,GENER_FILTERS*4, 4,2,1),
        nn.BatchNorm2d(GENER_FILTERS*4),
        nn.ReLU(),
        nn.ConvTranspose2d(GENER_FILTERS*4,GENER_FILTERS*2,4,2,1),
        nn.BatchNorm2d(GENER_FILTERS*2),
        nn.ReLU(),
        nn.ConvTranspose2d(GENER_FILTERS*2,GENER_FILTERS,4,2,1),
        nn.BatchNorm2d(GENER_FILTERS),
        nn.ReLU(),
        nn.ConvTranspose2d(GENER_FILTERS,output_shape[0], 4,2,1),
        nn.Tanh()
    )
  def forward(self,x):
    return self.pipe(x)

In [27]:
def iterate_batches(envs, batch_size):
  batch = [e.reset()[0] for e in envs] # extract the environment after resetting
  env_gen = iter(lambda : random.choice(envs),None) # returns an env from list of envs till None is returned
  while True:
    e = next(env_gen)
    action = e.action_space.sample() # random action, we are more interested in extracting images from the envs
    obs, reward, is_done,is_trunc,_ = e.step(action)
    if np.mean(obs) > 0.01 : # to fix the glitch in the env
        batch.append(obs)
    if len(batch) == batch_size:
      batch_np = np.array(batch, dtype=np.float32)
      yield torch.tensor(batch_np * 2.0 / 255. - 1.)
      batch.clear()
    if is_done or is_trunc:
      e.reset()



In [16]:
device = torch.device('cuda')

In [17]:
print(gym.envs.registry.keys())


dict_keys(['CartPole-v0', 'CartPole-v1', 'MountainCar-v0', 'MountainCarContinuous-v0', 'Pendulum-v1', 'Acrobot-v1', 'phys2d/CartPole-v0', 'phys2d/CartPole-v1', 'phys2d/Pendulum-v0', 'LunarLander-v3', 'LunarLanderContinuous-v3', 'BipedalWalker-v3', 'BipedalWalkerHardcore-v3', 'CarRacing-v3', 'Blackjack-v1', 'FrozenLake-v1', 'FrozenLake8x8-v1', 'CliffWalking-v0', 'Taxi-v3', 'tabular/Blackjack-v0', 'tabular/CliffWalking-v0', 'Reacher-v2', 'Reacher-v4', 'Reacher-v5', 'Pusher-v2', 'Pusher-v4', 'Pusher-v5', 'InvertedPendulum-v2', 'InvertedPendulum-v4', 'InvertedPendulum-v5', 'InvertedDoublePendulum-v2', 'InvertedDoublePendulum-v4', 'InvertedDoublePendulum-v5', 'HalfCheetah-v2', 'HalfCheetah-v3', 'HalfCheetah-v4', 'HalfCheetah-v5', 'Hopper-v2', 'Hopper-v3', 'Hopper-v4', 'Hopper-v5', 'Swimmer-v2', 'Swimmer-v3', 'Swimmer-v4', 'Swimmer-v5', 'Walker2d-v2', 'Walker2d-v3', 'Walker2d-v4', 'Walker2d-v5', 'Ant-v2', 'Ant-v3', 'Ant-v4', 'Ant-v5', 'Humanoid-v2', 'Humanoid-v3', 'Humanoid-v4', 'Humanoid-v5

In [131]:
!pip install gymnasium[atari]



In [132]:
!pip install "gymnasium[accept-rom-license, atari]"



In [18]:
from gymnasium import spaces

In [8]:
import ale_py


In [19]:
envs = [InputWrapper(gym.make(name)) for name in ('ALE/Breakout-v5', 'ALE/AirRaid-v5', 'ALE/Pong-v5')]

In [20]:
input_shape = envs[0].observation_space.shape
net_discr = Discriminator(input_shape = input_shape).to(device)
net_gener = Generator(output_shape = input_shape).to(device)



In [137]:
LEARNING_RATE = 0.0001
REPORT_EVERY_ITER = 100
SAVE_IMAGE_EVERY_ITER = 1000
BATCH_SIZE = 16

In [21]:
objective = nn.BCELoss()
gen_optimizer = optim.Adam(params=net_gener.parameters(), lr=LEARNING_RATE, betas=(0.5, 0.999))
dis_optimizer = optim.Adam(params=net_discr.parameters(), lr=LEARNING_RATE, betas=(0.5, 0.999))

In [22]:
true_labels = torch.ones(BATCH_SIZE, device=device)
false_labels= torch.ones(BATCH_SIZE,device=device)


In [140]:
 writer = SummaryWriter()
 gen_losses = []
 disc_losses = []


In [7]:
import logging
import time
import torchvision.utils as vutils
# Get the root logger of Gymnasium
gym_logger = logging.getLogger('gymnasium')

# Set the desired log level (e.g., INFO, DEBUG, WARNING, ERROR, CRITICAL)
gym_logger.setLevel(logging.INFO)

In [142]:
iter_no = 0
ts_start = time.time()
for batch in iterate_batches(envs, BATCH_SIZE):
  gen_input = torch.FloatTensor(BATCH_SIZE, LATENT_VECTOR_SIZE, 1,1)
  gen_input.normal_(0,1)
  gen_input = gen_input.to(device)
  batch = batch.to(device)
  generator_output = net_gener(gen_input)
  dis_optimizer.zero_grad()
  dis_output_true = net_discr(batch)
  dis_output_false = net_discr(generator_output.detach())
  dis_loss = objective(dis_output_true, true_labels) + objective(dis_output_false, false_labels)
  dis_loss.backward()
  dis_optimizer.step()
  disc_losses.append(dis_loss.item())

  gen_optimizer.zero_grad()
  dis_output = net_discr(generator_output)
  gen_loss = objective(dis_output, true_labels)
  gen_loss.backward()
  gen_optimizer.step()
  gen_losses.append(gen_loss.item())

  iter_no += 1
  if iter_no % REPORT_EVERY_ITER == 0:
    dt = time.time() - ts_start
    ts_start = time.time()
    gym_logger.info("Iter %d in %.2fs: gen_loss=%.3e, dis_loss=%.3e",
                     iter_no, dt, np.mean(gen_losses), np.mean(disc_losses))
    writer.add_scalar("gen_loss", np.mean(gen_losses), iter_no)
    writer.add_scalar("dis_loss", np.mean(disc_losses), iter_no)
    gen_losses = []
    disc_losses = []
    if iter_no % SAVE_IMAGE_EVERY_ITER == 0:
       img = vutils.make_grid(generator_output.data[:64], normalize=True)
       writer.add_image("fake", img, iter_no)
       img = vutils.make_grid(batch.data[:64], normalize=True)
       writer.add_image("real", img, iter_no)

  if iter_no >= 20000:
    break








INFO:gymnasium:Iter 100 in 14.53s: gen_loss=1.974e-03, dis_loss=1.676e-02
INFO:gymnasium:Iter 200 in 8.47s: gen_loss=4.776e-05, dis_loss=1.907e-04
INFO:gymnasium:Iter 300 in 4.50s: gen_loss=1.902e-05, dis_loss=6.179e-05
INFO:gymnasium:Iter 400 in 3.66s: gen_loss=1.119e-05, dis_loss=3.657e-05
INFO:gymnasium:Iter 500 in 3.65s: gen_loss=7.471e-06, dis_loss=2.698e-05
INFO:gymnasium:Iter 600 in 3.35s: gen_loss=5.444e-06, dis_loss=1.879e-05
INFO:gymnasium:Iter 700 in 3.34s: gen_loss=4.124e-06, dis_loss=1.587e-05
INFO:gymnasium:Iter 800 in 3.84s: gen_loss=3.279e-06, dis_loss=1.042e-05
INFO:gymnasium:Iter 900 in 3.61s: gen_loss=2.697e-06, dis_loss=1.073e-05
INFO:gymnasium:Iter 1000 in 3.35s: gen_loss=2.210e-06, dis_loss=9.186e-06
INFO:gymnasium:Iter 1100 in 3.42s: gen_loss=1.861e-06, dis_loss=7.985e-06
INFO:gymnasium:Iter 1200 in 3.96s: gen_loss=1.587e-06, dis_loss=5.805e-06
INFO:gymnasium:Iter 1300 in 3.36s: gen_loss=1.375e-06, dis_loss=4.938e-06
INFO:gymnasium:Iter 1400 in 3.34s: gen_loss=1.

For some mundane deep learning tasks, pytorch might be too low level. we may be benefitted by a high level wrapper. There are several libraries which offer open source , faster implementations of common pytorch tasks like ptlearn, fastai and ignite.

Ignite simplifies writing training loops in pytorch among other task simplifications

In [5]:
!pip install pytorch-ignite


Collecting pytorch-ignite
  Downloading pytorch_ignite-0.5.1-py3-none-any.whl.metadata (27 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3,>=1.3->pytorch-ignite)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch<3,>=1.3->pytorch-ignite)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch<3,>=1.3->pytorch-ignite)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch<3,>=1.3->pytorch-ignite)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch<3,>=1.3->pytorch-ignite)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-

In [6]:
#GAN training on Atari using Ignite

#includes
import cv2
import random
import torchvision.utils as vutils
import gymnasium as gym
from gymnasium import spaces

import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
from ignite.engine import Engine, Events
from ignite.handlers import Timer
from ignite.metrics import RunningAverage
from ignite.contrib.handlers import tensorboard_logger as tb_logger


  from torch.distributed.optim import ZeroRedundancyOptimizer


In [3]:
LATENT_VECTOR_SIZE = 100
DISCR_FILTERS = 64
GENER_FILTERS = 64
BATCH_SIZE = 16

In [4]:
LEARNING_RATE = 0.0001
REPORT_EVERY_ITER = 100
SAVE_IMAGE_EVERY_ITER = 1000

In [23]:
def process_batch(trainer, batch):
  gen_input = torch.FloatTensor(BATCH_SIZE,LATENT_SIZE,1,1)
  gen_input.normal_(0,1)
  gen_input = gen_input.to(device)
  batch = batcn.to(device)
  gen_output = net_gener(gen_input)

  dis_optimizer.zero_grad()
  dis_output_true = net_discr(batch)
  dis_output_false = net_discr(gen_output.detach())
  dis_loss = objective(dis_output_true, true_labels) + objective(dis_output_false, false_labels)
  dis_loss.backward()
  dis_optimizer.step()

  gen_optimizer.zero_grad()
  dis_output = net_discr(gen_output)
  gen_loss = objective(dis_output, true_labels)
  gen_loss.backward()
  gen_optimizer.step()

  if trainer.state.iteration % SAVE_IMAGE_EVERY_ITER == 0:
            fake_img = vutils.make_grid(gen_output_v.data[:64], normalize=True)
            trainer.tb.writer.add_image("fake", fake_img, trainer.state.iteration)
            real_img = vutils.make_grid(batch_v.data[:64], normalize=True)
            trainer.tb.writer.add_image("real", real_img, trainer.state.iteration)
            trainer.tb.writer.flush()
  return dis_loss.item(), gen_loss.item()





In [31]:
LATENT_VECTOR_SIZE = 100
DISCR_FILTERS = 64
GENER_FILTERS = 64
BATCH_SIZE = 16

# dimension input image will be rescaled
IMAGE_SIZE = 64

LEARNING_RATE = 0.0001
REPORT_EVERY_ITER = 100
SAVE_IMAGE_EVERY_ITER = 1000

In [25]:
engine = Engine(process_batch) # ignite engine
tb = tb_logger.TensorboardLogger(log_dir=None)
engine.tb = tb
RunningAverage(output_transform=lambda x: x[0]).attach(engine,"dis_loss")
RunningAverage(output_transform = lambda x : x[1]).attach(engine,"gen_loss")

handler = tb_logger.OutputHandler(tag="train", metric_names=['gen_loss', 'dis_loss'])
tb.attach(engine, log_handler=handler, event_name=Events.ITERATION_COMPLETED)
timer = Timer()
timer.attach(engine)

@engine.on(Events.ITERATION_COMPLETED)
def log_losses(trainer):
        if trainer.state.iteration % REPORT_EVERY_ITER == 0:
            log.info("%d in %.2fs: gen_loss=%f, dis_loss=%f",
                     trainer.state.iteration, timer.value(),
                     trainer.state.metrics['avg_loss_gen'],
                     trainer.state.metrics['avg_loss_dis'])
            timer.reset()

In [32]:
engine.run(data=iterate_batches(envs,BATCH_SIZE))