<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0"> </div>
    <div style="float: left; margin-left: 10px;"> <h1>Machine Learning with PyTorch for Developers</h1>
<h1>Machine Learning Overview</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [1]:
from collections import Counter
from pprint import pprint

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

import torch
from torch import nn
import torch.nn.functional as F

import watermark

%load_ext watermark
%matplotlib inline

We start by print out the versions of the libraries we're using for future reference

In [2]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.13.2
IPython version      : 9.0.0

Compiler    : Clang 16.0.0 (clang-1600.0.26.6)
OS          : Darwin
Release     : 24.3.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: 9fc8d0140a8bc745a378db69347b7f8d00e6f884

matplotlib: 3.10.1
watermark : 2.5.0
torch     : 2.6.0
pandas    : 2.2.3
numpy     : 2.2.3



Load default figure style

In [3]:
plt.style.use('d4sci.mplstyle')

# Tensors

## Creation
Tensors can be created in a number of ways:

In [4]:
# From Python list
list_tensor = torch.tensor([[1, 2], [3, 4]])

# Factory methods
range_tensor = torch.arange(0, 10, step=2)
linspace_tensor = torch.linspace(0, 10, steps=5)

We can also create tensors directly from numpy arrays:

In [5]:
numpy_tensor = torch.from_numpy(np.array([[1, 2], [3, 4]]))

However, `torch.from_numpy` shares memory with the underlying NumPy array, so in-place operations on one side will affect the other (if they are on the CPU).

## Arithmetic and Indexing

Arithmetic operations are as straightforward as in numpy:

In [6]:
x = torch.tensor([1.0, 2.0, 3.0])
y = torch.tensor([3.0, 2.0, 1.0])

sum_xy = x + y 
print("Element-wise addition: ", sum_xy)

diff_xy = x - y
print("Element-wise subtraction: ", diff_xy)

mul_xy = x * y
print("Element-wise multiplication: ", mul_xy)

div_xy = x / y
print("Element-wise division: ", div_xy)

mm_xy = torch.matmul(x, y)
print("Matrix multiplication: ", mm_xy)

Element-wise addition:  tensor([4., 4., 4.])
Element-wise subtraction:  tensor([-2.,  0.,  2.])
Element-wise multiplication:  tensor([3., 4., 3.])
Element-wise division:  tensor([0.3333, 1.0000, 3.0000])
Matrix multiplication:  tensor(10.)


As as indexing and slicing. Let us start by defining a 2D tensor

In [7]:
M = torch.tensor([[1, 2, 3], 
                  [4, 5, 6]])

We can easily access the first element

In [8]:
first_element = M[0, 0]
print(first_element)


tensor(1)


As well as any subset of elements

In [9]:
first_row = M[0, :]
print(first_row)

tensor([1, 2, 3])


In [10]:
last_column = M[:, -1]
print(last_column)

tensor([3, 6])


Transpose

In [11]:
print(f"Original:\n{M}\n\nTranspose:\n{M.t()}")

Original:
tensor([[1, 2, 3],
        [4, 5, 6]])

Transpose:
tensor([[1, 4],
        [2, 5],
        [3, 6]])


## Reshaping

Reshaping can be done as a `view` (for contiguous storage) or a `reshape` (generic) operations:

In [12]:
print(f"M:{M}\n")

M_flat = M.view(-1)
print(f"M_flat:{M_flat}\n")

M_reshaped = M.view(3, 2)
print(f"M_reshaped:{M_reshaped}\n")

M_reshaped_alt = M.reshape(3, 2)
print(f"M_reshaped_alt:{M_reshaped_alt}\n")


M:tensor([[1, 2, 3],
        [4, 5, 6]])

M_flat:tensor([1, 2, 3, 4, 5, 6])

M_reshaped:tensor([[1, 2],
        [3, 4],
        [5, 6]])

M_reshaped_alt:tensor([[1, 2],
        [3, 4],
        [5, 6]])



## Linear Algebra

Let us defined some toy matrices

In [13]:
A = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
B = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)

Matrix multiplication comes in two flavors

In [14]:
C = A @ B
print(f"C={C}")

C_alt = torch.matmul(A, B)
print(f"\nC_alt={C_alt}")


C=tensor([[19., 22.],
        [43., 50.]])

C_alt=tensor([[19., 22.],
        [43., 50.]])


Determinant

In [15]:
det_A = torch.linalg.det(A)
print(det_A)

tensor(-2.)


Inverse

In [16]:
inv_A = torch.linalg.inv(A)

Eigenvalues and singular values

In [17]:
eigenvals, eigenvecs = torch.linalg.eig(A)
print(f"Eigenvalues: {eigenvals}")
print(f"Eigenvectors: {eigenvecs}")

U, S, Wt = torch.linalg.svd(A)
print(f"\nU: {U}\nS: {S}\nWt: {Wt}")

Eigenvalues: tensor([-0.3723+0.j,  5.3723+0.j])
Eigenvectors: tensor([[-0.8246+0.j, -0.4160+0.j],
        [ 0.5658+0.j, -0.9094+0.j]])

U: tensor([[-0.4046, -0.9145],
        [-0.9145,  0.4046]])
S: tensor([5.4650, 0.3660])
Wt: tensor([[-0.5760, -0.8174],
        [ 0.8174, -0.5760]])


### Batch Operations

PyTorch’s linear algebra routines have the native ability to batch operations. When you provide a 3D (or higher-dimensional) Tensor, PyTorch will treat the first dimention as being the batch and perform the operation across each sub-matrix.

If A is of shape (batch_size, m, n) and B is of shape (batch_size, n, p), then A @ B yields a Tensor of shape (batch_size, m, p).

In [18]:
A_batched = torch.randn((3, 5, 7)) # 3 batches of 5x7 matrices
B_batched = torch.randn((3, 7, 3)) # 3 batches of 7x3 matrices
C_batched = A_batched @ B_batched
print(f"C_batched: {C_batched}") # 3 batches of 5x3 matrices

C_batched: tensor([[[ 2.5735, -6.5772, -2.0316],
         [ 0.5111,  0.0955, -0.6558],
         [-1.4524,  2.4596,  1.3092],
         [-0.6378, -0.7847, -0.2282],
         [ 1.3726, -2.0131, -0.0568]],

        [[ 1.3394,  1.3047, -0.1981],
         [ 5.0303,  9.2996, -0.5089],
         [-3.9663,  0.2711, -2.7120],
         [-3.8638, -2.1184, -0.6659],
         [-2.3358,  2.2634, -1.0877]],

        [[ 1.4385,  5.0728, -3.2049],
         [-1.5706, -1.7867,  1.0062],
         [-8.1230, -1.4168, -5.4197],
         [ 3.0272,  4.4971, -2.1803],
         [-2.2893,  0.9117, -3.6431]]])


## Hardware acceleration

PyTorch supports GPUs and MPS devices straight out of the box:

In [19]:
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu") # Uncomment this line if you have a MPS device
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Uncomment this line if you have a GPU

Tensors can be created directly within a device

In [20]:
x_mps = torch.randn((1000, 1000), device=device)
y_mps = torch.randn((1000, 1000), device=device)
z_mps = x_mps @ y_mps

Confirm that we used the right device

In [21]:
z_mps.device

device(type='mps', index=0)

Or transfered between devices:

In [22]:
z_cpu = z_mps.to("cpu") # Move tensor to CPU
print(z_cpu.device)

cpu


## Automatic Differentiation

We start by defining a matrix and a vector with grad tracking enabled

In [23]:
A = torch.tensor([[3.0, 1.0],
                  [2.0, 4.0]], requires_grad=True)
x = torch.tensor([5.0, 6.0], requires_grad=True)

Perform some operations tand obtain a scalar output

In [24]:
y = A @ x             
z = y.sum()
print(z)

tensor(55., grad_fn=<SumBackward0>)


Compute the gradients

In [25]:
z.backward()

The gradient with respect to each of the variables is stored within those variables:

In [26]:
print(f"\nGradient of z with respect to x: {x.grad}")
print(f"\nGradient of z with respect to A: {A.grad}")


Gradient of z with respect to x: tensor([5., 5.])

Gradient of z with respect to A: tensor([[5., 6.],
        [5., 6.]])


<center>
     <img src="data/D4Sci_logo_full.png" alt="Data For Science, Inc" border="0" width=300px> 
</center>