# Torch basics

> Author : Badr TAJINI - Machine Learning 2 & Deep learning - ECE 2025-2026

---

In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import torch
import numpy as np

In [2]:
torch.__version__

'2.10.0+cu128'

Largely inspired from the tutorial [What is PyTorch?](https://pytorch.org/tutorials/beginner/former_torchies/tensor_tutorial.html)

Tensors are used to encode the signal to process, but also the internal states and parameters of models.

Manipulating data through this constrained structure allows to use CPUs and GPUs at peak performance.



## Tensors

Construct a 3x5 matrix, uninitialized:

In [3]:
# Sets the default floating point dtype. 
# This type will be used as default floating point type for type inference in torch.tensor().
torch.set_default_tensor_type('torch.FloatTensor') 

  _C._set_default_tensor_type(t)


In [4]:
x = torch.empty(3,5)
print(x.type())
print(x)

torch.FloatTensor
tensor([[8.1995e-25, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]])


In [5]:
x = torch.randn(3,5)
print(x)

tensor([[-0.4874, -0.3681, -0.1959,  1.7144,  1.4593],
        [-0.1928, -0.4261,  0.0498, -0.4387, -0.0411],
        [-0.0722, -0.5162, -1.3037, -0.6263, -0.8440]])


In [6]:
print(x.size())

torch.Size([3, 5])


torch.Size is in fact a [tuple](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences), so it supports the same operations.

In [7]:
x.size()[1]

5

In [8]:
x.size() == (3,5)

True

Importance of the brackets when defining a tensor

In [9]:
a = torch.tensor([2.0])
print(type(a))
print(a.dtype)
print(a.size())

b = torch.tensor(2.0)
print(type(b))
print(b.dtype)
print(b.size())

<class 'torch.Tensor'>
torch.float32
torch.Size([1])
<class 'torch.Tensor'>
torch.float32
torch.Size([])


Select some columns

In [10]:
cols = torch.zeros(5, dtype=torch.bool)
print(cols)
cols[1] = True
cols[4] = True
print(cols)
c = x[:, cols]  # selects all rows, 4th column and  5th column from a
print(c)

tensor([False, False, False, False, False])
tensor([False,  True, False, False,  True])
tensor([[-0.3681,  1.4593],
        [-0.4261, -0.0411],
        [-0.5162, -0.8440]])


All operations on the tensor that operate in-place on it will have an _ postfix. 

In [11]:
# x will be filled with the value 3.5
x.fill_(3.5)
print(x)

tensor([[3.5000, 3.5000, 3.5000, 3.5000, 3.5000],
        [3.5000, 3.5000, 3.5000, 3.5000, 3.5000],
        [3.5000, 3.5000, 3.5000, 3.5000, 3.5000]])


## Bridge to numpy

In [12]:
y = x.numpy()
print(y)

[[3.5 3.5 3.5 3.5 3.5]
 [3.5 3.5 3.5 3.5 3.5]
 [3.5 3.5 3.5 3.5 3.5]]


In [13]:
a = np.ones(5)
b = torch.from_numpy(a)
print(b)
 

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


In [14]:
xr = torch.randn(3, 5)
a = np.ones(5).astype(int)
b = torch.from_numpy(a)
print(xr)
print(b)

tensor([[-0.6261, -0.7982,  0.8510,  1.5492,  0.6239],
        [-0.3400, -0.9991,  0.9205,  0.8365, -1.3516],
        [ 0.4258, -1.0592,  0.3294, -0.0551, -1.5634]])
tensor([1, 1, 1, 1, 1])


### Question: print the type of the content (data) of variables a, b and xr

In [15]:
print("Type de a :", type(a))
print("Type de b :", type(b))
print("Type de xr :", type(xr))

Type de a : <class 'numpy.ndarray'>
Type de b : <class 'torch.Tensor'>
Type de xr : <class 'torch.Tensor'>


## Operations

There are multiple syntaxes for operations. In the following
example, we will take a look at the addition operation.

Addition: syntax 1

In [16]:
x = torch.rand(5, 3) 
y = torch.rand(5, 3)
print(x + y)

tensor([[1.8068, 0.9764, 1.5073],
        [0.8803, 0.7703, 0.4453],
        [1.2583, 1.3808, 0.7862],
        [1.7002, 0.8118, 0.3111],
        [0.9016, 1.3798, 0.8313]])


Addition: syntax 2

In [17]:
print(torch.add(x, y))

tensor([[1.8068, 0.9764, 1.5073],
        [0.8803, 0.7703, 0.4453],
        [1.2583, 1.3808, 0.7862],
        [1.7002, 0.8118, 0.3111],
        [0.9016, 1.3798, 0.8313]])


Addition: providing an output tensor as argument

In [18]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

tensor([[1.8068, 0.9764, 1.5073],
        [0.8803, 0.7703, 0.4453],
        [1.2583, 1.3808, 0.7862],
        [1.7002, 0.8118, 0.3111],
        [0.9016, 1.3798, 0.8313]])


Addition: in-place

In [19]:
# adds x to y
y.add_(x)
print(y)

tensor([[1.8068, 0.9764, 1.5073],
        [0.8803, 0.7703, 0.4453],
        [1.2583, 1.3808, 0.7862],
        [1.7002, 0.8118, 0.3111],
        [0.9016, 1.3798, 0.8313]])


**Note:** Any operation that mutates a tensor in-place is post-fixed with an ``_``.
    For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.



Any operation that mutates a tensor in-place is post-fixed with an ```_```

For example: ```x.copy_(y)```, ```x.t_()```, will change ```x```.

In [20]:
print(x.t())

tensor([[0.9058, 0.3686, 0.5222, 0.9737, 0.1741],
        [0.1328, 0.4908, 0.9151, 0.4332, 0.9029],
        [0.5908, 0.2709, 0.5133, 0.2264, 0.7994]])


In [21]:
print(x)

tensor([[0.9058, 0.1328, 0.5908],
        [0.3686, 0.4908, 0.2709],
        [0.5222, 0.9151, 0.5133],
        [0.9737, 0.4332, 0.2264],
        [0.1741, 0.9029, 0.7994]])


In [22]:
x.t_()
print(x)

tensor([[0.9058, 0.3686, 0.5222, 0.9737, 0.1741],
        [0.1328, 0.4908, 0.9151, 0.4332, 0.9029],
        [0.5908, 0.2709, 0.5133, 0.2264, 0.7994]])


You can use standard NumPy-like indexing with all bells and whistles!

In [23]:
print(x[:, 1])

tensor([0.3686, 0.4908, 0.2709])


Resizing (very useful): If you want to resize/reshape tensor, you can use ``torch.view``:

In [24]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


If you have a one element tensor, use ``.item()`` to get the value as a
Python number

In [25]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([-1.6367])
-1.6367301940917969


**Read later:**


  100+ Tensor operations, including transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers, etc.,
  are described
  [here](https://pytorch.org/docs/torch).

## 3D Tensors

### Question: What is the size of the following tensor?

In [26]:
y = torch.tensor([
     [
       [1, 2, 3],
       [4, 5, 6]
     ],
     [
       [1, 2, 3],
       [4, 5, 6]
     ],
     [
       [1, 2, 3],
       [4, 5, 6]
     ]
   ])
print(y)

tensor([[[1, 2, 3],
         [4, 5, 6]],

        [[1, 2, 3],
         [4, 5, 6]],

        [[1, 2, 3],
         [4, 5, 6]]])


In [27]:
print(y.shape)

torch.Size([3, 2, 3])


### Question: Explain the result of the next cell

In [28]:
torch.sum(y, dim=0)

tensor([[ 3,  6,  9],
        [12, 15, 18]])

## Broadcasting semantics

In short, if a PyTorch operation supports broadcast, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data).

Two tensors are “broadcastable” if the following rules hold:

*   Each tensor has at least one dimension.
*   When iterating over the dimension sizes, **starting at the trailing dimension**, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.

If two tensors x, y are “broadcastable”, the resulting tensor size is calculated as follows:
* If the number of dimensions of x and y are not equal, prepend 1 to the dimensions of the tensor with fewer dimensions to make them equal length.
* Then, for each dimension size, the resulting dimension size is the max of the sizes of x and y along that dimension.

More details [here](https://pytorch.org/docs/stable/notes/broadcasting.html)



In [29]:
# can line up trailing dimensions to make reading easier
x=torch.empty(5,1,4,1)
y=torch.empty(  3,1,1)
print((x+y).size())



torch.Size([5, 3, 4, 1])


In [30]:
# but not necessary:
x=torch.empty(1)
y=torch.empty(3,1,7)
print((x+y).size())



torch.Size([3, 1, 7])


### Question: The following command does not work. Why?



In [31]:
x=torch.empty(5,2,4,1)
y=torch.empty(  2,1,1)
print((x+y).size())


torch.Size([5, 2, 4, 1])


PyTorch ne peut pas étirer les tenseurs pour faire l’addition.

In [32]:
x=2*torch.ones(  2,4)
y=torch.ones(3,2,4)
print(x+y)

tensor([[[3., 3., 3., 3.],
         [3., 3., 3., 3.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.]]])


### Question: What is the diffence between "x = xr" and "x = xr.clone()"? 

In [33]:
x = xr.clone()
x.add_(-xr)
print(x)
print(xr)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
tensor([[-0.6261, -0.7982,  0.8510,  1.5492,  0.6239],
        [-0.3400, -0.9991,  0.9205,  0.8365, -1.3516],
        [ 0.4258, -1.0592,  0.3294, -0.0551, -1.5634]])


x = xr dans ce cas x pointe juste sur la même valeur

x = xr.clone() dans ce cas x devient une vraie copie avec son propre stockage

Also be careful, changing the torch tensor modify the numpy array and vice-versa...

In [34]:
y=torch.ones(2,4)
print(y)
z = y.numpy()
print(z)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [35]:
np.add(z, 1, out=z)
print("z=", z)
print("y=", y,"\n")
torch.add(y, -4, out=y)
print("z=",z)
print("y=",y)

z= [[2. 2. 2. 2.]
 [2. 2. 2. 2.]]
y= tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.]]) 

z= [[-2. -2. -2. -2.]
 [-2. -2. -2. -2.]]
y= tensor([[-2., -2., -2., -2.],
        [-2., -2., -2., -2.]])


## Computational graphs

In [36]:
import matplotlib.pyplot as plt
%matplotlib inline
import torch

In [37]:
!pip install torchviz



In [38]:
import torchviz

In [39]:
x = torch.ones(2, 2, requires_grad=True)
w = torch.rand(1, 1, requires_grad=True)
print(x)
print(w)
y = w * x + 2
print(y)
torchviz.make_dot(y)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[0.2089]], requires_grad=True)
tensor([[2.2089, 2.2089],
        [2.2089, 2.2089]], grad_fn=<AddBackward0>)


ExecutableNotFound: failed to execute PosixPath('dot'), make sure the Graphviz executables are on your systems' PATH

<graphviz.graphs.Digraph at 0x7e24e34129d0>


## Playing with pytorch: linear regression

Code for plotting the surface

In [40]:
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

def plot_figs(fig_num, elev, azim, x, y, weights, bias):
    fig = plt.figure(fig_num, figsize=(4, 3))
    plt.clf()
    ax = Axes3D(fig, elev=elev, azim=azim)
    ax.scatter(x[:, 0], x[:, 1], y)
    ax.plot_surface(np.array([[0, 0], [1, 1]]),
                    np.array([[0, 1], [0, 1]]),
                    (np.dot(np.array([[0, 0, 1, 1],
                                          [0, 1, 0, 1]]).T, weights) + bias).reshape((2, 2)),
                    alpha=.5)
    ax.set_xlabel('x_1')
    ax.set_ylabel('x_2')
    ax.set_zlabel('y')
    
def plot_views(x, y, w, b):
    # Generate the different figures from different views
    elev = 43.5
    azim = -110
    plot_figs(1, elev, azim, x, y, w, b[0])

    plt.show()

Code for generating the 2D points

In [41]:
#Data generation (2D points)
w_source = torch.tensor([2., -3.], dtype=torch.float).view(-1,1)
b_source  = torch.tensor([1.], dtype=torch.float)

x = torch.empty(30, 2).uniform_(0, 1) # input of the regression model

print(x.shape)
print(w_source.shape)
print(b_source.shape)

y = torch.matmul(x,w_source)+b_source # output of the regression model
print(y.shape)

torch.Size([30, 2])
torch.Size([2, 1])
torch.Size([1])
torch.Size([30, 1])


Plot the dataset

In [42]:
plot_views(x.numpy(), y.numpy(), w_source.numpy(), b_source.numpy())

<Figure size 400x300 with 0 Axes>

In [43]:
# randomly initialize learnable weights and bias
w_t_init = torch.empty(2, 1).uniform_(-1, 1) 
b_t_init = torch.empty(1, 1).uniform_(-1, 1)

print("Initial values of the parameters:")
print(w_t_init) 
print(b_t_init)



Initial values of the parameters:
tensor([[-0.7776],
        [ 0.4204]])
tensor([[0.6449]])


### Question: calculate the gradient of the loss and code it.

In [44]:
# our model forward pass
def forward_t(x):
    return x.mm(w_t) + b_t

# loss function
def loss_t(x, y):
    y_pred = forward_t(x)
    return (y_pred - y).pow(2).sum()

# compute gradient
def gradient_t(x, y):
    y_pred = forward_t(x)
    err = y_pred - y
    grad_w = 2 * x.t().mm(err)
    grad_b = 2 * err.sum()
    grad_b = grad_b.reshape_as(b_t)
    return grad_w, grad_b


Main loop for computing the estimate (gradient descent)

### Question: code the gradient descent algorithm within the main loop.

In [45]:
learning_rate = 1

w_t = w_t_init.clone()
b_t = b_t_init.clone()

for epoch in range(10):
    l = loss_t(x,y)
    grad_w, grad_b = gradient_t(x,y)
    w_t -= learning_rate * grad_w
    b_t -= learning_rate * grad_b
    print("progress:", "epoch:", epoch, "loss",l)

progress: epoch: 0 loss tensor(57.7993)
progress: epoch: 1 loss tensor(3184.4573)
progress: epoch: 2 loss tensor(16098597.)
progress: epoch: 3 loss tensor(1.4374e+11)
progress: epoch: 4 loss tensor(1.2861e+15)
progress: epoch: 5 loss tensor(1.1507e+19)
progress: epoch: 6 loss tensor(1.0296e+23)
progress: epoch: 7 loss tensor(9.2125e+26)
progress: epoch: 8 loss tensor(8.2428e+30)
progress: epoch: 9 loss tensor(7.3752e+34)


In [46]:
# After training
print("Estimation of the parameters:")
print(w_t)
print(b_t)

Estimation of the parameters:
tensor([[-1.6727e+18],
        [-1.5903e+18]])
tensor([[-2.9120e+18]])


### Question: Test a higger learning rate (e.g., learning_rate = 1). Explain what you observe.

Avec learning_rate = 1, la descente de gradient diverge : la perte augmente très vite et les paramètres saturent. Le pas est trop grand par rapport à la courbure de la perte ; on sur‑saute le minimum à chaque itération, ce qui provoque une instabilité.

### Question: How to improve the quality of the estimate?

Pour améliorer la qualité de l’estimation, il faut normaliser les données, utiliser un learning rate plus petit, augmenter le nombre d’itérations, éventuellement ajouter une régularisation, et utiliser un optimiseur plus avancé pour une convergence plus stable et précise.