# Deep Learning

Deep Learning is a subfield of Machine Learning that uses ***neural_networks***. Neural networks were inspired by the human brain, with many layers of interconnected "neurons" that process information.


 # Neural Networks

 <img src='https://miro.medium.com/v2/resize:fit:800/1*wDptpMvzYcmV62esTVZ_Bw.png' height=300 width=400>


A **neural network** is a computational model that approximates functions. It maps input data (features) to an output (predictions) by passing it through a network of layers.

It consists of:
- **Input Layer**: Takes in raw data (e.g., pixel values, text embeddings)
- **Hidden Layers**: Perform computations via weighted sums and activation functions
- **Output Layer**: Produces final predictions (e.g., class scores or regression values)

---


#PyTorch Fundamentals

Installation
pip
```
pip install torch
```

Installation uv



```
uv init
uv venv

source .venv/bin/activate  # On Linux/macOS
.venv\Scripts\activate     # On Windows

uv add torch
```


# Pytorch Introduction

PyTorch is a fully featured framework for building deep learning models.

## Tensors
Tensors are a core PyTorch data type, similar to a multidimensional array, used to store and manipulate the inputs and outputs of a model, as well as the model’s parameters. Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs to accelerate computing.

## Graphs
Graphs are data structures consisting of connected nodes (called vertices) and edges. Every modern framework for deep learning is based on the concept of graphs, where Neural Networks are represented as a graph structure of computations. PyTorch keeps a record of tensors and executed operations in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors.

Example

$
\mathbf{h}_t = \tanh(\mathbf{W}_h \mathbf{h}_{t-1}^\top + \mathbf{W}_x \mathbf{x}^\top)
$


           Wh    h_{t-1}     Wx    x(input at t)
            │       │         │       │
            └──┬────┘         └──┬────┘
               ▼                 ▼
              ┌─────────┐   ┌─────────┐
              │ Matrix  │   │ Matrix  │
              │Multiply │   │Multiply │
              └─────────┘   └─────────┘
                    │         │
                    └────┬────┘
                         ▼
                       ┌───┐
                       │ + │
                       └───┘
                         │
                         ▼
                      ┌─────┐
                      │tanh │
                      └─────┘
                         │
                         ▼
                        h_t

Nodes represent:
- Operations (like matmul, add, tanh, etc.)

- Tensors (especially leaf tensors like inputs and parameters that require gradients)

Edges represent:
The data dependencies between operations.

- They show how the output of one operation (node) is used as the input to another.

- During backpropagation, gradients flow along these edges in r

----
####**Module 1: The Core - Tensors**

In [None]:
import torch
torch.__version__

'2.6.0+cu124'

##### Creating Tensor

In [None]:


# from list
a = [1,2]
b = [3,4]
x_data = torch.tensor([a, b])
print("Tensor from list:\n", x_data)

Tensor from list:
 tensor([[1, 2],
        [3, 4]])


In [None]:
# from numpy

import numpy as np
np_array = np.array([[1, 2], [3, 4]])
print("Numpy Array: \n", np_array)
x_np = torch.from_numpy(np_array)
print("Tensor from NumPy array:\n", x_np)

Numpy Array: 
 [[1 2]
 [3 4]]
Tensor from NumPy array:
 tensor([[1, 2],
        [3, 4]])


In [None]:

shape = (2, 3) # (rows, columns)

rand_tensor = torch.rand(shape) # values between 0 and 1
print("Random Tensor:\n", rand_tensor)
print("----------------------------")

rand_tensor = torch.randint(0, 50, shape) # values between low=0 and high=50
print("Random Tensor:\n", rand_tensor)
print("----------------------------")

ones_tensor = torch.ones(shape)
print("Ones Tensor:\n", ones_tensor)
print("----------------------------")


zeros_tensor = torch.zeros(shape)
print("Zeros Tensor:\n", zeros_tensor)

Random Tensor:
 tensor([[0.6440, 0.7300, 0.4562],
        [0.0547, 0.3996, 0.9277]])
----------------------------
Random Tensor:
 tensor([[ 1,  2, 47],
        [14,  4,  3]])
----------------------------
Ones Tensor:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])
----------------------------
Zeros Tensor:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [None]:
# Creating tensors with specific properties

x_ones = torch.ones_like(zeros_tensor, dtype=torch.int)
print("Ones Tensor:\n", x_ones)
print("----------------------------")

x_rand = torch.rand_like(ones_tensor, dtype=torch.float)
print("Random Tensor:\n", x_rand)

Ones Tensor:
 tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int32)
----------------------------
Random Tensor:
 tensor([[0.0469, 0.0298, 0.7687],
        [0.7031, 0.2311, 0.3464]])


##### Tensor Attributes and Operations

In [None]:
my_tensor = torch.randint(0, 9, (4, 4))
print(my_tensor)

tensor([[5, 3, 6, 6],
        [6, 6, 8, 8],
        [3, 5, 4, 3],
        [2, 3, 2, 4]])


In [None]:
# Demonstrate shape, dtype, device, reshaping, and operations

print("Shape:", my_tensor.shape)
print("Datatype:", my_tensor.dtype)
print("Device:", my_tensor.device) # default device CPU

Shape: torch.Size([4, 4])
Datatype: torch.int64
Device: cpu


In [None]:
# Reshaping

reshaped_tensor = my_tensor.reshape(2, 8)
print("Reshaped tensor:", reshaped_tensor)
print('-'*50,'\n')


reshaped_tensor = my_tensor.reshape(-1)
print("Reshaped tensor with unknown dimension:", reshaped_tensor)
print('-'*50,'\n')



flat_tensor = my_tensor.view(16)
print("Flat tensor:", flat_tensor)
print('-'*50,'\n')


reshaped_tensor = my_tensor.view(8,2)
print("Reshaped tensor using view:", reshaped_tensor)
print('-'*50,'\n')

reshaped_tensor = my_tensor.flatten()
print("Reshaped tensor with flatten:", reshaped_tensor)

Reshaped tensor: tensor([[5, 3, 6, 6, 6, 6, 8, 8],
        [3, 5, 4, 3, 2, 3, 2, 4]])
-------------------------------------------------- 

Reshaped tensor with unknown dimension: tensor([5, 3, 6, 6, 6, 6, 8, 8, 3, 5, 4, 3, 2, 3, 2, 4])
-------------------------------------------------- 

Flat tensor: tensor([5, 3, 6, 6, 6, 6, 8, 8, 3, 5, 4, 3, 2, 3, 2, 4])
-------------------------------------------------- 

Reshaped tensor using view: tensor([[5, 3],
        [6, 6],
        [6, 6],
        [8, 8],
        [3, 5],
        [4, 3],
        [2, 3],
        [2, 4]])
-------------------------------------------------- 

Reshaped tensor with flatten: tensor([5, 3, 6, 6, 6, 6, 8, 8, 3, 5, 4, 3, 2, 3, 2, 4])


In [None]:
# Mathematical Operation

a = torch.tensor([1.0, 2.0, 3.0],)
b = torch.tensor([4.0, 5.0, 6.0])

# Element-wise operations
print(f"add =", a + b)
print(f"sub =", a - b)
print(f"mul =", a * b)
print(f"div =", a / b)
print(f"pow =", a ** 2)

print('-'*50,'\n')
# Element-wise functions
x = torch.tensor([
    [2., 4., 6.],
     [1., 5., 3.]])

print("mean_all =", torch.mean(x))          # Overall mean
print("mean_dim0 =", torch.mean(x, dim=0))
print("sum_all =", torch.sum(x))
print("max_val, max_idx =", torch.max(x, dim=0))
print("min_val, min_idx =", torch.min(x, dim=1))
print("std = ", torch.std(x))

add = tensor([5., 7., 9.])
sub = tensor([-3., -3., -3.])
mul = tensor([ 4., 10., 18.])
div = tensor([0.2500, 0.4000, 0.5000])
pow = tensor([1., 4., 9.])
-------------------------------------------------- 

mean_all = tensor(3.5000)
mean_dim0 = tensor([1.5000, 4.5000, 4.5000])
sum_all = tensor(21.)
max_val, max_idx = torch.return_types.max(
values=tensor([2., 5., 6.]),
indices=tensor([0, 1, 0]))
min_val, min_idx = torch.return_types.min(
values=tensor([2., 1.]),
indices=tensor([0, 0]))
std =  tensor(1.8708)


In [None]:
# matrix operation

A = torch.tensor([[1., 2.], [3., 4.]])
B = torch.tensor([[5., 6.], [7., 8.]])

# Matrix multiplication
print("matmul = \n",torch.matmul(A, B))  # or A @ B
print(A@B)

# Transpose
print("transpose =\n", A.T)  # or A.transpose(0, 1)

# Determinant
print("det =\n", torch.linalg.det(A))

# Inverse
print("inv =\n", torch.linalg.inv(A))


matmul = 
 tensor([[19., 22.],
        [43., 50.]])
tensor([[19., 22.],
        [43., 50.]])
transpose =
 tensor([[1., 3.],
        [2., 4.]])
det =
 tensor(-2.)
inv =
 tensor([[-2.0000,  1.0000],
        [ 1.5000, -0.5000]])


##### GPU Acceleration

In [None]:
my_tensor = torch.randn(8000,8000, dtype=torch.float32)

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

Using device: cuda


In [None]:
gpu_tensor = my_tensor.to(device)
gpu_tensor

tensor([[-0.0843, -1.1978,  0.9261,  ..., -0.1400,  0.5865,  0.1649],
        [-0.1110, -0.2898, -1.5152,  ...,  1.1018,  0.0280,  0.2350],
        [-2.0120, -1.0060,  1.5909,  ...,  0.2863,  0.7272,  0.5691],
        ...,
        [ 0.4817, -0.4049,  0.8068,  ..., -1.4496, -2.1939,  1.0204],
        [-0.9117,  0.5796,  0.8540,  ...,  0.1452,  1.1311,  0.2764],
        [-0.5662, -0.5067, -1.4553,  ...,  0.9152,  1.9835, -0.7238]],
       device='cuda:0')

In [None]:
# my_tensor + gpu_tensor

In [None]:
import time
start = time.time()
print(my_tensor @ my_tensor)
print('total time taken: ', time.time()-start)

tensor([[  42.1795, -144.4513,  184.9128,  ..., -244.2494,  -61.0312,
          104.1585],
        [  31.9048,   68.0681,    8.1253,  ...,  -41.3922,    4.2698,
          104.0785],
        [-150.6989,  -65.7963,  -53.2894,  ...,   13.7322,  -24.9052,
           55.5764],
        ...,
        [  24.4931,  -41.3000,  131.7645,  ...,  -87.0403,  -50.6616,
         -102.7693],
        [ -35.7732,  -18.8666,  147.9960,  ...,   35.6521,   -7.1886,
          -66.7831],
        [ -60.5708,  129.8687,   80.7628,  ...,  -35.2043, -104.5603,
          -14.0187]])
total time taken:  15.728365421295166


In [None]:
import time
start = time.time()
print(torch.matmul(gpu_tensor, gpu_tensor))
print('total time taken: ', time.time()-start)

tensor([[  42.1796, -144.4515,  184.9129,  ..., -244.2495,  -61.0315,
          104.1583],
        [  31.9047,   68.0679,    8.1253,  ...,  -41.3921,    4.2699,
          104.0784],
        [-150.6986,  -65.7963,  -53.2895,  ...,   13.7322,  -24.9054,
           55.5765],
        ...,
        [  24.4931,  -41.3003,  131.7643,  ...,  -87.0402,  -50.6618,
         -102.7692],
        [ -35.7733,  -18.8667,  147.9961,  ...,   35.6520,   -7.1885,
          -66.7833],
        [ -60.5708,  129.8686,   80.7626,  ...,  -35.2043, -104.5602,
          -14.0189]], device='cuda:0')
total time taken:  0.3053436279296875


#### Autograd or Automatic Differentiation


$p = x \cdot y + y
$

Compute the first derivative of \( p \) with respect to \( y \):

$
\frac{dp}{dy} = \frac{d(xy + y)}{dy} = x + 1
$

Substitute \( x = 2 \):

$
\frac{dp}{dy}$ = 2 + 1 = 3

----
$
q = p^2
$

$\frac{dq}{dy} = \frac{dp^2}{dy} $
$= \frac{d(x \cdot y + y)^2}{dy} $
$= \frac{d(x + 1)^2y^2}{dy} $
$= y(x+1)^2$

In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)

p = x * y + y

q = p ** 2

print("Result of p = x * y + y:", p)
print("Result of q = p ** 2:", q)

Result of p = x * y + y: tensor(9., grad_fn=<AddBackward0>)
Result of q = p ** 2: tensor(81., grad_fn=<PowBackward0>)


In [None]:
q.grad_fn, q.grad_fn.next_functions

(<PowBackward0 at 0x7da682df6f80>, ((<AddBackward0 at 0x7da682df5cf0>, 0),))

In [None]:
p.grad_fn, p.grad_fn.next_functions

(<AddBackward0 at 0x7da682df5cf0>,
 ((<MulBackward0 at 0x7da682df61a0>, 0),
  (<AccumulateGrad at 0x7da682df5c00>, 0)))

In [None]:
dp_dy = torch.autograd.grad(p, y, create_graph=True) #create_graph=True, this new dp_dy is now part of a computation graph and supports further gradients.
dp_dy

(tensor(3., grad_fn=<AddBackward0>),)

In [None]:
dq_dy = torch.autograd.grad(q, y, create_graph=True)
dq_dy

(tensor(54., grad_fn=<AddBackward0>),)

In [None]:
dq2_dy2 = torch.autograd.grad(dq_dy, y, create_graph=True)
dq2_dy2

(tensor(18., grad_fn=<AddBackward0>),)

In [None]:
from os import pread
x = torch.tensor([[1.], [2.], [3.], [4.]]) # input
y = 5 # target
print(f"input, x = {x} \n target = {y}")
print("-"*50,"\n")

w = torch.ones(4, 4, requires_grad=True)
print("w = ", w)
print("-"*50,"\n")

b = torch.randn(4, 1, requires_grad=True)
print("b =", b)
print("-"*50,"\n")


o = w @ x + b
print("output = w @ x + b \n", o)
print("-"*50,"\n")

pred = o.sum()

loss = pred - y

print("Loss: ", loss)
print("-"*50,"\n")
loss.backward()

print("Gradient for x:\n d(loss)/dw = w.grad = ", w.grad) # .grad works only for scalar attributes; 4×1 input x.⊤⇒broadcasted to 4×4
print("-"*50,"\n")

print("Gradient for b:\n d(loss)/db = b.grad = ", b.grad)
#  Since the loss is the sum of all elements in z, each entry in bias.grad is just 1
print("-"*50,"\n")


input, x = tensor([[1.],
        [2.],
        [3.],
        [4.]]) 
 target = 5
-------------------------------------------------- 

w =  tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], requires_grad=True)
-------------------------------------------------- 

b = tensor([[-0.4719],
        [-0.5372],
        [-0.0431],
        [-0.0083]], requires_grad=True)
-------------------------------------------------- 

output = w @ x + b 
 tensor([[9.5281],
        [9.4628],
        [9.9569],
        [9.9917]], grad_fn=<AddBackward0>)
-------------------------------------------------- 

Loss:  tensor(33.9395, grad_fn=<SubBackward0>)
-------------------------------------------------- 

Gradient for x:
 d(loss)/dw = w.grad =  tensor([[1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.]])
-------------------------------------------------- 

Gradient for b:
 d(loss)/db = b.grad =  tensor([[1.],
      

In [None]:
# 2.3. Disabling Gradient Tracking
print(o.requires_grad)
with torch.no_grad():
    o_no_grad = w @ x + b
    print(o_no_grad.requires_grad)

True
False


#### *Questions*

1. Basic Tensor Creation
Write code to create a 3x3 tensor filled with the value 7. Print the tensor and its shape.
2. Tensor from NumPy
Given a NumPy array np_arr = np.arange(9).reshape(3, 3), convert it to a PyTorch tensor and print both the NumPy array and the resulting tensor.
3. Tensor Properties
Create a random tensor of shape (4, 2). Print its shape, data type, and device.
4. Reshaping Tensors
Given a tensor of shape (2, 6), reshape it to (3, 4) and then flatten it to a 1D tensor. Print all intermediate results.
5. Element-wise Operations
Create two tensors of shape (2, 2) with random values. Perform and print the results of element-wise addition, subtraction, and multiplication.
6. Matrix Multiplication
Create two tensors: one of shape (2, 3) and another of shape (3, 4). Perform matrix multiplication and print the result.
7. GPU Acceleration
Write code to check if a GPU is available. If so, move a tensor of your choice to the GPU and print its device. If not, print a message saying "GPU not available".
8. Mini-Exercise: Transpose and Device
Create a 5x3 tensor with random values, move it to the appropriate device (CPU or GPU), and print its transpose.
