# <center>**Chapter 10 : Building Neural Nwtworks with Pytorch**</center>

## **Pytorch Fundamentals**



In [1]:
import torch


In [2]:
X = torch.tensor([[1.0 , 4.0 , 7.0] , [2.0 , 3.0 , 6.0]])
X

tensor([[1., 4., 7.],
        [2., 3., 6.]])

In [3]:
X.shape , X.dtype

(torch.Size([2, 3]), torch.float32)

In [4]:
# indexing works just like numpy arrays
X[0,1] , X[:,1]

(tensor(4.), tensor([4., 3.]))

In [5]:
import numpy as np
X.numpy()

array([[1., 4., 7.],
       [2., 3., 6.]], dtype=float32)

In [6]:
torch.tensor(np.array(np.array([[1 , 4 , 7] , [ 2 , 3 , 6]])))

tensor([[1, 4, 7],
        [2, 3, 6]])

In [7]:
#pytorch's API also provides many inplace operations

X.relu_()

tensor([[1., 4., 7.],
        [2., 3., 6.]])

## **Hardware Acceleration**


In [8]:
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

In [9]:
M = torch.tensor([[1. , 2. , 3.] , [4.,5.,6.]])
M = M.to(device)

In [10]:
M.device

device(type='cuda', index=0)

In [11]:
R = M @ M.T # this runs on the GPU
R

tensor([[14., 32.],
        [32., 77.]], device='cuda:0')

In [12]:
M = torch.rand(1000 , 1000) # on the CPU
%timeit M @ M.T

4.83 ms ± 91.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [13]:
M = torch.rand((1000 , 1000) , device = "cuda")
%timeit M @ M.T

446 μs ± 7.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## **Autograd**


In [14]:
x = torch.tensor(5.0 , requires_grad=True)
f = x ** 2
f
print(f)

tensor(25., grad_fn=<PowBackward0>)


In [19]:
f.backward()
x.grad

tensor(10.)

In [20]:
learning_rate = 0.1
with torch.no_grad():
    x -= learning_rate * x.grad  # this is the gradient decent step

In [None]:
# another way to avoid gradient computation

# detach method createds a new tensor detached from the computation graph with requires_grad = False, but still pointing to same data in memory
# this can be effective when you need to run some computations on a tensor without affecting the gradients ( eg : evaluation) , or when you need fine grained control over which operations should contribute to gradient computation.

x_detached = x.detach()
x_detached -= learning_rate * x.grad

### **Warning : If you forget to zero out the gradients at each training iteration, the backward() method will just accumulate them, causing incorrect gradient descent updates. Since, there wont be any explicit error, just low performance this issue may be hard to debug**

In [None]:
x.grad.zero_()

tensor(0.)

In [21]:
# putting  everything together, the whole training loop looks like this:

learning_rate = 0.1
x = torch.tensor(5.0 , requires_grad= True)
for iteration in range(100):
    f = x ** 2
    f.backward()
    with torch.no_grad():
        x -= learning_rate * x.grad  # this is the gradient descent step

    x.grad.zero_()

#### 1.  **Some oprerations - such as exp(), relu(), rsqrt(), sigmoid(), sqrt(), tan(), and tanh() - save their outputs in the computation graph during the forward pass, then use these outputs to compute the gradients during the backward pass. This means that you must not modify such an operations output in place, or you will get an error during the backward pass**

#### 2.  **Other operations  such as abs(), cos() , log() , sin(), square(), and var() save their inputs instead of their oputputs. Such an operation doesnot care if you modify its output in place, but you must not modify its inputs in place before the backward pass.**


## **Implementing Linear Regression**