In [1]:
import torch
import numpy as np

### **torch.nn.Linear()**
- **Note** : A matrix multiplication like this is also referred to as the `dot product` of two matrices.
- Nural network are full of matrix multiplications and dot products.
- The `torch.nn.Linear()` module (we'll see this in action later on), also known as a __feed-forward layer__ of __fully connected layer__, implements a matrix multiplication between an input `x` and weights matrix `A`.
- `y = x.transpose(A) + b`
  - Where :
  - `x` is the input to the layer (deep learning is a stack of layers like `torch.nn.Linear()` and others on top of each other.)
  - `A` is the weights matrix created by the layer, this starts out as random numbers that get ajusted as a Neural Netwok learns to better representation patterns in the data. (notice the "`T`", that's because the weights matrix get transposed).
  - __Note__ : You might also often see `W` or another letter like `X` used to showcase the weights matrix.
  - `b` is the bias terms used to slightly offset the weights and inputs.
  - `y` is the output (a manipulation of the input in the hopes to discover patterns in it).
  - This the linear function linek `y = mx + c` equation of straight line. Try to changing the values of `in_features` and `out_features` below and see what happens.
  - Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
  

In [2]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
linear = torch.nn.Linear(in_features=2,   # matche inner dimension of input
                         out_features=6)  # describe outer value

x = torch.tensor([[1, 2],   # put 2 feature in a row of input
                  [3, 4],
                  [5, 6]], dtype=torch.float)

output = linear(x)   # get 6 feature in a row of output 
print(f"Input Shape : {x.shape}\n")
print(f"Output : \n{output}\n\nOuptu Shape : {output.shape}")

Input Shape : torch.Size([3, 2])

Output : 
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Ouptu Shape : torch.Size([3, 6])


In [3]:
torch.manual_seed(32)
linearModel = torch.nn.Linear(in_features=4, # matche inner dimension of input
                         out_features=10) # describe outer value
  
x = torch.tensor([[1, 2, 3, 4],           # put 4 feature in a row of input
                  [2, 4, 6, 8],
                  [3, 6, 9, 12],
                  [4, 8, 12, 16],
                  [5, 10, 15, 20],
                  [6, 12, 18, 24]], dtype=torch.float)

y = linearModel(x)                         # get 10 feature in a row of output
print(x); print(x.shape); print()
print(y); print(y.shape); print()

tensor([[ 1.,  2.,  3.,  4.],
        [ 2.,  4.,  6.,  8.],
        [ 3.,  6.,  9., 12.],
        [ 4.,  8., 12., 16.],
        [ 5., 10., 15., 20.],
        [ 6., 12., 18., 24.]])
torch.Size([6, 4])

tensor([[  0.4974,  -1.5763,   1.4562,  -0.8506,   0.7215,  -0.4398,  -3.2376,
          -0.5411,   1.0880,   0.6005],
        [  1.3022,  -3.2688,   2.6275,  -1.3393,   1.8976,  -1.1590,  -5.9811,
          -0.7089,   1.9592,   1.3687],
        [  2.1071,  -4.9613,   3.7988,  -1.8280,   3.0737,  -1.8782,  -8.7246,
          -0.8768,   2.8304,   2.1368],
        [  2.9119,  -6.6538,   4.9701,  -2.3167,   4.2497,  -2.5974, -11.4682,
          -1.0447,   3.7016,   2.9050],
        [  3.7168,  -8.3462,   6.1413,  -2.8053,   5.4258,  -3.3166, -14.2117,
          -1.2125,   4.5729,   3.6732],
        [  4.5216, -10.0387,   7.3126,  -3.2940,   6.6019,  -4.0358, -16.9552,
          -1.3804,   5.4441,   4.4413]], grad_fn=<AddmmBackward0>)
torch.Size([6, 10])



## **<font color="red">Reproducibility : (Trying to take the random out of random)</font>**


- As you learn more about Neural Netwoks and Machine Learning, you'll start to discover how much randomness plays a part.
- How does this related to nural networks and deep learning then?
- We've discussed neural networks start with random numbers to describe patterns in data and try to improve those random numbers using tensor operations to better describe patterns in data.
- In short : `start with random numbers` ---> `tensor operations` ---> `try to make better (again and again......)`
- Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness. Because, for example, I create an algorithm capable of achieving `X` performance and now I want to verify that I'm right or wrong.
- To verify that I am right and my work is in correct direction the `Reproducibility` comes in picture.
- Any one get the same results on his computer running the same code as I get on mine.
- Let's see a brief example of reproducibility in PyTorch.
- We'll start by creating two random tensors, since they're random, you'd expect them to be different right?

In [4]:
random_tensor_a = torch.rand(size=(3, 4))
random_tensor_b = torch.rand(size=(3, 4))
print(f"tensor A : \n{random_tensor_a}\n")
print(f"tensor B : \n{random_tensor_b}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
print(random_tensor_a == random_tensor_b)

tensor A : 
tensor([[0.8529, 0.3920, 0.5805, 0.2238],
        [0.5989, 0.0382, 0.1198, 0.1159],
        [0.5589, 0.4112, 0.9977, 0.5187]])

tensor B : 
tensor([[0.9310, 0.2028, 0.6240, 0.4142],
        [0.9531, 0.6668, 0.5360, 0.1804],
        [0.0755, 0.3405, 0.7576, 0.2296]])

Does Tensor A equal Tensor B? (anywhere)
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


- Just as you might've expected, the tensors come out with different values. But what if you wnated to created two random tensors with the same values.
- As in, the tensors would still contain random values but they would be of the same flavour. That's where `torch.manual_seed(seed)` comes in, where `seed` is an integer that flavours the randomness.
- Let's try it out by creating some more *flavoured* random tensors.

In [5]:
# Set the random seed
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)

# torch.random.manual_seed(seed=RANDOM_SEED) # reset the seed every time a new rand()
random_tensor_C = torch.rand(3, 4)

# Reset the seed every time a new rand().
# Without this, tensor_D would be different to tensor_C
torch.random.manual_seed(seed=RANDOM_SEED)  # without this line get different tensor
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C : \n{random_tensor_C}\n")
print(f"Tensor D : \n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
print(random_tensor_C == random_tensor_D)

Tensor C : 
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D : 
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)
tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])


- Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.
- However, there are some steps you can take to limit the number of sources of nondeterministic behavior for a specific platform, device, and PyTorch release. **First**, you can control sources of randomness that can cause multiple executions of your application to behave differently. **Second**, you can configure PyTorch to avoid using nondeterministic algorithms for some operations, so that multiple calls to those operations, given the same inputs, will produce the same result.
- **Controlling Sources of Randomness** :
  - **PyTorch Random Number Generator** :
  - You can use `torch.manual_seed() to seed the RNG for all devices (both CPU and CUDA).
  - Some PyTorch operations may use random numbers internally. `torch.svd_lowrank()` does this, for instance. Consequently, calling it multiple times back-to-back with the same input arguments may give different results. However, as long as `torch.manual_seed()` is set to a constant at the beginning of an application and all other sources of nondeterminism have been eliminated, the same series of random numbers will be generated each time the application is run in the same environment.
  - It is also possible to obtain identical results from an operation that uses random numbers by setting `torch.manual_seed()` to the same value between subsequent calls.
  - **PyTorch**:
    <pre>
        import torch
        torch.manual_seed(0)
    </pre>
  - **Python** :
    <pre>
        import random
        random.seed(0)
    </pre>
  - **NumPy**:
    <pre>
        import numpy as np
        np.random.seed(0)
    </pre>
### **CUDA Convolution Benchmarking**
- The `cuDNN` library, used by **CUDA** convolution operations, can be a source of nondeterminism across multiple executions of an application. When a `cuDNN` convolution is called with a new set of size parameters, an optional feature can run multiple convolution algorithms, benchmarking them to find the fastest one. Then, the fastest algorithm will be used consistently during the rest of the process for the corresponding set of size parameters. Due to benchmarking noise and different hardware, the benchmark may select different algorithms on subsequent runs, even on the same machine.
- Disabling the benchmarking feature with `torch.backends.cudnn.benchmark = False` causes cuDNN to deterministically select an algorithm, possibly at the cost of reduced performance.
- However, if you do not need reproducibility across multiple executions of your application, then performance might improve if the benchmarking feature is enabled with `torch.backends.cudnn.benchmark = True`.
- **Note**: This setting is different from the `torch.backends.cudnn.deterministic` setting.

## **<font color="red">Running tensors on GPUs (and making faster computations)</font>**
- Deep learning algorithms require a lot of numerical operations. And by default these operations are often done on a `CPU` (Computer Processing Unit).
- However, there's another common piece of hardware called a `GPU` (Graphic Processing Unit), which is often much faster at performing the specific type of operations neural networks need (matrix multiplications) than CPUs.
- Your computer might have one. If so, you should look to use it whenever you can to train neural networks because chances are it'll speed up the training time dramatically.
- There are few ways to first get access to a GPU and secondly get PyTorch to use the GPU.
- **Note** : When I reference `GPU` throughout this course, I'm referencing a `Nvidia GPU with CUDA` enabled (CUDA is a computing platform and API that helps allow GPUs be used for general purpose computing & not just graphics) unless otherwise specified.

### **1. Getting a GPU**
- You may already know what's going on when I say GPU. But if not, there are a few ways to get access to one.
- 1. Method : ***Google Colab*** : Free to use, almost zero setup required, can share work with others as easy as link.
  2. Method : ***Use you own*** : Run everything locally on your own machine.
  3. Method : ***Cloud Computing (AWS, GCP, Azure)*** : Small upfront cost, access to almost infinite compute.
  4. Time(s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
     - *CPU (s)* : 3.8624
     - *GPU (s)* : 0.1083
     - *GPU* speedup over *CPU* : 35x

- For more knowloege about GPU you can follow this link:
https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/
- To check if you've got access to a Nvidia GPU, you can run `!nvidia-smi` where the `!` (also called bang) means "run this on the command line".

In [6]:
!nvidia-smi

Wed Oct 25 12:36:18 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 532.09                 Driver Version: 532.09       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                      TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce MX330          WDDM | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P0               N/A /  N/A|      0MiB /  2048MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

- If you don't have a Nvidia GPU accessible, the above will output something like:
  - NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
  - In that case, go back up and follow the install steps.

### **2. Getting PyTorch to run on the GPU**

- Once you've got a GPU ready to access, the next step is getting PyTorch to use for storing data(tensors) and computing on data (performing operations on tensors).
- To do so, you can use the `torch.cuda` package.
- You can test if PyTorch has access to a GPU using `torch.cuda.is_available()`.

In [7]:
torch.cuda.is_available()

False

- If the above outputs `True`, PyTorch can see and use the GPU, if it outputs `False`, it can't see the GPU and in that case, you'll have to go back through the installation steps.
- Now, let's say you wanted to setup your code so it ran on CPU or the GPU if it was available. That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.
- Let's create a `device` variable to store what kind of device is available.

In [8]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
print("device : ", device)


device :  cpu


- If the above output `"cuda"` it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output `"cpu"`, our PyTorch code will stick with the CPU.
  - **Note**: In PyTorch, it's best practice to write ***device agnostic code***. This means code that'll run on CPU (always available) or GPU (if available).
- If you want to do faster computing you can use a GPU but if you want to do *much* faster computing, you can use multiple GPUs.
- You can count the number of GPUs PyTorch has access to using `torch.cuda.device_count()`.
- You can count the number of CPUs PyTorch has access to using `torch.cpu.device_count()`.

In [9]:
# Count number of devices
print("cuda : ", torch.cuda.device_count())
print("cpu : ", torch.cpu.device_count())


cuda :  0
cpu :  1


### **3. Putting tensors (and models) on the GPU**
- `CPU` >---`tensor.to(device)`--->>> `GPU`
- You can put tensors (and models, we'll see this later) on a specific device by calling `to(device)` on them. Where `device` is the target device you'd like the tensor (or model) to go to.
- Why do this?
  - GPUs offer for faster numerical computing than CPUs do and if a GPU isn't available, because of our ***device agnostic code***, it'll run on the CPU.
- **Note :** Putting a tensor on GPU using `to(device)` (e.g. `<tensor_name>.to(device)`) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them:
  `<tensor_name> = <tensor_name>.to(device)`
- Tensor will move *CPU* to *GPU* if *GPU* is available else the tensor will be present in *CPU*.

In [10]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor)
print("device : ", tensor.device)

# Move tensor to GPU (if available)
tensor_to_gpu = tensor.to(device)
print(tensor_to_gpu)
print("device : ", tensor_to_gpu.device)

tensor([1, 2, 3])
device :  cpu
tensor([1, 2, 3])
device :  cpu


### **4. Moving tensors back to the CPU**
- `GPU` >---`Tensor.cpu()`--->>> `CPU`
- Move the tensor back to CPU.
- For example, you'll want to do this if you want to interact with your tensors with NumPy (NumPy does not leverage the GPU).
- Let's try using the `torch.Tensor.numpy()` method on our `tensor_on_gpu`.

In [11]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
# tensor_on_gpu.numpy()   # error
tensor_on_gpu = tensor.cpu()
tensor_on_gpu

tensor([1, 2, 3])

- Instead, to get a tensor back to CPU and usable with NumPy we can use `Tensor.cpu()`. This copies the tensor to CPU memory so it's usable with CPUs.

In [12]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()    # returns a copys of the GPU tensor in CPU memory so the original tensor is still on GPU.
tensor_back_on_cpu

array([1, 2, 3], dtype=int64)

- The above returns a copys of the GPU tensor in CPU memory so the original tensor is still on GPU.