# Day 13 - Introducing deep learning and the PyTorch Library

## The deep learning revolution

* Before the deep learning revolution, much of machine learning focused on $feature\ engineering$
* Deep learning enabled learned features to be used, instead of requiring practitioners to engineer them manually
* Automatically learned features are often better than manually engineered ones
* What is needed for deep learning:
    * Ingesting data
    * Define the deep learning machine
    * Train the machine
* The goal of training is to drive the $criterion$ (loss function) lower

## PyTorch for deep learning

* PyTorch is an excellent library for introducing deep learning, as it is clear, streamlined, popular, and easy to debug
* The core data structure in PyTorch is the $tensor$, which is a multidimensional array similar to NumPy arrays
* Along with these, PyTorch comes with tools to accelerate mathematical operations on dedicated hardware, like GPUs
* The aim of the book is to cover enough ground to allow solving real-world problems and understanding new models as the pop up on arXiv

## Why PyTorch?

* In order to practically solve problems, we need tools that are flexible, efficient, and perform with variability in the input data
* PyTorch is simple and Pythonic
* It provides two key features:
    * Acceleration via GPU
    * Numerical optimiztion for mathematical expressions
* These extend beyond deep learning into high performance scientific computing in general
* PyTorch is very expressive, avoiding undue complexity
* It provides one of the most seamless transitions from idea to code, in deep learning

### The deep learning competitive landscape

* The release of PyTorch marked the beginning of a unification in the deep learning space, which was previously composed of many tools and libraries
* In industry today, most technology is built using PyTorch, TensorFlow, or Hugging Face
* Hugging Face is an application-oriented high level wrapper, allowing users to share pre-trained models

## An overview of how PyTorch supports deep learning projects

* PyTorch provides tensors and ways to operate on them, on the CPU or the GPU
* Moving from CPU to GPU, or back, is usually extremely simple
* Tensors remember operations done on them, used by autograd, the automatic differentiation engine
* The typical workflow is to load data, train a model, and then deploy the model
* The building blocks for neural networks, including layers, activation functions, and loss functions, reside in `torch.nn`
* The bridge between our raw data, and PyTorch's tensors, is the `Dataset` class from `torch.utils.data`
* Loading this data quickly during training is done with the `DataLoader` class
* Training is usually done in a simple `for` loop
* In the loop, the model is evaluated on a batch of samples from the data loader
* The model's output is compared to the desired output, using our $loss\ function$ or $criterion$
* Using autograd and an $optimizer$ from `torch.optim`, the model is then adjusted to produce the desired output
* As it is becoming increasingly more common to use multiple GPUs, or even multiple machines, `torch.distributed` provides functionality for this
* For the $trained$ model to be useful, it needs to be $deployed$
* We can export the model by serializing it into $TorchScript$, or exporting it in the [$ONNX$](https://onnx.ai/) format

## Hardware and software requirements

* Running a pretrained network is doable on a PC, or even a laptop
* Training, however, takes a long time, due to the number of loops
* For this, a CUDA-capable GPU brings at least an order of magnitude of speedup, usually 40-50x
* Larger networks may take hours or days to train
* This can be reduced by using a high-end GPU, using multiple GPUs, or even using multiple machines with multiple GPUs
* The latter is less prohibitive than it sounds, due to cloud computing providers 

### Using Jupyter Notebooks

* This is a Jupyter notebook
* I clearly don't need to take notes on this

## Exercises