# Week G

ML review, Ingredients for ML models, Intro to Neural Networks

## Review

We've seen a couple of different methods of doing classic machine learning for classification, regression and clustering tasks.

Despite operating on different types of data, using different algorithms and serving different purposes, all of the methods we've seen follow a common set of operations, or, pipeline:

<br>
<img src="./imgs/ML_00.jpg" width=800px />

- **Data**: we start with a collection of files or numbers that we process, analyze, visualize and study to understand their content and relationships. We then split this data into $2$ separate datasets, one for training our algorithm and another to test how it performs omn data it hasn't seen.

- **Algorithm**: these are the mathematical operations that get performed on the data to extract patterns and relationships between our data points. The algorithm chosen depends on the type of task we are trying to accomplish.

- **Cost Function**: in order for the algorithm to learn anything we have to guide it by telling it how close it gets to correct answers. This is particularly important for supervised learning, where the algorithm gets the data to operate on and the correct answer for the task. Some algorithms have built-in cost functions, others take a cost function as a parameters, but either way, this is the function that the algorithm uses to adjust its parameters.

- **Evaluation Function**: once our algorithm builds a model from the training data, the evaluation function is what we use to measure how well the model performs on the test dataset. The evaluation function can be the same as the cost function, but is usually a little bit more legible. Where the cost function can be a complex mix of formulas, each meant to guide the algorithm navigate tradeoffs when picking parameters, the evaluation function is meant to validate whether our choice of algorithm, cost function and data is sufficient for our overall goals.

<br>
<img src="./imgs/ML_01.jpg" width=800px />

Some examples of each of these components:
- **Data Processing**:
  - [Encoding](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)
  - [Scaling](https://scikit-learn.org/stable/modules/preprocessing.html)
  - [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)

- **Cost Function**:
  - [Distance](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.DistanceMetric.html)
  - [MSE](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.mean_squared_error.html)
  - [Class Likelihood](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.class_likelihood_ratios.html)

- **Algorithm**:
  - [Linear Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)
  - [Random Forest](https://scikit-learn.org/stable/modules/ensemble.html#random-forests-and-other-randomized-tree-ensembles)
  - [SVMs](https://scikit-learn.org/stable/modules/svm.html)
  - [Clustering](https://scikit-learn.org/stable/modules/clustering.html)

- **Evaluation Function**:
  - [Accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)
  - [Precision](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html)
  - [Other metrics](https://scikit-learn.org/stable/modules/model_evaluation.html)

<br>
<img src="./imgs/ML_02.jpg" width=800px />

## Training Models

We've seen this a few times now, and it was usually in the form of a function called `fit()`.

Training, or fitting, is the process by which the algorithm combines our data and our cost function to produce a model.

The process can be summarized like this:

<br>
<img src="./imgs/training_00.jpg" width=800px />

Every model prediction is compared to the correct output value available in the training data, and the difference between prediction and true value is used to adjust the model's mathematical parameters.

If we were to visualize this process over time for a linear regression model and a clustering model, we might see something like this, where the model's performance improves as it uses more data to adjust its parameters:

<br>
<img src="./imgs/training_01.jpg" width=800px />
<br>
<img src="./imgs/training_02.jpg" width=800px />


Some of the models we've seen so far have "_closed-form_" solutions. This means that they're able to look at all of the training data at the same time and, using basic algebraic operations $(+$, $-$, $\times$, $\div)$ and matrix algebra, come up with optimal parameters almost instantaneously. This is the case for `PCA` and `Linear Regression` models.

This is both a strength and a weakness of these types of models. They train really fast and work really well, as long as we're working with not-so-large datasets for specific tasks. As the variation in our data becomes too large, and the relations we are trying to model grow in complexity, these models start to fall short in quality, or become impractical.

## Neural Networks

This has been the most popular method for creating models that can handle very diverse types of data, while not requiring a lot of customization and manual parameter-tuning.

The process for training a neural network is exactly the same:

<br>
<img src="./imgs/nn_00.jpg" width=800px />

Except, instead of using very specific matrix operations and algebraic expressions $(+$, $-$, $\times$, $\div)$, all of the calculations (processing, predictions, etc) are done using A LOT of very generic, very simple, computational elements called "_neurons_" or "_perceptrons_":

<br>
<img src="./imgs/nn_01.jpg" width=800px />

### Neurons

These are supposed to mimic how actual brain neurons work: they fire and propagate signals depending on the combination and strength of signals present at their inputs.

The operation of a single neuron is quite simple:

<br>
<img src="./imgs/nn_02.jpg" width=800px />

They first perform a weighted sum of their input signals:

$\displaystyle Z = w_A \cdot A + w_B \cdot B + w_C \cdot C + ... $

and then, an _activation function_ determines if and how the neuron fires, based on this weighted sum and a threshold value.

This looks a lot like the function that we try to optimize during linear regression:

$\displaystyle Y = \beta_0 \cdot x_0 + \beta_1 \cdot x_1 + \beta_2 \cdot x_2 + ... $

The difference is that before we had $N$ parameters for $N$ features, and with a neural network we'll have around $N$ parameters per node and about $N$ nodes (one for each feature).

### Networks

Despite this increase in the number of parameters, neural networks are beneficial because they are modular, and relatively simple to extend when needed.

Where previously we needed to know very specific strategies for dealing with different types of problems (adding non-linear functions to linear regression, or dimensionality reduction with PCA, or normalization), with neural networks, we can just add more layers and more nodes:

<br>
<img src="./imgs/nn_03.jpg" width=800px />

Another benefit of neural networks is that the same architecture can be used for different tasks by just changing the activation function of the nodes in the last layers.

For regression tasks we remove any activation function and just use the node's weighted sum. For classification tasks, we use something called a `softmax()` function that turns the weighted sum into a likelihood value:

<br>
<img src="./imgs/softmax_00.jpg" width=800px />

Other tasks, like image detection, segmentation, generation or encoding, just use a combination of these two types of output nodes.

### Training Neural Networks

Training neural networks is a process similar to the `fit()` step we've seen in other models:

<br>
<img src="./imgs/nn_00.jpg" width=800px />

But, because they tend to have A LOT more parameters, training takes A LOT more data and A LOT more time.

<br>
<img src="./imgs/nn_04.jpg" width=800px />

The cost function is still responsible for starting the process of adjusting the parameters in a neural network, using a method called _Backpropagation_.

<br>
<img src="./imgs/nn_05.jpg" width=800px />

For every record (or batch of records) in the training dataset, the cost function informs each node how their parameters could be changed in order to decrease the error on the output:

<br>
<img src="./imgs/nn_06.jpg" width=800px />

### A Bit More Detail

<img src="./imgs/slides_00.jpg" width=800px />

<a href="https://docs.google.com/presentation/d/1ppf-nxKS9QKvuNrx37SVkiAs4nk8qJv9JZz7WhhwjFo/">SLIDES</a>

## Tensors

We'll be using the [PyTorch](https://pytorch.org/) library for working with Neural Networks.

Before we start building, training, tuning models, we have to learn a little bit about [Tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)!

Tensor is a fancy word for multi-dimensional list. They are very much like lists, where they keep a sequence of number values, or a sequence of other tensors. They are a little bit more picky than lists because they require all members to be of the same _type_ (all integers, or all floats, etc), and they don't like having inner lists of different lengths.

PyTorch tensors are optimized for doing neural network operations, and so they come with a few extra capabilities beyond `sum()`, `sort()`, `mean()`, etc.

Let's start by importing them, and taking a look at how to work with multi-dimensional tensors:

In [None]:
!wget -q https://github.com/DM-GY-9103-2024F-H/9103-utils/raw/main/src/image_utils.py

In [None]:
from torch import zeros_like, tensor

from image_utils import open_image, make_image

### Loading and Shaping

Let's open up an image and load its pixels into a tensor.

In [None]:
mimg = open_image("./data/arara.jpg")

display(mimg)
print(mimg.pixels[:5])

We just have to pass the list of pixels to the `tensor()` constructor.

We can check it's size with the `shape` member variable, and use slicing and indexing like we've always used with lists:

In [None]:
mimg_t = tensor(mimg.pixels)
mimg_t.shape, mimg_t[:5], mimg_t[5], mimg_t[5][0]

The shape of this tensor is $607,500 \times 3$, meaning that we have $607,500$ pixels and each pixel has $3$ color values.

Let's reshape the tensor so it's more representative of our image's dimensions. We want to have a tensor of shape $h \times w \times 3$, where $h$ and $w$ are the images `height` and `width` dimensions.

The `reshape()` function does just this, we just have to pass the parameters in the right order.

In [None]:
mimg_t = tensor(mimg.pixels).reshape(mimg.size[1], mimg.size[0], 3)

mimg_t.shape, mimg_t[:5].shape, mimg_t[:5], mimg_t[0][5], mimg_t[0, 5]

Now `mimg_t[:5]` doesn't refer to first $5$ pixels anymore, but to the first $5$ rows of our image.

To get the first $5$ pixels we can use `mimg_t[0][:5]` or `mimg_t[0, :5]`.

That's new syntax! using multiple numbers inside the square brackets, separated with a comma.

In [None]:
mimg_t[0][:5]

### Slicing

This is where it starts to get fun.

Since we can 

# TODO HERE

In [None]:
x0,y0 = 240, 30

mimg_crop_t = mimg_t[y0:y0+256, x0:x0+256]

mimg_crop_t.shape, mimg_crop_t[0,:5]

In [None]:
mimg_crop = make_image(mimg_crop_t)
display(mimg_crop)
mimg_crop.pixels[:5]

In [None]:
mimg_crop_r_t = mimg_crop_t.clone()
mimg_crop_r_t[:, :, 1:3] = 0

mimg_crop_r_t[0,:5]

In [None]:
display(make_image(mimg_crop_r_t))

In [None]:
mimg_crop_g_t = mimg_crop_t.clone()
mimg_crop_g_t[:, :, 0] = 0
mimg_crop_g_t[:, :, 2] = 0

mimg_crop_b_t = mimg_crop_t.clone()
mimg_crop_b_t[:, :, 0:2] = 0

In [None]:
display(make_image(mimg_crop_r_t))
display(make_image(mimg_crop_g_t))
display(make_image(mimg_crop_b_t))

In [None]:
mimg_crop_rgb_t = mimg_crop_g_t.clone()

mimg_crop_rgb_t[:, 32:] += mimg_crop_r_t[:, :-32]
mimg_crop_rgb_t[:, :-32] += mimg_crop_b_t[:, 32:]

display(make_image(mimg_crop_rgb_t))

In [None]:
display(make_image(mimg_crop_t[:,:,0]))
display(make_image(mimg_crop_t[:,:,1]))
display(make_image(mimg_crop_t[:,:,2]))

In [None]:
(mimg_crop_t[:,:,0] - mimg_crop_t[:,:,1])

In [None]:
(mimg_crop_t[:,:,0] - mimg_crop_t[:,:,1]) > 100

In [None]:
rgtg_idx = (mimg_crop_t[:,:,0] - mimg_crop_t[:,:,1]) > 80
rgtb_idx = (mimg_crop_t[:,:,0] - mimg_crop_t[:,:,2]) > 80

red_idx = rgtg_idx & rgtb_idx
not_red_idx = ~red_idx

In [None]:
mimg_idx_bool_t = mimg_crop_t.clone()
mimg_idx_bool_t[not_red_idx] = tensor((0,0,0))

display(make_image(mimg_idx_bool_t))

In [None]:
mimg_idx_bool_t = zeros_like(mimg_crop_t)
mimg_idx_bool_t[red_idx] = mimg_crop_t[red_idx]

display(make_image(mimg_idx_bool_t))