# Machine Learning Basic Module
Florian Walter, Tobias JÃ¼lg, Pierre Krack

## General Information About Implementation Assignments
We will use the Jupyter Notebook for our implementation exercises. The task description will be provided in the notebook. The code is also run in the notebook. However, the implementation itself is done in additional files, which are imported in the notebook. Please provide your code only in the marked positions of the Python files. The content of a Python file could, for example, look similar as shown below:
```python
def f():
    ########################################################################
    # YOUR CODE
    # TODO: Implement this function
    ########################################################################
    pass
    ########################################################################
    # END OF YOUR CODE
    ########################################################################
```
To complete the exercise, remove the `pass` command and only use space inside the `YOUR CODE` block to provide a solution. Other lines within the file may not be changed in order to deliver a valid solution.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from data_loading import load_data, compute_summary
from perceptron import Perceptron, fit
from plotting import plot_decision_boundary

# Assignment 1: The Perceptron
The Perceptron is one of the simplest forms of a neural network, often referred to as a single-layer binary classifier. It was introduced by Frank Rosenblatt in the late 1950s and can be thought of as a building block for larger neural networks. Essentially, the Perceptron is an algorithm that, given an input vector, can decide whether it belongs to one class or another.

We will use the Perceptron as an example to shortly introduce all major parts of a standard machine learning pipeline, including
- Loading the dataset
- Visualizing the dataset and prediction outcomes
- Implementing the model and auxilary training code
- Evaluating the model's performance


## Loading the Dataset
[Pandas](https://pandas.pydata.org/) is widely used in ML. It has more functionality than we need here but we can still use it to visualize the data.
> **Task 1** Implement the `load_data()` function in [`data_loading.py`](./data_loading.py), then execute the next cell to display it. 

In [1]:
data: pd.DataFrame = load_data("data.csv")
display(data)

NameError: name 'load_data' is not defined

As you can, see each row has an $x$ and a $y$ value as well as a label.
Now imagine you get a new datapoint $(x, y) = (-0.8, -0.3)$, which originates from the same underlying distribution as the dataset displayed above.
Which class does it belong to, $-1$ or $1$? This is the classification problem.

The table above seems to suggest that the elements with negative $x$ and $y$ values belong the $-1$ class and vice versa.
But maybe the small excerpt shown in above does not tell the full story.
We can further inspect the dataset by computing summary statistics.

> **Task 2** Implement the `compute_summary()` function in [`data_loading.py`](./data_loading.py). It should return a dictionary with keys `cnt`, `avg` and `std`, which contain summary statistics for the value counts, the mean value and the standard deviation, grouped by label. Use the pandas functions intended for this purpose. You can find a quick tutorial [here](https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html).

In [None]:
summary = compute_summary(data)
print("Counts")
display(summary["cnt"])
print("Averages")
display(summary["avg"])
print("Standard deviation")
display(summary["std"])


We now have a better overview of the data and it is easy to say with high confidence that $(x, y) = (-0.8, -0.3)$ belongs to the class $-1$.
However this is still not enough: what about the data point $(x, y) = (-1.4, 1.5)$?
Seems like it could easily belong both to $-1$ and to $1$.
Let us visualize the data with [matplotlib](https://matplotlib.org/). Run the cell below and see if you find it easier to classify the data point $(-1.4, 1.5)$.

In [2]:
color = lambda label: "tab:blue" if label == -1 else "tab:orange"
plt.scatter(data["x"].array,
            data["y"].array,
            color = np.vectorize(color)(data["label"]),
)
plt.show()

NameError: name 'plt' is not defined

## Predicting labels with an artificial neuron
Plotting the data can help us decide where the point should go.
However we still can not automate the process, you have to decide visually for each point using this approach, we want the computer to make that decision for us.
Additionally, the visualization approach breaks if your dataset has rows with 10 values per row instead of two.

We can solve this problem with the Perceptron.

>**Task 3** Open [`perceptron.py`](./perceptron.py) and implement the `initialize_weights()`, `activation()`, `predict_forloop()` and `predict_vectorized()` methods.
>- Initialize the weights using numpy's random module
>- Use the sign function as activation function:
>$$
\textrm{sign}(x) = \begin{cases}
1,& \text{if } x \geq 1\\
-1,& \text{if } x < 0
\end{cases}
>$$
>- The `predict_forloop` method should use a python for loop and resemble the following formula:
>$$A\left(\sum_{i=0}^1 w_i*x_i +b\right)$$
>- The `predict_vectorized` method should use numpy functions and resemble the following formula:
>$$A\left(w^Tx+b\right)$$

Once you are done come back here and execute the next cell to see what your randomly initialized network predicts. Try running the cell multiple times to see what happens with different weights.

In [None]:
p = Perceptron()
p.initialize_weights()
colors = np.array(tuple("tab:blue" if p.predict(x, vectorized=True) == -1 else "tab:orange" for x in data[["x", "y"]].to_numpy()))
plt.scatter(data["x"].array, data["y"].array, color = colors)
plt.title(f"{p.weights=}")
plt.show()

# Plotting the Decision Boundary
As you can see, the Perceptron appears to make decision based on whether a sample is positioned on one side or the other of a decision boundary. 

>**Task 4** Open [plotting.py](./plotting.py) and implement the function `plot_decision_boundary()`.
>It takes as inputs a [`matplotlib.axis.Axis`](https://matplotlib.org/stable/api/axis_api.html#matplotlib.axis.Axis) object, as well as the weights and the bias of a Perceptron.
>If you are new to matplotlib, read the [welcome guide](https://matplotlib.org/stable/users/explain/quick_start.html).
When you are done, run the next cell to check that your code is correct. The plotted line should cleanly separate the points into two groups with different colors.
Make sure that plotting the line does not change the scaling of the figure, i.e. do not plot outside the current `x_lim` and `y_lim` of the plot.

*Hint:* This exercise might be harder than it appears. If you plot the line manually, for example by computing the formula of the line and writing it in Python, you will have to handle several different cases (computers do not like division by zero). You can use the matplotlib [`contour`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contour.html) method, which handles these cases for you.

In [None]:
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
p = Perceptron()
p.initialize_weights()
colors = np.array(tuple("tab:blue" if p.predict(x, vectorized=True) == -1 else "tab:orange" for x in data[["x", "y"]].to_numpy()))
ax.scatter(data["x"].array, data["y"].array, color = colors)
plot_decision_boundary(ax, p)

Visualizing data and algorithms is part of the the ML workflow. It helps you understand algorithms and share your results, and is therefore an extremely useful tool to have.
Run the next cell to use your plotting code to gain a better understanding of the weigths and bias in a perceptron. Feel free to run the cell several times and to change the three tuples `bias`, `w1` and `w2`

>**Task 5** Discuss: Can you come up with a geometric interpretation of the weights and bias of the perceptron?

In [None]:
plt.tight_layout()
fig, axs = plt.subplots(3, 3)
#p = Perceptron()
#p.initialize_weights()
bias = (-0.5, 0, 0.5)
w1 = (-1, -0.6666, -0.333)
w2 = (-0.5, 0, 0.5)
def predict_and_plot(ax, p, point_size):
    colors = np.array(tuple("tab:blue" if p.predict(x, vectorized=False) == -1 else "tab:orange" for x in data[["x", "y"]].to_numpy()))
    ax.scatter(data["x"].array, data["y"].array, color = colors, s=point_size)
    plot_decision_boundary(ax, p)
for ax, bias in zip(axs[0], bias):
    bias_old = p.bias
    p.bias = bias
    predict_and_plot(ax, p, 1)
    p.bias = bias_old
for ax, w in zip(axs[1], w1):
    w_old = p.weights[0]
    p.weights[0] = w
    predict_and_plot(ax, p, 1)
    p.weights[0] = w_old
for ax, w in zip(axs[2], w2):
    w_old = p.weights[1]
    p.weights[1] = w
    predict_and_plot(ax, p, 1)
    p.weights[1] = w_old
plt.show()

# Training a Perceptron

Now that we can use our Perceptron to predict labels, we would it to make the correct predictions.
For this we need to adapt (i.e. learn) its parameters (weights & bias).

>**Task 6** Open [`perceptron.py`](./perceptron.py) and implement the `update_step()` method, the `train_epoch()` and the `fit()` methods.

This next task illustrates the core of machine learning, and why it is called this way. First, we define a mathematical object, or model, (the Perceptron in our case) that can solve a problem (in our case: classify data). Then, we find a rule that updates the parameters (weights & bias in our case) of that mathematical object such that it solves the problem correctly. The solution, i.e. the correct parameters, are not given by the programmer but found in a mathematical optimization process. When such models become complex with hundreds, thousands or even billions of parameters (as is the case in modern large language models), it becomes hard to understand what exactly these models base their decisions on. This is the reason why machine learning models are often described as "black boxes", and why you can read articles claiming that scientists do not understand their machine learning models.

In [None]:
p = Perceptron()
p.initialize_weights()
accuracies = fit(p, data[["x", "y"]].to_numpy(), data["label"].array, eta=0.0001, max_epochs=1000, stop_accuracy=0.999)
fig, axs = plt.subplots(2, 1)
predict_and_plot(axs[0], p, 10)
axs[1].plot(accuracies)
plt.show()
print(f"Final accuracy: {accuracies[-1]}\n{p.weights=}\n{p.bias=}")

# Hyperparameter Optimization
One missing aspect of the ML workflow that you have not done yet in this notebook is hyperparameter optimization.
Most ML algorithms have many hyperparameters (like $\eta$ in the perceptron's case) with intricate effects.
Modifying these hyperparameters can affect the learning speed and the performance of an algorithm.
The process of finding the optimal hyperparameters is usually done automatically in a process called hyperparameter optimization.
In the case of the Perceptron, we only have one parameter. We can therefore avoid the complexity of automatic hyperparameter search and find a good parameter manually.
In the previous example we have set $\eta$ to $0.0001$.

> **Task 7** Find a better value for $\eta$. The perceptron should be able to classify with 100 percent accuracy after a few epochs. You can modify the value of $\eta$ by modifying `ETA` in [`perceptron.py`](./perceptron.py).

In [None]:
p = Perceptron()
p.initialize_weights()
accuracies = fit(p, data[["x", "y"]].to_numpy(), data["label"].array, max_epochs=1000, stop_accuracy=0.999)
plt.plot(accuracies)
plt.xlabel("Epochs")
plt.ylabel("Performance")
plt.show()
print(f"{accuracies=}, {p.weights=}, {p.bias=}")

>**Task 8** Share the weights your algorithm found among each other and compare your results. Did you all find the same weights? Are some weights better than other ones and are there unique optimal weights?

# Another Simple Dataset
In the following we define a very simple minimal dataset consisting of only four data points and train a perceptron on it.

>**Task 9** Discuss: Which logical operator is behind this dataset? Does the Perceptron algorithm work for this dataset? What property of this dataset causes the Perceptron algorithm to fail?

Hint: Plotting the dataset & the decision boundary will help you.

In [None]:
data = np.array([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
labels = np.array([-1, 1, 1, -1])
p = Perceptron()
p.initialize_weights()
accuracies = fit(p, data, labels, 1000, 0.999)
plt.plot(accuracies)
plt.show()