# Machine Learning - Supervised Methods
# The Learning Problem

## 1. Data Handling

* **a) Load the "banana" data set provided in the moodle course and load the data file into scikit-learn as follows:**

In [1]:
import numpy as np

def loadCSV(filename):
    f = open(filename)
    data = np.loadtxt(f, delimiter=',')
    X = data[:, 1:]
    y = data[:, 0]
    return X, y

X, y = loadCSV("banana.csv")

**Output the number $N$ of data points and the dimension $d$ of the input space $X = \mathbb{R}^d$. Which label space $Y$ does this classification problem use? **

* **b) Create a numpy array of predictions according to the rule
$$ \hat y = h(x) = \begin{cases}
					+1 & \text{if } x_1 \leq 0 \\
					-1 & \text{if } x_1 > 0 \\
				\end{cases} $$
Compute the error rate, implemented by *sklearn.metrics.zero_one_loss*.**

* **c) Split the data into 50% "training" and 50% "test" data. For each subset output the number of points and the distribution of label values as fractions of the subset size. **

## 2. The Perceptron (Implementation)

* **a) Load the "workers" data set provided in the moodle course. It consists of two files for inputs and labels/targets, which can be loaded with the function *numpy.load*. Print number of data points and the shape of of the input vectors.**

* **b) The input vectors are actually RGB images with resolution $200 \times 100$, making for a total of $200 \cdot 100 \cdot 3 = 60000$ numbers (features) per images. The task is to classify the images into two categories: images with and without construction workers. This information is used to improve safety when operating large construction machines. Display a few images using the following function to get an impression of the task.**

In [98]:
%matplotlib inline
import matplotlib.pyplot as plt
def showimage(x):
    plt.imshow(x)
    plt.show()

* **c) Reshape the data so that all 60000 features form a "flat" vector. Afterwards the data should be a two-dimensional array.**

* **d) Implement the Perceptron algorithm, which works as follows:**

infinite loop:<br/>
&nbsp; &nbsp; find index $i \in \{1, \dots, N\}$ such that $y_i \cdot w^T x_i \leq 0$<br/>
&nbsp; &nbsp; if no such index exists then stop<br/>
&nbsp; &nbsp; update weight vector: $w \leftarrow w + y_i \cdot x_i$

* **e) Test the perceptron on the following toy data. It should give a solution close to some positive multiple of $w = (1, 0)$ and $w_0 = -1$.**

In [102]:
X_dummy = np.array([[0.99, -1], [1.01, -1], [0.99, 1], [1.01, 1]])
y_dummy = np.array([-1,+1,-1,+1])

* **f) Split the worker data into two equal halves. Train the on the first half of the data. Make predictions on the second half and count the number of wrongly predicted labels. Output the error rate. Comment on the quality of the classifier - do you expect it to be useful for the task at hand? **

## 3. The Perceptron (Concepts)

** We can measure the "quality" of the prediction of the perceptron by $y_i \cdot w^T x_i$: the prediction is correct if and only if this value is positive.**

* **a) Updating the weight vector with a labeled point $(x_i, y_i)$ improves the prediction for that point. By which amount is the value of the linear function improved?**

* **b) What happens to $w^T x_j$ when updating $w$ with $(x_i, y_i)$, i.e., with a different point? Does the prediction of $(x_j, y_j)$ necessarily improve?**

* **c) What happens if the perceptron is run on data that is not linearly separable?**