## Perceptron 

The task of this exercise is to reimplement and test the perceptron algorithm from UE 06 for verifying the authenticity of banknotes. Rewrite the program and replace basic Python with NumPy as much as possible. 

A revised version of the Perceptron algorithm from UE 06 is shown below. Use the dataset `./data/banknote.csv`. To read the data into a NumPy array call 

```python
    Z = np.loadtxt('./data/banknote.csv', delimiter=',')
```

<br>

**Task**

+ Load the data

+ Display information about the data set (see below)

+ Randomly split the data into a training set (80 %) and test set (20 %)

+ Train a model using the Perceptron algorithm on the training set

+ Evaluate the error rate of the trained model on the test set

<br>

**Note:** Before training a model, it is important to explore the data you want to learn from. Display some general statistics such as the number of examples, number of features, number of classes, mean, standard deviation, minimum, and maximum of each feature, and so on.


---
### Perceptron Algorithm

The perceptron algorithm described here assumes class labels of $\pm 1$. A feature vector is denoted by $\mathbf{x} \in \mathbb{R}^n$ and its corresponding class label by $y \in \{\pm 1\}$.

A perceptron with weight vector $\mathbf{w} \in \mathbb{R}^n$ and bias $b \in \mathbb{R}$ classifies a point $\mathbf{x} \in \mathbb{R}^n$ to class $y$ according to the rule

$$
    f_{\mathbf{w}, b}(\mathbf{x}) = \begin{cases}
    +1 & \mathbf{w}^{t}\mathbf{x} + b \geq 0\\
    -1 & \text{otherwise}
    \end{cases},
$$

where $\mathbf{w}^{t}\mathbf{x} = w_1x_1 + \cdots + w_n x_n$ is the dot product of $\mathbf{w}$ and $\mathbf{x}$. The prediction $f_{\mathbf{w}, b}(\mathbf{x})$ is correct if $f_{\mathbf{w}, b}(\mathbf{x}) = y$. Otherwise, there is a misclassification.

The goal of learning is to find suitable parameters $\mathbf{w}$ and $b$ such that the learned function $f_{\mathbf{w}, b}$ misclassifies as few training examples as possible. 

The perceptron algorithm describes how suitable parameters $\mathbf{w}$ and $b$ can be learned.

**Perceptron Algorithm**

1. Initialize the weight vector $\mathbf{w}$ and bias $b$
2. Repeat for `max_iter` times
    1. For each training example $(\mathbf{x}, y)$:
        1. calculate $\hat{y} = f_{\mathbf{w}, b}(\mathbf{x})$
        2. update $\mathbf{w}$ and $b$ according to the rule
        
            $\mathbf{w} \leftarrow \mathbf{w} + \eta(y-\hat{y}) \mathbf{x}$

            $b \leftarrow b + \eta(y-\hat{y})$
        
        where $\eta \in [0, 1]$ is the learning rate (step size).
    2. Output the classification accuracy over all training examples
3. Return $\mathbf{w}$ and $b$.

In **Step 1**, the weights $\mathbf{w}$ and the bias $b$ are initialized with random values between $[-\sigma, +\sigma]$. Use a suitable NumPy function for this purpose. In **Step 2**, the algorithm is repeated for a predetermined number of iterations (`max_iter`). In each iteration, all training examples are visited, the prediction is calculated, and the weight vector and bias are updated. After each iteration, the classification accuracy on the training set is calculated and displayed. In **Step 3**, the learned weight vector and bias are returned.

You can use the following parameter setting: 

```
    sigma = 0.1	
    num_epochs = 10	
    eta = 0.1
```

---
### Imports and settings

In [21]:
import numpy as np
from sklearn.model_selection import train_test_split


filename = 'data/banknote.csv'

data = np.loadtxt(filename, delimiter = ",")

def info_data(X):
    print(f'{"number of lines/data samples: "} {X.shape[0]:>}')
    print(f'{"number of classes: "} {len(np.unique(X[:,4])):>}')
    for i in range(5):
        print(f'{"info for collumn"} {i+1} {":"}')
        print(f'{"mean: "} {np.mean(X[:,i])}')
        print(f'{"standart deviation: "} {np.std(X[:,i])}')
        print(f'{"min: "} {np.min(X[:,i])}')
        print(f'{"max: "} {np.max(X[:,i])}')

info_data(data)
train_data, test_data = train_test_split(data, test_size=0.2, train_size=0.8, random_state=42) #randomly splits the data if random_state = None

number of lines/data samples:  1372
number of classes:  2
info for collumn 1 :
mean:  0.4337352570699707
standart deviation:  2.841726405239481
min:  -7.0421
max:  6.8248
info for collumn 2 :
mean:  1.9223531206393585
standart deviation:  5.866907488387089
min:  -13.7731
max:  12.9516
info for collumn 3 :
mean:  1.3976271172667638
standart deviation:  4.308459093119795
min:  -5.2861
max:  17.9274
info for collumn 4 :
mean:  -1.1916565200437317
standart deviation:  2.100247322449037
min:  -8.5482
max:  2.4495
info for collumn 5 :
mean:  0.4446064139941691
standart deviation:  0.49692207701954094
min:  0.0
max:  1.0


In [80]:
#starting values:
sigma = 0.1
num_epochs = 10
eta = 0.1

def init_values():
    weights = np.random.uniform(low=-sigma, high=sigma, size=(len(data[0,:])-1, 1))
    bias = np.random.uniform(low=-sigma, high=sigma)
    return (weights, bias)

def calc_y(weights, bias, data):
    y_pred = np.dot(data, weights) + bias
    y_pred = np.where(y_pred >= 0, 1, -1)
    return y_pred

def update(weights, bias, X, y_pred):
    weights = weights + eta*(X[:,4] - y_pred) * np.mean(X[:,0:4], axis=1) #wie zum fick soll des gehen?
    bias = bias + eta*(X[:,4] - y_pred)
    return (weights, bias)

weights, bias = init_values()
y_pred = calc_y(weights, bias, train_data[:,0:4])
update(weights, bias, train_data, y_pred)

ValueError: operands could not be broadcast together with shapes (4,1) (1097,1097) 