# DL4NLP SS17 Home Exercise 01
----------------------------------

## Task 1 Setup (1P)
Install Python 3, numpy and Jupyter on your machine.

## Task 2 Perceptron Learning
### Task 2.1 Sigmoid Activation Function (1P)
When optimizing functions, first or higher-order derivatives (gradients, Hessians) are of major importance. In neural network learning, we typically want to minimize weight parameters so that the difference between the net output and the true labels is minimized, e.g.:

\begin{equation}
  \text{min}_\mathbf{w}\sum_{j=1}^N \Bigl(\sigma(\mathbf{x}_j \cdot \mathbf{w})-y_j\Bigr)^2
\end{equation}

Here, $\sigma$ is an activation function. A frequently used activation function is the *sigmoid* function, defined as:

\begin{equation}
  \text{sig}(x) = \frac{1}{1+\exp(-x)}
\end{equation}

Show that:

\begin{equation}
    \text{sig}'(x) = \text{sig}(x) \cdot \bigl(1-\text{sig}(x)\bigr)
\end{equation}

You may find the chain rule useful: $f(g(x))' = f'(g(x))\cdot g'(x)$

### Task 2.2 Perceptron Learning by Hand (2P)
A simple perceptron learning algorithm was introduced in the lecture (slide 88). Here is the weight update rule again for reference:
\begin{equation}
    w' \leftarrow w - \alpha \sum_{(\mathbf{x},y)\in\mathcal{T}'} \Bigl(\sigma(\mathbf{x} \cdot \mathbf{w}) - y\Bigr) \cdot \sigma'(\mathbf{x} \cdot \mathbf{w}) \cdot x^T
\end{equation}
The weight update rule is designed to minimize the square loss between the perceptron output and the target labels (see slide 87).

#### a) Training

Train a perceptron using the abovementioned algorithm and report the weight vector $w_j$ after each weight update. Run one epoch (one training pass over the training data) with the following parameters:
* activation function $\sigma = \text{sig}$ (see Task 2)
* initial weight vector $w_0 = (-1, 1)^T$
* learning rate $\alpha = 1$
* batch size $N'=1$, i.e. one weight vector update per data point $(\mathbf{x}, y)$
* training data $T$:

| $j$ | $x_1$ | $x_2$ | $y$ |
|----:|------:|------:|----:|
|  1  | -1.28 |  0.09 |  0  |
|  2  | 0.17  |  0.39 |  1  |
|  3  | 1.36  |  0.46 |  1  |
|  4  | -0.51 | -0.32 |  0  |

#### b) Evaluation

Compute the square loss $L$ before (using $w_0$) and after training (using $w_4$) on the following test data:

| $j$ | $x_1$ | $x_2$ | $y$ |
|----:|------:|------:|----:|
|  1  | -0.50 | -1.00 |  0  |
|  2  |  0.75 |  0.25 |  1  |

Square loss is defined as (see Task 2 and slide 87):
\begin{equation}
    L = \sum_{j=1}^N \ell(\mathbf{x}_j, y) = \sum_{j=1}^N (\sigma(\mathbf{x}_j \cdot \mathbf{w}) - y_j)^2
\end{equation}

### Task 2.3 Decision Boundary and Plotting (1P)
A perceptron learns a linear decision boundary. The activation function $\sigma(\mathbf{x} \cdot \mathbf{w})$ used throughout this exercise corresponds to a decision boundary $x_1 \cdot w_1 + x_2 \cdot w_2 = 0$ (Hesse normal form).

Create a plot with [matplotlib](https://matplotlib.org/contents.html) that shows:
* the training and test data points from Task 2.2
* the decision boundaries before and after training

In [2]:
import matplotlib.pyplot as plt
import numpy as np

#create training data
training_data_input_list = [[-1.28, 0.09, 0],
                            [ 0.17, 0.39, 1],
                            [ 1.36, 0.46, 1],
                            [-0.51,-0.32, 0]]
training_data = np.matrix(training_data_input_list)

w0 = np.matrix([[-1],[1]])

def sigmoid(x):
    return 1/(1+np.exp(-1*x))

def w_update(weight_vector, data_point_shape_n_1, activation_function):
    x_vector = data_point_shape_n_1[0,:-1]
    print(x_vector)
    y_scalar = data_point_shape_n_1[0,-1]
    w = weight_vector
    print(y_scalar)
    
    activation = activation_function(np.dot(x_vector,w).astype('float64'))[0,0]
    print(activation)
    
    erg = w - (activation - y_scalar) * (activation*(1-activation))* x_vector.transpose()
    print(erg)
    print(x_vector.shape)
    return 0



for i in range(1):
    #choose one random sample with from training data
    w_update(w0, training_data[0,:], sigmoid)

    

[[-1.28  0.09]]
0.0
0.797380153569
[[-0.83509919]
 [ 0.98840541]]
(1, 2)
