Osnabrück University - Machine Learning (Summer Term 2018) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack

# Exercise Sheet 10

## Introduction

This week's sheet should be solved and handed in before the end of **Sunday, June 17, 2018**. If you need help (and Google and other resources were not enough), feel free to contact your groups' designated tutor or whomever of us you run into first. Please upload your results to your group's Stud.IP folder.

## Assignment 1: Classification [8 Points]

In the lecture (ML-09 Slides 7ff) several types of classifiers have been introduced. In this assignment you will explore differences and similarities between them.

### a) LDA

How does the LDA classifier work? What restrictions have to be fullfilled by the data for this method to work and why?

Linear discrminant analysis is a procedure to find lines for which if the data is projected on them the categories (of the data) are separeted best. There number of lines is determined by the number of categories which have to be separeted. The procedure of LDA works as follows: The center of mass of the different categories is computed. The differnces of the center of mass is computed. This vector is divided by the sum of all variances. This gives us a vector whith the desired properties. The vector orthgonal of the center of this vector (in the direction of the projection) is the separation line. 

The data has to be linaer separable. Otherwise LDA will not result in a good separation of the categories.

### b) Nearest Neighbor

How does the nearest neighbor classifier work? When would you use it and how is it trained?

The nearest neighbor classifier is a procedeure which chooses the class for a data point by suming the classes of k nearest neighors divided by the number of those neighbors. For discrete outputs the closest discrete value to the computed one is chosen. K is a parameter of the algorithm. The algorithm can be modified by also accounting for the distance of the k neighbors as a weight (data point minus neighbor). This algorithm is called distance-weighted k-nearest neoghbor algorithm. 

The k-nearest neighbor algorithm is trained very fast (actually no training at all), but requires a lot of memory since all examples have to be memorized all the time for classification. 

### c) Support Vector Machines

Name some differences between a SVM and a MLP. When would you use which?

In tarining time performance the SVM is slower to train than the MLP. This is the case since the SVM has to solve an optimization problem with a high number of variables. Whereas the backpropagation of the MLP is faster to compute. Also MLP is easier to parallelize. The SVM is also best suited for binary classification problems and adaptation to multi-class problems is payed by performance. 

The classification performance of the SVM is better than the MLP. This is because the MLP is trained on the squared error of the training data to the output. While the SVM is trained for an explicit determination of the decision boundaries directly from the training data. 

According to this I would chose the MLP if training performance matters more and SVM if classification performance is more important.

### d) Random forests

Explain in your own words the concept of a *random forest*. What is meant by *bagging of trees* and *bagging of features* and what are the respective benefits? How do radom forests allow for *parallelization* in *training* and *classification*? 

Random forest is built on the algorithm of decision trees. Decision trees are simple classifiers but tend to overfitting. The idea of random forest is to seperate the data in to random chunks and train decison trees on each chunk. The classified output of a random forest is the weighted sum of the output of all decison trees. This method prevents from overfitting of each of those decison trees. 

Bagging of trees refers to the idea of multiple trees (a bag full of trees) according to the random subsets of data. Bagging of features refers to the idea that for each decision tree the beast node is not chosen from all features but also from a random subset of features. 

Since the decision trees can be trained and can classify independent of each other they allow for parallelization. 

## Assignment 2: Reinforcement Learning [12 Points]

In this assignment you will have a look at the Q-Learning algorithm described in the lecture (ML-10 Slide 18). For this we generate a field with random rewards. A learning agent is then exploring the field and learns the optimal path to navigate through it. The code below is again filled with some ``TODO``s that should be filled by you in order to implement the Q-Learning algorithm. 

Below the code there are some questions! You also find a free-code field for a complete own implementation. You may use your own test mazes.

In [6]:
import numpy as np
import numpy.random as rand

def generate_field(x, y, num_rewards, max_reward):
    """
    Generate a random game field with rewards.
    
    Args:
        x (int):            x dimension of the field
        y (int):            y dimension of the field 
        num_rewards (int):  the number of rewards that should be randomly placed
        max_reward (int):   the maximum reward that can be placed 
        
    Returns:
        ndarray: A field with randomly initialized rewards, the rest of the 
        entries is zero
    """
    field = np.zeros((y,x), dtype=np.uint8)
    
    for i in range(num_rewards):
        field[rand.randint(y), rand.randint(x)] = rand.choice(max_reward)
    
    return field

In [None]:
%matplotlib notebook

import numpy as np
import matplotlib.pyplot as plt

class QLearning:
    """
    This class contains all the necessary methods to navigate through
    a maze or game with the help of a little bit of Q-Learning.
    """

    def __init__(self, field, actions, gamma):
        """
        Initializes the QLearning Algorithm with the necessary parameters.
        All q values are stored in self.q - this is an array that has
        ACTIONS x map_x x map_y dimensions to store a value for each action
        in each field. The starting position self.pos is randomly initialized.
        
        Args:
            field (ndarray):  the map
            actions (list):   the available actions
            gamma (float):    the gamma in the lecture slides
        
        Returns:
            QLearning: An instance that can be used for Q-Learning on the field
        """
        # q stores the q_values for each action in each space of the field.
        self.field = field
        self.actions = actions
        self.gamma = gamma
        
        # Remember the map extend for further navigation.
        self.map_y = self.field.shape[0]
        self.map_x = self.field.shape[1]
        
        # Create q value matrix.
        self.q = np.zeros((len(self.actions), self.map_y, self.map_x))

        # Start on a random position in the field.
        self.pos = [np.random.randint(self.map_y), np.random.randint(self.map_x)]


    def get_coordinates(self, position, action):
        """
        Returns the coordinates that follow a certain action, depending
        on the current position of the learner. If the border is reached
        the agent just stops there.
        
        Args:
            position (pair):  the current position
            action (string):  the action that should be performed (one of: 'up', 'down', ...)
            
        Returns:
            pair of int: the updated coordinates
        """
        # return the right new coordinates depending on the position
        # YOUR CODE HERE
        y, x = position
        
        if action == 'up' and x < self.map_x-1:
            x += 1
        elif action == 'down' and x > 0:
            x -= 1
        elif action == 'right' and y < self.map_y-1:
            y += 1
        elif action == 'left' and y > 0:
            y -= 1
        return (y,x)
        
    
            


    def update(self):
        """
        Implementation of the update step. Closely follows the Algorithm described on
        ML-10 Sl.18. Note that you have attributes available as specified in the
        __init__ method of this class, in addition to that is the FIELD variable that
        stores the real field the agent is iterating about, as well as ACTIONS which
        stores the available actions.
        """
        # Select a random action that should be performed next.
        # Be careful to handle the case where you hit the wall!
        # YOUR CODE HERE
        action = rand.choice(self.actions)
        

        # Receive the reward for the new position from the field.
        # YOUR CODE HERE
        reward = self.field[self.pos[0],self.pos[1]]
        
        
        # Update the q-value for the performed action.
        # YOUR CODE HERE
        new_pos = self.get_coordinates(self.pos,action)
        q_max = np.max(self.q[:,new_pos[0],new_pos[1]]) 
        
        #if we hit a "wall" we do not need to update
        if new_pos == self.pos: return
        
        #otherwise we update
        self.q[self.actions.index(action),self.pos[0],self.pos[1]] = reward + self.gamma*q_max

        # Update the position of the player to the new field.
        # YOUR CODE HERE
        
        self.pos = new_pos


    def plot(self):
        """
        Plots the current state.
        """
        fig_player = plt.figure('QLearning State')

        for i, direc in enumerate(ACTIONS):
            plt.subplot(3,3,2*i+2)
            plt.axis('off')
            plt.title(direc)
            plt.imshow(self.q[i,:,:], interpolation = 'None')

        fig_player.canvas.draw()

In [None]:
%matplotlib notebook

import matplotlib.pyplot as plt

# Determine the size of the field, change this parameters as you like
m_x = 5
m_y = 4

steps = 200

ACTIONS = ['up','left','right','down']  # Those are the availabe actions for the QLearning.
FIELD = generate_field(m_x, m_y, 5, 10) # The field that is used for learning.

# Plotting the generated field
figure = plt.figure('Field')
plt.axis('off')
plt.imshow(FIELD, interpolation='none')
figure.canvas.draw()

# Generate a QLearning instance with the right parameters.
# YOUR CODE HERE
GAMMA = 0.2
player = QLearning(FIELD, ACTIONS,GAMMA)


# Now we perform steps many learning iterations on the field with
# the generated QLearning instance.
for i in range(steps):     
    player.update()
    player.plot()

Explain in your own words, how the algorithm works. What is depicted on the resulting plots. How can an action policy be derived from these data?

The Q Learning algorithm works as follows:
The aim is to learn the best action for a current state. But unlike supervised learning their are no actions avaialable for the learning procedure. The best actions are determined by a reward and the best action of the state to which the action will lead us. This is like the problem of the henn and the egg since we do not know the best action of the state the action will lead us. But in Q Learning this solved by randomly chosing an action and updating its value with the best value for a future action found in the future state to which the action will lead us. By this procedure the algorithm figures out the best action for each state in an iterative learning procedure. 

You are also free to write your complete own implementation of the QLearning algorithm (instead of completing the code above). Use the following cell for your implementation.

In [None]:
# YOUR CODE HERE