Osnabrück University - Machine Learning (Summer Term 2016) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack

# Exercise Sheet 09

## Introduction

This week's sheet should be solved and handed in before the end of **Sunday, June 19, 2016**. If you need help (and Google and other resources were not enough), feel free to contact your groups' designated tutor or whomever of us you run into first. Please upload your results to your group's Stud.IP folder.

## Assignment 1: Self-Organizing Maps [6 Points]

In the past you already briefly discussed self-organizing maps. In this exercise you will implement a self-organizing map and use it for a beautiful application: coloring countries having similar statistics in similar colors. 

### Additional Information about the Data

The data is taken from the [World Bank's World DataBank](http://databank.worldbank.org/data/home.aspx) and preprocessed. Since their data is very sparse we just tried to use the latest possible data for each country in each category. This means there can be data from the 1960 but also from 2015 - for the exercise this shouldn't matter too much. Note that some countries don't have data at all.

The [blank map](https://en.wikipedia.org/wiki/File:BlankMap-World6-Equirectangular.svg) is taken from wikipedia. It is an [SVG](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) file which suits this task well: We can easily display it in Jupyter Notebooks and it is very easy to color them, as this just involves a modification of their style sheet. You can find the code to do this below, you just have to figure out how to use it.

### Coloring the Map

The following cell defines a method to create a colored version of the empty map and shows an example usage of it.

As you can see, the mapping parameter is a dictionary mapping lowercase [ISO 3166-2](https://en.wikipedia.org/wiki/ISO_3166-2) country codes to `[R, G, B]` values which range from `0` to `1`.

In [None]:
from IPython.display import SVG, display_svg
from xml.etree import cElementTree as ET
import time

def create_colored_map(color_mapping, scaling=0.7, display=True):
    """
    Takes a color mapping to create a world map with the specified 
    colors.
    For example:
    
    mapping = {'de': [1, 0, 0],
               'fr': [0, 1, 0]}
    create_and_display_colored_map(mapping)
    
    will create a worldmap and display it with Germany ('de') 
    colored red and France ('fr') colored green.
    Colors need to be iterables containing R G B values ranging
    from 0 (dark) to 1 (bright).
    
    The template used for the map can be found here:
    https://en.wikipedia.org/wiki/File:BlankMap-World6-Equirectangular.svg
    
    Args:
        mapping     A color mapping between country codes and colors.
        scaling     Scales the map by this factor. 
        display     If True, the image is displayed, if False, it is
                    returned.
    Returns:
        The svg image if display=False. Else nothing.
    """
    def color_css(color_mapping, map_id):
        """Creates a CSS string for the color mapping."""
        tmpl = '#{4} .{0} {{fill: #{1:0>2x}{2:0>2x}{3:0>2x} !important;}}'
        scale = lambda x : [int(255 * i) for i in x]
        return '\n' + '\n'.join([tmpl.format(country.lower(), *scale(color), map_id) for country, color in color_mapping.items()])

    # Read SVG file and get document root.
    tree = ET.parse('map.svg')
    root = tree.getroot()
    
    # Adjust the ID (Otherwise coloring will be global for all SVG images).
    time.sleep(1)
    root.attrib['id'] = "{}{}".format(root.attrib['id'], str(time.time())[0:10])
    
    # Search the style element and append the color mapping.
    style_element = tree.find('{http://www.w3.org/2000/svg}style')
    style_element.text = style_element.text + color_css(color_mapping, root.attrib['id'])

    # Adjust the image scale.
    root.attrib['height'] = str(float(root.attrib['height']) * scaling)
    root.attrib['width'] = str(float(root.attrib['width']) * scaling)
    
    # Create an SVG instance which can be displayed by Jupyter.
    svg = SVG(data=ET.tostring(root).decode('UTF-8').replace('ns0:',''))
    if display:
        display_svg(svg)
    else:
        return svg

# Example for coloring the map.
mapping = {'de': [1, 0, 0], 
           'fr': [0, 1, 0], 
           'us': [0, 0, 1]}
create_colored_map(mapping, scaling=0.4)

### a) Implement a Self-Organizing Map.

Below is a class definition for a self-organizing map. The initialization is already provided. Follow the instruction below to finish it.

1. Write a method `get_best_matching_index(self, X)` which returns the indices of the node with the weights closest to `X`. Use the `cdist` function to calculate the distances between `X` and all nodes. *Note:* You might need `unravel_index`.

1. Write a method `alpha(self, step, max_steps)` which defines the decaying learning rate. One possible formula for step $s$ of $S$ steps is:
$$\alpha(s, S) = 0.1 \exp\left(-\frac{s}{S-s}\right)$$

1. Write a method `theta(self, u, v, step, max_steps)` which defines the decaying neighborhood function. One possible formula for the coordinates of the best matching node $u$, the coordinates of the node $v$, $n$ the maximum number of nodes in one direction, and step $s$ of $S$ steps is: 
$$\begin{align*}   r &= n \exp\left( -\frac{s \log(n)}{S} \right) \\
  \theta(u, v, s, S) &= \exp\left(-\frac{||u - v||^2}{2r^2}\right)\end{align*}$$

1. Write a function `organize(self, max_steps)` which trains the map for `max_steps` steps. Pick a random data sample $X$, calculate the best matching indices $u$ and update each node $v_i$ (with $w_{v_i}$ being the corresponding weight vector) according to the following formula ($s$, $S$, $\theta$ and $\alpha$ are as above): 
$$\Delta w_{v_i} = \theta(u, v_i, s, S)\ (X - w_{v_i})\ \alpha(s, S)$$

In [None]:
%matplotlib inline
import numpy as np
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt

class SelfOrganizingMap:
    """Implements a self-organizing map."""
    
    def __init__(self, data, map_size=(20,20), method='distance'):
        """
        Creates a grid self.nodes of map_size[0] x map_size[1] 
        many nodes with random weights for each dimension in the 
        data. This means self.nodes.shape will be 
        (map_size[0], map_size[1], data.shape[-1]).
        
        Stores the data in self.data.
        
        Args:
            data        The data to fit with this map.
            map_size    The size of the map. Defaults to 20x20.
            method      The activation method. Supports 'distance' and 'activation'.
        """
        self.nodes = np.random.rand(map_size[0], map_size[1], data.shape[-1])
        self.data = data
        self.method = method
    
    def theta(self, u, v, step, max_steps):
        """
        The neighborhood function. 
        
        Args:
            u           The best matching node's grid coordinates.
            v           Grid coordinates of the (possible) neighboring node.
            step        The current step.
            max_steps   The maximum number of steps.
        
        Returns:
            A weight of how strong the neighborhood relation between
            u and v is at step of max_steps.
        """
        # TODO: Implement this method.
        max_shape = max(self.nodes.shape[0:2])
        r = max_shape * np.exp(-(step * np.log(max_shape)) / max_steps)
        return np.exp(-np.linalg.norm(u - v) ** 2 / (2 * r ** 2))

    def alpha(self, step, max_steps):
        """
        The learning rate. Decays with step.
        
        Args:
            step        The current step.
            max_steps   The maximum number of steps.
        """
        # TODO: Implement this method.
        return 0.1 * np.exp(-step / (max_steps - step))

    def organize(self, max_steps):
        """
        For steps this method organizes the map with its data.
        
        In each step it picks a random sample from the data and
        calculates the best matching node.
        The best matching node's indices are calculated with 
        get_best_matching_index.
        Using the indices of that node, all nodes are
        updated by applying alpha and theta.
        
        Args:
            max_steps   The number of steps.
        """
        # TODO: Implement this method.
        for step in range(max_steps):
            X = self.data[np.random.randint(0, len(self.data))]
            best_match = self.get_best_matching_index(X)

            for row_idx in range(self.nodes.shape[0]):
                for col_idx in range(self.nodes.shape[1]):
                    theta = self.theta(best_match, np.array([row_idx, col_idx]), step, max_steps)
                    alpha = self.alpha(step, max_steps)
                    delta = X - self.nodes[row_idx, col_idx]
                    self.nodes[row_idx, col_idx] += theta * delta * alpha
    
    def get_best_matching_index(self, X):
        """
        Calculates the best matching node for data sample X.
        Depending on the method used (see __init__), a different
        approach is used.
        
        method 'distance': 
            Finds the best matching node by minimal distance:
                argmin(||n-x||)
        other method ('activation'): 
            Finds the best matching node by maximal excitation:
                argmax(nx)
        
        Args:
            X       The data point.
        Returns:
            The grid coordinates of the best matching node.
        """
        # TODO: Implement this method.
        reshaped_nodes = self.nodes.reshape(-1, self.nodes.shape[-1])
        if self.method == 'distance':
            distances = cdist(X[np.newaxis], reshaped_nodes)
            best_matching_1D_index = np.argmin(distances)
        else: # method == 'activation'
            activations = np.sum(np.multiply(X[np.newaxis], reshaped_nodes), 1)
            best_matching_1D_index = np.argmax(activations)
        return np.array(np.unravel_index(best_matching_1D_index, self.nodes.shape[0:2]))

    def __getitem__(self, key):
        """
        Allows to access the nodes via the self-organizing map directly.
        
        som[4,2] 
        is thus same as 
        som.nodes[4,2]
        
        Args:
            key The key (can be a slice or similar).
        Returns:
            self.nodes[key]
        """
        return self.nodes[key]

    def plot(self):
        """Plots the map's first three features as an image."""
        plt.imshow(self.nodes[:,:,0:3], interpolation='none')

### b) Apply the Self-Organizing Map

Now apply your self-organizing map on some data.

We already generate simple color data for you - you can change it if you like.

1. Load `world_data.csv`. We recommend using a `csv.reader` for this, as the first column contains strings: the labels you need to use to accomplish the mapping.

1. The data has some invalid values (np.nan). Use the `Imputer` (check the imports) to fill them.

1. Additionally the data has to be scaled. Use `scale` (check the imports) for this.

1. Create two instances of the `SelfOrganizingMap` and organize them, one for the colors and one for the countries. Take care that both have the same sizes.

In [None]:
%matplotlib inline
import csv
import itertools
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import Imputer as Imputer, scale as scale

# Generate color data.
colors = np.array(list(itertools.product([0, 1], repeat=3)))

# TODO: Read world data.
with open('world_data.csv', 'r') as in_file:
    input_data = list(csv.reader(in_file))[1:]
countries = np.array([[d[0], d[1]] for d in input_data], dtype='str_')
country_data = np.array([d[2:] for d in input_data], dtype='float_')
country_data = scale(Imputer(axis=0).fit_transform(country_data), axis=0)

map_size = (20, 20)
max_steps = 400

# TODO: Create color map and organize it.
som_colors = SelfOrganizingMap(colors, map_size)
som_colors.organize(max_steps)
print('Organized colors by distance.')

som_colors_act = SelfOrganizingMap(colors, map_size, method='activation')
som_colors_act.organize(max_steps)
print('Organized colors by activation.')

# TODO: Create country map and organize it.
som_countries = SelfOrganizingMap(country_data, map_size)
som_countries.organize(max_steps)
print('Organized countries by distance.')

som_countries_act = SelfOrganizingMap(country_data, map_size, method='activation')
som_countries_act.organize(max_steps)
print('Organized countries by activation.')

# Take a look at the results.
plt.figure('SOM')
plt.subplot(221).set_title('Colors by Distance')
som_colors.plot()
plt.subplot(222).set_title('Countries by Distance')
som_countries.plot()
plt.subplot(223).set_title('Colors by Activation')
som_colors_act.plot()
plt.subplot(224).set_title('Countries by Activation')
som_countries_act.plot()

Select the best matching indices for each country from the country map. Use those indices to select the corresponding color from the color map. Create the mapping from ISO codes to colors and use the `create_colored_map` function to produce the colored SVG map.

In [None]:
# TODO: Match countries to colors.
mapping = {}
for i, country_d in enumerate(country_data):
    x, y = som_countries.get_best_matching_index(country_d)
    mapping[countries[i,1]] = som_colors[x,y]

# TODO: Create colored map.
create_colored_map(mapping)

# TODO: Match countries to colors.
mapping_act = {}
for i, country_d in enumerate(country_data):
    x, y = som_countries_act.get_best_matching_index(country_d)
    mapping_act[countries[i,1]] = som_colors_act[x,y]

# TODO: Create colored map.
create_colored_map(mapping_act)

## Assignment 2: Self-Organizing Maps Theory [4 Points]

This exercise will highlight the theoretical differences of SOM's to other algorithms we already took a look at. There is again some research involved if the answers are not directly clear from the slides (or even better: your own ideas!)

### a) How is learning in such a network achieved? (As opposed to techniques used in MLP?) 

### b) In the task above we initialize the nodes randomly - what would be an alternative?

### c) Why are self-organizing maps possibly interesting for cognitive scientists in general?

## Assignment 3: Probability Theory [4 Points]

Consider three bags filled with three types of candy. The table below indicates for each bag how many candies of each type are in each bag.

| contains        || green candy | blue candy | red candy || total |
|-----------------||-------------|------------|-----------||-------|
|                 ||             |            |           ||       |
| **cyan bag**    ||          10 |          4 |         2 ||    16 |
| **magenta bag** ||           5 |          7 |         2 ||    14 |
| **yellow bag**  ||           2 |          2 |         8 ||    12 |
|                 ||             |            |           ||       |
| **total**       ||          17 |         13 |        12 ||    42 |

In the following we denote the bags as $B=\{c,m,y\}$ and the candies as $C=\{r, g, b\}$. So the probability for drawing a blue candy from the cyan bag would be: $P(C=b|B=c)=\frac{4}{16}=0.25$.

### a)

Give the probabilities for the following events:

$$
\begin{align*}
P(C=b|B=m) &= \frac{7}{14} = 0.5 \\
P(C=g|B=y) &= \frac{2}{12} = 0.167 \\
P(C=r) &= \frac{12}{42} = 0.286 \\
P(B=y|C=r) &= \frac{8}{12} = 0.667 \\ 
P(B=c)     &= \frac{16}{42} = 0.381
\end{align*}
$$

Note that there is a difference in interpretation possible for $P(C=r)$ and $P(B=c)$. In the above solution we assumed that we do not explicitly draw from the bags first before determining the color. If you encorporate that, the probabilities change to:

$$
\begin{align*}
P(C=r)'     &= \frac{1}{3}\left(\frac{2}{16}+\frac{2}{14}+\frac{8}{12}\right) = 0.312\\
P(B=y|C=r)' &=  \frac{\frac{1}{3}*\frac{8}{12}}{\frac{157}{504}} = 0.713 \\
\end{align*}
$$

Likewise $P(B=c)$ could also be different: If we don't take a look at the candies inside (i.e. don't assume we draw each candy equally likely but draw from the bags equally likely) the probability can be just set a priori to $P(B=c) = \frac{1}{3}$.

### b) 

Let's assume we draw with the following probabilities from each bag: $P(B=c)=0.2$, $P(B=m)=0.7$, $P(B=y)=0.1$.
What are the probabilities to draw a green, blue or red candy?

$$\begin{align*}
P(C=g) &= \sum\limits_{x \in B}{\left(P(B=x)\ P(B=x|C=g)\right)} = 0.2\frac{10}{16} + 0.7\frac{5}{14} + 0.1\frac{2}{12} = 0.392 \\
P(C=b) &= \sum\limits_{x \in B}{\left(P(B=x)\ P(B=x|C=b)\right)} = 0.2\frac{4}{16} + 0.7\frac{7}{14} + 0.1\frac{2}{12} = 0.417 \\
P(C=r) &= \sum\limits_{x \in B}{\left(P(B=x)\ P(B=x|C=r)\right)} = 0.2\frac{2}{16} + 0.7\frac{2}{14} + 0.1\frac{8}{12} = 0.192 \\
\end{align*}$$

## Assignment 4: Bayes classifier [6 Points]

Consider the following data set. There are four features, running nose ($N$), coughing ($C$), reddened skin ($R$), and fever ($F$), each of which can take the values true ($+$) or false ($-$).

| Diagnosis ID  | $N$ | $C$ | $R$ | $F$ | Classification     |
|---------------|-----|-----|-----|-----|--------------------|
|     $d_1$     | $+$ | $+$ | $+$ | $-$ | positive (ill)     |
|     $d_2$     | $+$ | $+$ | $-$ | $-$ | positive (ill)     |
|     $d_3$     | $-$ | $-$ | $+$ | $+$ | positive (ill)     |
|     $d_4$     | $+$ | $-$ | $-$ | $-$ | negative (healthy) |
|     $d_5$     | $-$ | $-$ | $-$ | $-$ | negative (healthy) |
|     $d_6$     | $-$ | $+$ | $+$ | $-$ | negative (healthy) |

Solve the following problems either by hand or programmatically. Assume all features are independent.

### a)

Determine all probabilities required to apply a Bayes classifier for predicting whether a new person is ill or not.

$$\begin{align*}
P(ill) &= \frac{1}{2} & P(healthy) &= \frac{1}{2} \\
\\
x &= \{N, C, R\}\\
P(x_\oplus)   &= \frac{1}{2} & P(x_\ominus)   &= \frac{1}{2} \\
P(x_\oplus\,|\,ill) &= \frac{2}{3} & p(x_\ominus\,|\,ill) &= \frac{1}{3} \\
P(x_\oplus\,|\,healthy) &= \frac{1}{3} & P(x_\ominus\,|\,healthy) &= \frac{2}{3} \\
\\
P(F_\oplus)   &= \frac{1}{6} & P(F_\ominus)   &= \frac{5}{6} \\
P(F_\oplus\,|\,ill) &= \frac{1}{3} & P(F_\ominus\,|\,ill) &= \frac{2}{3} \\
P(F_\oplus\,|\,healthy) &= 0   & P(F_\ominus\,|\,healthy) &= 1   \\
\end{align*}$$

### b)
Person $p_1$ is coughing and has fever. Person $p_2$ has a running nose and reddened skin. Person $p_3$ is coughing, sufferes from reddened skin and has fever. Determine the probability of being ill for all persons $p1, p2, p3$.

Using an [alternative form of the Bayes' Theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem#Alternative_form):
$$P(A\,|\,B) = \frac{P(B\,|\,A)*P(A)}{(P(B\,|\,A)*P(A)) + P(B\,|\,\neg A)*P(\neg A)}$$

$$\begin{align*}
P_{p_1}(ill\,|\,N_\ominus, C_\oplus, R_\ominus, F_\oplus) 
&= 
\frac{
  \frac{1}{3}\frac{2}{3}\frac{1}{3}\frac{1}{3} * \frac{1}{2}
}{
  \frac{1}{3}\frac{2}{3}\frac{1}{3}\frac{1}{3} * \frac{1}{2} + 
  \frac{1}{3}\frac{2}{3}\frac{1}{3}0 * \frac{1}{2}
}
= \frac{\frac{2}{81}\frac{1}{2}}{\frac{1}{81}+0} 
= 1
\\
P_{p_2}(ill\,|\,N_\oplus, C_\ominus, R_\oplus, F_\ominus) 
&= 
\frac{
  \frac{2}{3}\frac{1}{3}\frac{2}{3}\frac{2}{3} * \frac{1}{2}
}{
  \frac{2}{3}\frac{1}{3}\frac{2}{3}\frac{2}{3} * \frac{1}{2} + 
  \frac{1}{3}\frac{2}{3}\frac{1}{3}1 * \frac{1}{2}
}
= \frac{\frac{8}{81}\frac{1}{2}}{\frac{4}{81}+\frac{1}{27}} 
= \frac{4}{7} 
\\
P_{p_3}(ill\,|\,N_\ominus, C_\oplus, R_\oplus, F_\oplus) 
&= 
\frac{
  \frac{1}{3}\frac{2}{3}\frac{2}{3}\frac{1}{3} * \frac{1}{2}
}{
  \frac{1}{3}\frac{2}{3}\frac{2}{3}\frac{1}{3} * \frac{1}{2} + 
  \frac{2}{3}\frac{1}{3}\frac{1}{3}0 * \frac{1}{2}
}
= \frac{\frac{4}{81}\frac{1}{2}}{\frac{2}{81}+0} 
= 1
\end{align*}$$