Osnabrück University - Machine Learning (Summer Term 2016) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack

# Exercise Sheet 01: Concept Learning

**IMPORTANT NOTE:** The provided PDF only contains information on how to get Jupyter to run. You will need it to open the `*.ipynb` file which contains the rest (also the important part) of the sheet. [Assignment 0](#Assignment-0:-Setup-your-homework-environment) will walk you through that process.

## Introduction

This is a part of the first exercise sheet, you will find the rest in the `*.ipynb` file. The homework sheets will usually be available on Tuesdays and are supposed to be solved in groups of three. They have to be handed in before Monday morning of the following week. The exercises are then presented to your tutor in a small feedback session. To acquire the admission for the final exam, you will have to pass $N-2$ of the weekly provided exercise sheets.

Sign up for a group on Stud.IP (See `Participants` -> `Functions/Groups`). The times mentioned there are the times for the feedback session of your group. If none of them fits, send any of the tutors an e-mail so we can try to arrange something. 

Your group will have a group folder in Stud.IP under `Documents`. Upload your solutions there to hand them in.

All exercise sheets will use [Jupyter Notebooks](http://jupyter-notebook.readthedocs.org/en/latest/notebook.html). To be able to run these on your system, you will need to install Python and a few packages. We suggest you to use the latest version of Python 3. In case you are not familiar with it, follow the directives below ([Assignment 0a)](#a%29-Install-Python)) to get it up and running. [Assignment 0b)](#b%29-Run-Jupyter-Notebooks) on this sheet will provide details on how to run the notebooks with Jupyter.

We will offer an open help session if you need help with installing and getting things to run: On **Thursday, April 14, 2016 between 12:30 and 16:00** you will find some tutors in **93/E42** who try to help you.

## Assignment 0: Setup your homework environment

### a) Install Python

To be able to run Jupyter Notebooks you will need Python. Follow this exercise to get everything up and running.

#### UNIX (e.g. Ubuntu)

The following commands will install Python and the components required to build some of the packages we will use.

```sh
sudo apt-get install build-essential python3-dev python3
pip3 install --upgrade pip
pip3 install jupyter numpy matplotlib
```

#### MacOS

We recommend using homebrew to install Python.

```sh
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install python3
pip3 install --upgrade pip
pip3 install jupyter numpy matplotlib
```

#### Windows

Go to [https://www.python.org/downloads/windows/](https://www.python.org/downloads/windows/) and download the _Latest Python 3 Release_. Install it and make sure that _Add to PATH_ is checked during the installation.

Open your command line (`START` → `cmd.exe`). Type the following commands:

```sh
pip install --upgrade pip
pip install jupyter numpy matplotlib
```

If some of those installations fail, check if `pip` produces output. Otherwise `'pip' is not recognized as an internal or external command, operable program or batch file.`. If that is the case, rerun the installation and check "Add to PATH" or try restarting your computer. In the other cases it might be you have problems with compiling the packages. Try to find them on [http://www.lfd.uci.edu/~gohlke/pythonlibs/](http://www.lfd.uci.edu/~gohlke/pythonlibs/), download them and install them with: 

```sh
pip install *.whl
```


### b) Run Jupyter Notebooks

After you installed Python and Jupyter verify you are able to run the notebook server by opening your command line, navigate to the directory where you downloaded the `sheet01.ipynb` to, e.g. `~/university/ML2016` or `C:\Users\Documents\University\ML2016` and run jupyter in that directory.

```sh
cd ~/university/ML2016
jupyter notebook
```

Usually a browser window should open up. If not, open your favorite webbrowser and navigate to [localhost:8888/tree](localhost:8888/tree). (In some browsers there is a small display bug with $\LaTeX$ output: After each equation there will be a trailing `|`. Affected is e.g. Google Chrome.)

You will be presented with a list of files, choose `sheet01.ipynb`: You are good to go now and can start working on your homework right away!

If you experience any troubles, remember to step by at the help session on Thursday, April 14, use the Stud.IP forum or send us an e-mail - we are always happy to help.

## Assignment 1: Candidate Elimination (by Hand) [6 Points]

Candidate Elimination is a learning algorithm that, in each step, tries to generate a description which is consistent with all previously observed examples in a training set. That description could hypothetically then be used to classify examples outside the training set.

Consider the following situation:

Earl and Fran have made it their mission to visit as many amusement parks as possible in the coming summer term. However, to maximize their enjoyment and not have any unnecessary arguments break out, they make a list of previous park visits and if they would go there again, to have a few criteria to decide if a park is worth their time.

This is the set of attributes along with their possible values Earl and Fran came up with:

| Attribute           | driving distance | ticket price      | rollercoasters | dinosaurs |
|---------------------|------------------|-------------------|----------------|-----------|
| **Possible Values** | short / far      | cheap / expensive | many / none    | yes / no  |

This is Earl and Fran's accumulated data from previous visits. The list will allow you to come to a learning decision which properties have to be fulfilled such that the two will enjoy a visit to an amusement park.

| Sample No. | driving distance | ticket price | rollercoasters | dinosaurs | go again? |
|------------|------------------|--------------|----------------|-----------|-----------|
| 1          | far              | cheap        | many           | no        | yes       |
| 2          | short            | expensive    | many           | no        | yes       |
| 3          | far              | expensive    | none           | yes       | no        |
| 4          | short            | cheap        | none           | yes       | no        |
| 5          | short            | cheap        | many           | yes       | yes       |

### a)

Apply Candidate Elimination to the samples 1-5 below and provide the version space boundaries $S_n$ and $G_n$ after each new training sample.  

### b)

Provide the complete version space bounded by $S_2$ and $G_2$.

### c)

To what kind of amusement park should Earl and Fran go?

## Assignment 2: Candidate Elimination (in Python) [10 Points]

In the following Python code there are four places marked with 

```python
# TODO: ...
``` 

where you have to add some code to make the Candidate Elimination work. Finish the code to automate the decision making for Earl and Fran.

In [None]:
# maximally general hypothesis
G = [('?', '?', '?', '?')]
# maximally specific hypothesis
S = [('0', '0', '0', '0')]

# attribute values
AV = (['short', 'far'], ['cheap', 'expensive'], ['many', 'none'], ['yes', 'no'])

# samples
D = [ 
    {'sample': ('far',   'cheap',     'many', 'no' ), 'positive': True },
    {'sample': ('short', 'expensive', 'many', 'no' ), 'positive': True },
    {'sample': ('far',   'expensive', 'none', 'yes'), 'positive': False},
    {'sample': ('short', 'cheap',     'none', 'yes'), 'positive': False},
    {'sample': ('short', 'cheap',     'many', 'yes'), 'positive': True }
]

In [None]:
def consistent(hypothesis, sample):
    """
    Checks if a general hypothesis is consistent with a sample.
    """
    return all([hypothesis[i] == sample[i] or hypothesis[i] == '?' for i in range(len(hypothesis))])

In [None]:
def more_general(a, b):
    """
    Checks if a is more general than b.
    """
    # TODO: check if a is more general than b
    return True

In [None]:
def more_specific(a, b):
    """
    Checks if a is more specific than b.
    """
    # TODO: check if a is more specific than b
    return True

In [None]:
for d in D:
    if d['positive']:
        G = [g for g in G if consistent(g, d['sample'])]
        for s in S:
            if not consistent(s, d['sample']):
                S.remove(s)
                
                # Add to S all minimal generalizations h of s
                # TODO: change h = ('?', '?', '?', '?') such that it assigns the minimal generalization h 
                # instead of the most general hypothesis
                h = ('?', '?', '?', '?')
                if consistent(h, d['sample']) and any([more_general(g, h) for g in G]):
                    S.append(h)

                # Remove from S any hypothesis that is more general than another hypothesis in S
                for s2 in S:
                    if any([more_general(s2, s3) and not s2 == s3 for s3 in S]):
                        S.remove(s2)

    else:
        S = [s for s in S if not consistent(s, d['sample'])]
        for g in G:
            if consistent(g, d['sample']):
                G.remove(g)
                
                # Add to G all minimal specializations h of g
                for ai in range(len(AV)):
                    if g[ai] == '?':
                        h = list(g)
                        h[ai] = AV[ai][1 - AV[ai].index(d['sample'][ai])]
                        h = tuple(h)
                        if not consistent(h, d['sample']) and any([more_specific(s, h) for s in S]):
                            G.append(h)
                
                # Remove from G any hypothesis that is less general than another hypothesis in G
                # TODO: remove the hypotheses as mentioned above:

    print('Sample: {} {}\nG: {}\nS: {}\n'.format('+' if d['positive'] else '-', d['sample'], G, S))

## Assignment 3: Inductive Bias [4 Points]

### a) 

What is an inductive bias? Describe in your own words!

### b)

Which of the learning algorithms you heard about in the lecture (Candidate Elimination and Find-S) has the stronger bias?