# Modelling of Cognitive Processes
## Delta learning 
---
Lesson 06   
29/10/2019   
Pieter Huycke   

# Overview

## Theoretical
- Delta learning: quick recap

## Practical
1. Florence + the machine: a Delta learning tutorial
2. Modelling the blocking effect

# Theory

## The delta rule

In the last theoretical lesson, we considered the mean squared error (MSE) function, which has the following form:

$$E = \frac{1}{n} \sum_{i=1}^{n} (t_{i} - y_{i})^2$$

where $t_{i}$ are the values provided by the supervisor.   
Hence, this type of learning belongs to the category of __supervised learning__.   
Recall that minimizing this function is straightforward: we have to minimize the difference between the predicted values $y_{i}$ and the 'required' values $t_{i}$.

## The delta rule

We apply gradient descent in weight space, expressed mathematically using the following equation:

$$\Delta w_{ij} = - \beta \frac{\partial E}{\partial w_{ij}}$$

Mind that the notation $\frac{\partial f(x,y)}{\partial y}$ refers to the partial derivative of the function $f(x,y)$ with respect to variable $y$.

## The delta rule

Working out this equation algebraically brings us to the following equation:

$$\Delta w_{ij} = \beta_{j} (t_{i} - y_{i}) \frac{\partial}{\partial in_{i}} f(in_{i})$$

Which can be simplified if we use the linear activation function to:

$$\Delta w_{ij} = \beta_{j} (t_{i} - y_{i})$$

# Practical

# 1. Florence + the machine: a Delta learning tutorial

## The unknown artist

Imagine you are listening to the radio, and suddenly a song comes up that you really like.   
After the song, the radio host mentions the song 'Stand by me' by 'Florence + the machine'.   
   
You decide to search them online, and you find the following information...

## Florence + the machine

- English indie rock band
- Formed in London in 2007
- Lead singer: Florence Welch ⬇️

![Image of Florence](./florence.jpg)

## Florence: the modelling aproach

When you now hear _Stand by me_ again, you will be able to conjure up Florence's picture in your mind.   
In MCP terms: you learned an association between two items.   
Please note that encountering one item (the song) will result in the second item (the mental picture of Florence you saw online).

We have already seen these dynamics in the cat-dog model...

## Florence: comparison with the pet detector

Note that the pet detector also worked with specific features.   

![The pet detector](./cat_dog_model.jpg)

**What do we expect here?**   
```input = np.array([0, 1, 1])```

## Florence: comparison with the pet detector

**Model input**
- Unit 1 is **inactive**: does not bite visitors
- Unit 2 is **active**: has 4 legs
- Unit 3 is **active**: has a picture on Facebook

**Model output**   
![Cat](./cat.jpg)

## Florence: the modelling aproach

Mind what happened:
- First, the song was not associated with mental images
- After the Google search, we could picture the singer of this song

How?   
**Learning**

Now, we will represent this learning process in Python 3.   

## Florence: the modelling aproach

Our action plan:

1. Open Spyder 🕸️
2. Open **'ch4_florence_delta_solution.py'**
3. Notice that "blocks" of code are separated by the ```#%%``` character
4. Run these blocks of code by clicking inside this block and pressing ```shift + enter``` (```Ctrl + enter``` for Mac OS)
5. Look at the output
6. Sit back and listen to my explanation of each block!

In [1]:
# import modules
import ch0_delta_learning as delta_learning
import numpy              as np

# alter print options for numpy: suppress scientific printing 
np.set_printoptions(suppress = True)

image_florence   = [.99, .01, .99, .01, .99, .01]     # represents image
song_stand_by_me = [.99, .99, .01, .01]               # represents song

# define a weight matrix exclusively filled with zeros
weight_matrix = delta_learning.initialise_weights(image_florence, 
                                                  song_stand_by_me, 
                                                  zeros      = True,
                                                  predefined = False, 
                                                  verbose    = True)

# show me what you got 
print('Our original weight matrix, for now filled with zeros:\n', 
      weight_matrix)

# make a copy of the original weight matrix
original_weight_matrix = np.copy(weight_matrix)

Using zeros to fill the array...

Our original weight matrix, for now filled with zeros:
 [[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]


## The weight matrix

Mind that the weight matrix looks different than the one we used in lesson 4.   
To help you understand how our matrix looks like, we show this photo to make it clearer:

![Weights](./weights.jpg)

In [2]:
# activation associated with the all zero weight matrix
activation_original = delta_learning.internal_input(image_florence,
                                                    weight_matrix)[0]
print('\nActivation levels at output for the original weight matrix:\n', 
      activation_original)


Activation levels at output for the original weight matrix:
 [0.5, 0.5, 0.5, 0.5]


In [3]:
loops = 1000
alpha = 1.5
    
for loop_var in np.arange(1, loops + 1):
    weights_after_learning = delta_learning.weight_change(alpha,
                                                          image_florence,
                                                          song_stand_by_me,
                                                          weight_matrix)
    weight_matrix = weights_after_learning


print('\nOur altered weight matrix after {} trials of delta learning:\n'.format(loops), 
      weight_matrix)


Our altered weight matrix after 1000 trials of delta learning:
 [[ 1.47000591  0.01417701  1.40351737  0.01356316  1.34274688  0.01301225]
 [ 1.47000591  0.01417701  1.40351737  0.01356316  1.34274688  0.01301225]
 [-1.47000591 -0.01417701 -1.40351737 -0.01356316 -1.34274688 -0.01301225]
 [-1.47000591 -0.01417701 -1.40351737 -0.01356316 -1.34274688 -0.01301225]]


In [4]:
# activation associated with this altered weight matrix
activation_after_learning = delta_learning.internal_input(image_florence,
                                                          weight_matrix)[0]
print('\nActivation levels at output after {} trials of delta learning:\n'.format(loops), 
      np.round(activation_after_learning, 3))


Activation levels at output after 1000 trials of delta learning:
 [0.985 0.985 0.015 0.015]


# 2. Modelling the blocking effect

## Basic classical conditioning

We consider the following situation

![CS1 + US](./CS1_US.jpg)

Here, the played sound is the **first conditioned stimulus (CS1)**.   
We pair the sound with an electrical shock, which is referred to as the **unconditioned stimulus (US)**.   
The reaction our subject has to the shock is often referred to as the **unconditioned response (UR)**.

## Basic classical conditioning

After pairing CS1 and US multiple times, the UR (confused screaming) will become the **conditioned response (CR)**.   
Thus, the sound will elicit the screaming even though no shock was administered.

![CR after conditioning](./CS1_conditioned.jpg)

## The blocking effect

Now we do extra conditioning, but we show CS1 together with a **second conditioned stimulus (CS2)** (e.g. a strong light).   
After conditioning, CS1 + CS2 will also lead to confused screaming:

![CS1 + CS2 + US](./CS1_CS2_US.jpg)

## The blocking effect

Interestingly, when we show CS2 alone, this will not lead to the CR [(Kamin, 1967)](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19680014821.pdf).

![Reaction to CS2](./CS2_alone.jpg)

It appears that a subject is not able to learn that the light also predicts the shock.   
In other words: the learning of the CS2 - US association is _blocked_ because the CS1 - US assocation already exists...

## Modelling the blocking effect

Now, we ask you to prove the blocking effect using a model.   
You will have to do this yourself, relying on the code provided for exercise 1.   
The questions asked below might help you out.

* How many units does your model need?
    * Input layer
        * The model can encounter two different stimuli: **only sound** and **sound + light**
        * Sound and light can be seen as two different units: if the unit is switched off, the stimulus is not available
    * Output layer
        * Only two outputs: aversion or no aversion --> this is doable with one unit

## Modelling the blocking effect

* Make a weight matrix to start with
* Use delta learning to learn the outcome associated with **CS1**
* Use delta learning to learn the outcome associated with **CS1 + CS2**
   * Importantly, make sure that you use the weight matrix obtained from the previous step as a starting point
* Does the end result prove blocking? Why / why not?
   * How can you check investigate whether the blocking occured or not?

# 3. Delta learning: DIY with the iris dataset

## Why scikit-learn?

Let's take a look at their own definition of what this package is about:

>   sklearn is a Python module integrating classical machine
    learning algorithms in the tightly-knit world of scientific Python
    packages (numpy, scipy, matplotlib).   
    It aims to provide simple and efficient solutions to learning problems
    that are accessible to everybody and reusable in various contexts:
    machine-learning as a versatile tool for science and engineering.
    
Thus, ```scikit-learn``` is about machine learning, and can be used for larger scale problems.   
The drawback is that it solves problems in a 'black box' kind of way: the way how we get to the solution is not always clear.   
We will use this module for this lesson and the next practical because it it allows us to do modelling, and it is quite famous in science.

## Iris dataset?

In this exercise, we will use the iris dataset [(Fisher, 1936)](https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.1469-1809.1936.tb02137.x).   
More specifically, we will use this dataset to **predict the species** of the flower **based on the features** of the flower.
   
The dataset consists of 150 rows, where each row represents measurements of 150 different flowers.   
Each flower is different, but they all belong to the same family: "iris".   
There are 3 different species in the dataset, so we have 50 different flowers for each family.

**The data available** 🌹  
* Features of the flower
    * Sepal width
    * Sepal length
    * Petal width
    * Petal length
* The name of the flower
    * Iris setosa
    * Iris virginica
    * Iris vericolor

## An example of the provided features

![iris features](./iris.jpg)

**Our question**   
What iris _type (setosa, virginica or versicolor?)_ is this based on the provided measures?

In [5]:
# import modules
import pandas  as     pd
from   sklearn import datasets

# import the Iris flower dataset
iris        = datasets.load_iris()
X           = iris.data
y           = iris.target
class_names = iris.target_names

# glue data together
y           = np.reshape(y, 
                         (150, 1)) 
data_shown  = np.concatenate((X, y), 
                             axis = 1)
iris_visual = pd.DataFrame(data_shown)

# make column names
colnames            = ['sep len', 'sep wid', 
                       'pet len', 'pet wid',
                       'family']
iris_visual.columns = colnames

In [6]:
# show me the way (first 10 rows)
print('First 5 observations:\n', iris_visual[:5])
print('\nLast 5 observations:\n',iris_visual[-5:])

First 5 observations:
    sep len  sep wid  pet len  pet wid  family
0      5.1      3.5      1.4      0.2     0.0
1      4.9      3.0      1.4      0.2     0.0
2      4.7      3.2      1.3      0.2     0.0
3      4.6      3.1      1.5      0.2     0.0
4      5.0      3.6      1.4      0.2     0.0

Last 5 observations:
      sep len  sep wid  pet len  pet wid  family
145      6.7      3.0      5.2      2.3     2.0
146      6.3      2.5      5.0      1.9     2.0
147      6.5      3.0      5.2      2.0     2.0
148      6.2      3.4      5.4      2.3     2.0
149      5.9      3.0      5.1      1.8     2.0


## ...?

Our goal is to predict the family based on the provided features.   
So, if we see the following:

```python
In [9]: X[10,:]
Out[9]: array([5.4, 3.7, 1.5, 0.2])

In [10]:y[10]
Out[10]: 0
```

We know that flower 11 has a sepal length of 5.4 cm, sepal width of 3.7 cm ... .   
We also know that flower 11 belongs to family 0 (i.e. setosa).

Ideally, our model would be able to predict the family based on the features for every flower.   
So, if we give the model the features for flower 62:

```python
In [16]: X[61,:]
Out[16]: array([5.9, 3. , 4.2, 1.5])
```

we want to output of the model to be equal to 1 (i.e. versicolor), which is the observed family for flower 62.

## The modelling perspective

So, why the iris dataset?   
When doing computational modelling, we might be interested in the processes behind object recognition.   
In that case, we might train a model that is able to recognize flowers based on certain flower characteristics.   
Additionally, we might even go further, and model how someone becomes an expert in recognizing flowers, what happens when we presents other objects to a flower expert ...   

Now that the reason we use the iris dataset is (hopefully) clear, we move on to the actual exercise.

Our action plan:

1. Open **'ch4_iris_exercise.py'** 
2. Go through the script step by step and fill in the ```...``` spread throughout the code.
    * Load the iris dataset, and select the flower features (named ```X```), and the labels ( named ```y```)
    * Binarize the data: all observations that belong to class 1 should be relabeled so that they belong to class 2
        * Think it through, why can we only handle two labels?
    * Let the model learn 100 times
        * Confused about the ```Perceptron``` object? No worries: ask us, or read through [the docs](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html)
    * Train the model by providing the correct arguments to the ```classification_algorithm.fit``` code
    * Check the model accuracy by completing the ```classification_algorithm.predict``` code
    * Print some statistics: how well did your model perform?

In [7]:
# import: general and scikit-learn specific
import numpy                 as np

from sklearn                 import datasets
from sklearn.linear_model    import Perceptron
from sklearn.metrics         import accuracy_score
from sklearn.model_selection import train_test_split

In [8]:
# import the Iris flower dataset
iris        = datasets.load_iris()
X           = iris.data
y           = iris.target

# binarize the data: we relabel 1 to 2
   # thus, the flower is either class 0 or class 2
y[np.where(y == 1)] = 2

# split data in training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state = 20)

In [9]:
# define classifier (Perceptron object from scikit-learn)
classification_algorithm = Perceptron(max_iter         = 100,
                                      tol              = 1e-3,
                                      verbose          = 0,
                                      random_state     = 20,
                                      n_iter_no_change = 5)

# fit ('train') classifier to the training data
classification_algorithm.fit(X_train, y_train)

# predict y based on x for the test data
y_pred = classification_algorithm.predict(X_test)

In [10]:
# select wrong predictions (absolute vals) and print them
compared       = np.array(y_pred == y_test)
absolute_wrong = (compared == False).sum()
print("Our classification was wrong for {0} out of the {1} cases.".format(absolute_wrong, 
                                                                          len(compared)))


# print accuracy using dedicated function
print('Accuracy percentage: {0:.2f}'.format(accuracy_score(y_test, y_pred) * 100))

Our classification was wrong for 0 out of the 38 cases.
Accuracy percentage: 100.00
