# Perceptron

In this session we will implement the Perceprton algorithm from scratch. 

The data we will use to test our implementation is a collection of movie reviews, each associated with a rating. The data comes in a pre-processed form, with the features extracted, and has the following format:

```
target feature_1:feature_value1 features_2:value_2 ...

```

For example:

```
1 0:2.0 3:4.0 123:1.0
```
This means the example's target label is 1, features 0 is 2.0, feature
3 is 4.0, feature 123 is 1.0 (all the other features are implicitly
0.0). The features are word counts.

We will start by writing some code to work with this data.


## Exercise 1

We want to work with binary labels (positive vs negative) but we have integer ratings. We will convert the ratings to binary labels using 5 as a threshold: is the rating is higher, the label will be positive, otherwise it'll be negative.

Define function ``binarize``. The function should accept a list of numeric
ratings, and return a list of class labels -1 and 1.


In [1]:
def binarize(y):
   

y = list(range(0,11))
print(binarize(y))

[-1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1]


## Exercise 2

Define function ``parse_line`` which extracts the target label and
feature values of a training example from a string. Your function should return two values:

- a dictionary mapping features (as ints) to values (as floats)
- the target label (as int)

In [2]:
def parse_line(line):

    
line = '1 0:2.0 3:4.0 123:1.0\n'
print(parse_line(line))

({0: 2.0, 123: 1.0, 3: 4.0}, 1)


## Reading the data

Once we have these two functions, we can read our dataset from the input file.

In [3]:
import random
def prepare_data():
    """Read data from input file, shuffle and return a list of training examples."""
    train = []
    with open("sentiment.feat") as inp:
        for line in inp:
            xy = parse_line(line)
            train.append(xy)
    # We will shuffle the examples so that we have a mixture of positive and negative cases
    indexes = list(range(0, len(train)))
    SEED = 4096
    random.seed(SEED)
    random.shuffle(indexes)
    X, Z = list(zip(*train))
    Y = binarize(Z)
    XY = [ (X[i], Y[i]) for i in indexes ]
    return XY

In [4]:
XY = prepare_data()

The variable ``XY`` contains the list of examples, where each example is a 
tuple with the input in the first position and the target in the second position.


## Vector operations

Our inputs are word counts and therefore sparse: most feature values are zero. For this reason we are representing these feature vectors as Python dictionaries where we will record the non-zero values, and treat the zero values as implicit. We now need to implement the vector operations needed for the Perceptron algorithm for the dictionary representation.

## Exercise 3

Define the function ``dot`` which calculates the dot (or inner) product of two
vectors. This function should work on vectors represented as
dictionaries: any missing key in the dictionary is implicitly equal to
0.0. In order to compute the dot product, you need to multiply the
values at the corresponding keys together, and sum all the results.
This function can assume that the vector with more non-zero entries
(i.e. dictionary with more keys) will be the first argument. This
is useful for efficiency.


In [5]:
def dot(big, small):


u = {0:0.5, 1:2.0, 2:-2.5}
v = {0:-0.5, 2:2.5, 3:0.5}
print(dot(u, v))

-6.5


## Exercise 4

Define function ``increment`` which modifies a vector by adding
another vector to it. The two vectors are given as dictionaries:

- `u` - the vector to be modified (as a dictionary)
- `v` - the vector (as a dictionary) which should be added to `u` 

This function should not return anything, but it should modify `u` so
that it contains the union of the keys present in the two vectors. The
value at each key should be the sum of the values at this key in the
two vectors. Remember that if a key is missing from the dictionary
representing the vector, the value is implicitly equal to 0.0.

Note: a function (like ``increment``) which changes one of its inputs is called **destructive**.

In [6]:
def increment(big, small):

        
u = {0:0.5, 1:2.0, 2:-2.5}
v = {0:-0.5, 2:2.5, 3:0.5}
increment(u, v)
print(u) 
u = {0:0.5, 1:2.0, 2:-2.5}
w = {}
increment(u, w)
print(u)


{0: 0.0, 1: 2.0, 2: 0.0, 3: 0.5}
{0: 0.5, 1: 2.0, 2: -2.5}


## Exercise 5

Define function ``scale`` which takes a vector `u` (as a dictionary)
and a number `n`, and returns a new vector dictionary which contains
the values in vector `u` multiplied by `n`. This function should not
modify its arguments, but should return a new dictionary. Note: the function
``increment`` combined with the function ``scale`` can be used to
represent vector substraction (``decrement``).

In [7]:
def scale(u, n):


u = {0:0.5, 1:2.0, 2:-2.5}
v = {0:-0.5, 2:2.5, 3:0.5}
n = 2.0
print(scale(u, n))
print(u) # u should be unchanged
u = {0:0.5, 1:2.0, 2:-2.5}
v = {0:-0.5, 2:2.5, 3:0.5}
increment(u, scale(v, -1.0))
print(u)


{0: 1.0, 1: 4.0, 2: -5.0}
{0: 0.5, 1: 2.0, 2: -2.5}
{0: 1.0, 1: 2.0, 2: -5.0, 3: -0.5}


## Perceptron 

We will now start implementing the Perceptron algorithm. We will use a dictionary to keep the weights and the bias of the model. We'll initialize the bias to zero, and the weights to a vector of all zeros (represented as an empty dictionary). During training, we will update the values as we learn.


In [8]:
def initialize():

model = initialize()

## Exercise 6

Define function ``predict`` which takes two arguments: 

- `model` - the dictionary representation of the perceptron model with
  keys 'w' for weights and 'b' for the bias
- `x` - new input (as a dictionary)

It should return the predicted target for the input `x`: it should
compute the discriminant function `wx + b` and predict 1 if it is
greater than or equal to 0, and -1 otherwise.



In [9]:
def predict(model, x):
 
x = u = {0:0.5, 1:2.0, 2:-2.5}
model = initialize()
print(predict(model, x))

1


## Exercise 7

We will now implement the update functionality of the perceptron
algorithm. You need to code the function ``update``, which is given a
training example, and first uses the ``predict`` function to guess
the target, and the updates the weights and the bias of the model
depending on whether the guess is correct or incorrect, and on the
direction of the mistake. Finally, the function should return the
guess. ``update`` is given two arguments:

- `model` - this is the dictionary with keys 'w' (with weights) and
  'b' (with the bias)
- `xy` - this is the pair `(x,y)` where `x` is the input vector (as a
  dictionary) and `y` is the target (-1 or 1, as an int).

Details of the perceptron update rule are shown in the lecture slides
for Session 4. Hints:

- You can use the function  ``predict`` to make the guess.
- When updating the weights, use the function ``increment``
    (possibly with combination with ``scale``) to add the example
    input to (or subtract it from) the model weights.

Note: this function is destructive because it modifies its first argument.

In [10]:
def update(model, xy):
   


x = { 0: 7.0, 1: 4.0, 3: 4.0, 4: 2.0, 5: 2.0, 11: 3.0 }
y = -1
model = initialize()
y_pred = update(model, (x,y))
print(y_pred)
print(model)

1
{'b': -1.0, 'w': {0: -7.0, 1: -4.0, 3: -4.0, 4: -2.0, 5: -2.0, 11: -3.0}}


## Exercise 8

Now you'll implement the function ``learn`` will processes each
training example, generates a guess, and makes an update (using the
``update`` function from Task 6). Finally it will return the list of
guesses made. This function is given 2 arguments:

- `model` - the dictionary representing the perceptron model
- `XY` - the list of the training examples, where each example is a
  tuple `(x, y)`, `x` being the input vector dictionary and `y` the
  target (1 or -1)


You can implement this function following these steps:

- Initialize the list of guesses to an empty list
- For each training example `(x,y)`
  
   - get a guess using the ``update`` function with the `model`
   - add this guess to the list of guesses
   
- Return the complete list of guesses.


In [11]:
def learn(model, XY):
 


## Exercise 9 

In order to test our algorithm we will define function ``evaluate``, which takes the list of true class labels,
another list with predicted class labels, and returns a tuple with
three elements:

- total number of errors
- total number of labels  
- error rate

In [12]:
def evaluate(gold, predicted):
   
    
y_true = [-1,-1,1,1,1]
y_pred = [-1,1,1,1,-1]
print(evaluate(y_true, y_pred))		
print(evaluate(y_true, y_true))


(2, 5, 0.4)
(0, 5, 0.0)


## Running

Let's do a pass of perceptron learnng over the first 20000 examples.

In [13]:
XY_train = XY[:20000]
Y_true = [ xy[1] for xy in XY_train ]
model = initialize()
Y_pred = learn(model, XY_train)

Let's check how good the guesses made during training were.

In [14]:
print(evaluate(Y_true, Y_pred))

(5033, 20000, 0.25165)


## Multiple passes

We can run the model over the data a few times and monitor the performance on the second part of the data.


In [15]:
def run(XY, passes=1):
    XY_train = XY[:20000]
    XY_dev   = XY[20000:]
    Y_train = [ xy[1] for xy in XY_train ]
    Y_dev   = [ xy[1] for xy in XY_dev ]
    model = initialize()
    print("{:>3s} {:>7s} {:>7s}".format("Pass", "err_tr", "err_dev"))
    for i in range(1, passes+1):
        predicted_train = learn(model, XY_train)
        _, _, rate_train = evaluate(Y_train, predicted_train)
        predicted_dev = [ predict(model, x) for (x,_) in XY_dev ]
        _, _, rate_dev = evaluate(Y_dev, predicted_dev)
        print("{:3d} {:7.3f} {:7.3f}".format(i, rate_train, rate_dev))
        
run(XY, passes=5)        

Pass  err_tr err_dev
  1   0.252   0.207
  2   0.179   0.239
  3   0.153   0.232
  4   0.139   0.202
  5   0.128   0.143
