# Machine Learning (Summer 2018)

## Practice Session 10

June, 19th 2018

Ulf Krumnack

Institute of Cognitive Science
University of Osnabrück

## Plan for the next sessions

* New exercises: Sheet 11
* Classifiers

# Classifiers

* datasets
* k nearest neighbor
* lines
* linear classifiers

## Generating a dataset

A dataset for classification consists of two parts:
* a list of feature vectors, usually denoted by $x$
* a list of corresponding class labels, usually denoted as $c$, $y$, or $t$

Exercise:
1. Generate a 2-dimensional dataset consisting of two classes (positive and negative examples),
   both parts being normally distributed (use `np.random.multivariate_normal`). The result should be of shape (N,3), with the last column `data[:,-1]` providing the labels (either 0 or 1).
2. Plot your dataset, showing both classes in different colors.

In [None]:
# YOUR CODE HERE

In [None]:
import numpy as np
import matplotlib.pyplot as plt

n0 = 50
mean0 = [0, 0]
cov0 = [[1, 0], [0, 12]]

n1 = 40
mean1 = [6, 10]
cov1 = [[1, 0], [0, 12]]

# Create dataset of shape (n1+n2,3)
#data=
# YOUR CODE HERE

assert data.shape == (n0+n1,3), "data has invalid shape {}".format(data.shape)

plt.figure()
plt.axis('equal')
# YOUR CODE HERE
plt.show()

## Nearest Neighbor Classification

*Exercises:*
1. Implement a Euclidean distance function (`euclidean_distance`).
1. implement a function `nearest_neighbor`, that finds the nearest neighbor for a given point in your dataset
1. plot your result (indicating the point and the nearest neighbor). Try different coordinates for `p`

In [None]:
import numpy as np

# YOUR CODE HERE

p = np.asarray([1,3])
q = np.asarray([4,7])

# Check your results for the points (1,3) and (4,7) - distance should be 5.
assert np.round(euclidean_distance(p,q), 3) == 5., "distance between {} and {} is wrong: {}".format(p,q,euclidean_distance(p,q))

In [None]:
def nearest_neighbor(data, predict):
    # data is of shape (N,3):
    #   data[i,0:2] are features, data[i,2] is the value
    # predict is of shape (2,)
    #   the features of a new data point
    # YOUR CODE HERE

In [None]:
p = np.asarray((3,5))
nn = nearest_neighbor(data,p)

plt.figure()
plt.title("new point {} -> nearest neighbor {}".format(p,nn))
plt.axis('equal')
# YOUR CODE HERE

Exercise: Now implement $k$-nearest neighbor.

Hint: you may use a list to collect neighbors and `sorted()` to find the nearest ones.

Question:
* does increasing $k$ mean that the accuracy goes up?

In [None]:
def k_nearest_neighbors(data, predict, k=3):
    """
    data of shape (N,3)
    predict of shape (2,)
    k - the number of neighbors
    
    """
    # YOUR CODE HERE

p = np.asarray((3,5))
neighbors = k_nearest_neighbors(data,p,k=5)

plt.figure()
plt.title("new point {} -> nearest neighbors".format(p,nn))
plt.axis('equal')
plt.scatter(data[:,0],data[:,1], c=data[:,2])
plt.plot(*p, '*', c='red')
# YOUR CODE HERE
plt.show()

<a id="lines"></a>
## Lines

Lines (and hyperplanes) play a crucial role in many machine learning approaches (e.g. as linear separatrices). 

In school, lines are usually represented as functions

$$y = m\cdot x + y_0$$

Exercise:
1. Plot a line using matplotlib (on the interval [-10,10])
1. What do the two parameters $m$ and $y_0$ specify?
1. Where does the line intersect with the $x$- and the $y$-axis?
1. How to check if a point $\vec{p}=(x,y)$ is on/above/below the line?

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

m = .5
y0 = 3

# YOUR CODE HERE

YOUR ANSWER HERE

#### A more general description of a line

However, this representation has some disadvantages:
* it can not express vertical lines
* it is not obvious how to generalize to more dimensions

Hence one uses a more general form:

$$ a\cdot x + b\cdot y + c = 0 $$

Exercises:
1. Draw the line for the given values of $a,b,c$. Also try different values.
1. What parameters do you have to choose for horizontal and vertical lines? Can you draw them with your code?
1. Use the values $m$ and $y_0$ from the previous example to initialize $a,b,c$ to get the same line as in that example.
1. There are many triples $(a,b,c)$ that describe the same line. Can you find two of them? Can you give a criterion to check if two triples are equivalent?

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

a = 1
b = 2
c = 3

# YOUR CODE HERE

plt.figure()
plt.ylim([-10,10])
plt.plot(x, y)
plt.show()

#### A line specified by a normal vector

Using vector notation, $\vec{n} = (a,b)$ and $\vec{p} = (x,y)$ one can state the equation

$$ a\cdot x + b\cdot y + c = 0 $$

more compact as

$$\langle \vec{n},\vec{p}\rangle + c = 0$$

where $\langle \_,\_ \rangle$ denotes the inner product (dot product).

Exercises:
1. Show that $\vec{n}$ is a normal vector, i.e., that it is orthogonal to the line.
1. Can you locate the point $\vec{p}_0$ on the line that is closest to the origin?
1. Plot the line and the point $\vec{p}_0$ on the line.
1. What interpretation can be given to the value $c$?


YOUR ANSWER HERE

YOUR ANSWER HERE

In [None]:
# ad 3.
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

n = np.asarray([.3,.5])
c = 3

# Compute point p0 = ...
# YOUR CODE HERE

def my_line2(n,c):
    if abs(n[1])> abs(n[0]):
        x = np.linspace(-10,10,2)
        y = -(n[0]*x + c)/n[1]
    else:
        y = np.linspace(-10,10,2)
        x = -(n[1]*y + c)/n[0]
    return x,y

x,y = my_line2(n,c)

plt.figure()
plt.axes().set_aspect('equal')
plt.ylim([-10,10])
plt.plot(x, y)
plt.plot(*p0,'*')
plt.text(*p0,'({:4.2f},{:4.2f})'.format(p0[0],p0[1]))
plt.plot(0,0, '*k') # origin
plt.annotate(s='', xy=p0, xytext=(0,0), arrowprops=dict(arrowstyle='<->'))
plt.text(*(.5*p0),'d={:4.2f}'.format(c/np.sqrt(n.dot(n))))
plt.show()

YOUR ANSWER HERE

#### A line specified by normal vector and point

Instead of providing the value $c$ one could specify a line by the normal $\vec{n}$ and one point $\vec{p}$ on that line.

Exercises:
1. How can you recover the value $c$ from $\vec{n}$ and $\vec{p}$?
1. Plot the point $\vec{p}$, the normal $\vec{n}$, the line, the origin, and the point $\vec{p}_0$ into one graph.

YOUR ANSWER HERE

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

p = np.array([1, 3])
n = np.array([1, -3])

# YOUR CODE HERE

x,y = my_line2(n,c)

plt.figure()
plt.axes().set_aspect('equal')
plt.ylim([-10,10])
plt.plot(*o,'*k')
plt.plot(*p,'or')
plt.arrow(*p, *n, fc='m', ec='m', head_width=.3, head_length=.4)
plt.plot(x, y)
plt.plot(*zip(o,p0),'g')
plt.show()

### The higher dimensional case

* A $D$-dimensional space is separated into two parts by a hyperplane
  (i.e. a $(D-1)$-dimensional subspace)
* A hyperplane can be described by a point and a normal vector.
* In a $2$-dimensional space, a hyperplane is just a $1$-dimensional subspace (i.e. a line).
* In a $3$-dimensional space, a hpyerplane is just a $2$-dimensional subspace (i.e. a plane).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

point  = np.array([1, 2, 3])
normal = np.array([1, 1, 2])

# a plane is a*x+b*y+c*z+d=0
# [a,b,c] is the normal. Thus, we have to calculate
# d and we're set
d = -point.dot(normal)

# create x,y
xx, yy = np.meshgrid(range(10), range(10))

# calculate corresponding z
z = (-normal[0] * xx - normal[1] * yy - d) * 1. /normal[2]

# plot the surface
plt3d = plt.figure().gca(projection='3d')
plt3d.plot_surface(xx, yy, z)
plt.show()

## Euclidean classifier

*Exercise*:
1. Implement the Euclidean classifier
1. Apply it to your dataset
1. Visualize the result
1. Classify some datapoint and add it to your plot 

In [None]:
def euclidean(data):
    # YOUR CODE HERE
    return w,p

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

n, p = euclidean(data)

plt.figure()
plt.axes().set_aspect('equal')
plt.axis('equal')

# YOUR CODE HERE
plt.show()

## LDA

*Exercise*:
1. Implement the LDA (ML-09, slide 11) Hint: you may use `np.cov`, `np.linalg.inv`, and `np.dot` (`@`)
1. Apply it to your dataset (make sure, your dataset fullfills the conditions)
1. Visualize the result
1. Classify some datapoint and add it to your plot 

In [None]:
def LDA(data):
    # YOUR CODE HERE

    return w,p

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

n, p = LDA(data)

plt.figure()
plt.axes().set_aspect('equal')
plt.axis('equal')
# YOUR CODE HERE
plt.show()