## Exercises for workshop 4

Even though we do not have a strong focus on mathematics in this workshop series, we encourage you to get familiar with some mathematical topics since machine learning in general is highly depending on them. If you want to you can use the following materials provided by the Standford university to review or refresh your mathematical knowledge:
- Linear algebra http://cs229.stanford.edu/section/cs229-linalg.pdf
- Probability theory http://cs229.stanford.edu/summer2020/cs229-prob.pdf

In [1]:
import math as m
import numpy as np

We have 10 samples from the cities Berlin, Munich, Stuttgart, Nuremberg, Hamburg, Hannover, Augsburg, Halle, Fürth and Ingolstadt.

To make things easier we include the bias term in our feature array by adding a 1 to every sample of longitude and latitude data. We get the following feature, label and weight arrays:

In [2]:
features = np.array([[1, 52.5167, 13.3833], [1, 48.1372, 11.5755], [1, 48.7761, 9.1775], [1, 49.4539, 11.0775], 
                     [1, 53.55, 10], [1, 52.3744, 9.7386], [1, 48.3717, 10.8983], [1, 51.4828, 11.9697], 
                     [1, 49.4783, 10.9903], [1, 48.7636, 11.4261]]) # [biasTerm, lat, lng]

labels = np.array([0, 1, 0, 1, 0, 0, 1, 0, 1, 1]) # is in Bavaria


weights = np.array([1,.5,.5])

In [4]:
def sigmoid(z):
    return 1.0 /(1 + np.exp(-z))

In [5]:
def predict(features, weights):
    return sigmoid(np.dot(features, weights))

In [6]:
predict(features, weights)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

### Task 1:
You are given the mentioned parametric function. Compute the binary cross entropy loss for every sample

In [7]:
def BCE(features, weights, labels):
    # TODO
    predictions = predict(features, weights)
    loss1 = -labels*np.log(predictions)
    loss2 = (1-labels)*np.log(1-predictions)
    BCE = loss1 - loss2
    return BCE

In [8]:
BCE(features, weights, labels)

array([3.39642118e+01, 3.97459843e-14, 2.99775453e+01, 2.64233080e-14,
       3.27855569e+01, 3.20546693e+01, 4.95159469e-14, 3.27114489e+01,
       2.73114864e-14, 3.13082893e-14])

In [9]:
def EmpiricalLoss(features, weights, labels):
    N = len(features)
    BiCrEn = BCE(features, weights, labels)
    loss = BiCrEn.sum() / N
    return loss

In [10]:
EmpiricalLoss(features, weights, labels)

16.149343221904143

### Task 2:
Compute the gradient of the empirical loss (also called cost function) with the binary
cross entropy loss function with regards to the weight vector
w.

In [11]:
def Gradient(features,weights,labels):
    # TODO
    gradient = np.dot(features.T, predict(features, weights) - labels)
    return gradient / len(features)

In [12]:
Gradient(features, weights, labels)

array([ 0.5    , 25.87   ,  5.42691])

### Task 3:
Implement the gradient descent method and execute it with a learning rate of 0.001 and about 3000 iterations

In [13]:
def GradientDescent(eta, features, labels, weights, iterations):
    # TODO
    for i in range(iterations):
        weights = weights - eta*Gradient(features, weights, labels)
        if (i % 1000 == 0):
            print("iteration: ", str(i), " loss: ", str(EmpiricalLoss(features, weights, labels)))
    return weights
    

In [14]:
updated_weights = GradientDescent(0.001, features, labels, weights, 3001)
updated_weights

iteration:  0  loss:  15.449124167438217
iteration:  1000  loss:  0.6036136116977191
iteration:  2000  loss:  0.5981876720530922
iteration:  3000  loss:  0.5950049075701982


array([ 1.02499631, -0.15097511,  0.58963827])

### Task 4:
In the following we want to test how good our small modell is performing. For that we need a classifier which determines acording to our probalities given by our modell if a city is located in bavaria. 
After implementing the classifier compute the accuracy of your training data and the given test data. You can also take a look at the actual probabillities. Did the empirical loss improve?

In [15]:
# test data: Würzburg, Rostock, Trier, Rosenheim, Regensburg

test_features = np.array([[1, 49.7944, 9.9294], [1, 54.0833, 12.1333], [1, 49.7567, 6.6414], [1, 47.8561, 12.1289], 
                    [1, 49.0167, 12.0833]]) # [biasTerm, lat, lng]

test_labels = np.array([1, 0, 0, 1, 1]) # is in Bavaria

In [16]:
def classify(features, weights):
    # TODO
    prob = predict(features, weights)
    return np.greater_equal(prob, .5)*1

In [17]:
print("Probabilites: ", predict(features, updated_weights))

print("Classified: ", classify(features, updated_weights))
print("Actual labels: ", labels)

Probabilites:  [0.72861119 0.64171951 0.28341687 0.52258621 0.23806128 0.24232346
 0.53696454 0.57692632 0.50882561 0.59872907]
Classified:  [1 1 0 1 0 0 1 1 1 1]
Actual labels:  [0 1 0 1 0 0 1 0 1 1]


In [18]:
accuracy = sum(labels==classify(features, updated_weights))/len(features)
accuracy

0.8

In [19]:
print(EmpiricalLoss(features, weights, labels))
print(EmpiricalLoss(features, updated_weights, labels))

16.149343221904143
0.5950049075701982


In [20]:
print("Probabilites of test data: ", predict(test_features, updated_weights))

print("Classified test data: ", classify(test_features, updated_weights))
print("Actual labels of test data: ", test_labels)

Probabilites of test data:  [0.34570684 0.50350551 0.07102808 0.72143283 0.67906339]
Classified test data:  [0 1 0 1 1]
Actual labels of test data:  [1 0 0 1 1]


In [21]:
accuracy = sum(test_labels==classify(test_features, updated_weights))/len(test_features)
accuracy

0.6