# Graded Exercise 1

**Date: 25.10.2019**

Welcome to the first graded exercise. In this exercise, you will be tested on three topics you have learned so far: K-means clustering, KNN, and linear regression, along with some general machine learning practices. 

You are asked to fill in the code in a couple of cells throughout the exercise. For each such cell we provided tests which are run along with the cell and save your results to a file. The cells are independent of each other and you will receive points for each individual cell. The tests immediately show you whether your code is correct.

Before you finish, please make sure to **upload two files to Moodle**:
* **graded_exercise_1.ipynb**
* **answers_SCIPER.npz (e.g. "answers_280595.npz")**

Good luck! :-)

<br>

**PLEASE ENTER YOUR SCIPER NUMBER IN THE CELL BELOW.**

In [None]:
# Enter your sciper number here
sciper_number = 123456  # e.g. 123456

In [None]:
# Please do not change the seed, we keep the same seed to evaluate the result.
seed = 123

In [None]:
import random
from matplotlib.image import imread
import numpy as np
import matplotlib.pyplot as plt
from helpers.helper import KMEANSHelper

# Unit tests.
import tests.tests as tests

# set matplotlib to display all plots inline with the notebook
%matplotlib inline
%load_ext autoreload
%autoreload 2

## 1 $K$-means Clustering (6 pts)

In this part, you are asked to understand and complete the K-means algorithm including important concepts you have learned from the lecture and exercise. 

In this graded session, you will apply the K-means algorithm to compress an image. In this case, given a RGB image, you can cluster all pixels into $K$ clusters so that the original image can be compressed into an image with only $K$ colors. 

### 1.1. Initialization  (1 pt)
First, we will initialize the data centers randomly. Fill in the function `init_centers` below. The indices are already shuffled for you. Select the **first K indices** from the shuffled indices and use them to select the centers from the data.

In [None]:
def init_centers(data, K):
    """
    Randomly pick K data samples (i.e. pixels) from the input image as starting points 
    for centers.
    
    input: 
        data: ndarray of shape (N, d) where N is the number of pixels, d is number of features.
        K: int, the number of clusters.
    output:
        centers: ndarray of shape (K, d). Initial cluster centers.        
    """    
    np.random.seed(seed)
    random_idx = np.random.permutation(data.shape[0])
    
    # please select the first K indices from the random_idx and use these indices to select centers from data
    ## >>> YOUR CODE HERE
    centers = ...
       
    return centers

tests.test_kmeans_init_centers(locals())


### 1.2. Computing Distance (3 pts)

Please fill in the function `compute_distance()`.

In [None]:
 def compute_distance(data, centers, K):
    """
    Compute the euclidean distance between each datapoint and each center.
    
    input:    
        data: ndarray of shape (N, d) where N is the number of pixels, d is number of features.
        centers: ndarray of shape (K, d). Centers of K clusters.
        K: number of clusters.
        
    output:
        distance: ndarray of shape (N, K).
    """
    distance = np.zeros((data.shape[0], K))
    for k in range(K):
        
        # please compute the euclidean distances between each pixel and each center
        ## >>> YOUR CODE HERE
        ...
        
    return distance

tests.test_kmeans_compute_distance(locals())

### 1.3. Perform K-Means (2 pts)

Find the closest cluster to your data samples using the distances you just computed. Fill in the `find_closest_cluster()` function.

You can fill in this function even if you haven't completed `complete_distance()` yet, as they are graded individually.

In [None]:
def find_closest_cluster(distance):
    """
    Assign cluster labels to pixels according to minimum input distance.
    
    input:
        distance: ndarray of shape (N, K). 

    output:
        labels: ndarray of shape (N,) where each value means the cluster label that is assigned to the pixel.
    """
    
    # please assign labels to all datapoints 
    ## >>> YOUR CODE HERE
    labels = ...
    return labels

tests.test_kmeans_find_closest_cluster(locals())

You are done with Part 1! Just run the cells below and see the compressed image result.

You can now move on to Part 2: K-NN.

In [None]:
def kmean(data, K, max_iter):
    """
    Main function that combines all the former functions together to build the K-means algorithm.
    
    Input: 
        data: ndarray of shape (N, d) where N is the number of pixels, d is number of features.
        K: int, the number of clusters.
    
    output:
        center: ndarray of shape (N, d). Final cluster centers.
        labels: ndarray of shape (N,) where each value means the cluster label that assigned to the data.
    """
    centers = init_centers(data, K)
    for i in range(max_iter):
        old_centers = centers
        distance = compute_distance(data, old_centers, K)
        labels = find_closest_cluster(distance)
        centers = KMEANSHelper.compute_centers(data, labels, K)
        if np.all(old_centers == centers):
            break
    return centers, labels

In [None]:
# we can see how K-Means works for compressing image
img = imread('road.jpg') # jpg
img_size = img.shape
# print ('The shape of input image is: ', img_size)

# Reshape it to be 2-dimensio
X = img.reshape(img_size[0] * img_size[1], img_size[2])
# Run the Kmeans algorithm
K = 10
centers, labels = kmean(X, K, 100)
# print(centers.shape, labels.shape)

# Use the centroids to compress the image, clip it to image range
X_compressed = centers[labels]
X_compressed = np.clip(X_compressed.astype('uint8'), 0, 255)

# Reshape X_recovered to have the same dimension as the original image 300 * 400 * 3
X_compressed = X_compressed.reshape(img_size[0], img_size[1], img_size[2])

# Plot the original and the compressed image next to each other
fig, ax = plt.subplots(1, 2, figsize = (12, 8))
ax[0].imshow(img)
ax[0].set_title('Original Image')
ax[1].imshow(X_compressed)
ax[1].set_title('Compressed Image with %d colors'%(K))
for ax in fig.axes:
    ax.axis('off')
plt.tight_layout()

## 2. kNN (9 points)

### 2.1 Weighted $k$-Nearest Neighbors Classifier (5 pts)



Traditional $k$-NN assigns as label to a given example the most popular label from its surroundings. The method is very intuitive, and can be summarized as:
- Compute the distance between the example to classify and all the training examples.
- Select the closest $k$ training examples.
- Assign to the example the most common label among those neighbors.

However, in the **weighted k-NN**, instead of majority vote of the nearest neighbor labels, we assign a weight to each nearest neighbor that is inversely proportional to its distance from query point. Then to predict the label of the query point, the weights of the neighbors belonging to same label is summed up. The label with maximum sum is declared as query's label.

Formally, for a query point $x_{q}$, $(x_{i},y_{i}) \in \text{neighborhood}_{k}(x_q)$ for i =1,2...k are k-nearest neighbours with label $y_i \in [0,M-1]$ where $M$ is number of labels/classes , then the predicted label $y_{q}$ with weighted k-NN is given by

\begin{align}
        \text{scores}(v) &=  \sum_{(x_i,y_i)\in \text{neighborhood}_{k}(x_{q})}  w_{i} \times I(v,y_i)  \\
        w_{i} &= \frac{1.0}{\text{distance}(x_{q},x_{i})}\\
        y_q &= \underset{v}{\operatorname{argmax}} \text{scores}(v) 
\end{align}

where 
$$
I(v,y_i) = \begin{cases} 1, v=y_{i} \\ 0, v \neq y_i \end{cases} \\
v \in [0, M-1]
$$

**You are asked to implement the function below to predict label of a query point given the k- nearest neighbour labels and distances from them.**

In [None]:
def predict_label_with_weighted_distance(neighbor_labels, neighbor_distances, num_classes):
    """
    Input:
      neighbor_labels : labels of k-nearest neighbours with shape:(N,)
      neighbor_distances: distances of k-nearest neighbours from query point with shape:(N,)
      num-classes: num of classes/labels in the dataset, a scalar value

    Output:
      predicted_label: label of the query point predicted using weighted k-NN algorithm, a scalar value 
      w: weights of the neighbors with shape:(N,)
    """

    # save the final sum of distances for each class/label
    scores = np.zeros(num_classes, dtype=np.float32)  
    # save the weight of each neighbor
    w = np.zeros(len(neighbor_labels), dtype=np.float32)

    for  j in range(len(neighbor_labels)):   
        ## >>> YOUR CODE HERE
        w[j] = ...
        scores[neighbor_labels[j]] = ...   
    
    predicted_label = np.argmax(scores)
    return predicted_label, w

# Test your implementation.
tests.test_knn_weighted(locals())

### 2.2  Leave-one-out-cross-validation (1 pt)

K-fold is a type of CV technique that we'll see here. We'll split data in K parts and use kth part for validation, whereas k-1 for training. This process will be repeated K times.

Leave-one-out-cross-validation (LOOCV) which is special case of k-fold CV. Please complete the below cell to find the number of folds in LOOCV for the training set.

In [None]:
def num_folds_LOOCV(train_data):

    """
    train_data : size (NxD), N data points each of dimension D

    Output:
        num_folds :  return number of folds for Leave-one-out-cross-validation

    """
    num_folds = ...
    return num_folds

# Test your implementation.
tests.test_num_folds_LOOCV(locals())

## 3 Linear Regression (6 points)

### 3.1. Min-max normalization (2 pts)

Write the `min-max normalization()` function that takes a dataset, a max value and a min value. Also fill in the function `find_min_max_values()` that lets you find the minimum value and maximum value of each **feature**.

Let's recall how the min max normalization works. We have a dataset of $N$ samples and $D$ features. Feature $d$ (where $d=1,...,D$) of data sample $x_i$ (where $i=1,...,N$) is denoted as $x_i^{(d)}$. The normalized feature $d$ of $x_i$ is denoted as $\tilde{x}^{(d)}_i$.

The min-max normalization formula is:
$$\tilde{x}^{(d)}_i = \frac{x^{(d)}_i - x^{(d)}_{min}}{x^{(d)}_{max}-x^{(d)}_{min}}$$

where 

$$ x^{(d)}_{min} = \min_{i=1}^N x^{(d)}_i \\
x^{(d)}_{max} = \max_{i=1}^N x^{(d)}_i 
$$

Keep in mind that your data does not have a bias feature, so you do not have to account for this.

In [None]:
def min_max_normalization(dataset, min_value, max_value):
    """ Normalizes the dataset by linearly scaling its features
    to range [0, 1].
    
    Args:
        dataset (np.array): Dataset, shape (N, D), N is number of 
            data samples, D is number of features.
        min_value (np.array): Per-feature min value, shape (D, ).
        max_value (np.array): Per-feature max value, shape (D, ).
    
    Returns:
        np.array: Normalized dataset, shape (N, D).
    """

    ### YOUR CODE HERE
    ds_normalized = ...
    return ds_normalized
    ###
    
def find_min_max_values(dataset):
    """ Finds the minimum and maximum value for each feature.
    
    Args:
        dataset (np.array): Dataset, shape (N, D), N is number of 
            data samples, D is number of features.
    
    Returns:
        min_value (np.array): Per-feature min value, shape (D, ).
        max_value (np.array): Per-feature max value, shape (D, ).
    """
    
    ### YOUR CODE HERE
    min_value = ...
    max_value = ...
    return min_value, max_value
    ###

# Test your implementation.
tests.test_normalization(locals())

### 3.2. The Most and the Least Influential Feature (2 pts)

Assume we have normalized our data, added a bias term (column of ones as the 0'th feature), and then trained it with linear regression. Let us consider what the weights of our trained model mean. 

Can you, by looking at the  weights, tell us the least influential (feature that affects the label the least), and the most influential features? Fill in the `find_least_and_most_influential_features()` function to return the feature indices.

**When finding the results, ignore the bias feature. This means that you should never return the bias feature index 0 as a result.**

In [None]:
def find_least_and_most_influential_features(w):
    """ Finds the least and most influential feature indices and returns them. 
    
    Args:
        w (np.array): Weights, shape (D, ), where D is number of features.
    
    Returns:
        least_inf (int): Least influential feature index.
        most_inf (np.array): Most influential feature index.
    """
    
    #YOUR CODE HERE
    least_inf =  ...
    most_inf =  ...
    return least_inf, most_inf

# Test your implementation.
tests.test_influential_features(locals())

### 3.3. Positively Correlated and Negatively Correlated Features (2 pts) 

By looking at the model weights, can you write a function that returns the features which have a positive correlation with the label and the features which have a negative correlation with the label? Return them as numpy arrays. Again, ignore the bias feature.

Hint: `np.nonzero()` or `np.where()` might be useful!

In [None]:
def find_positively_correlated_negatively_correlated(w):
    """ Finds indices of features that are positively correlated with the label and
        negatively correlated with the label.
    
    Args:
        w (np.array): Weights, shape (D, ), where D is number of features.
    
    Returns:
        negatively_corr (np.ndarray): Negatively correlated feature indices.
        positively_corr (np.ndarray): Positively correlated feature indices.
    """
    #YOUR CODE HERE
    negatively_corr = ...
    positively_corr = ...
    return negatively_corr, positively_corr

# Test your implementation.
tests.test_correlated_features(locals())


**Don't forget to upload your answers file and this completed jupyter notebook!**