# LAB #2: Numpy

## Introduction
In this laboratory, you will perform some operation with NumPy arrays in such a way to build your first Machine Learning model. 
In particular, you will build a NumPy-based version of the K-Nearest Neighbors algorithm (a.k.a. KNN).

## 0 Preliminary steps
### 0.1 NumPy
Make sure you have the NumPy library installed, its use is strongly recommended for this laboratory.
NumPy is the fundamental package for scientific computing with Python. You can read more about it on
the official documentation.


In [1]:
! pip install numpy



### 0.2 Iris dataset download 
For this lab, you will need two of the datasets you have already met: Iris and MNIST. Please refer to
Laboratory 1 for a complete description of the datasets.
Iris. You can download it from:
https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

In [2]:
# linux users
# !wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data -O iris.csv
# windows users
! pip install wget
import wget
wget.download("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", "iris.csv")



'iris (2).csv'

## 1 Exercises 
Note that exercises marked with a ($\star$) are optional, you should focus on completing the other ones first.

## 1.1 Iris Analysis with Numpy
As you might remember from Lab. 1, the Iris dataset collects the measurements of different Iris flowers,
and each data point is characterized by 4 **features** (sepal length, sepal width, petal length, petal width) and is associated to 1 **label** (i.e. an Iris species - Setosa, Versicolor, or Virginica) which in this case is the last element of the row (last column of the csv file). 

1. Load the Iris dataset. You can use the `csv` library that we saw in the last laboratory or read it with the standard `open(filename, strategy)`. 
In the second case remember to split correctly the different fields, and avoid new line characters. In any case check for empty lines. 
This time remember to store the 4 features in a numpy array `x` of shape (n_sample, 4) and the labels in a different array `y` of shape (n_sample,) converting the 3 different species to a corresponding numerical value. E.g.,
      - Iris-setosa: 0
      - Iris-versicolor: 1
      - Iris-virginica: 2

In order to check you have correctly loaded the data, print the shape of the two arrays: you should find
(150, 4) for `x` and (150,) for `y`.

In [3]:
import csv
import numpy as np

labels = {
    "Iris-setosa": 0,
    "Iris-versicolor": 1,
    "Iris-virginica": 2
}

x = []
y = []
with open("iris.csv") as f:
    for i, cols in enumerate(csv.reader(f)):
        if cols != []:
            x.append(cols[:4])
            label_num = labels[cols[4]]
            y.append(label_num)
            
x = np.array(x, dtype=np.float32)
y = np.array(y)

print("x.shape:", x.shape)
print("y.shape:", y.shape)

x.shape: (150, 4)
y.shape: (150,)


2. Compute again the mean and standard deviation for each class by means of the numpy functions

In [4]:
import numpy as np
class_means_std = {}
for label in np.unique(y):
    x_class = x[y == label] # selecting the rows with the current label
    means = x_class.mean(axis=0)
    stds = x_class.std(axis=0)
    class_means_std[label] = {"mean": means,
                              "std": stds}
    
    print("Class:", label)
    print("Means:", means)
    print("Stds:", stds)
    print()

Class: 0
Means: [5.0059996  3.4180002  1.464      0.24399997]
Stds: [0.34894693 0.37719488 0.17176732 0.10613199]

Class: 1
Means: [5.936002  2.77      4.26      1.3259999]
Stds: [0.5109834  0.3106445  0.46518815 0.19576517]

Class: 2
Means: [6.5879993 2.9740002 5.5520005 2.0260003]
Stds: [0.6294887  0.31925544 0.5463478  0.27188972]


3. Compute the distances among two samples (e.g., the $36^{th}$ and the $81^{th}$, the $13^{th}$ and the $15^{th}$) 
by means of the `np.linalg.norm(a-b)` function which computes the norm of `a-b`, i.e., the euclidean distance between the feature of the `a` and of the `b` samples. 
  - Can you guess if the two couples of samples belong to the same species?
  - From the mean and standard deviations computed before can you guess which species? 

In [5]:
def euclidean_distance(a, b):
    return np.linalg.norm(a-b)
print("Distance between 36th and 81th sample:", euclidean_distance(x[35], x[80]))
print("Distance between 13th and 15th sample:", euclidean_distance(x[12], x[14]))


Distance between 36th and 81th sample: 2.9086077
Distance between 13th and 15th sample: 1.4317821


It is more likely that the 12th and 14th samples belong to the same species, since they are closer to each other.

In [6]:
print("35th sample:", x[35])
print("80th sample:", x[80])
print("12th sample:", x[12])
print("14th sample:", x[14])


35th sample: [5.  3.2 1.2 0.2]
80th sample: [5.5 2.4 3.8 1.1]
12th sample: [4.8 3.  1.4 0.1]
14th sample: [5.8 4.  1.2 0.2]


The 12th, 14th and 35th sample are likely to belong to the first class, since their values are closer to the mean of the first class.
The 80th sample is likely to belong to the second class, since its values are closer to the mean of the second class.

4. Find the k nearest neighbors of a sample in the dataset.
    - Define a function `k_nearest_neighbors(x, x_set, k)` that takes as input a sample `x` and a set of sample (i.e., a matrix) `x_set` and returns the indices of the `k` nearest neighbors of `x` in `x_set`.
        - Reuse the `euclidean_distance` function that you defined before to do so. 
        - Remember that the `x_set` is a matrix of shape ($N_{samples}, N_{features}$), so you have to compute the distance between `x` and each row of `x_set`. 
        - In order to find the indices of the `k` nearest neighbors, you can use the `argsort` function that returns the indices that would sort an array
    - Apply the function to the $36^{th}$ sample of the dataset with $k=5$.
    - Print the indices of the $5$ nearest neighbors.
    - Print the labels of the $5$ nearest neighbors. Can you guess the label of the $36^{th}$ sample?

In [7]:
def k_nearest_neighbors(x, x_set, k):
    # compute the distances of x from each point in x_set
    distances = np.linalg.norm(x - x_set, axis=1)
    # we then take the first k elements along the second axis (the training sample one) of the sorted array with [:k]
    indices = distances.argsort()[:k]
    return indices

n_neigbours = 5
sample_idx = 35

neighbours_idx = k_nearest_neighbors(x[sample_idx], x, n_neigbours)
print("Indices of the 5 nearest neighbors:", neighbours_idx)
print("Labels of the 5 nearest neighbors:", y[neighbours_idx])

print("Label of the 36th sample:", y[sample_idx])


Indices of the 5 nearest neighbors: [35 49  1  2 40]
Labels of the 5 nearest neighbors: [0 0 0 0 0]
Label of the 36th sample: 0


## 1.2 KNN design and implementation
In this exercise, you will implement your own version of the K-Nearest Neighbors (KNN) algorithm, and you will use it to assign an
Iris species (i.e. a label) to flowers whose species is unknown.

The KNN algorithm is straightforward. Suppose that some measurements (e.g., the iris features) and their
relative label (e.g., the iris species) of a set of samples are known in advance. 

<img src="https://mlarchive.com/wp-content/uploads/2022/09/img2.png" width="800">

Then, whenever we want to label a new sample, we look at the K most similar points (a.k.a. neighbors) and assign a label accordingly. 

<img src="https://mlarchive.com/wp-content/uploads/2022/09/img1-1.png" width="800">


The simplest solution is using a majority voting scheme: if the majority of the neighbors votes for a label, we will go for it. 
This approach is naive only at first sight: the local similarity assumed by KNN happens to be roughly true, as you have seen in the previous exercises.
Even though this reasoning does not generalize well, the KNN provides a valid baseline for your tasks.


1. Let’s identify a portion of our data for which we will try to guess the species. Randomly select 20%
of the records and store the first four columns (i.e. the features representing each flower) into a
two-dimensional numpy array of shape ($N_{test} \times 4$), you can call it `X_test` and $N_{test}$ is the 20% of the total number of samples.
For the same records, store the test label column (i.e. the one with the species values) into another array, namely `y_test`. 
This is the data that will be used to test the accuracy of your KNN implementation and its correct functioning (i.e. the testing data).

In [21]:
import numpy as np
np.random.seed(0)

indices = np.arange(150)
np.random.shuffle(indices)
test_len = 20*len(indices) // 100
test_idx = indices[:test_len]

x_test = x[test_idx]
y_test = y[test_idx]

print("Number of test samples:", y_test.shape[0])

Number of test samples: 30


2. Store the remaining 80% of the records in the same way. In this case, use the names X_train andy_train for the arrays.
This is the data that your model will use as ground-truth knowledge (i.e. the training data, from which we extract the knowledge and that we will use for comparison).


In [22]:
train_idx = indices[test_len:]

x_train = x[train_idx]
y_train = y[train_idx]
print("Number of training samples:", y_train.shape[0])


Number of training samples: 120


3. Focus now on the KNN technique. 
From the next month, you will use the `scikit-learn` package. Many of its functionalities
are exposed via an object-oriented interface. With this paradigm in mind, implement now the KNN
algorithm and expose it as a Python class. The bare skeleton of your class should look like this (you
are free to add other methods if you want to).

```
class KNearestNeighbors:
    def __init__(self, k):
        """
        Store the value of k in a attribute of the class and initialize other attributes.
        :param k : int, number of neighbors to consider.
        """
        pass # TODO: implement it!
    def fit(self, X, y):
        """
        Store the 'prior knowledge' of you model that will be used
        to predict new labels.
        :param X : input data points, ndarray, shape = (R,C).
        :param y : input labels, ndarray, shape = (R,).
        """
        pass # TODO: implement it!
    
    def predict(self, X):
        """Run the KNN classification on X.
        :param X: input data points, ndarray, shape = (N,C).
        :return: labels : ndarray, shape = (N,).
        """
        pass # TODO: implement it!

```


Implement the `__init__` and `fit` methods first. 
- In the `__init__` method, you should store the value of `k` in a private attribute of the class.
- In the `fit` method you should only store the training data in private attributes of the class. 

In [23]:
class KNearestNeighbors:
    def __init__(self, k):
        """
        Store the value of k in a attribute of the class and initialize other attributes.
        :param k : int, number of neighbors to consider.
        """
        self.k = k
        self.__X_train = None
        self.__y_train = None
        
    def fit(self, X, y):
        """
        Store the 'prior knowledge' of you model that will be used
        to predict new labels.
        :param X : input data points, ndarray, shape = (R,C).
        :param y : input labels, ndarray, shape = (R,).
        """
        self.__X_train = X
        self.__y_train = y


4. Implement the `predict` method. The function receives as input a numpy array with N rows and C
columns, corresponding to N flowers. The method assigns to each row one of the three Iris species 
using the KNN algorithm, and returns the predicted species as a numpy array. 

    - For the actual implementation, you can either re-use the previously defined `k_nearest_neighbors` function or 
implement a new one exploiting the numpy broadcasting capabilities in order to avoid iterating over the sample matrix `X`.
    - Then, assign the *predicted label* to each sample using a majority voting scheme, i.e., the label that appears most frequently among the k nearest neighbors. To do so you can use the `np.unique(neighbours_labels, return_count=True)` function that returns the unique labels and their counts. 
    - Finally, return the predicted labels as a numpy array.


In [31]:
class KNearestNeighbors(KNearestNeighbors): # this is a trick to extend an already existing class in Python, you simply define a child class with the same name of the parent class 

    def __init__(self, k):
        super().__init__(k)

    # we create a __distance method, because later way may want to extend the functionalities of the model
    # and we will only reimplement this method
    def __distance(self, a, b):
        return np.linalg.norm(a-b, axis=-1)
    
    def __nearest_neighbors(self, x):
        distances = self.__distance(x, self.__X_train)
        indices = distances.argsort()[:self.k]
        return indices
    
    def predict(self, X):
        """Run the KNN classification on X.
        :param X: input data points, ndarray, shape = (N,C).
        :return: labels : ndarray, shape = (N,).
        """
        
        # first check that the model has been trained
        if self.__X_train is None or self.__y_train is None:
            raise Exception("The model has not been trained yet!")
        
        # compute distance of X from each point in X_train
        # the distance vector has to be of shape (R, N) 
        # where R is the number of samples of the test set (30 in IRIS) and N is the number of sample of the train set (120 in IRIS)
        # because we want to compute the distance of each sample in the test set from each sample in the train set

        # solution 1
        # we reuse the previously defined k_nearest_neighbors function to compute the distance element-wise
        # for each sample in the test set
        indices = []
        for i in range(X.shape[0]):
           indices.append(self.__nearest_neighbors(X[i]))
        indices = np.array(indices)
        
        # # solution 2
        # # we reshape the train set to have shape (N, 1, C) so that we can broadcast it with the test set of shape (R, C) and compute the distance
        # # here C is the number of features (4 in the Iris dataset)
        # X = X.reshape(X.shape[0], 1, x_test.shape[1])        
        # # we then compute the distance along the axis that we want to collapse, i.e., the axis 2 which corresponds to the features
        # distances = self.__distance(X, self.__X_train)
        # # then we sort the distances and take the first k elements as previously done
        # indices = distances.argsort(axis=1)[:, :self.k]
        
        
        # we then take the labels of the k nearest neighbors for each sample in the test set
        # we use the indices to select the labels of the k nearest neighbors
        # the final matrix has shape (R, k) where R is the number of samples in the test set and k is the number of neighbors

        # solution 1 we iterate over the indices and select the labels
        neighbour_labels = []
        for i in range(indices.shape[0]):
           neighbour_labels.append(self.__y_train[indices[i]])
        neighbour_labels = np.vstack(neighbour_labels)
        
        # # solution 2 we simply use the indices to select the labels. This is an advanced features of fancy indexing in numpy 
        # # it returns an array of the same shape as the indices array
        # neighbour_labels = self.__y_train[indices]
    
        # we then compute the majority voting
        # we use the unique function to get the unique labels and the counts of each label
        # we then sort the counts in descending order and take the first element
        # the first element is the label that appears the most in the array
        predictions = []
        for i in range(neighbour_labels.shape[0]):
           unique_labels, label_counts = np.unique(neighbour_labels[i], return_counts=True)
           sorted_counts = label_counts.argsort()[::-1][0] # we sort the counts in descending order and select the most frequent label
           predictions.append(unique_labels[sorted_counts])
        predictions = np.array(predictions)    
        
        return predictions
        

5. Now let’s fit the KNN model with the X_train and y_train data. Then, try to use your KNN model
to predict the species for each record in X_test and store them in a nupy array called y_pred.
As we did in the previous lab, check how many Iris species in the array y_pred have been guessed correctly computing with respect to the ones in y_test computing the accuracy. 
    - A prediction is correct if `y_pred[i] == y_test[i]`. To get the accuracy then compute the ratio between the number of correct guesses and the total number of guesses is known. 
    - If all labels are assigned correctly ((y_pred == y_test).all() == True), the accuracy of the model is 100%. 
    - Instead, if none of the guessed species corresponds to the real one ((y_pred == y_test).any() == False), the accuracy is 0%



In [32]:
knn_model = KNearestNeighbors(k=3)
knn_model.fit(x_train, y_train)
y_pred = knn_model.predict(x_test)
print("Predicted labels:", y_pred)
print("True labels:", y_test)

accuracy = (y_pred == y_test).sum() / y_test.shape[0]
print("Accuracy:", accuracy)


Predicted labels: [2 1 0 2 0 2 0 1 1 1 2 1 1 1 2 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0]
True labels: [2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0]
Accuracy: 0.9666666666666667


6. ($\star$) As a software developer, you might want to increase the functionalities of your product and
publish newer versions over time. The better your code is structured and organized, the lower is the
effort to release updates.
As such,  extend your KNN implementation adding the parameter `distance`. This has to be one among:
    - Euclidean distance: $ euclidean(p,q) = \sqrt{\sum_{i=1}^{n} (p_i _- q_i)^2} $
    - Manhattan distance: $ manhattan(p,q) = \sum_{i=1}^n |p_i - q_i|$
    - Cosine distance: $ cosine(p, q) = 1 - \frac{\sum_{i=1}^n p_i q_i}{ \sqrt{\sum^n_{i=1} p^2_i} \cdot \sqrt{\sum^n_{i=1} q_i^2}}$

If any of this distance is not already implemented in `numpy` implement it yourself

In [15]:
# let's extend the KNearestNeighbors class to also accept the distance metric
class KNearestNeighbors(KNearestNeighbors):
    def __init__(self, k, distance_metric="euclidean"):
        super().__init__(k)
        self.distance_metric = distance_metric
        
    def __distance(self, a, b):
        if self.distance_metric == "euclidean":
            distances = np.linalg.norm(a-b, axis=-1)
        elif self.distance_metric == "manhattan":
            distances = np.linalg.norm(a-b, ord=1, axis=-1)
        elif self.distance_metric == "cosine":
            distances = 1 - np.sum(a * b, axis=-1) / (np.linalg.norm(a, axis=-1) * np.linalg.norm(b, axis=-1))
        else:
            raise Exception("Unknown distance metric!")
        
        return distances

knn_model = KNearestNeighbors(k=3, distance_metric="cosine")
knn_model.fit(x_train, y_train)
y_pred = knn_model.predict(x_test)
print("Predicted labels:", y_pred)
print("True labels:", y_test)

accuracy = (y_pred == y_test).sum() / y_test.shape[0]
print("Accuracy:", accuracy)

Predicted labels: [2 2 2 1 1 1 0 2 0 0 0 0 1 1 1 2 1 0 0 1 2 0 2 0 2 0 1 0 0 2]
True labels: [2 2 2 1 1 1 0 2 0 0 0 0 1 1 1 2 1 0 0 1 2 0 2 0 2 0 1 0 0 2]
Accuracy: 1.0



7. ($\star$) Again, extend now your KNN implementation by adding the parameter `weights` to the constructor,
as shown below:

```
class KNearestNeighbors:
    def __init__(self, k, distance_metric="euclidean", weights="uniform"):
        self.k = k
        self.distance_metric = distance_metric
        self.weights = weights
```

Change your KNN implementation to accept a new weighting scheme for the labels. If weights=
"distance", weight neighbor votes by the inverse of their distance (for the distance, again, use
distance_metric). The weight for a neighbor of the point p is:

$
w(p, n) = \frac{1}{distance\_metric(p, n)}
$

Instead, if the default is chosen (weights="uniform"), use the majority voting you already implemented
in Exercise 6.

<img src="https://mlarchive.com/wp-content/uploads/2022/09/img5.png">


In [17]:
class KNearestNeighbors(KNearestNeighbors):
    def __init__(self, k, distance_metric="euclidean", weights="uniform"):
        super().__init__(k, distance_metric)
        self.weights = weights
        
    
    def predict(self, X):
        """Run the KNN classification on X.
        :param X: input data points, ndarray, shape = (N,C).
        :return: labels : ndarray, shape = (N,).
        """
        
        # first check that the model has been trained
        if self.__X_train is None or self.__y_train is None:
            raise Exception("The model has not been trained yet!")
        
        # compute distance of X from each point in X_train
        X = X.reshape(X.shape[0], 1, x_test.shape[1])        

        # we then compute the distance along the axis that we want to collapse, i.e., the axis 2 which corresponds to the features
        distances = self.__distance(X, self.__X_train)
        
        # then we sort the distances and take the first k elements
        indices = distances.argsort(axis=1)[:, :self.k]
        
        # we then take the labels of the k nearest neighbors for each sample in the test set
        neighbour_labels = self.__y_train[indices]
    
        # we then compute the weighted majority voting by considering
        predictions = []
        
        if self.weights == "uniform":
            for i in range(neighbour_labels.shape[0]):
                unique_labels, label_counts = np.unique(neighbour_labels[i], return_counts=True)
                sorted_counts = label_counts.argsort()[::-1][0] # we sort the counts in descending order and select the most frequent label
                predictions.append(unique_labels[sorted_counts])
            
        elif self.weights == "distance":
            for i in range(neighbour_labels.shape[0]):
                label_count = {}
                for j in range(neighbour_labels.shape[1]):
                    label = neighbour_labels[i, j]
                    index = indices[i, j]
                    if label not in label_count:
                        label_count[label] = 1 / distances[i, index]
                    else:
                        label_count[label] += 1 / distances[i, index]
                predictions.append(max(label_count, key=label_count.get))
        else:
            raise Exception("Unknown weights!")
        predictions = np.array(predictions)    
        
        
        return predictions 


knn_model = KNearestNeighbors(k=5, distance_metric="cosine", weights="distance")
knn_model.fit(x_train, y_train)
y_pred = knn_model.predict(x_test)
print("Predicted labels:", y_pred)
print("True labels:", y_test)
accuracy = (y_pred == y_test).sum() / y_test.shape[0]
print("Accuracy:", accuracy)


Predicted labels: [2 2 2 1 1 1 0 2 0 0 0 0 1 1 1 2 1 0 0 1 2 0 2 0 2 0 1 0 0 2]
True labels: [2 2 2 1 1 1 0 2 0 0 0 0 1 1 1 2 1 0 0 1 2 0 2 0 2 0 1 0 0 2]
Accuracy: 1.0


8. ($\star$) Test the modularity of the implementation applying it on a different dataset. Ideally, you should
not change the code of your KNN python class.
- Download the MNIST dataset and retain only 100 samples per digit. You will end up with a dataset of 1000 samples.
- Define again four numpy arrays as you did in Exercises 2 and 3.
- Apply your KNN as you did for the Iris dataset.
- Evaluate the accuracy on MNIST’s y_test.

In [19]:
# download MNIST dataset
! pip install wget

import wget
wget.download("https://raw.githubusercontent.com/dbdmg/data-science-lab/master/datasets/mnist_test.csv", "mnist.csv")
#! wget https://raw.githubusercontent.com/dbdmg/data-science-lab/master/datasets/mnist_test.csv -O mnist.csv


'mnist (1).csv'

In [20]:
# extracting MNIST dataset
samples_per_digit = {}
x, y = [], []
with open("mnist.csv", "r") as f:
    for columns in csv.reader(f):

        # check empty lines
        if len(columns) == 0:
            break

        # first element in MNIST is the label (i.e., value of the corresponding written digit), the other are the gray-scaled pixel intensity 
        features = np.array(columns[1:], dtype=np.float32)
        label = int(columns[0])
            
        # sample only 100 points per digit
        if label in samples_per_digit:
            if samples_per_digit[label] >= 100:
                continue
            else:
                samples_per_digit[label] += 1
        else:
            samples_per_digit[label] = 1
        
        x.append(features)
        y.append(label)        
        
x = np.vstack(x) # vstack to extract a new dimension and have a 2D array (number of samples, number of features)
y = np.array(y) # y is simply a list of integer, so we create a 1D array (number of samples,)

print(x.shape)
print(y.shape)
    
print(samples_per_digit)


(1000, 784)
(1000,)
{7: 100, 2: 100, 1: 100, 0: 100, 4: 100, 9: 100, 5: 100, 6: 100, 3: 100, 8: 100}


In [21]:
# define four numpy arrays x_train, y_train, x_test, y_test
indices = np.arange(len(y))
np.random.shuffle(indices)
test_len = 20*len(indices) // 100
test_idx = indices[:test_len]
train_idx = indices[test_len:]

x_test = x[test_idx]
y_test = y[test_idx]

x_train = x[train_idx]
y_train = y[train_idx]

print("Number of training samples", y_train.shape)
print("Number of test samples", y_test.shape)



Number of training samples (800,)
Number of test samples (200,)


In [22]:
# Apply KNN on MNIST
knn_model = KNearestNeighbors(k=3, distance_metric="cosine", weights="distance")
knn_model.fit(x_train, y_train)
y_pred = knn_model.predict(x_test)
print("Predicted labels:", y_pred)
print("True labels:", y_test)
accuracy = (y_pred == y_test).sum() / y_test.shape[0]
print("Accuracy:", accuracy)

Predicted labels: [4 3 3 3 3 3 0 7 1 0 1 1 2 5 8 5 6 9 5 5 0 8 0 1 6 4 9 9 3 8 8 4 9 2 1 9 3
 0 8 0 3 1 4 8 6 3 7 2 2 2 0 0 9 4 7 1 4 9 8 7 5 9 9 7 1 9 2 4 1 6 5 8 7 1
 6 4 3 1 9 8 0 6 0 6 2 2 9 0 2 6 3 6 2 5 8 7 8 4 9 6 9 1 5 1 1 4 7 0 5 6 5
 4 0 2 7 8 0 3 3 1 5 7 7 0 3 3 9 8 0 0 5 9 4 7 8 3 9 8 6 3 4 4 9 7 1 3 9 2
 5 1 7 5 0 8 8 1 5 6 2 1 0 2 0 1 4 4 2 8 8 6 6 7 0 6 0 8 5 2 8 5 9 3 4 1 0
 1 9 4 4 2 9 6 6 4 9 8 4 9 7 6]
True labels: [4 3 3 5 3 8 0 7 1 0 1 1 2 5 8 5 5 9 5 5 0 8 0 1 6 4 9 9 3 2 8 4 9 2 1 9 3
 0 8 0 3 1 9 8 6 3 7 2 2 2 0 0 8 4 7 1 4 4 5 7 0 9 9 7 7 9 2 4 7 6 5 8 2 1
 6 4 3 1 9 8 0 6 0 6 2 2 9 0 2 6 3 6 2 5 8 7 2 4 9 6 9 1 5 1 1 4 7 0 8 6 5
 4 0 2 7 3 0 3 3 1 5 7 7 4 3 3 9 8 6 0 5 7 4 7 8 3 9 8 6 3 4 5 9 7 1 3 8 2
 8 1 7 5 0 8 8 1 5 6 2 1 0 2 6 1 4 4 2 8 8 6 6 7 6 6 0 8 5 2 8 5 9 3 4 1 0
 1 4 4 4 7 9 6 6 4 9 8 4 9 7 6]
Accuracy: 0.875
