### Implement K-Nearest Neighbors
<img src="pics/city-1.jpg" width="800" height="400">
In this article you're going to learn about K-Nearest Neighbors and machine learning workflow (after getting cleaned dataset).

### Agenda
1. How does it work?
2. Calculating distance.
3. Implement K-Nearest Neighbors on quantitative data.
4. Implement K-Nearest Neighbors on qualitative data.

### 1. How does it work?
As the core concept of K-Nearest Neighbors is to take the majority vote of **K** samples which are the **most similar** to the new sample.  
The procedure below show how the algorithm works.

<img src="pics/knn-1.png" width="1000">

1. **Calculate distance** between new sample and other labeled samples.
2. **Sort** samples by its distance value from low to high.
3. **Get first K samples** which have the lowest distance value.  
4. 
a) If target prediction is **quantitative data**, target prediction can be calculated by **averaging** outputs of the K samples.  
b) If target prediction is **qualitative data**, target prediction can be calculated by **getting most frequent** outputs of the K samples.

### 2. Calculating distance
There are two ways to calculate distance for K-Nearest Neighbors.  

<img src="pics/calculate_distance.png" width="650">

2.1. Manhattan distance can be calculated as folowing:  
#### $$d = \sum_{i=1}^{N} |x_i - y_i|$$
2.2. Euclidean distance can be calculated as following:
#### $$d = \sqrt{\sum_{i=1}^{N} (x_i - y_i)^2}$$
Let's implement modules to calculate Manhattan distance and Euclidean distance below.

In [None]:
import numpy as np
from utilities.ManhattanDistance import ManhattanDistance as ExampleManhattanDistance
from utilities.EuclideanDistance import EuclideanDistance as ExampleEuclideanDistance

In [None]:
class ManhattanDistance:
    def __call__(self, samples, new_sample):
        """ 
        samples shape: (sample_nums, feature_nums)
        new_sample shape: (1, feature_nums)
        Return Manhanttan distance between new sample point and other sample points with shape (sample_nums, )
        """
        # Put your code here
        pass

In [None]:
# Test ManhattanDistance
samples = np.random.uniform(-10, 10, size=(100, 10))
new_sample = np.random.uniform(-10, 10, size=(1, 10))

# Test Manhanttan Distance
example_manhanttan = ExampleManhattanDistance()
manhanttan = ManhattanDistance()

example_manhanttan_distance = example_manhanttan(samples, new_sample)
manhanttan_distance = manhanttan(samples, new_sample)

assert np.sum(example_manhanttan_distance - manhanttan_distance, dtype=np.float32) == 0.0
print("pass")

In [None]:
class EuclideanDistance:
    def __call__(self, samples, new_sample):
        """
        samples shape: (sample_nums, feature_nums)
        new_sample shape: (1, feature_nums)
        Return Euclidean distance between new sample point and other sample points with shape (sample_nums, )
        """
        # Put your code here
        pass

In [None]:
# Test EuclideanDistance
samples = np.random.uniform(-10, 10, size=(100, 10))
new_sample = np.random.uniform(-10, 10, size=(1, 10))

# Test Euclidean Distance
example_euclidean = ExampleEuclideanDistance()
euclidean = EuclideanDistance()

example_euclidean_distance = example_euclidean(samples, new_sample)
euclidean_distance = euclidean(samples, new_sample)

assert np.sum(example_euclidean_distance - euclidean_distance, dtype=np.float32) == 0.0
print("pass")

### 3. Implement K-Nearest Neighbors on quantitative data
In this practice we're going to use KNN to predict house price. The dataset has 4600 samples each consists of price that we want to predict and various features such as number of bedrooms, number of bathrooms, number of floors, size, location and etc., we need to preprocess such features and use them to predict the particular house price.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 3.1. Load dataset

In [None]:
# Load dataset
house_data = pd.read_csv("./datasets/housedata/data.csv")

### 3.2. Explore dataset

In [None]:
# Show some samples of the dataset
house_data.tail()

There are 18 columns, 1 columns for target price prediction and and 17 columns for features. However, we do not need to use all of them so let's investigate which one should we cut it out.

In [None]:
# date
house_data.date.unique()

Date column is almost useless **in this case** because all sample gathered in small time period (about 2 months) in 2014 so that inflation is not to be considered in this dataset, but keep in mind that model learns only from this dataset is not going to work well when predict nowadays house price.

In [None]:
# bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfront, view, condition, sqft_above, sqft_basement, yr_built, yr_renovated
cols = ["bedrooms", "bathrooms", "sqft_living", "sqft_lot", "floors", "waterfront", "view", "condition", "sqft_above", "sqft_basement", "yr_built", "yr_renovated"]
plt.figure(figsize=(24, 12))

for i, column_name in enumerate(cols):
    plt.subplot(3, 4, i + 1)
    plt.scatter(house_data[column_name], house_data.price)
    plt.title(column_name)

plt.show()

Since there are 3 outlier points that make all graph look flat, so let's remove that outlier points and plot the graph again.  
Moreover, yr_built and yr_renovated are repeatitive so let's convert them into house_age.

In [None]:
# Convert yr_built and yr_renovated into house_age
house_age = 2014 - np.maximum(house_data.yr_built, house_data.yr_renovated)
house_age = pd.Series(house_age, name="house_age")
house_data = pd.concat([house_data.drop(columns=["yr_built", "yr_renovated"]), house_age], axis=1)

# Update cols
cols.remove("yr_built")
cols.remove("yr_renovated")
cols.append("house_age")

In [None]:
# bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfront, view, condition, sqft_above, sqft_basement, house_age
plt.figure(figsize=(24, 12))

without_outlier = house_data.query("0 < price < 5000000")
for i, column_name in enumerate(cols):
    plt.subplot(3, 4, i + 1)
    plt.scatter(without_outlier[column_name], without_outlier.price)
    plt.title(column_name)

plt.show()

So far, all 12 features above look useful except house_age that almost competely flat.

In [None]:
# street, city, statezip, country
print(f"street unique count: {len(house_data.street.unique())}")
print(f"city unique count: {len(house_data.city.unique())}")
print(f"statezip unique count: {len(house_data.statezip.unique())}")
print(f"country unique count: {len(house_data.country.unique())}")

All of these features are telling us about the same thing, that is location. Actually it would be better if we can turn these features into longitude and latitude, but for now let's use it this way.  
**street** is not really useful in this case because it almost specific to each of the sample.  
**city** and **statezip** can be useful but they are repetitive because both are location information, however, statezip is more precise than city. For this reason, only use one of them would be find.  
**country** is completely useless in this case because it identical to every samples.

To summarize, 
1. We will not utilize the following columns **date**, **street**, **country**, **house_age** and either **city** or **statezip**.

### 3.3. Prepare data

In [None]:
# Remove columns
data = house_data.drop(columns=["date", "street", "country", "house_age", "statezip"])
data.head()

In [None]:
# One-hot encoding categorical columns
categorical_cols = ["view", "condition", "city"]

for col in categorical_cols:
    city_encoded = pd.get_dummies(data[col])
    city_encoded.columns = [col + "_" + str(_col) for _col in city_encoded.columns]
    data = pd.concat([data.drop(columns=col), city_encoded], axis=1)
data.head()

In [None]:
# Seperate prediction/feature
data_x = data.drop(columns="price")
data_y = data.price
print(f"data_x: {data_x.shape}")
print(f"data_y: {data_y.shape}")

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Split train/test
# Group y into bins
bins = np.linspace(0, 1500000, 10)
y_binned = np.digitize(data_y, bins)
plt.hist(y_binned)

# Split with stratify
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size=0.2, random_state=42, shuffle=True, stratify=y_binned)
print(f"train_x: {train_x.shape}")
print(f"test_x: {test_x.shape}")
print(f"train_y: {train_y.shape}")
print(f"test_y: {test_y.shape}")

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
# Normalize all features to be in range [0, 1] for training set
scaler = MinMaxScaler()
scaler.fit(train_x)

train_x_scaled = scaler.transform(train_x)
train_x_scaled = pd.DataFrame(train_x_scaled, columns=train_x.columns)
train_x_scaled.describe()

In [None]:
# Normalize all features to be in range [0, 1] for test set
test_x_scaled = scaler.transform(test_x)
test_x_scaled = pd.DataFrame(test_x_scaled, columns=test_x.columns)
test_x_scaled.describe()

In [None]:
# Convert type to numpy array
train_x_scaled = train_x_scaled.to_numpy()
train_x, train_y = train_x.to_numpy(), train_y.to_numpy()
test_x_scaled, test_y = test_x_scaled.to_numpy(), test_y.to_numpy()

### 3.4. Prepare model

In [None]:
from sklearn.neighbors import KNeighborsRegressor

In [None]:
class RegressionKNN:
    def __init__(self, k=5, distance_method="Euclidean"):
        # Put your code here
        pass
    
    def fit(self, x, y):
        # Put your code here
        pass
    
    def predict(self, x):
        # Put your code here
        pass

In [None]:
# Test RegressionKNN
regression_model = RegressionKNN(k=5, distance_method="Euclidean")    # KNeighborsRegressor uses Euclidean distance
ex_regression_model = KNeighborsRegressor(n_neighbors=5)

x = np.random.uniform(low=0, high=5, size=(10, 5))
y = np.random.uniform(low=0, high=1000, size=(10, ))

regression_model.fit(x, y)
ex_regression_model.fit(x, y)

pred = regression_model.predict(x)
ex_pred = ex_regression_model.predict(x)

assert np.all(np.equal(pred, ex_pred))
print("Pass")

### 3.5. Fit and evaluate model
Before start training our model we must decide
### 3.5.1. Evaluate metrics
Metrics are functions that is used to measure how good or bad is our model. One of the nost popular metrics for quantitative data is **Mean Absolute Error (MAE)**, it is used to measure the average magnitude of errors between predictions and actual observations by calculating mean absolute differences between the two values.
#### $$MAE = \frac{1}{n}\sum_{i=1}^{N} |y_i - \hat{y_i}|$$

In [None]:
from tensorflow.keras.losses import MeanAbsoluteError as ExampleMeanAbsoluteError

In [None]:
class MeanAbsoluteError:
    def __call__(self, y_true, y_pred):
        """
        y_true: Target predictions with shape (batch_size, class_nums)
        y_pred: Predictions with shape (batch_size, class_nums)
        Return scalar value of MAE
        """
        # Put your code here
        pass

In [None]:
# Test MeanAbsoluteError
y_true = np.random.uniform(low=0, high=5, size=(10, 5))
y_pred = np.random.uniform(low=0, high=5, size=(10, 5))

metrics_example = ExampleMeanAbsoluteError()
metrics = MeanAbsoluteError()

mae_example = metrics_example(y_true, y_pred)
mae = metrics_example(y_true, y_pred)
print(f"mae_example: {mae_example}")
print(f"mae: {mae}")

assert mae_example == mae
print("Pass")

### 3.5.2. Validation set
Validation set is a set of data which split from training set for evaluating model while doing hyperparameter tuning to avoid data leakage.
### 3.5.2.1. Split validation
Split some partial of training set and use it to evaluate trained model.

<img src="pics/split_validation.png" width="900">

In [None]:
class SplitValidation:
    def __init__(self, metrics, val_size=0.2, random_state=None, shuffle=True, stratify=None):
        # Initial properties
        self.metrics = metrics
        self.val_size = val_size
        self.random_state = random_state
        self.shuffle = shuffle
        self.stratify = stratify
        
    def eval(self, model, x, y):
        # Split train/val
        train_x, val_x, train_y, val_y = train_test_split(x, y, test_size=self.val_size, 
                                                          random_state=self.random_state, 
                                                          shuffle=self.shuffle, 
                                                          stratify=self.stratify)
        
        # Normalization
        scaler = MinMaxScaler()
        train_x = scaler.fit_transform(train_x)
        val_x = scaler.transform(val_x)
        
        # Fit training set to model
        model.fit(train_x, train_y)

        # Evaluate model with validation set
        pred_y = model.predict(val_x)
        score = self.metrics(val_y.reshape(-1, 1), pred_y.reshape(-1, 1))
        return score

In [None]:
# Group y into bins
bins = np.linspace(0, 1500000, 10)
y_binned = np.digitize(train_y, bins)

model = RegressionKNN(k=5, distance_method="Euclidean")
metrics = MeanAbsoluteError()
evaluator = SplitValidation(metrics, random_state=42, stratify=y_binned)

score = evaluator.eval(model, train_x, train_y)
print(f"Validation error: {score}")

### Advantage and disadvantage of using Split validation.  
**Advantage**  
- Cheap to compute. 

**Disadvantage**  
- It sacrifice partial of training set.
- Validation set can be bias especially when it has small size (< 10000).

### 3.5.2.2. K-fold cross validation
K-fold cross validation is a better way to evaluate model than the split validation by evaluating model k times each time with different trining and validation set, the following steps show how is works.  

<img src="pics/k-fold_cross_validation.png" width="1100">

```
1. Divide training set into k folds
2. for i in range(k):
3.    Initial new model
4.    validation fold is the data at fold i
5.    training fold is the rest of the data
6.    Train model on training_fold
7.    Evaluate model on validation_fold
8.    Save evaluation result
9. Average all evaluation results
```

In [None]:
from sklearn.model_selection import KFold

In [None]:
class CrossValidation:
    def __init__(self, metrics, k_folds=10):
        # Initial properties
        self.metrics = metrics
        self.k_folds = k_folds
        self.scores = []
        
    def eval(self, model, x, y):
        # Divide training set into k folds
        kf = KFold(n_splits=self.k_folds)
        self.scores = []
        for i, (train_index, val_index) in enumerate(kf.split(x)):
            # Get validation fold
            val_x, val_y = x[val_index], y[val_index]

            # Get training fold
            train_x, train_y = x[train_index], y[train_index]

            # Normalization
            scaler = MinMaxScaler()
            train_x = scaler.fit_transform(train_x)
            val_x = scaler.transform(val_x)

            # Train model on training set
            model.fit(train_x, train_y)

            # Evaluate model on validation set
            pred_y = model.predict(val_x)
            score = self.metrics(val_y.reshape(-1, 1), pred_y.reshape(-1, 1))

            # Save evaluation result
            self.scores.append(score)
        # Average all evaluation results
        mean_score = np.mean(self.scores)
        return mean_score

In [None]:
model = RegressionKNN(k=5, distance_method="Euclidean")
metrics = MeanAbsoluteError()
evaluator = CrossValidation(metrics, k_folds=10)

score = evaluator.eval(model, train_x, train_y)
print(f"Validation errors: {evaluator.scores}")
print(f"Validation mean error: {score}")

### Advantage and disadvantage of using Cross validation.  
**Advantage**  
- Always more accurate than split validation.

**Disadvantage**  
- Computationally expensive.

### Split validation and cross validation summarize
1. If dataset size is small (< 10000) -> cross validation.
2. If it doesn't take too long to train and validate per model -> cross validation.
3. otherwise -> split validation.

### 3.6. Search for best hyperparameters
There are various of ways to search for best hyperparameters. However, there are two basic ways to do so that are Grid search and Random search.
### 3.6.1. Grid searching
Implementation of grid search is simple, we define a set of values for each hyperparameter then we evaluate models which created based on every combination of all set of hyperparameters and choose only the best one.

<img src="pics/GridSearch.png" width="300">

In [None]:
from sklearn.model_selection import ParameterGrid

In [None]:
class GridSearcher:
    def __init__(self, evaluator, criteria="min"):
        assert criteria.lower() == "min" or criteria.lower() == "max", "criteria can be either 'max' or 'min'"
        # Initial properties
        self.evaluator = evaluator
        self.criteria = criteria
        
    def search(self, model_class, x, y, params_grid, report=True):
        # Get parameters space
        param_space = list(ParameterGrid(params_grid))
        
        # Reset
        best_params = None
        best_score = None
        for params in param_space:
            # Create model with new parameters
            model = model_class(**params)
            
            # Evaluate model
            score = self.evaluator.eval(model, x, y)
            
            # Save best parameters and best score
            if best_params is None and best_score is None:
                best_params = params
                best_score = score
            else:
                if self.criteria == "min":
                    if score < best_score:
                        best_params = params
                        best_score = score
                else:
                    if score > best_score:
                        best_params = params
                        best_score = score
                    
            # Report
            if report:
                print(f"parameters: {params} score: {score}")
        return best_params, best_score

In [None]:
metrics = MeanAbsoluteError()
evaluator = CrossValidation(metrics, k_folds=10)
searcher = GridSearcher(evaluator, criteria="min")

# Search for best parameters
params_grid = {"k": list(range(1, 41)), "distance_method": ["Euclidean", "Manhattan"]}
best_params, score = searcher.search(RegressionKNN, train_x, train_y, params_grid)
print(f"best parameters: {best_params}")
print(f"score: {score}")

### 3.6.2. Random searching
Grid searching however has big problem when number of hyperparameters get large, trying out every combination of them take long time to run. For this reason, instead of trying out all combination we randomly try out only some of them, this way not only help **reduce search time** but also **increase number of search point for individual hyperparameter** as shown in figure below. Therefore, **Random searching is recommended** over Grid searching when number of hyperparameters is large or some hyperparameters are float.

<img src="pics/RandomSearch.png" width="300">

In [None]:
class RandomSearcher:
    def __init__(self, evaluator, criteria="min", nums=None):
        assert criteria.lower() == "min" or criteria.lower() == "max", "criteria can be either 'max' or 'min'"
        # Initial properties
        self.evaluator = evaluator
        self.criteria = criteria
        self.nums = nums
        
    def search(self, model_class, x, y, params_grid, report=True):
        # Get parameters space
        param_space = list(ParameterGrid(params_grid))
        nums = len(param_space) if self.nums is None else self.nums
        assert nums <= len(param_space), f"nums cannot be bigger than {len(param_space)}"
        param_space = np.random.choice(param_space, size=nums, replace=False)
        
        # Reset
        best_params = None
        best_score = None
        for params in param_space:
            # Create model with new parameters
            model = model_class(**params)
            
            # Evaluate model
            score = self.evaluator.eval(model, x, y)
            
            # Save best parameters and best score
            if best_params is None and best_score is None:
                best_params = params
                best_score = score
            else:
                if self.criteria == "min":
                    if score < best_score:
                        best_params = params
                        best_score = score
                else:
                    if score > best_score:
                        best_params = params
                        best_score = score
                    
            # Report
            if report:
                print(f"parameters: {params} score: {score}")
        return best_params, best_score

In [None]:
metrics = MeanAbsoluteError()
evaluator = CrossValidation(metrics, k_folds=10)
searcher = RandomSearcher(evaluator, criteria="min", nums=10)

# Search for best parameters
params_grid = {"k": list(range(1, 41)), "distance_method": ["Euclidean", "Manhattan"]}
best_params, score = searcher.search(RegressionKNN, train_x, train_y, params_grid)
print(f"best parameters: {best_params}")
print(f"score: {score}")

### 3.7. Evaluate on test set

In [None]:
# Define model and metrics
model = RegressionKNN(**best_params)
metrics = MeanAbsoluteError()

# Fit model on traning set
model.fit(train_x_scaled, train_y)

# Evaluate on test set
pred_y = model.predict(test_x_scaled)
score = metrics(test_y.reshape(-1, 1), pred_y.reshape(-1, 1))
print(f"Test error: {score}")

### 4. Implement K-Nearest Neighbors on qualitative data
In this practice we're going to use KNN to classify species of iris flower. The only different here is that now our target prediction is categorical instead of continuous values.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 4.1. Load data

In [None]:
# Load dataset
iris_data = pd.read_csv("./datasets/Iris/iris.csv")

### 4.2. Explore dataset

In [None]:
# Show some samples of the dataset
iris_data.tail()

We have very few number of samples, 150 samples.

In [None]:
# SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm
cols = ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]
plt.figure(figsize=(24, 6))

for i, column_name in enumerate(cols):
    plt.subplot(1, 4, i + 1)
    plt.scatter(iris_data[column_name], iris_data.Species)
    plt.title(column_name)

plt.show()

According to the charts above, there are 3 categories for target prediction and 4 features which can be used to train model.

### 4.3. Prepare data

In [None]:
# Remove Id column
data = iris_data.drop(columns=["Id"])
data.head()

In [None]:
# One-hot encoding categorical columns
categorical_cols = ["Species"]

for col in categorical_cols:
    city_encoded = pd.get_dummies(data[col])
    city_encoded.columns = [col + "_" + str(_col) for _col in city_encoded.columns]
    data = pd.concat([data.drop(columns=col), city_encoded], axis=1)
data.head()

In [None]:
# Seperate prediction/feature
data_x = data.iloc[:, :4]
data_y = data.iloc[:, 4:]
print(f"data_x: {data_x.shape}")
print(f"data_y: {data_y.shape}")

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Split train/test
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size=0.2, random_state=42, shuffle=True, stratify=data_y)
print(f"train_x: {train_x.shape}")
print(f"test_x: {test_x.shape}")
print(f"train_y: {train_y.shape}")
print(f"test_y: {test_y.shape}")

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
# Normalize all features to be in range [0, 1] for training set
scaler = MinMaxScaler()
scaler.fit(train_x)

train_x_scaled = scaler.transform(train_x)
train_x_scaled = pd.DataFrame(train_x_scaled, columns=train_x.columns)
train_x_scaled.describe()

In [None]:
# Normalize all features to be in range [0, 1] for test set
test_x_scaled = scaler.transform(test_x)
test_x_scaled = pd.DataFrame(test_x_scaled, columns=test_x.columns)
test_x_scaled.describe()

In [None]:
# Convert type to numpy array
train_x_scaled = train_x_scaled.to_numpy()
train_x, train_y = train_x.to_numpy(), train_y.to_numpy()
test_x_scaled, test_y = test_x_scaled.to_numpy(), test_y.to_numpy()

### 4.4. Prepare model
KNN for qualitative data works exactly the same as KNN for quatitative data but instead of averaging k closest samples it takes majority vote to get final prediction.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [None]:
class ClassifierKNN:
    def __init__(self, k=5, distance_method="Euclidean"):
        # Put your code here
        pass
    
    def fit(self, x, y):
        # Put your code here
        pass
    
    def predict(self, x):
        # Put your code here
        pass

In [None]:
# Test RegressionKNN
classifier_model = ClassifierKNN(k=5, distance_method="Euclidean")    # KNeighborsClassifier uses Euclidean distance
ex_classifier_model = KNeighborsClassifier(n_neighbors=5)

x = train_x_scaled[:10]
y = train_y[:10]

classifier_model.fit(x, y)
ex_classifier_model.fit(x, y)

pred = classifier_model.predict(x)
ex_pred = ex_classifier_model.predict(x)

assert np.all(np.equal(pred, ex_pred))
print("Pass")

### 4.5. Fit and evaluate model
For qualitative data we usually use accuracy metrics to evaluate model.

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
model = ClassifierKNN(k=5, distance_method="Euclidean")
evaluator = CrossValidation(accuracy_score, k_folds=10)

score = evaluator.eval(model, train_x, train_y)
print(f"Validation accuracy: {evaluator.scores}")
print(f"Validation mean accuracy: {score}")

### 4.6. Search for best hyperparameters

In [None]:
evaluator = CrossValidation(accuracy_score, k_folds=10)
searcher = GridSearcher(evaluator, criteria="max")

# Search for best parameters
params_grid = {"k": list(range(1, 41)), "distance_method": ["Euclidean", "Manhattan"]}
best_params, score = searcher.search(ClassifierKNN, train_x, train_y, params_grid)
print(f"best parameters: {best_params}")
print(f"score: {score}")

### 4.7. Evaluate on test set

In [None]:
# Define model and metrics
model = ClassifierKNN(**best_params)

# Fit model on traning set
model.fit(train_x_scaled, train_y)

# Evaluate on test set
pred_y = model.predict(test_x_scaled)
score = accuracy_score(test_y.reshape(-1, 1), pred_y.reshape(-1, 1))
print(f"Test error: {score}")