# SIB - Portfolio of Machine Learning Algorithms

## Exercise 7: Implementing the KNNRegressor with RMSE

### 7.1) 
Add the RMSE metric to the "metrics" sub-package.
You must create a new module named "rmse.py".
Consider the structure of the rmse function as presented in the following slide:

  - In the "metrics" sub-package, add a new module named "rmse.py" containing the following function:

  def rmse:
  - arguments:
    - y_true – real values of y
    - y_pred – predicted values of y
  - expected output:
    - A float corresponding to the error between y_true and y_pred
  - algorithm:
    - Calculate the error following the RMSE formula


### 7.2) 
Add the "KNNRegressor" class to the "models" sub- package. You should create a module named "knn_regressor.py" to implement this class. Consider the structure of the "KNNRegressor" as presented in the next slides:



#### KNNRegressor Class

In the "models" sub-package, add the module "knn_regressor.py", which should contain the "KNNRegressor" class.

class KNNRegressor(Model):
- parameters:
  - k – the number of k nearest examples to consider
  - distance – a function that calculates the distance between a sample and the samples in the training dataset
- estimated parameters:
  - dataset – stores the training dataset
- methods:
  - _fit – stores the training dataset
  - _predict – estimates the label value for a sample based on the k most similar examples
  - _score – calculates the error between the estimated values and the real ones (rmse)

---

KNNRegressor._fit:
- signature:
  - Input: dataset – training dataset
  - Output: self – KNNRegressor
- algorithm:
  - Store the training dataset

---

KNNRegressor._predict:
- signature:
  - Input: dataset – test dataset
  - Output: predictions – an array of predicted values for the testing dataset (y_pred)
- algorithm:
  1. Calculate the distance between each sample and various samples in the training dataset
  2. Obtain the indexes of the k most similar examples (shortest distance)
  3. Use the previous indexes to retrieve the corresponding values in y
  4. Calculate the average of the values obtained in step 3
  5. Apply steps 1, 2, 3, and 4 to all samples in the testing dataset

---

KNNRegressor._score:
- signature:
  - Input: dataset – test dataset
  - Output: error – error between predictions and actual values
- algorithm:
  1. Get the predictions (y_pred)
  2. Calculate the rmse between actual values and predictions


### 7.3) 
Test the "KNNRegressor" class using the "cpu.csv" dataset (regression).

In [15]:
import sys
sys.path.append('C:/Users/dases/Desktop/SI/repositorio/si-2/src')

import numpy as np
from si.io.csv_file import read_csv
from si.models.knn_regressor import KNNRegressor
from si.model_selection.split import train_test_split
from si.metrics.rmse import rmse
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Carregar o dataset cpu
cpu = read_csv("../datasets/cpu/cpu.csv", features=True, label=True)


In [17]:
# Dividir o dataset em treino e teste
train, test = train_test_split(cpu, test_size=0.2, random_state=42)

# Inicializar o KNNRegressor com k=3
knn = KNNRegressor(k=3)

# Ajustar o modelo aos dados de treinamento
knn.fit(train)

# Fazer previsões nos dados de teste
predictions = knn.predict(test)

# Calcular o RMSE
error = rmse(test.y, predictions)

# Imprimir resultados
print("Train shape:", train.X.shape)
print("Test shape:", test.X.shape)
print("Predictions:", predictions[:5])
print("Actual values:", test.y[:5])
print("RMSE:", error)

Train shape: (168, 6)
Test shape: (41, 6)
Predictions: [140.66666667  29.33333333  35.66666667 701.33333333  18.66666667]
Actual values: [274  30  22 915  16]
RMSE: 81.36259969252635
