# PART 1

## Brief Overview of the Problem

For the first part of our assignment, we have to use k Nearest Neighbor and weighted k Nearest Neighbor algorithm to classify our data. Our dataset contains 60 features with discrete values ranging from -3 to 3. Our final column contains the resulting "Personality" attribute which can be 16 different values. Our task for this part is using the kNN and weighted kNN algorithm, we have to decide the correct class over 100 different variation of data model. This 100 variations are the result of the following equation: 2 algorithms(kNN and w-kNN) * 5-fold cross validation * 2 types of data (with and without feature normalization) * 5 values of k (k= 1, 3, 5, 7, 9). We have to acquire accuracy, precision and recall values for each of the folds and then take the average of our results. 

The following is our methods to help us during our calculations. We will explain each of them shortly.

In [69]:
import pandas as pd
import numpy as np
import random
import copy
from math import sqrt
from sklearn.metrics import confusion_matrix

the method that shuffles our data. Iterates through each row and swaps the row with another row which is determined via a random integer.

In [70]:
def shuffle_data(db):
    for i in range(len(db) - 1, 0, -1):
        j = random.randint(0, len(db) - 1)
        db[[i,j]] = db[[j, i]]

the method that calculates the euclidean distance:

In [73]:
def euclid(row1, row2):
    distance = 0.0
    for i in range(len(row1)-1):
        distance += (row1[i] - row2[i])**2
    return sqrt(distance)

This method checks the euclidean distances and returns the nearest neighbors.

In [76]:
def get_neigh(train, test_row, k):
    distances = list()
    for train_row in train:
        dist = euclid(train_row, test_row)
        distances.append((train_row, dist))
    distances.sort(key= lambda tup: tup[1])
    neighbors = list()
    for i in range(k):
        neighbors.append(distances[i][0])
    return neighbors

the method that determines the resulting class value of our data:

In [77]:
def predict_class(train, test_row, k):
    neighbors = get_neigh(train, test_row, k)
    output = [row[-1] for row in neighbors]
    prediction = max(set(output), key=output.count)
    return prediction

The method that returns nearest neighbors and their weight(1/distance) with them.

In [78]:
def get_weighted_neigh(train, test_row, k):
    distances = list()
    for train_row in train:
        dist = euclid(train_row, test_row)
        distances.append((train_row, dist))
    distances.sort(key= lambda tup: tup[1])
    neighbors = list()
    for i in range(k):
        neighbors.append((distances[i][0], distances[i][1]))
    return neighbors

The method decides our final class value according to highest weight.
We have added 0.001 value in case of zero division if there is a duplicate value in our dataset.

In [79]:
def predict_weighted_class(train, test_row, k):
    neighbors = get_weighted_neigh(train, test_row, k)
    output = dict()
    for i in neighbors:
        if(i[0][-1] in output.keys()):
            output[i[0][-1]] += (1/(i[1] + 0.001))
        else:
            output[i[0][-1]] = (1/(i[1] + 0.001))
    prediction = max(output, key=output.get)
    return prediction

the method that calculates accuracy score:

In [80]:
def accuracy_score(db, r):
    return round(sum(db.diagonal())/np.sum(db), r)

The method that calculates the precision values of classes and then takes the average:

In [81]:
def precision_avg(db, r):
    prec = list()
    tp_index = 0
    for row in db:
        prec.append(round(row[tp_index]/np.sum(row), r))
        tp_index += 1
    return round(sum(prec)/len(prec), r)

The method that calculates the recall values of classes and then takes the average:

In [82]:
def recall_avg(db, r):
    rec = list()
    tp_index = 0
    for column in db.T:
        rec.append(round(column[tp_index]/np.sum(column), r))
        tp_index += 1
    return round(sum(rec)/len(rec), r)

The method that does the feat normalization to our data:

In [83]:
def feat_normal(data,r):
    data_temp = data[:,:r]
    num = 0
    for column in data_temp.T:
        col_min = np.min(column)
        col_max = np.max(column)
        for i in range(len(column)):
            data[i,num] = (data[i,num]-col_min)/(col_max-col_min)
        num += 1
    return data

The method that takes the average of our folds and returns a dataframe:

In [84]:
def avg_df(dictionary, mux):
    d = dict()
    d["average of folds"] = []
    first_key = list(dictionary.keys())[0]
    for i in range(len(dictionary[first_key])):
        sum = 0
        for key in dictionary.keys():
            sum += dictionary[key][i]
        d["average of folds"].append(round(sum/5, 2))
    df_end = pd.DataFrame.from_dict(d, orient='index', columns = mux)
    return df_end

The method that calculates the Mean Absolute Error:

In [85]:
def mae(truth, preds):
    sum = 0
    for i in range(len(truth)):
        sum += abs(truth[i] - preds[i])
    return sum/len(truth)

Firstly, we read the dataset and convert it to numpy. We deleted the first column since it just contains each rows position index.

In [86]:
# read the input file
df = pd.read_csv("subset_16P.csv", encoding='cp1254')
data = df.to_numpy()  # convert it to numpy array
data = np.delete(data, (0), axis=1)  # delete the IDs

Our final column contains strings as our result. We changed them to integers accordingly.

In [87]:
# change the result from string to int
my_dict = {"ESTJ": 0, "ENTJ": 1, "ESFJ": 2, "ENFJ": 3, "ISTJ": 4, "ISFJ": 5, "INTJ": 6, "INFJ": 7,
            "ESTP": 8, "ESFP": 9, "ENTP": 10, "ENFP": 11, "ISTP": 12, "ISFP": 13, "INTP": 14, "INFP": 15}
for i in range(len(data[:,60])):
    data[i, 60] = my_dict.get(data[i, 60])

We shuffle the data.

In [88]:
# shuffle data
shuffle_data(data)

We will construct our results dictionary to store our resulting values:

In [89]:
results = dict()
results["fold0"] = [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
results["fold1"] = [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
results["fold2"] = [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
results["fold3"] = [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
results["fold4"] = [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

accs = copy.deepcopy(results)
accs_keys = list(accs)

precs = copy.deepcopy(results)
precs_keys = list(precs)

recs = copy.deepcopy(results)
recs_keys = list(recs)

In this part, we will execute our algorithms and get our results.
Firstly, we take our data without feature normalization, then with feature normalization.
Later, we divide our dataset to 5 different subset to apply 5-fold cross validation.
We name each of our folds to fold0, foldd1, ..., fold4 respectively.
Then, we split our fold to get our train and test datas.
Later on, we use kNN and weighted-kNN algorithms for each value of k (1,3,5,7,9)
Finally, we record our results to the relative dictionary.

In [90]:
mismatch = np.empty([16,16])

for selection in [0, 1]:
    if(selection == 0): # without feature normalization
        # apply 5-fold cross validation
        folded = np.array_split(data, 5)
    else: # with feature normalization
        data = feat_normal(data, 60)
        folded = np.array_split(data, 5)

    fold_num = 0
    for fold in folded:
        subfolded = np.array_split(fold, 5)
        test = subfolded[fold_num]
        train = subfolded[4-fold_num]
        if fold_num == 2:
            train = subfolded[0]
        for i in range(len(subfolded)):
            if fold_num == 2 and i == 0:
                continue
            elif i != fold_num and i != 4-fold_num:
                np.concatenate((subfolded[i], train))

        fold_num += 1
        # name each divided subset of data as fold_0, fold_1, ..., fold_4 respectively.
        test_x = test[:, :60]
        test_y = test[:, 60]

        train_x = train[:, :60]
        train_y = train[:, 60]
        train_y = train_y.astype("int")

        for k in [1, 3, 5, 7, 9]:
            preds = list()
            preds_w = list()
            for i in range(len(test_y)):
                preds.append(predict_class(train, test_x[i], k))
            cm = confusion_matrix(test_y.tolist(), preds)
            mismatch = cm
            accs[accs_keys[fold_num-1]][int((k-1)/2)][selection] = accuracy_score(cm, 2)
            precs[precs_keys[fold_num-1]][int((k-1)/2)][selection] = precision_avg(cm, 2)
            recs[recs_keys[fold_num-1]][int((k-1)/2)][selection] = recall_avg(cm, 2)
        
            for i in range(len(test_y)):
                preds_w.append(predict_weighted_class(train, test_x[i], k))
            cmw = confusion_matrix(test_y.tolist(), preds_w)
            accs[accs_keys[fold_num-1]][int((k-1)/2)][selection+2] = accuracy_score(cmw, 2)
            precs[precs_keys[fold_num-1]][int((k-1)/2)][selection+2] = precision_avg(cmw, 2)
            recs[recs_keys[fold_num-1]][int((k-1)/2)][selection+2] = recall_avg(cmw, 2)

for each of the results (accuracy, precision, recall) we create a dataframe from its dictionary and then we display it.

In [91]:
mux = pd.MultiIndex.from_product([['k=1', 'k=3', 'k=5', 'k=7', 'k=9'], ['NN k-NN', 'N k-NN', 'NN w-kNN', 'N w-kNN']])

dict_acc = {"fold0":[],"fold1":[],"fold2":[],"fold3":[],"fold4":[]}
dict_prec = copy.deepcopy(dict_acc)
dict_rec = copy.deepcopy(dict_acc)

# accuracy dataframe
for i in accs:
    for j in accs.get(i):
        dict_acc.get(i).extend(j)
df_res = pd.DataFrame.from_dict(dict_acc, orient='index', columns = mux)
df_acc_avg = avg_df(dict_acc,mux)

# precision dataframe
for i in precs:
    for j in precs.get(i):
        dict_prec.get(i).extend(j)
df_prec = pd.DataFrame.from_dict(dict_prec, orient='index', columns = mux)
df_prec_avg = avg_df(dict_prec,mux)

#recall dataframe
for i in recs:
    for j in recs.get(i):
        dict_rec.get(i).extend(j)
df_rec = pd.DataFrame.from_dict(dict_rec, orient='index', columns = mux)
df_rec_avg = avg_df(dict_rec,mux)

NN: no feature normalization     
N: feature normalization     
k-NN: k nearest neighbor     
w-kNN: weighted k nearest neighbor   

#### Results for accuracy score:

In [92]:
df_res.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
fold0,0.89,0.84,0.89,0.84,0.93,0.89,0.93,0.89,0.93,0.89,0.93,0.91,0.92,0.9,0.93,0.91,0.91,0.88,0.92,0.92
fold1,0.92,0.85,0.92,0.85,0.94,0.86,0.94,0.88,0.93,0.89,0.94,0.9,0.92,0.9,0.94,0.92,0.92,0.9,0.94,0.92
fold2,0.92,0.9,0.92,0.9,0.92,0.89,0.92,0.92,0.92,0.9,0.94,0.93,0.92,0.87,0.94,0.91,0.92,0.9,0.94,0.92
fold3,0.9,0.84,0.9,0.84,0.88,0.83,0.92,0.89,0.9,0.86,0.92,0.9,0.9,0.87,0.92,0.91,0.9,0.86,0.92,0.89
fold4,0.9,0.83,0.9,0.83,0.91,0.86,0.93,0.86,0.92,0.88,0.93,0.9,0.93,0.88,0.94,0.9,0.93,0.9,0.95,0.92


#### Average of each fold for accuracy score:

In [93]:
df_acc_avg.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
average of folds,0.91,0.85,0.91,0.85,0.92,0.87,0.93,0.89,0.92,0.88,0.93,0.91,0.92,0.88,0.93,0.91,0.92,0.89,0.93,0.91


#### Results for precision:

In [94]:
df_prec.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
fold0,0.89,0.84,0.89,0.84,0.92,0.88,0.92,0.89,0.93,0.89,0.93,0.9,0.91,0.89,0.92,0.91,0.91,0.88,0.91,0.91
fold1,0.91,0.85,0.91,0.85,0.94,0.86,0.94,0.88,0.93,0.89,0.94,0.9,0.92,0.9,0.94,0.92,0.92,0.9,0.93,0.92
fold2,0.92,0.9,0.92,0.9,0.92,0.89,0.93,0.92,0.92,0.9,0.94,0.93,0.92,0.87,0.93,0.91,0.91,0.9,0.94,0.92
fold3,0.89,0.83,0.89,0.83,0.88,0.83,0.92,0.89,0.9,0.87,0.93,0.9,0.9,0.87,0.92,0.91,0.91,0.86,0.93,0.89
fold4,0.9,0.83,0.9,0.83,0.92,0.86,0.93,0.86,0.92,0.89,0.93,0.91,0.94,0.89,0.94,0.9,0.94,0.9,0.95,0.92


#### Average of each fold for precision:

In [95]:
df_prec_avg.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
average of folds,0.9,0.85,0.9,0.85,0.92,0.86,0.93,0.89,0.92,0.89,0.93,0.91,0.92,0.88,0.93,0.91,0.92,0.89,0.93,0.91


#### Results for recall:

In [96]:
df_rec.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
fold0,0.9,0.85,0.9,0.85,0.93,0.89,0.93,0.9,0.93,0.9,0.93,0.91,0.92,0.9,0.93,0.91,0.91,0.89,0.92,0.92
fold1,0.92,0.86,0.92,0.86,0.94,0.87,0.94,0.88,0.93,0.9,0.95,0.91,0.92,0.91,0.94,0.92,0.92,0.9,0.93,0.92
fold2,0.92,0.9,0.92,0.9,0.92,0.9,0.93,0.92,0.93,0.91,0.94,0.93,0.93,0.87,0.94,0.91,0.92,0.9,0.94,0.92
fold3,0.9,0.85,0.9,0.85,0.88,0.85,0.92,0.89,0.91,0.88,0.93,0.9,0.9,0.88,0.93,0.91,0.91,0.87,0.92,0.9
fold4,0.91,0.84,0.91,0.84,0.91,0.86,0.93,0.86,0.92,0.88,0.93,0.9,0.93,0.89,0.94,0.9,0.93,0.9,0.95,0.92


#### Average of each fold for recall:

In [97]:
df_rec_avg.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
average of folds,0.91,0.86,0.91,0.86,0.92,0.87,0.93,0.89,0.92,0.89,0.94,0.91,0.92,0.89,0.94,0.91,0.92,0.89,0.93,0.92


## Error Analysis for Classification

Below, you can see a confusion matrix for k-NN (k=9), fold4, with feature normalization:

In [98]:
print(mismatch)

[[21  0  1  0  0  0  0  0  0  0  0  0  0  0  0  2]
 [ 0 28  0  0  0  1  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 23  0  0  0  0  1  0  0  0  0  1  1  0  1]
 [ 0  0  0 15  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 21  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  1 17  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  1  0  3 21  0  0  0  0  1  1  0  1  0]
 [ 0  0  0  0  0  1  0 23  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 24  0  0  0  0  0  0  0]
 [ 1  0  0  0  0  0  0  1  1 18  0  0  0  0  0  1]
 [ 0  0  0  1  0  0  0  0  0  0 21  1  1  0  0  0]
 [ 0  0  0  0  0  0  0  1  0  0  0 22  0  0  0  0]
 [ 0  0  0  0  0  0  0  1  0  0  0  0 31  0  0  0]
 [ 1  0  0  0  2  1  0  0  0  0  0  0  0 28  0  0]
 [ 0  0  0  1  1  1  0  0  1  0  1  0  0  0 17  0]
 [ 3  1  0  0  0  1  0  0  1  0  0  0  0  0  1 28]]


As you can see, some predictions do not match the actual class. We can see that in the values that are not in the diagonal and different than zero. There can be several reasons for this mismatch:
-> Because k value is equal to 9 for this confusion matrix, when we choose our neighbours, the radius that we apply for choosing might be unnecessarily big. So big that it might include some irrelevant values as neighbors. Choosing a smaller k value might prevent this issue.
-> This mismatches might occur because of the distribution of our train and test dataset. Since we are using fold4 for this confusion matrix, our test dataset might not include some data that is in the train dataset. Since our model sees this kind of data for the first time, this might result a mismatch. Using different folds might prevent this issue.


As the result of our computations, we can see from the tables above that, our accuracy score is generally in between 0.80 and 0.95. While this does not mean our models are working fine on its own, if we look at the precision and recall values together with the accuracy score, we can see that the three of our results generally is in the same interval and this means our model is working fine.
In terms of feature normalization, if we check our results, we can see that no feature normalization gives better results than feature normalization if our k value is small. This difference is becoming smaller while our k value is increasing. The reason behind that situation is when we apply feature normalization, our data distribution becomes more homogeneous. If our k value is 1, this means get  the closest neighbor. Since our data is more homogeneously distributed, this decision becomes harder. That is why the feature normalization gives slightly less results than no feature normalization if our k value is small.

As for the important system parameters, for example the number of training samples, if we used more training samples, our results might be slightly higher than the results that we found. But this would cause a slower execution time for our code since we are using k-NN algorithm (complexity: O(N^2)). Because we are using 10000 data in this assignment, our code takes approximately 4-5 minutes to execute. If we used more training samples, this number might go up to 16 hours or so.

For the values of k, when we check our resulting tables, we can see that taking k as 1 or 9 gives us slightly less performance than taking k as an intermediate value as 3 or 5. This change in performance occurs because of our neighbor selection. If we take k as 1, we directly select the nearest neighbor. This choice may not be suitable in some edge cases. If we take k value as a high integer (for example: 9), some irrelevant neighbors might be included in our neighbor selection and this would result in a poorly trained learning model as well. It is best to take k as a intermediate value, as 3, 5 or 7.

Decision of our folds matters, too. The train and test data must be evenly distributed to get a successful model. To give an example, if we choose a training data which has only class 3 as result, when we have test data that has class 5 as result, our model would classify that test data as class 3. This would cause a poorly trained model and the majority of the results would be wrong. As you can see in our result tables some folds performed better than other folds. This could be relevant to the data distribution as I explained.

# PART 2

## Brief Overview of the Problem

In this part, we have a  dataset which contains various data about energy efficiency. This dataset contains 768 samples and each of them contains 10 different attributes. Two of these attributes are continuous energy efficiency output rate which will be our result attributes. These attributes are heating load and cooling load. Using this dataset we are going to estimate two different energy efficiency values of different building shapes. We will estimate these values using nearest neighbour algorithm and weighted version of it.

We are reading our .csv file here and converting it to numpy array:

In [99]:
df2 = pd.read_csv("energy_efficiency_data.csv", encoding='cp1254')
data2 = df2.to_numpy() # convert it to numpy array

We are shuffling our data so it may be randomized:

In [100]:
shuffle_data(data2)

We are creating dictionaries using our previously created dictionary template and will be using it to hold resulting data:

In [101]:
mae_heat = copy.deepcopy(results)
mae_heat_keys = list(mae_heat)

mae_cool = copy.deepcopy(results)
mae_cool_keys = list(mae_cool)

In this part we first splitted our data into 5 folds. Then for every fold we splitted it into 5 folds again and called them subfolds. To implement 5-fold cross validation correctly we made one of these subfolds to our test data and the rest to train data. After obtaining our test and train data we splitted test data to three arrays. One contains heating load attribute of all samples, other contains cooling load attribute of all samples and the final part contains remaining attributes of all samples. For train data we splitted it into two parts, one contains all attributes except heating load and other contains all except cooling load. After that for every k number(1,3,5,7,9) we get prediction data with our function and then compute mean absolute error using prediction and actual data of all folds and assign them to their respective folds in our previously created dictionary.

In [102]:
for selection in [0, 1]:
    if(selection == 0): # without feature normalization
        # apply 5-fold cross validation
        folded = np.array_split(data2, 5)
    else: # with feature normalization
        data2 = feat_normal(data2, 8)
        folded = np.array_split(data2, 5)
        
    fold_num = 0
    for fold in folded:
        subfolded = np.array_split(fold, 5)
        test = subfolded[fold_num]
        train = subfolded[4-fold_num]
        if fold_num == 2:
            train = subfolded[0]
        for i in range(len(subfolded)):
            if fold_num == 2 and i == 0:
                continue
            elif i != fold_num and i != 4-fold_num:
                np.concatenate((subfolded[i], train))

        fold_num += 1
        # name each divided subset of data as fold_0, fold_1, ..., fold_4 respectively.
        test_x = test[:, :8]
        test_heat_y = test[:, 8]
        test_cool_y = test[:, 9]

        train_heat = train[:, :9]
        train_cool = np.delete(train, 8, 1)

        for k in [1, 3, 5, 7, 9]:
            preds = list()
            preds_w = list()
            for i in range(len(test_heat_y)):
                preds.append(predict_class(train_heat, test_x[i], k))
            mae_heat[mae_heat_keys[fold_num-1]][int((k-1)/2)][selection] = mae(test_heat_y.tolist(), preds)
            for i in range(len(test_heat_y)):
                preds_w.append(predict_weighted_class(train_heat, test_x[i], k))
            mae_heat[mae_heat_keys[fold_num-1]][int((k-1)/2)][selection+2] = mae(test_heat_y.tolist(), preds_w)

        for k in [1, 3, 5, 7, 9]:
            preds = list()
            preds_w = list()
            for i in range(len(test_cool_y)):
                preds.append(predict_class(train_cool, test_x[i], k))
            mae_cool[mae_cool_keys[fold_num-1]][int((k-1)/2)][selection] = mae(test_cool_y.tolist(), preds)
            for i in range(len(test_cool_y)):
                preds_w.append(predict_weighted_class(train_cool, test_x[i], k))
            mae_cool[mae_cool_keys[fold_num-1]][int((k-1)/2)][selection+2] = mae(test_cool_y.tolist(), preds_w)

In this part we organize our previously populated dictionaries so it may be converted into a dataframe and be shown in tables.

In [103]:
dict_heat = {"fold0":[],"fold1":[],"fold2":[],"fold3":[],"fold4":[]}
dict_cool = {"fold0":[],"fold1":[],"fold2":[],"fold3":[],"fold4":[]}

for i in mae_heat:
    for j in mae_heat.get(i):
        dict_heat.get(i).extend(j)
df_heat = pd.DataFrame.from_dict(dict_heat, orient='index', columns = mux)
df_heat_avg = avg_df(dict_heat,mux)

for i in mae_cool:
    for j in mae_cool.get(i):
        dict_cool.get(i).extend(j)
df_cool = pd.DataFrame.from_dict(dict_cool, orient='index', columns = mux)
df_cool_avg = avg_df(dict_cool,mux)

NN: no feature normalization   
N: feature normalization    
k-NN: k nearest neighbor    
w-kNN: weighted k nearest neighbor   

#### Table of Mean Absolute Error for Heating Load

In [104]:
df_heat.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
fold0,2.953871,3.846774,2.953871,3.846774,2.886129,3.441613,2.953871,3.846774,5.557419,4.746452,2.953871,3.846774,5.935484,4.889355,2.953871,3.982903,6.364194,5.248065,2.953871,3.982903
fold1,2.362581,2.712258,2.362581,2.712258,3.126129,2.594194,2.362581,2.712258,4.545806,4.335806,2.362581,2.712258,5.462581,4.049355,2.362581,2.712258,5.779355,3.992258,2.362581,2.712258
fold2,2.627742,3.967742,2.627742,3.967742,2.687419,4.053548,2.595806,3.967742,3.219677,3.364839,2.595806,4.22871,4.057419,4.640323,2.595806,4.220645,4.155161,4.787742,2.595806,4.206129
fold3,3.829667,4.100333,3.829667,4.100333,3.606667,3.186,3.829667,4.100333,3.506667,3.326,3.829667,4.100333,3.63,3.772333,3.829667,4.100333,3.63,4.003667,3.829667,4.100333
fold4,2.261667,2.847333,2.261667,2.847333,2.628667,3.597667,2.261667,2.847333,3.114,3.033667,2.261667,2.847333,3.262667,3.223333,2.261667,2.847333,3.654333,3.597333,2.261667,2.847333


#### Table of Average Values of Mean Absolute Error for Heating Load

In [105]:
df_heat_avg.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
average of folds,2.81,3.49,2.81,3.49,2.99,3.37,2.8,3.49,3.99,3.76,2.8,3.55,4.47,4.11,2.8,3.57,4.72,4.33,2.8,3.57


#### Table of Mean Absolute Error for Cooling Load

In [106]:
df_cool.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
fold0,3.141613,3.515484,3.141613,3.515484,3.633226,3.224194,3.141613,3.515484,4.910645,3.982581,3.141613,3.515484,5.178387,4.279677,3.141613,3.515484,4.976129,4.330645,3.141613,3.515484
fold1,1.975484,2.412903,1.975484,2.412903,3.009677,2.923871,1.975484,2.412903,3.709677,3.393226,1.975484,2.412903,3.977742,3.438387,1.975484,2.412903,4.078065,3.446129,1.975484,2.412903
fold2,3.328387,4.978387,3.328387,4.978387,3.883871,4.685161,3.328387,4.978387,4.106774,4.005484,3.328387,4.978387,4.464839,4.191935,3.328387,4.978387,4.207097,4.204194,3.328387,4.978387
fold3,4.224,4.961667,4.224,4.961667,3.772,4.085,4.224,4.961667,3.762,3.960333,4.224,4.961667,3.811333,4.163,4.224,4.961667,3.781333,4.286667,4.224,4.961667
fold4,2.476667,3.797,2.476667,3.797,3.456,3.616333,2.476667,3.797,3.446,3.757667,2.476667,3.797,3.681333,3.676333,2.476667,3.797,3.838667,3.792333,2.476667,3.797


#### Table of Average Values of Mean Absolute Error for Cooling Load

In [107]:
df_cool_avg.head()

Unnamed: 0_level_0,k=1,k=1,k=1,k=1,k=3,k=3,k=3,k=3,k=5,k=5,k=5,k=5,k=7,k=7,k=7,k=7,k=9,k=9,k=9,k=9
Unnamed: 0_level_1,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN,NN k-NN,N k-NN,NN w-kNN,N w-kNN
average of folds,3.03,3.93,3.03,3.93,3.55,3.71,3.03,3.93,3.99,3.82,3.03,3.93,4.22,3.95,3.03,3.93,4.18,4.01,3.03,3.93


## Error Analysis for Regression

By applying feature normalization in our data, we make it more homogeneous compared to non normalized data. This means different mean absolute error values for our results. As it can be seen in the tables above, at the beginning where our k is 1 our results for non normalized data are generally smaller than normalized data. And at the end where our k is 9 results for non normalized data are generally larger than normalized data. When our k value is 1, we take the closest neighbor. This situation becomes harder if we apply feature normalization to our dataset because feature normalization makes our distribution more homogeneous. When our k value is larger, neighbor selection becomes more suitable and in the end we get smaller errors.

In this part we had data of 768 samples each having 10 attributes. If our number of samples were larger than this number, our results would have been smaller and by extension closer to our actual values because there would be a lot of samples to validate with. But even if the results would have been better compared to less datasize, the cost and time to validate and get results would have increased. This increase in execution time might be problematic in some scenarios. We should always take the optimal size of data.

In case of choosing the k value, as can be seen in the tables above, we should take the value of k as an intermediate value(3, 5 or 7). If we choose k as a small number, our model just takes the nearest samples as neighbor. This can cause mismatch in some edge cases. If we take k as a large number, even the irrelevant samples might be considered as neighbor by our model. This event might also cause a large number for our mean absolute error as can be seen in the tables above. In the end, it is best to choose an optimal value as k.

Choosing the right fold variation for our model is essential. If we choose a fold that has not an evenly distributed samples as train and test data, our model might be poorly trained. This would cause large numbers for mean absolute error. It is best to choose an evenly distributed fold for our model.