## This notebook contains detailed results for assignement data

In [1]:
# import libraries
import sys
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline            

### Read data from file and create vectors
The function **get_vectors** takes a filename as input and returns a matrix X containing each row was a datapoint and vector y containing target labels.

In [2]:
def get_vectors(filename):
    try:
        f = open(filename, 'r')
    except OSError:
        print(f'{filename} could not be opened.\n')
        sys.exit()
        
    # initialize list to store feature and labels for training data
    features = []             
    labels = []
    
    with f:
        line = f.readline()
        while line != '':
            # strip newline and outer parenthesis
            line = line.strip('\n')
            line = line.strip('( )')
            
            # extrace label and append to labels list
            single_label = line.split('), ')[-1]
            labels.append(single_label)
            
            # extrace features and append to features list
            feat = line.split('), ')[0].split(', ')
            features.append(feat)
            
            # read next line
            line = f.readline()
        
        # create dataframe of features and append labels
        X = np.array(features, dtype = float, ndmin = 2)
        
        # convert labels list to array
        y = np.array(labels, dtype = str, ndmin = 2)
        
        return X, y.transpose()

### Distance Calculation

In [3]:
# This function calculates euclidean distance between training datapoints and test data point
def get_cartesian_distance(X_train, p):
    
    # n = total number of datapoints, f_n = total number of features
    n, f_n = X_train.shape
    
    sum_of_squared_diff = np.zeros((n, 1), dtype = float)
    
    # use vectorization to get sum of squared difference
    for i in range(f_n):
        x_vector = X_train[:,i].reshape((n,1))
        sum_of_squared_diff = sum_of_squared_diff + (x_vector - p[i])**2
        
    # take sq root to get array of cartesianeuclidean distance
    euc_dist = np.sqrt(sum_of_squared_diff)
    
    return euc_dist

# This function calculates manhattan distance between training datapoints and test data point
def get_manhattan_distance(X_train, p):
    
    # n = total number of datapoints, f_n = total number of features
    n, f_n = X_train.shape
    
    sum_of_abs_diff = np.zeros((n, 1), dtype = float)
    
    # use vectorization to get sum of squared difference
    for i in range(f_n):
        x_vector = X_train[:,i].reshape((n,1))
        sum_of_abs_diff = sum_of_abs_diff + abs(x_vector - p[i])
        
    # take sq root to get array of cartesianeuclidean distance
    man_dist = np.sqrt(sum_of_abs_diff)
    
    return man_dist


## 2 b)

### KNN Implementation 

The **predict_class_label** function takes training data(X_train, y_train), prediction datapoint(p), number of neighbors(k) to consider and distance type for similarity measurement(dist_type) as arguments and returns a prediction class based on highest posterior class probability value i.e. argmax_c P(class|< data >).

Key notes:
1. The distance, prior_class_probabilities and labels of the first 'k' neighbors are printed if verbose = 1.
2. Tie breaker: If two or more classes have the same highest probability value, class label of the closest neighbor among them
   is selected.

In [4]:

def predict_class_with_knn(X_train, y_train, p, k, dist_type, verbose = 'n'):
    
    # n = total number of datapoints, f_n = total number of features
    n, f_n = X_train.shape
    
    if dist_type == 'cartesian':
        dist_arr = get_cartesian_distance(X_train, p)
    elif dist_type == 'manhattan':
        dist_arr = get_manhattan_distance(X_train, p)
    
    # concat with y_train labels and sort in ascending order of the distance
    dist_arr = np.concatenate((dist_arr, y_train), axis = 1)
    dist_arr = dist_arr[dist_arr[:,0].argsort()]
    
    # the first 'k' rows contain distance and labels of the k nearest neighbors
    knn = dist_arr[0:k,:]
    
    # save class labels of the k nearest neighbors as a list s.t. to count occurence
    knn_labels = list(knn[:,1])
    
    # class labels
    class_labels = list(set(knn_labels))
    
    # calculate posterior class probability
    class_probabilies = {}
    class_probabilies['Metal'] = float(knn_labels.count('Metal')/n)
    class_probabilies['Ceramic'] = float(knn_labels.count('Ceramic')/n)
    class_probabilies['Plastic'] = float(knn_labels.count('Plastic')/n)
   
    # knn_post_class_prob stores posterior probability of the 'k' nearest neighbors based on their class label
    knn_post_class_prob = []
    
    for idx, item in enumerate(knn_labels):
        knn_post_class_prob.append(class_probabilies[item]) 
    
    max_class_prob_idx = 0
    max_class_prob = float(0)
    
    # get class with highest posterior probability
    for i in range(k):

        # the class will only be considerd if its prior probability value is greater than the currrent max value
        # as we want to break any occurence of a tie based on which one is the closest neighbor
        if knn_post_class_prob[i] > max_class_prob:
            
            # value of highest posterior class probability
            max_class_prob = knn_post_class_prob[i]
            
            # index of neighbor with highest posterior class probability
            max_class_prob_idx = i
            
    # check if a max posterior probability tie had occured
    count = 0
    tie = 0
    for key, val in class_probabilies.items():
        
        # count number of max posterior probability occurences
        count = count + int(max_class_prob == val)
        
        # if there is more than one occurence of maximu posterior probability value, a tie has occured
        tie = int(count >= 2)    
            
    # print detailed results        
    if verbose == 'y':
        print(f'\nFor datapoint {p} the {k} nearest neighbor(s) are:')
        print(knn)
        print(f'The posterior class probabilities are:')
        for key, val in class_probabilies.items():
            print(f'{key}: {val:0.6f}')
        
        if tie == 0:
            print(f'Highest posterior probability is {max_class_prob:0.6f} for class {knn_labels[max_class_prob_idx]}.')
        elif tie == 1:
            print(f'Highest posterior probability is {max_class_prob:0.6f} for class {knn_labels[max_class_prob_idx]}.')
            print('Closest neighbor criterion was used to break the tie.')
        
    # return the class with maximum MLP and the probablity value
    return knn_labels[max_class_prob_idx], max_class_prob


### Leave-one-out evaluation function 

The **leave_one_out_evaluation** function takes the entire feature dataset(X) , correct data point labels(y), the number of neighbors to consider(k), verbose preference and the type of distance to use for nearest neighbor calculation(dist_type) as arguments and returns the accuracy (total correct predictions/ total datapoints). 

The datapoint ot be left out and tested is chosen according to its index (i.e. item at index 0 is left out during the first iteration and item at index n-1 is left our during the last iteration).

The **get_evaluation_results** function takes the entire feature dataset(X) , correct data point labels(y)and the type of distance to use for nearest neighbor calculation(dist_type) and verbose preference as arguments and calls **leave_one_out_evaluation** function to get accuracy values for different values of k(1, 3 and 5).

In [5]:
def leave_one_out_evaluation(X, y, k, dist_type, verbose):
    
    # get number of training items and number of features
    n, f_n = X.shape
    
    # prediction labels generated by 'predict_class_with_knn' will be stored in this list
    predictions = []
    
    for i in range(n):
        X_train = np.delete(X, i, axis = 0)
        y_train = np.delete(y, i, axis = 0)
        X_test = X[i,:]
        pred, prob = predict_class_with_knn(X_train, y_train, X_test, k, dist_type, verbose)
        predictions.append(pred)
    
    # convert prediction list to numpy array
    predictions = np.array(predictions, dtype = str, ndmin = 2)
    predictions = predictions.reshape(predictions.shape[1], 1)
    
    # return accuracy
    return (np.sum(y == predictions))/n


def get_evaluation_results(X, y, dist_type, verbose):
    # initialize distionary to store accuracy values for different 'k' values
    accuracy = {}

    # calculate accuracy for various values of 'k'
    accuracy[1] = leave_one_out_evaluation(X, y, 1, dist_type, verbose)
    accuracy[3] = leave_one_out_evaluation(X, y, 3, dist_type, verbose)
    accuracy[5] = leave_one_out_evaluation(X, y, 5, dist_type, verbose)

    print(f'\nFinal result With {dist_type} distance:')
    for key, value in accuracy.items():
        print(f'For k = {key} the accuracy is {value:0.6f}.')

# RESULTS

### Provide training filename & prediction data filename
The file must contain 1 datapoint per line in format (( height, diameter, weight, hue ), label ) which is similar to the format provided for the assignment. The training and test data for 2 a) were used for this example.

**Sample Training data .txt file format:**<br>
(( 0.087387449858191, 0.060081931648431, 0.31979451078728, 2.8496262373309), Ceramic )<br>
(( 0.10978353617857, 0.091370057029853, 0.47387481406305, 3.6590249754078), Metal )<br>
(( 0.085536050586897, 0.03, 0.11915260588651, 2.2104287108141), Metal )<br>
(( 0.084891424475572, 0.052960702057064, 0.19612629947571, 2.7078102998887), Metal )<br>
(( 0.061070880610981, 0.056668572017189, 0.24657746013871, 4.1755360255283), Ceramic )<br>
(( 0.16354523767183, 0.12624593368025, 0.44889932007996, 3.4711554454503), Plastic )<br>

**Sample test data .txt file format:**<br>
( 0.1267104769925, 0.068040454192177, 0.20859882666808, 3.9587910256346)<br>
( 0.080077827464175, 0.051607165210692, 0.17838065457662, 2.9290972438539)<br>
( 0.10538602217058, 0.12242648735599, 0.74997602386248, 5.6360780933991)<br>
( 0.19908879002915, 0.14817768938662, 0.67289149286103, 4.0833065098118)<br>

In [6]:
# provide training and test filename
fname_train = str(input('Enter file containing training data: '))
fname_test = str(input('Enter file containing test data: '))

# provide 'k' value
k = int(input('Enter value of k: '))

# provide verbose preference
verbose = str(input('Would you like to print detailed results?(y/n): '))
print('\n')

# get training data as vectors
X, y = get_vectors(fname_train)

# read test file to make predictions
with open(fname_test, 'r') as f:
    line = f.readline()
    i = 1
    while line != '':
        line = line.strip('\n')
        line = line.strip('( )')
        values = line.split(', ')
        p = np.array(values, dtype = float)
        predicted_class, prob_value = predict_class_with_knn(X, y, p, k, 'cartesian', verbose)
        print(f"Final Class prediction for datapoint {i} : '{predicted_class}' with a probability of {prob_value:.4f}.")
        i = i + 1
        line = f.readline()
    

Enter file containing training data: 2_a_train.txt
Enter file containing test data: 2_a_test.txt
Enter value of k: 3
Would you like to print detailed results?(y/n): y



For datapoint [0.12671048 0.06804045 0.20859883 3.95879103] the 3 nearest neighbor(s) are:
[['0.22991008654808265' 'Ceramic']
 ['0.40132508774163955' 'Metal']
 ['0.46414161830789835' 'Plastic']]
The posterior class probabilities are:
Metal: 0.083333
Ceramic: 0.083333
Plastic: 0.083333
Highest posterior probability is 0.083333 for class Ceramic.
Closest neighbor criterion was used to break the tie.
Final Class prediction for datapoint 1 : 'Ceramic' with a probability of 0.0833.

For datapoint [0.08007783 0.05160717 0.17838065 2.92909724] the 3 nearest neighbor(s) are:
[['0.1626000364448459' 'Ceramic']
 ['0.22205364725787488' 'Metal']
 ['0.32726564816026826' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.083333
Ceramic: 0.166667
Plastic: 0.000000
Highest posterior probability is 0.166667 for class Ceramic.
F

## 2 c)

### K = 1, 3 and 5 using cartesian distance

### Enter filename for leave-one-out evaluation

**Sample evaluation data .txt file format:**<br>
(( 0.087387449858191, 0.060081931648431, 0.31979451078728, 2.8496262373309), Ceramic )<br>
(( 0.10978353617857, 0.091370057029853, 0.47387481406305, 3.6590249754078), Metal )<br>
(( 0.085536050586897, 0.03, 0.11915260588651, 2.2104287108141), Metal )<br>
(( 0.084891424475572, 0.052960702057064, 0.19612629947571, 2.7078102998887), Metal )<br>
(( 0.061070880610981, 0.056668572017189, 0.24657746013871, 4.1755360255283), Ceramic )<br>
(( 0.16354523767183, 0.12624593368025, 0.44889932007996, 3.4711554454503), Plastic )<br>

In [7]:
fname_loo = str(input('Enter filename for leave-one-out evaluation data: '))

# provide verbose preference
verbose = str(input('Would you like to print detailed results?(y/n): '))
print('\n')

X, y = get_vectors(fname_loo)
get_evaluation_results(X, y, 'cartesian', verbose)

Enter filename for leave-one-out evaluation data: 2_c_d_e_eval.txt
Would you like to print detailed results?(y/n): y



For datapoint [0.10333502 0.07922548 0.18783852 2.78657642] the 1 nearest neighbor(s) are:
[['0.08576818981901319' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Metal.

For datapoint [0.1859912  0.14277373 0.60851886 5.64781922] the 1 nearest neighbor(s) are:
[['0.1537564340952132' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Ceramic.

For datapoint [0.06690281 0.04501028 0.11844696 2.78001842] the 1 nearest neighbor(s) are:
[['0.08576818981901319' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.12899039 0.15       0.433085

Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.13223902 0.14048933 0.44828336 1.06817699] the 1 nearest neighbor(s) are:
[['0.11730477726076742' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.08768161 0.08347548 0.30995075 3.41618716] the 1 nearest neighbor(s) are:
[['0.05152842866764247' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.1258569  0.12308654 0.75       3.52262415] the 1 nearest neighbor(s) are:
[['0.03357445675669022' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Ceramic.

For datapoint [0.05       0.06175543 0.1699843  2.8917648 ] the 1 nearest neighbor(s) are:
[['0.04


For datapoint [0.10528721 0.09241496 0.56819437 4.12577297] the 3 nearest neighbor(s) are:
[['0.0948984569119003' 'Plastic']
 ['0.11379598504347868' 'Plastic']
 ['0.11937277129334962' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 0.016807
Highest posterior probability is 0.016807 for class Plastic.

For datapoint [0.06966086 0.06406182 0.12343079 2.98528861] the 3 nearest neighbor(s) are:
[['0.04598712630541811' 'Ceramic']
 ['0.08862701937267116' 'Ceramic']
 ['0.10632874409166383' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.016807
Plastic: 0.000000
Highest posterior probability is 0.016807 for class Ceramic.

For datapoint [0.18898394 0.15       0.73118278 2.05005687] the 3 nearest neighbor(s) are:
[['0.10894269577771457' 'Plastic']
 ['0.11705404948138488' 'Ceramic']
 ['0.18950093342004526' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 0.016807
Highest poster

For datapoint [0.12732012 0.13783518 0.32772094 3.98221693] the 5 nearest neighbor(s) are:
[['0.10207198340253607' 'Ceramic']
 ['0.12503033013862588' 'Metal']
 ['0.14476023692195528' 'Plastic']
 ['0.1960096552342648' 'Plastic']
 ['0.20000271561133257' 'Plastic']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.008403
Plastic: 0.025210
Highest posterior probability is 0.025210 for class Plastic.

For datapoint [0.10719759 0.10605922 0.47254658 3.51154245] the 5 nearest neighbor(s) are:
[['0.06338134146668435' 'Metal']
 ['0.09544666270072519' 'Metal']
 ['0.1004715701415987' 'Plastic']
 ['0.1033585435590142' 'Plastic']
 ['0.11790084208649469' 'Plastic']]
The posterior class probabilities are:
Metal: 0.016807
Ceramic: 0.000000
Plastic: 0.025210
Highest posterior probability is 0.025210 for class Plastic.

For datapoint [0.08459958 0.06847511 0.37812527 3.16649119] the 5 nearest neighbor(s) are:
[['0.05373265445637085' 'Metal']
 ['0.09551640215849634' 'Ceramic']
 ['0.11126

### Conclusion: k = 1 was found to give the best performance in terms of accuracy.

## 2 d) 

### K = 1, 3 and 5 using manhattan distance

In [8]:
get_evaluation_results(X, y, 'manhattan', verbose)


For datapoint [0.10333502 0.07922548 0.18783852 2.78657642] the 1 nearest neighbor(s) are:
[['0.3828798441008132' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Metal.

For datapoint [0.1859912  0.14277373 0.60851886 5.64781922] the 1 nearest neighbor(s) are:
[['0.47922570591805763' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Ceramic.

For datapoint [0.06690281 0.04501028 0.11844696 2.78001842] the 1 nearest neighbor(s) are:
[['0.3828798441008132' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.12899039 0.15       0.4330857  3.13922991] the 1 nearest neighbor(s) are:
[['0.45622077799245386' 'Ceramic']]
The posterior class probabilities are:

Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.09783754 0.07320937 0.52493785 4.25088916] the 1 nearest neighbor(s) are:
[['0.3537856762155663' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Ceramic.

For datapoint [0.13310722 0.09231551 0.73820108 2.92126713] the 1 nearest neighbor(s) are:
[['0.5269770878674819' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Ceramic.

For datapoint [0.10116074 0.06116333 0.27465542 2.3407889 ] the 1 nearest neighbor(s) are:
[['0.3713224838683825' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Metal.

For datapoint [0.05475686 0.03       0.10898636 2.65280366] the 1 nearest neighbor(s) a

Highest posterior probability is 0.016807 for class Metal.

For datapoint [0.07951285 0.03       0.12402406 4.31034434] the 3 nearest neighbor(s) are:
[['0.38701853254332513' 'Metal']
 ['0.6763384025072199' 'Ceramic']
 ['0.6933015881941155' 'Plastic']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.008403
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Metal.
Closest neighbor criterion was used to break the tie.

For datapoint [0.09750152 0.10323631 0.20279394 6.04898829] the 3 nearest neighbor(s) are:
[['0.9669131706882466' 'Plastic']
 ['1.000220650754871' 'Ceramic']
 ['1.085419636037872' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.016807
Plastic: 0.008403
Highest posterior probability is 0.016807 for class Ceramic.

For datapoint [0.1817643  0.12946164 0.55706555 3.33196741] the 3 nearest neighbor(s) are:
[['0.32105525937934726' 'Plastic']
 ['0.41654361805063134' 'Plastic']
 ['0.4280852261603873' 'Plastic']]
T

Plastic: 0.008403
Highest posterior probability is 0.008403 for class Metal.
Closest neighbor criterion was used to break the tie.

For datapoint [0.1258569  0.12308654 0.75       3.52262415] the 3 nearest neighbor(s) are:
[['0.22101415034316305' 'Ceramic']
 ['0.2266428284443395' 'Ceramic']
 ['0.4733928891246991' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.016807
Plastic: 0.000000
Highest posterior probability is 0.016807 for class Ceramic.

For datapoint [0.05       0.06175543 0.1699843  2.8917648 ] the 3 nearest neighbor(s) are:
[['0.2968393084779592' 'Ceramic']
 ['0.36846827692553' 'Ceramic']
 ['0.40254759364783066' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.016807
Plastic: 0.000000
Highest posterior probability is 0.016807 for class Ceramic.

For datapoint [0.16430498 0.15       0.6332237  3.40106854] the 3 nearest neighbor(s) are:
[['0.3192406522878783' 'Plastic']
 ['0.40897897036273145' 'Plastic']
 ['0.4280852261603

Highest posterior probability is 0.042017 for class Ceramic.

For datapoint [0.10538439 0.08372553 0.35547923 3.8751981 ] the 5 nearest neighbor(s) are:
[['0.31510970888597983' 'Plastic']
 ['0.41157945904491494' 'Ceramic']
 ['0.45241598946374684' 'Plastic']
 ['0.4591541211703287' 'Plastic']
 ['0.49864783019928516' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.008403
Plastic: 0.025210
Highest posterior probability is 0.025210 for class Plastic.

For datapoint [0.05230696 0.06504103 0.1        2.42393394] the 5 nearest neighbor(s) are:
[['0.44011900647951335' 'Metal']
 ['0.4653429457643985' 'Metal']
 ['0.5247351840021575' 'Metal']
 ['0.5572539344531008' 'Metal']
 ['0.6059604662705994' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.033613
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.033613 for class Metal.

For datapoint [0.18749085 0.15       0.66070272 3.03202824] the 5 nearest neighbor(s) are:
[['0.5479730223211889' '

For datapoint [0.08549831 0.06371963 0.38378073 3.97640227] the 5 nearest neighbor(s) are:
[['0.41157945904491494' 'Metal']
 ['0.42167738213829076' 'Plastic']
 ['0.4619377348604214' 'Ceramic']
 ['0.48500784528721985' 'Plastic']
 ['0.5167558447382565' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.016807
Plastic: 0.016807
Highest posterior probability is 0.016807 for class Plastic.
Closest neighbor criterion was used to break the tie.

For datapoint [0.08220233 0.08229253 0.28643243 3.62177853] the 5 nearest neighbor(s) are:
[['0.29449258029512193' 'Plastic']
 ['0.4597948765948866' 'Metal']
 ['0.47423403357785465' 'Ceramic']
 ['0.4808391679610284' 'Plastic']
 ['0.4845389044888777' 'Metal']]
The posterior class probabilities are:
Metal: 0.016807
Ceramic: 0.008403
Plastic: 0.016807
Highest posterior probability is 0.016807 for class Plastic.
Closest neighbor criterion was used to break the tie.

For datapoint [0.07402509 0.03       0.19625839 3.44148936] the

The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.025210
Plastic: 0.008403
Highest posterior probability is 0.025210 for class Ceramic.

For datapoint [0.09157231 0.1276855  0.75       3.98746097] the 5 nearest neighbor(s) are:
[['0.4784940045185071' 'Ceramic']
 ['0.604828550202954' 'Ceramic']
 ['0.6075385279903956' 'Ceramic']
 ['0.6566177470176114' 'Ceramic']
 ['0.6688182358717786' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.042017
Plastic: 0.000000
Highest posterior probability is 0.042017 for class Ceramic.

For datapoint [0.11785945 0.05172539 0.42794328 2.10032089] the 5 nearest neighbor(s) are:
[['0.41242366642970163' 'Ceramic']
 ['0.4438714533838675' 'Plastic']
 ['0.4890120374021654' 'Plastic']
 ['0.5627492403507418' 'Metal']
 ['0.6097167870620893' 'Plastic']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.008403
Plastic: 0.025210
Highest posterior probability is 0.025210 for class Plastic.

For datapoint [0.10

### Conclusion: Manhattan distance was found to perform better than cartesian distance. Accuracy is highest for the value k = 5.

## 2 e) 

### Prediction after removing 4th attribute (hue) 

In [9]:
# remove 4th attribute from X
X_3 = np.delete(X, 3, axis = 1)

# check shape of x
X_3.shape

get_evaluation_results(X_3, y, 'cartesian', verbose)


For datapoint [0.10333502 0.07922548 0.18783852] the 1 nearest neighbor(s) are:
[['0.028419877747671858' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.1859912  0.14277373 0.60851886] the 1 nearest neighbor(s) are:
[['0.013875562741809155' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.06690281 0.04501028 0.11844696] the 1 nearest neighbor(s) are:
[['0.019884832698123127' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Metal.

For datapoint [0.12899039 0.15       0.4330857 ] the 1 nearest neighbor(s) are:
[['0.015723075457364333' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Pla

For datapoint [0.07402509 0.03       0.19625839] the 1 nearest neighbor(s) are:
[['0.026269633015792226' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Metal.

For datapoint [0.12732012 0.13783518 0.32772094] the 1 nearest neighbor(s) are:
[['0.026458345520439117' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.008403 for class Plastic.

For datapoint [0.10719759 0.10605922 0.47254658] the 1 nearest neighbor(s) are:
[['0.00506050629494292' 'Metal']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.008403 for class Metal.

For datapoint [0.08459958 0.06847511 0.37812527] the 1 nearest neighbor(s) are:
[['0.007443562207442778' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.008403
Plastic: 


For datapoint [0.13408468 0.08466501 0.27111709] the 3 nearest neighbor(s) are:
[['0.02226406575317533' 'Plastic']
 ['0.02586785540565964' 'Plastic']
 ['0.03518082395560695' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.025210
Highest posterior probability is 0.025210 for class Plastic.

For datapoint [0.05       0.06719063 0.10808586] the 3 nearest neighbor(s) are:
[['0.00867893819175694' 'Metal']
 ['0.025135749179500445' 'Metal']
 ['0.029749372754780422' 'Metal']]
The posterior class probabilities are:
Metal: 0.025210
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.025210 for class Metal.

For datapoint [0.09721779 0.13427563 0.53030239] the 3 nearest neighbor(s) are:
[['0.037353412223397546' 'Metal']
 ['0.03788128222647215' 'Metal']
 ['0.04092576714266953' 'Metal']]
The posterior class probabilities are:
Metal: 0.025210
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.025210 for class Met

Plastic: 0.000000
Highest posterior probability is 0.016807 for class Metal.

For datapoint [0.16841432 0.15       0.508795  ] the 3 nearest neighbor(s) are:
[['0.01623941784975458' 'Plastic']
 ['0.0183826537215948' 'Plastic']
 ['0.021742171323094923' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.025210
Highest posterior probability is 0.025210 for class Plastic.

For datapoint [0.10719121 0.10500125 0.48998482] the 3 nearest neighbor(s) are:
[['0.0174703004070175' 'Metal']
 ['0.018769606770900827' 'Metal']
 ['0.02891492752842012' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.016807
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.016807 for class Metal.

For datapoint [0.10137728 0.10190273 0.60563625] the 3 nearest neighbor(s) are:
[['0.01660219230519834' 'Ceramic']
 ['0.038822673364682876' 'Ceramic']
 ['0.039002516479260944' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic

The posterior class probabilities are:
Metal: 0.025210
Ceramic: 0.000000
Plastic: 0.000000
Highest posterior probability is 0.025210 for class Metal.

For datapoint [0.08336829 0.10818619 0.5716158 ] the 3 nearest neighbor(s) are:
[['0.02721904642582771' 'Ceramic']
 ['0.03505197313412408' 'Ceramic']
 ['0.039002516479260944' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.025210
Plastic: 0.000000
Highest posterior probability is 0.025210 for class Ceramic.

For datapoint [0.1260886  0.13831422 0.75      ] the 3 nearest neighbor(s) are:
[['0.00502628383911354' 'Metal']
 ['0.013498572460490021' 'Ceramic']
 ['0.015229437390972636' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.008403
Ceramic: 0.016807
Plastic: 0.000000
Highest posterior probability is 0.016807 for class Ceramic.

For datapoint [0.11717367 0.06095632 0.35909151] the 3 nearest neighbor(s) are:
[['0.020752054068679904' 'Metal']
 ['0.02117204569525836' 'Ceramic']
 ['0.0258934864584806

The posterior class probabilities are:
Metal: 0.033613
Ceramic: 0.008403
Plastic: 0.000000
Highest posterior probability is 0.033613 for class Metal.

For datapoint [0.08233304 0.05248308 0.20700933] the 5 nearest neighbor(s) are:
[['0.026269633015792226' 'Ceramic']
 ['0.0304907430362784' 'Metal']
 ['0.0331249641563747' 'Metal']
 ['0.03903535480152252' 'Plastic']
 ['0.040520867897685935' 'Plastic']]
The posterior class probabilities are:
Metal: 0.016807
Ceramic: 0.008403
Plastic: 0.016807
Highest posterior probability is 0.016807 for class Metal.
Closest neighbor criterion was used to break the tie.

For datapoint [0.05352238 0.03       0.1       ] the 5 nearest neighbor(s) are:
[['0.009070753653936964' 'Metal']
 ['0.014168919810465003' 'Metal']
 ['0.018838516885107355' 'Metal']
 ['0.020532528611989228' 'Plastic']
 ['0.027287992710873468' 'Metal']]
The posterior class probabilities are:
Metal: 0.033613
Ceramic: 0.000000
Plastic: 0.008403
Highest posterior probability is 0.033613 for cl


For datapoint [0.16369468 0.10495649 0.47624154] the 5 nearest neighbor(s) are:
[['0.025926158969198747' 'Plastic']
 ['0.040601616243635404' 'Plastic']
 ['0.04488040615312247' 'Plastic']
 ['0.05182584666095245' 'Plastic']
 ['0.053906803016363844' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.000000
Plastic: 0.042017
Highest posterior probability is 0.042017 for class Plastic.

For datapoint [0.11059118 0.07945932 0.21776576] the 5 nearest neighbor(s) are:
[['0.030795225523612538' 'Plastic']
 ['0.03099741076727617' 'Plastic']
 ['0.035396134153715814' 'Metal']
 ['0.040520867897685935' 'Metal']
 ['0.04514317045667616' 'Plastic']]
The posterior class probabilities are:
Metal: 0.016807
Ceramic: 0.000000
Plastic: 0.025210
Highest posterior probability is 0.025210 for class Plastic.

For datapoint [0.05933457 0.03       0.14635134] the 5 nearest neighbor(s) are:
[['0.007991021719801287' 'Ceramic']
 ['0.023773034688069537' 'Ceramic']
 ['0.02811500079461687' 'Me


For datapoint [0.09397004 0.04364781 0.25422878] the 5 nearest neighbor(s) are:
[['0.026774849019523483' 'Metal']
 ['0.0278522383329496' 'Metal']
 ['0.03766832282138467' 'Plastic']
 ['0.04581283255841928' 'Metal']
 ['0.048265900095806426' 'Ceramic']]
The posterior class probabilities are:
Metal: 0.025210
Ceramic: 0.008403
Plastic: 0.008403
Highest posterior probability is 0.025210 for class Metal.

For datapoint [0.09523558 0.11535584 0.65383858] the 5 nearest neighbor(s) are:
[['0.050419947061926394' 'Ceramic']
 ['0.06503610029979728' 'Ceramic']
 ['0.07997357663623181' 'Plastic']
 ['0.08119248438154508' 'Ceramic']
 ['0.08298080534134196' 'Plastic']]
The posterior class probabilities are:
Metal: 0.000000
Ceramic: 0.025210
Plastic: 0.016807
Highest posterior probability is 0.025210 for class Ceramic.

For datapoint [0.09157231 0.1276855  0.75      ] the 5 nearest neighbor(s) are:
[['0.020716878564420192' 'Ceramic']
 ['0.03250221730422464' 'Ceramic']
 ['0.034591666100096743' 'Ceramic']


### Conclusion: Removing the 4th attribute was found to significantly improve accuracy with highest accuracy value for k = 3