### Dataset
This program is a simple K-NN classification algorithm. It uses the following dataset:
https://www.kaggle.com/shub99/student-marks

The dataset is a txt file with 3 columns:  marks of students from Mid-Semester and End-Semester, and a value that represents if the student will PASS(1) or FAIL(0)

### What will the program do?

1.The program will read the data file

2.Treat the data to be worked with python

3.Generate random points to be classified 

4.Apply the K-NN Model in the generated values and classify them

### What is K-NN?

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm.
It can classify new data based on previously calssified data. For example, categorize clients based on their data for a more accurate marketing strategy.

Imagine that the data is put in a 2 dimensinal graph. It uses a mathematical formula to find the nearest already classified points and attribute a class to the new data bassed on them.

### Reading the data

Here the progrmam turns the txt file containing the Student Marks and their class into a List, so that we can work with python

In [10]:
f = open('marks.txt','r')
lines: list = f.readlines()
data: list =[]
for l in lines:
    l = l.replace('\n','')
    l = l.split(',')
    l2 = [float(i) for i in l]
    data.append(((l2[0],l2[1]),l2[2])) # Each element on the list is a tuple containing 
                                       # another tuple with the marks and the value for the class 

In [11]:
for e in data: print(e)

((34.62365962451697, 78.0246928153624), 0.0)
((30.28671076822607, 43.89499752400101), 0.0)
((35.84740876993872, 72.90219802708364), 0.0)
((60.18259938620976, 86.30855209546826), 1.0)
((79.0327360507101, 75.3443764369103), 1.0)
((45.08327747668339, 56.3163717815305), 0.0)
((61.10666453684766, 96.51142588489624), 1.0)
((75.02474556738889, 46.55401354116538), 1.0)
((76.09878670226257, 87.42056971926803), 1.0)
((84.43281996120035, 43.53339331072109), 1.0)
((95.86155507093572, 38.22527805795094), 0.0)
((75.01365838958247, 30.60326323428011), 0.0)
((82.30705337399482, 76.48196330235604), 1.0)
((69.36458875970939, 97.71869196188608), 1.0)
((39.53833914367223, 76.03681085115882), 0.0)
((53.9710521485623, 89.20735013750205), 1.0)
((69.07014406283025, 52.74046973016765), 1.0)
((67.94685547711617, 46.67857410673128), 0.0)
((70.66150955499435, 92.92713789364831), 1.0)
((76.97878372747498, 47.57596364975532), 1.0)
((67.37202754570876, 42.83843832029179), 0.0)
((89.6767757507208, 65.79936592745237),

### Randomizing test data

We will generate a random dataset to test the program

In [12]:
from random import random

In [38]:
no_class = []
for i in range(100):
    no_class.append([((random() * 100),(random() * 100)), '']) # the program generates random values from 0 to 100
for e in no_class: print(e)

[(58.51606482453228, 9.473831942278899), '']
[(42.47786760637919, 19.377863558025787), '']
[(38.93367284470456, 10.89648947827071), '']
[(34.4174099710584, 7.115883268500522), '']
[(49.62211823461019, 15.746239653847638), '']
[(38.570785626342676, 26.407411970017392), '']
[(84.60862657693852, 65.11992677097655), '']
[(16.470622907468524, 96.53966793921578), '']
[(87.22826455884496, 94.92756631733312), '']
[(49.25553022083194, 93.18840376283917), '']
[(65.82808618192776, 23.877953380549677), '']
[(16.869941856574723, 53.458161346369934), '']
[(44.86053672754847, 21.873938584600193), '']
[(48.483606294987794, 14.747340468641035), '']
[(0.7650420400863611, 34.35511844035023), '']
[(87.19236335112754, 27.56728330742143), '']
[(41.15642163313754, 28.536915725537515), '']
[(55.76723022343315, 47.31642229530581), '']
[(15.221572305572838, 26.54374967716929), '']
[(1.7965884724828851, 73.06121918894502), '']
[(10.762546083506797, 66.82668555656792), '']
[(44.49488090491236, 93.96978850587074),

### Defining Class

Now we will create a python class that receives both datasets and classifies the 'no_class' list.
By creating a class, we can create a object that is specific to 2 datasets, what helps in keeping the data organized and to not mix datasets

In [40]:
class MarkClassifier():
    def __init__(self, data: list, no_class: list, k: int):
        self.data: list = data
        self.no_class: list = no_class
        self.k: int = k
            
    def distance_verifier(self, p_data: tuple, p_no_class: tuple) -> float:
        '''Check the distance between 2 points
        p_data: tuple with the values from the classified student
        p_no_class: tuple with the values from the unclassified student
        Returns the distance in float
        '''
        d: float = 0.0
        for i in range(len(p_data)):
            d += (p_data[i] - p_no_class[i]) ** 2
        d = d ** (1 / 2)
        return d
    
    def find_closer_points(self, i) -> list: # Function to find closer points
        '''Find the closer points to the point to be classified
        i: range for looping the data
        Returns a list with the nearest points'''
        data_distances = []
        closer_distances = []
        closer_points = []
    
        # For each point in the 'data' list, check the distance between the values
        # of point [i] and the points in data
        for point in data:
            data_distances.append(self.distance_verifier(self.no_class[i][0], point[0]))
        data_distances = list(enumerate(data_distances))
        data_distances.sort(key=lambda elem: elem[1])
        closer_distances = data_distances[:self.k]

        for element in closer_distances:
            closer_points.append(element[0])
        return closer_points
    
    def class_def(self, closer_points) -> int: # Function to define the class of the point
        '''Classify a point
        'closer_points': List with nearest points
        Returns a int(0 or 1)'''
        closer_points_class = []
        for i in closer_points:
            closer_points_class.append(self.data[i][1])
        zeros = closer_points_class.count(0)
        ones = closer_points_class.count(1)
        if zeros > ones:
            p_class = 0
        else:
            p_class = 1
        return p_class
    
    def class_all_points(self) -> list:
        '''Classify al points in the dataset
        Returns a list with the classified random dataset '''
        for i in range(len(no_class)):
            cl_points = self.find_closer_points(i)
            p_class = self.class_def(cl_points)
            no_class[i][1] = p_class
        return no_class

### K
The value of K can be altered here for maximum precision

In [35]:
k = 5

### Program

In [41]:
classer = MarkClassifier(data, no_class, k)
data_classified = classer.class_all_points()

Now the random dataset is classified! And we can easily apply this algorithm to new datasets.

In [42]:
data_classified

[[(58.51606482453228, 9.473831942278899), 0],
 [(42.47786760637919, 19.377863558025787), 0],
 [(38.93367284470456, 10.89648947827071), 0],
 [(34.4174099710584, 7.115883268500522), 0],
 [(49.62211823461019, 15.746239653847638), 0],
 [(38.570785626342676, 26.407411970017392), 0],
 [(84.60862657693852, 65.11992677097655), 1],
 [(16.470622907468524, 96.53966793921578), 0],
 [(87.22826455884496, 94.92756631733312), 1],
 [(49.25553022083194, 93.18840376283917), 1],
 [(65.82808618192776, 23.877953380549677), 0],
 [(16.869941856574723, 53.458161346369934), 0],
 [(44.86053672754847, 21.873938584600193), 0],
 [(48.483606294987794, 14.747340468641035), 0],
 [(0.7650420400863611, 34.35511844035023), 0],
 [(87.19236335112754, 27.56728330742143), 0],
 [(41.15642163313754, 28.536915725537515), 0],
 [(55.76723022343315, 47.31642229530581), 0],
 [(15.221572305572838, 26.54374967716929), 0],
 [(1.7965884724828851, 73.06121918894502), 0],
 [(10.762546083506797, 66.82668555656792), 0],
 [(44.4948809049123