BY: **RIYA JOSHI**

EMAIL: riya.joshi@somaiya.edu



---


### **Basic idea behind KNN(K-Nearest Neighbour) algorithm**:

*   This algorithm works by classifying the data points based on how the neighbors are classified. 
* Any new case is classified based on a similarity measure of all the available cases. 
* Technically, the algorithm classifies an unknown item by looking at k of its already -classified, nearest neighbor items by finding out majority votes from nearest neighbors that have similar attributes as those used to map the items.
*   It can be used for Regression as well as for Classification but mostly it is used for the Classification problems.
* **Lazy Learning Algorithm** –  It is a lazy learner because it does not have a training phase but rather memorizes the training dataset. All computations are delayed until classification.
* KNN algorithm is a good choice if you have a small dataset and the data is noise free and labeled. 





### **Applications of KNN**:
* Text mining
* Agriculture
* Finance
* Medical
* Facial recognition
* Recommendation systems (Amazon, Hulu, Netflix, etc)




---




In [None]:
# Importing required libraries
import numpy as np
import pandas as pd

from statistics import *

from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.preprocessing import LabelEncoder

import warnings
warnings.filterwarnings('ignore')

In [None]:
df = pd.read_csv('players_20.csv') #importing dataset
df.head() # displaying first five records

Unnamed: 0,sofifa_id,player_url,short_name,long_name,age,dob,height_cm,weight_kg,nationality,club,overall,potential,value_eur,wage_eur,player_positions,preferred_foot,international_reputation,weak_foot,skill_moves,work_rate,body_type,real_face,release_clause_eur,player_tags,team_position,team_jersey_number,loaned_from,joined,contract_valid_until,nation_position,nation_jersey_number,pace,shooting,passing,dribbling,defending,physic,player_traits,attacking_crossing,attacking_finishing,attacking_heading_accuracy,attacking_short_passing,attacking_volleys,skill_dribbling,skill_curve,skill_fk_accuracy,skill_long_passing,skill_ball_control,movement_acceleration,movement_sprint_speed,movement_agility,movement_reactions,movement_balance,power_shot_power,power_jumping,power_stamina,power_strength,power_long_shots,mentality_aggression,mentality_interceptions,mentality_positioning,mentality_vision,mentality_penalties,mentality_composure,defending_marking,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes
0,158023,https://sofifa.com/player/158023/lionel-messi/...,L. Messi,Lionel Andrés Messi Cuccittini,32,24-06-1987,170,72,Argentina,FC Barcelona,94,94,95500000,565000,"RW, CF, ST",Left,5,4,4,Medium/Low,Messi,Yes,195800000.0,"#Dribbler, #Distance Shooter, #Crosser, #FK Sp...",RW,10.0,,01-07-2004,2021.0,,,87.0,92.0,92.0,96.0,39.0,66.0,"Beat Offside Trap, Argues with Officials, Earl...",88,95,70,92,88,97,93,94,92,96,91,84,93,95,95,86,68,75,68,94,48,40,94,94,75,96,33,37,26,6,11,15,14,8
1,20801,https://sofifa.com/player/20801/c-ronaldo-dos-...,Cristiano Ronaldo,Cristiano Ronaldo dos Santos Aveiro,34,05-02-1985,187,83,Portugal,Juventus,93,93,58500000,405000,"ST, LW",Right,5,4,5,High/Low,C. Ronaldo,Yes,96500000.0,"#Speedster, #Dribbler, #Distance Shooter, #Acr...",LW,7.0,,10-07-2018,2022.0,LS,7.0,90.0,93.0,82.0,89.0,35.0,78.0,"Long Throw-in, Selfish, Argues with Officials,...",84,94,89,83,87,89,81,76,77,92,89,91,87,96,71,95,95,85,78,93,63,29,95,82,85,95,28,32,24,7,11,15,14,11
2,190871,https://sofifa.com/player/190871/neymar-da-sil...,Neymar Jr,Neymar da Silva Santos Junior,27,05-02-1992,175,68,Brazil,Paris Saint-Germain,92,92,105500000,290000,"LW, CAM",Right,5,5,5,High/Medium,Neymar,Yes,195200000.0,"#Speedster, #Dribbler, #Playmaker , #Crosser,...",CAM,10.0,,03-08-2017,2022.0,LW,10.0,91.0,85.0,87.0,95.0,32.0,58.0,"Power Free-Kick, Injury Free, Selfish, Early C...",87,87,62,87,87,96,88,87,81,95,94,89,96,92,84,80,61,81,49,84,51,36,87,90,90,94,27,26,29,9,9,15,15,11
3,200389,https://sofifa.com/player/200389/jan-oblak/20/...,J. Oblak,Jan Oblak,26,07-01-1993,188,87,Slovenia,Atlético Madrid,91,93,77500000,125000,GK,Right,3,3,1,Medium/Medium,Normal,Yes,164700000.0,,GK,13.0,,16-07-2014,2023.0,GK,1.0,,,,,,,"Flair, Acrobatic Clearance",13,11,15,43,13,12,13,14,40,30,43,60,67,88,49,59,78,41,78,12,34,19,11,65,11,68,27,12,18,87,92,78,90,89
4,183277,https://sofifa.com/player/183277/eden-hazard/2...,E. Hazard,Eden Hazard,28,07-01-1991,175,74,Belgium,Real Madrid,91,91,90000000,470000,"LW, CF",Right,4,4,4,High/Medium,Normal,Yes,184500000.0,"#Speedster, #Dribbler, #Acrobat",LW,7.0,,01-07-2019,2024.0,LF,10.0,91.0,83.0,86.0,94.0,35.0,66.0,"Beat Offside Trap, Selfish, Finesse Shot, Spee...",81,84,61,89,83,95,83,79,83,94,94,88,95,90,94,82,56,84,63,80,54,41,87,89,88,91,34,27,22,11,12,6,8,8


In [None]:
# replacing null values with 0
df = df.fillna(value= 0)

In [None]:
new_df=df[['overall', 'potential', 'shooting', 'pace', 'passing', 'skill_ball_control', 'physic', 'preferred_foot']]
new_df.head()

Unnamed: 0,overall,potential,shooting,pace,passing,skill_ball_control,physic,preferred_foot
0,94,94,92.0,87.0,92.0,96,66.0,Left
1,93,93,93.0,90.0,82.0,92,78.0,Right
2,92,92,85.0,91.0,87.0,95,58.0,Right
3,91,93,0.0,0.0,0.0,30,0.0,Right
4,91,91,83.0,91.0,86.0,94,66.0,Right


In [None]:
# converting type from object to string to be able to apply Label Encoder
new_df['preferred_foot']=new_df['preferred_foot'].astype(dtype='string',copy=True)

In [None]:
# converting catagorical column 'preferred_foot' to numerical column by LabelEncoding
new_df['preferred_foot'] = LabelEncoder().fit_transform(new_df['preferred_foot'])

In [None]:
# changing data type of whole dataset to int
new_df = new_df.astype(int)

In [None]:
# splitting dataset into 70:30 ratio

# Defining train size
train_size = int(0.7 * len(new_df))

# Splitting dataset
train_set = new_df[:train_size]
test_set = new_df[train_size:]

In [None]:
# separating train_set into X and Y
X_train=train_set.drop('preferred_foot', axis=1)
y_train=train_set['preferred_foot']

# separating test_set into X and Y
X_test=test_set.drop('preferred_foot', axis=1)
y_test=test_set['preferred_foot']

# **KNN**

**How to Find the Ideal K?**

1- Using odd numbers, fit a KNN classifier for each number.

2- Create predictions.

3- Further evaluate the performance using the predictions produced in step 2.

4- Compare results across each model and decide on the one with the least error.

In [108]:
def eucledian_dist(p1,p2):
    dist = np.sqrt(np.sum((p1-p2)**2))
    return dist
 
def predict(x_train, y_train , x_input, k):
    output_labels = []
     
    # Loop through the datapoints to be classified
    for i in x_input: 
         
        # Array to store distances
        point_dist = []
         
        # Loop through each training Data
        for j in range(len(x_train)): 
            distance = eucledian_dist(np.array(x_train[j,:]) , i) 
            # Calculating the distance
            point_dist.append(distance) 
        point_dist = np.array(point_dist) 
         
        # Sorting the array while preserving the index
        # Keeping the first K datapoints
        dist = np.argsort(point_dist)[:k] 
         
        # Labels of the K datapoints from above
        labels = y_train[dist]
         
        # Majority voting
        lab = mode(labels) 
        output_labels.append(lab)
 
    return output_labels

In [109]:
# Predicting
y_pred = predict(X_train.values,y_train.values,X_test.values , 7)
 
# Calculating accuracy
accuracy_score(y_test.values, y_pred)

0.7541940189642596

In [113]:
confusion_matrix(y_test,y_pred)

array([[ 132, 1029],
       [ 319, 4004]])

**Comparing our model with Scikit-learn model**

In [110]:
from sklearn.neighbors import KNeighborsClassifier  

model = KNeighborsClassifier(n_neighbors=7, metric='minkowski', p=2 )  
model.fit(X_train,y_train)
preds=model.predict(X_test)

In [111]:
# Calculating accuracy
accuracy_score(y_test, preds)

0.7543763676148797

In [112]:
confusion_matrix(y_test,preds)

array([[ 133, 1028],
       [ 319, 4004]])