### Finding Similar Players using Clustering Algorithm

- <b>Problem</b>        : Finding similar players to a given player. 
- <b>Algorithm Used</b> : K - means
    - K - means was preferred over Aglomerative Hierarchial clustering due to better results.
    - No of clusters - 400. Found out SSE for 50, 100, 150....450. 400 was ideal.
- <b>Dataset</b>        : FIFA20 complete database from Kaggle
- <b>Pre Processing</b> : Null values removal, converting ordinal data ( work_rate and preferred foot) into encoded numbers.
- <b>GUI package</b>    : PySimpleGUI
- <b>Additional Features</b>   : Stats checker using Pandas Slicing operation.

### Importing Packages

In [2]:
import pandas as pd
import numpy as np

### Reading data

In [2]:
players = pd.read_csv("D:\\Praxis\\ML\\fifa-20-complete-player-dataset\\players_20.csv")

In [3]:
players.shape

(18278, 104)

### Selecting required columns

In [4]:
players1 = players[['short_name','overall','preferred_foot','weak_foot', 'skill_moves', 'work_rate','pace', 'shooting', 'passing', 
                   'dribbling', 'defending', 'physic', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 
                   'gk_speed', 'gk_positioning', 'attacking_crossing', 'attacking_finishing', 
                   'attacking_heading_accuracy', 'attacking_short_passing', 'attacking_volleys', 'skill_dribbling', 
                   'skill_curve', 'skill_fk_accuracy', 'skill_long_passing', 'skill_ball_control', 'movement_acceleration', 
                   'movement_sprint_speed', 'movement_agility', 'movement_reactions', 'movement_balance', 'power_shot_power', 
                   'power_jumping', 'power_stamina', 'power_strength', 'power_long_shots', 'mentality_aggression', 
                   'mentality_interceptions', 'mentality_positioning', 'mentality_vision', 'mentality_penalties', 
                   'mentality_composure', 'defending_marking', 'defending_standing_tackle', 'defending_sliding_tackle', 
                   'goalkeeping_diving', 'goalkeeping_handling', 'goalkeeping_kicking', 'goalkeeping_positioning', 
                   'goalkeeping_reflexes']]

In [5]:
players1.shape

(18278, 52)

### Checking for null values

In [6]:
#players1.isnull().sum()

In [7]:
players1.fillna(0, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  downcast=downcast,


### Converting preferred foot into binary value

In [8]:
players1.preferred_foot.value_counts()

Right    13960
Left      4318
Name: preferred_foot, dtype: int64

In [9]:
players1.loc[players1.preferred_foot == 'Right','preferred_foot'] = 1
players1.loc[players1.preferred_foot == 'Left','preferred_foot'] = 0
players1 = players1.astype({"preferred_foot" : "float64"})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [10]:
players2 = players1

### Splitting work rate into attacking and defensive work rate, then encoding them

In [11]:
attacking_work_rate = []
defensive_work_rate = []

for i in players1['work_rate']:
    x = (i.split('/'))
    attacking_work_rate.append(x[0])
    defensive_work_rate.append(x[1])

players2['attacking_work_rate'] = attacking_work_rate

players2['defensive_work_rate'] = defensive_work_rate

In [12]:
players2.loc[players2.defensive_work_rate == 'Medium','defensive_work_rate'] = 2
players2.loc[players2.defensive_work_rate == 'Low','defensive_work_rate'] = 1
players2.loc[players2.defensive_work_rate == 'High','defensive_work_rate'] = 3

In [13]:
players2.loc[players2.attacking_work_rate == 'Medium','attacking_work_rate'] = 2
players2.loc[players2.attacking_work_rate == 'Low','attacking_work_rate'] = 1
players2.loc[players2.attacking_work_rate == 'High','attacking_work_rate'] = 3

In [14]:
players3 = players2

### Dropping categorical columns

In [15]:
players3 = players3.drop('work_rate', axis=1)
players3 = players3.drop('short_name',axis =1)

### importing packages

In [16]:
import matplotlib.pyplot as plt
import scipy
from scipy.cluster.hierarchy import dendrogram,linkage
from sklearn.preprocessing import StandardScaler
import numpy as np

### Standardising all variables

In [17]:
players3 = players3.apply(lambda x : (x- np.mean(x))/ np.std(x))

### Agglomerative clustering

#### Making linkage object using complete linkage

In [18]:
#### Creating dendrogram

### K- Means clustering

### SSE for diff cluster solutions

In [19]:
from sklearn.cluster import KMeans

In [20]:
players4 = players3

### Fitting K - Means algo

In [21]:
km = KMeans(n_clusters = 400, n_init = 10)
km.fit(players4)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=400, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=None, tol=0.0001, verbose=0)

In [22]:
players4['group'] = km.labels_

In [23]:
players4['player_name'] = players2['short_name']

### Creating GUI application

In [4]:
players4 = pd.read_csv('D:\\Praxis\\ML\\fifa-20-complete-player-dataset\\clustered_players.csv')
### Stored in a .csv file for faster testing...
###Ideally line should be commented and whole notebook should be run from the start

import PySimpleGUI as sg
from IPython.display import HTML

sg.theme('DarkBlue8')

from PySimpleGUI import Text, CBox, Input, Button, Window

attributes = ['overall', 'preferred_foot', 'weak_foot', 'skill_moves',
       'work_rate', 'pace', 'shooting', 'passing', 'dribbling', 'defending',
       'physic', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes',
       'gk_speed', 'gk_positioning', 'attacking_crossing',
       'attacking_finishing', 'attacking_heading_accuracy',
       'attacking_short_passing', 'attacking_volleys', 'skill_dribbling',
       'skill_curve', 'skill_fk_accuracy', 'skill_long_passing',
       'skill_ball_control', 'movement_acceleration', 'movement_sprint_speed',
       'movement_agility', 'movement_reactions', 'movement_balance',
       'power_shot_power', 'power_jumping', 'power_stamina', 'power_strength',
       'power_long_shots', 'mentality_aggression', 'mentality_interceptions',
       'mentality_positioning', 'mentality_vision', 'mentality_penalties',
       'mentality_composure', 'defending_marking', 'defending_standing_tackle',
       'defending_sliding_tackle','attacking_work_rate', 'defensive_work_rate']

selected_attributes = []

layout = [[sg.Text('Enter player name'),sg.InputText(key = '-Name-')],
              [sg.MLine(key='-ML1-'+sg.WRITE_ONLY_KEY, size=(40,8),do_not_clear=False)],
              [sg.Button('Search'),sg.Button('StatsCheck'),sg.Exit()]]

window1 = sg.Window('Similar Players Finder').Layout(layout)
win2_active=False

while True:                             # The Event Loop

    event, values = window1.read()



    if event in (None, 'Exit'):
        window1.Close()
        break

    if event == 'Search':
        
        #window1['-ML1-'+sg.WRITE_ONLY_KEY].print('My variables are')
        window1['-ML1-'+sg.WRITE_ONLY_KEY].Update('')
        #similar_players = []
        cluster = players4[players4.player_name == values['-Name-']]['group']
        similar_players = players4[players4.group==cluster.values[0]].nlargest(30, 'overall')['player_name']
                
        similar_players = list(similar_players)
        similar_players.remove(values['-Name-'])
        similar_players = pd.Series(similar_players)
        
        window1['-ML1-'+sg.WRITE_ONLY_KEY].print([x for x in similar_players.values], text_color='green', sep='\n')
    
    if event == 'StatsCheck' and not win2_active:
        window1.Hide()
        win2_active = True
                
        layout = [[Text(f'{attributes[i]}. '), CBox('',default=False,key=attributes[i]),Text(f'{attributes[i+8]}. '), CBox('',default=False,key=attributes[i+8]),Text(f'{attributes[i+16]}. '), CBox('',default=False,key=attributes[i+16]),Text(f'{attributes[i+24]}. '), CBox('',default=False,key=attributes[i+24]),Text(f'{attributes[i+32]}. '), CBox('',default=False,key=attributes[i+32]),Text(f'{attributes[i+40]}. '), CBox('',default=False,key=attributes[i+40])] for i in range(0,8)] + \
             [[sg.Text('Enter player name'),sg.InputText(key = '-Name-')],
              [Button('Ok'), Button('Exit')]]

        window2 = Window('Stats Shecker', layout)

        while True:
 
            event, values = window2.read()

            if event in (None, 'Exit'):
                win2_active = False
                window2.Close()
                window1.UnHide()
                selected_attributes = []
                break

            if event == 'Ok':

                selected_attributes.append('short_name')
                for i in attributes:
                    if values[i] == True:
                        selected_attributes.append(i)
                x = (players2.loc[players2.short_name == values['-Name-'],selected_attributes])
                
                dic = {}
                for i in range(0,len(selected_attributes)):
                    dic[selected_attributes[i]] = x.values[0][i]
                    
                stats = pd.DataFrame([dic])
                sg.popup(stats.to_string(index=False))
                selected_attributes = []

window1.close()

In [27]:
players4.to_csv('D:\\Praxis\\ML\\fifa-20-complete-player-dataset\\clustered_players.csv', index=False)