# Feature Engineering

Feature engineering is the process of selecting, transforming, extracting, combining, and manipulating raw data to generate the desired variables for analysis or predictive modeling. It is a crucial step in developing a machine learning model.

#### Importing required libraries

In [1]:
import numpy as np
import pandas as pd

#### Loading the dataset into a Dataframe

In [2]:
df = pd.read_csv('CSGO Player Dataset(Raw).csv')

In [3]:
df.head()

Unnamed: 0,Name,Teams Played In,Number of Maps Played,Number of Rounds Played,Kill Death Difference,K/D Ratio,HLTV Rating
0,ZywOo,"'Vitality', 'aAa'",985,25890,5999,1.38,1.27
1,s1mple,'Natus Vincere',1544,40788,8916,1.34,1.25
2,sh1ro,"'Gambit Youngsters', 'Gambit'",853,22639,5372,1.45,1.22
3,deko,'1WIN',382,10313,2270,1.38,1.22
4,saffee,"'FURIA', 'paiN'",374,9873,1936,1.32,1.21


We currently have 7 columns in our dataset

- Name <br>
- Teams Played In <br>
- Numbers of Maps Played <br>
- Number of Rounds Played <br>
- Kill Death Difference <br>
- K/D Ratio <br>
- HLTV Rating

New features can be created from the features we already have, we already know that :-<br>
- KD Ratio is derived by ratio of total kills of player with total deaths of player <br>
- Kill Death Difference is difference between kills and deaths

for simplicity lets say

####  K = Total Kills of Player
####  D = Total Deaths of Player
####  KDD = Kill Death Difference
####  KDR = Kill Death Ratio

we know that

#### $KDR=\frac{K}{D}$

#### $K = KDR * D$

#### $KDD =  K - D $

we can use the above equations to compute total kills (kills) and total deaths (deaths) for each player

$K - D = KDD$ 

$K = KDD + D$ 

$KDR * D = KDD + D$ 

$(KDR * D) - D = KDD$ 

$D*(KDR - 1) = KDD$ 

$D = \frac{KDD}{KDR - 1}$

When Kill Death Difference(KDD) > 0 and KD Ratio > 1, it means that Kills > Deaths<br>
Similarly when KIll Death Difference(KDD) < 0 and KD Ratio <1, it means that Deaths > Kills<br><br>
But in case when KD Ratio = 1, it ideally means that Kills and Deaths are equal and there difference should be zero but that is not the case here, in dataset where the Kill Death Difference is not very large and KD Ratio should be very close to one instead the KD Ratio is rounded to 1<br>
In such cases we cannot compute Total Kills and Total Deaths as we dont know the real value of KD Ratio<br>

In [4]:
df[df['K/D Ratio'] == 1]

Unnamed: 0,Name,Teams Played In,Number of Maps Played,Number of Rounds Played,Kill Death Difference,K/D Ratio,HLTV Rating
408,1962,"'Isurus', 'Leviatan'",572,14881,-12,1.0,1.01
417,dazzLe,"'eUnited', 'Rise Nation', 'New Identity'",690,17971,-8,1.0,1.01
423,siuhy,"'MOUZ NXT', 'Izako Boars'",478,12807,43,1.0,1.01
447,leo_drk,"'Sharks', '00NATION'",597,15378,31,1.0,1.0
452,xiaosaGe,"'5POWER', 'Invictus', 'UYA'",847,22333,35,1.0,1.0
456,Djoko,"'HEET', 'DBL PONEY'",459,12166,5,1.0,1.0
460,KILLDREAM,"'AlienTech', 'k1ck', 'Giants'",429,11159,-16,1.0,1.0
462,RoLEX,"'Signature', 'ZIGMA', 'Beyond', 'MiTH'",492,12610,33,1.0,1.0
469,mopoz,'Movistar Riders',1108,29419,-46,1.0,1.0
471,pyth,"'NIP', 'Vexed'",756,19801,-5,1.0,1.0


In [5]:
100 * len(df[df['K/D Ratio'] == 1])/len(df)

3.9800995024875623

Around 4 % of Data has KD Ratio = 1 , for now we will assign them with Null Values in Total Kills and Total Deaths<br>
We can also assign them average values or drop them if we wish but right now we will make them Null and for data with KD Ratio != 1 we already know the equation

In [6]:
def deaths(kdd,kdr):
    # kdd, kdr are arguments we get from dataframe
    '''
    function that computes Total Deaths(d) where,
    d -> Total Deaths
    kdd -> Kill Death Difference
    kdr -> KD Ratio
    '''
    # when kdr is not equal to 1 we compute d
    if kdr != 1:
        d = int(kdd/(kdr-1))
        
    # when kdr is 1 we assign Null value to d
    else:
        d = np.nan
    # we return d
    return d

we create a column __Total Deaths__ in dataset and compute its values by applying deaths function we created<br> df['Kill Death Difference'] and df['K/D Ratio'] are passed as arguments in deaths function to calculate __Total Deaths__

In [7]:
df['Total Deaths'] = df[['Kill Death Difference','K/D Ratio']].apply(lambda df: deaths(df['Kill Death Difference'],df['K/D Ratio']),axis=1)

we calculate __Total Kills__ by using Kill Death Difference and Total Deaths

In [8]:
df['Total Kills'] = df['Kill Death Difference'] + df['Total Deaths']

Hence we have added new columns __Total Deaths__ and __Total Kills__ to the Dataframe

In [9]:
df.head()

Unnamed: 0,Name,Teams Played In,Number of Maps Played,Number of Rounds Played,Kill Death Difference,K/D Ratio,HLTV Rating,Total Deaths,Total Kills
0,ZywOo,"'Vitality', 'aAa'",985,25890,5999,1.38,1.27,15786.0,21785.0
1,s1mple,'Natus Vincere',1544,40788,8916,1.34,1.25,26223.0,35139.0
2,sh1ro,"'Gambit Youngsters', 'Gambit'",853,22639,5372,1.45,1.22,11937.0,17309.0
3,deko,'1WIN',382,10313,2270,1.38,1.22,5973.0,8243.0
4,saffee,"'FURIA', 'paiN'",374,9873,1936,1.32,1.21,6049.0,7985.0


Lets save the Dataframe to a csv file

In [10]:
df.to_csv('CSGO Player Dataset(FE).csv')