## Data Manipulation:
The most important step of any data analysis project. It deals with data pre-processing, data cleaning, data wrangling,dealing with missing values. In any project, Data manipulation is the most hectic task. Without this step , further  data analysis is not possible, rather not recommended.

In [1]:
# Importing required packages
import numpy as np
import pandas as pd

In [2]:
# Connecting Drive and Colab
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [11]:
# Reading the 'WorldCup2K23_OriginalData.csv' file in a variable called data
data = pd.read_csv('/content/drive/MyDrive/Project_Data_Sets/WorldCup2K23data_OriginalData.csv')
data.head()

Unnamed: 0,PlayerName,TeamName,Continent,Type,1,2,3,4,5,6,...,26,27,28,29,30,31,WicketsTaken,OversBowled,RunsConceded,Economy
0,Subhman Gill,India,Asia,Batter,,,16.0,53.0,26.0,9.0,...,0,0,0,0,0,0,0,2.0,11.0,5.5
1,Ishan Kishan,India,Asia,Batter,0.0,47.0,,,,,...,0,0,0,0,0,0,0,,,
2,Rohit Sharma,India,Asia,Batter,0.0,131.0,86.0,48.0,46.0,87.0,...,0,0,0,1,0,0,1,0.5,7.0,8.4
3,Virat Kohli,India,Asia,Batter,85.0,55.0,16.0,103.0,95.0,0.0,...,0,0,0,1,0,0,1,3.3,15.0,4.55
4,Shreyas Iyer,India,Asia,Batter,0.0,25.0,53.0,19.0,33.0,4.0,...,0,0,0,0,0,0,0,,,


In [12]:
# As we observed in data exploration step, it contains missing values
missing_values = data.isnull().sum()
print('Missing Values In Each Column:')
print(missing_values)

Missing Values In Each Column:
PlayerName         0
TeamName           0
Continent          0
Type               0
1                 44
2                 42
3                 42
4                 42
5                 42
6                 42
7                 44
8                 42
9                 42
10               108
11               130
TotalRuns          0
MatchesPlayed      0
21                 0
22                 0
23                 0
24                 0
25                 0
26                 0
27                 0
28                 0
29                 0
30                 0
31                 0
WicketsTaken       0
OversBowled       51
RunsConceded      51
Economy           51
dtype: int64


In [13]:
# Converting some columns in object type for better interpretation
columns = ['1','2','3','4','5','6','7','8','9','10','11',
           '21','22','23','24','25','26','27','28','29','30','31']
data[columns] = data[columns].astype('object')

data.dtypes # I successfully converted the above listed columns in object type

PlayerName        object
TeamName          object
Continent         object
Type              object
1                 object
2                 object
3                 object
4                 object
5                 object
6                 object
7                 object
8                 object
9                 object
10                object
11                object
TotalRuns          int64
MatchesPlayed      int64
21                object
22                object
23                object
24                object
25                object
26                object
27                object
28                object
29                object
30                object
31                object
WicketsTaken       int64
OversBowled      float64
RunsConceded     float64
Economy          float64
dtype: object

In [14]:
# As the data set is to big (32 columns), I dont want all of them
# I want to drop the columns startinf from '21' upto '31'
col_drop = ['21','22','23','24','25','26','27','28','29','30','31']
data.drop(columns = col_drop, axis = 1, inplace = True)

data.head()

Unnamed: 0,PlayerName,TeamName,Continent,Type,1,2,3,4,5,6,...,8,9,10,11,TotalRuns,MatchesPlayed,WicketsTaken,OversBowled,RunsConceded,Economy
0,Subhman Gill,India,Asia,Batter,,,16.0,53.0,26.0,9.0,...,23.0,51.0,80.0,4.0,354,11,0,2.0,11.0,5.5
1,Ishan Kishan,India,Asia,Batter,0.0,47.0,,,,,...,,,,,47,11,0,,,
2,Rohit Sharma,India,Asia,Batter,0.0,131.0,86.0,48.0,46.0,87.0,...,40.0,61.0,47.0,47.0,597,11,1,0.5,7.0,8.4
3,Virat Kohli,India,Asia,Batter,85.0,55.0,16.0,103.0,95.0,0.0,...,101.0,51.0,117.0,54.0,765,11,1,3.3,15.0,4.55
4,Shreyas Iyer,India,Asia,Batter,0.0,25.0,53.0,19.0,33.0,4.0,...,77.0,128.0,105.0,4.0,530,11,0,,,


In [16]:
# Treating the missing values of the data frame
# Replacing them by 0
data.fillna(0, inplace = True)
data.head()

Unnamed: 0,PlayerName,TeamName,Continent,Type,1,2,3,4,5,6,...,8,9,10,11,TotalRuns,MatchesPlayed,WicketsTaken,OversBowled,RunsConceded,Economy
0,Subhman Gill,India,Asia,Batter,0.0,0.0,16.0,53.0,26.0,9.0,...,23.0,51.0,80.0,4.0,354,11,0,2.0,11.0,5.5
1,Ishan Kishan,India,Asia,Batter,0.0,47.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,47,11,0,0.0,0.0,0.0
2,Rohit Sharma,India,Asia,Batter,0.0,131.0,86.0,48.0,46.0,87.0,...,40.0,61.0,47.0,47.0,597,11,1,0.5,7.0,8.4
3,Virat Kohli,India,Asia,Batter,85.0,55.0,16.0,103.0,95.0,0.0,...,101.0,51.0,117.0,54.0,765,11,1,3.3,15.0,4.55
4,Shreyas Iyer,India,Asia,Batter,0.0,25.0,53.0,19.0,33.0,4.0,...,77.0,128.0,105.0,4.0,530,11,0,0.0,0.0,0.0


In [19]:
# I wish to rename the columns straring from '1' to '11'
# The column named '1' will be renamed as 'Match1'
# This process will continue till I reach 'Match11'
col_names = {
    '1' : 'Match1',
    '2' : 'Match2',
    '3' : 'Match3',
    '4' : 'Match4',
    '5' : 'Match5',
    '6' : 'Match6',
    '7' : 'Match7',
    '8' : 'Match8',
    '9' : 'Match9',
    '10' : 'Match10',
    '11' : 'Match11'
}

data.rename(columns = col_names, inplace = True)

data.columns

Index(['PlayerName', 'TeamName', 'Continent', 'Type', 'Match1', 'Match2',
       'Match3', 'Match4', 'Match5', 'Match6', 'Match7', 'Match8', 'Match9',
       'Match10', 'Match11', 'TotalRuns', 'MatchesPlayed', 'WicketsTaken',
       'OversBowled', 'RunsConceded', 'Economy'],
      dtype='object')

In [23]:
# Now I want to have a copy of this clean and pre processed data
# Say, the new data frame will be named as 'WorldCup2K23_WrangledData.csv'
file_path = 'WorldCup2K23_WrangledData.csv'
data.to_csv(file_path)

print('Data is saved in Mentioned Location')

Data is saved in Mentioned Location
