In [240]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [241]:
df=pd.read_csv('matches.csv')
df.head()

Unnamed: 0,id,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,IPL-2017,Hyderabad,05-04-2017,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,2,IPL-2017,Pune,06-04-2017,Mumbai Indians,Rising Pune Supergiant,Rising Pune Supergiant,field,normal,0,Rising Pune Supergiant,0,7,SPD Smith,Maharashtra Cricket Association Stadium,A Nand Kishore,S Ravi,
2,3,IPL-2017,Rajkot,07-04-2017,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium,Nitin Menon,CK Nandan,
3,4,IPL-2017,Indore,08-04-2017,Rising Pune Supergiant,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium,AK Chaudhary,C Shamshuddin,
4,5,IPL-2017,Bangalore,08-04-2017,Royal Challengers Bangalore,Delhi Daredevils,Royal Challengers Bangalore,bat,normal,0,Royal Challengers Bangalore,15,0,KM Jadhav,M Chinnaswamy Stadium,,,


In [242]:
df.shape

(756, 18)

In [243]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 756 entries, 0 to 755
Data columns (total 18 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   id               756 non-null    int64 
 1   Season           756 non-null    object
 2   city             749 non-null    object
 3   date             756 non-null    object
 4   team1            756 non-null    object
 5   team2            756 non-null    object
 6   toss_winner      756 non-null    object
 7   toss_decision    756 non-null    object
 8   result           756 non-null    object
 9   dl_applied       756 non-null    int64 
 10  winner           752 non-null    object
 11  win_by_runs      756 non-null    int64 
 12  win_by_wickets   756 non-null    int64 
 13  player_of_match  752 non-null    object
 14  venue            756 non-null    object
 15  umpire1          754 non-null    object
 16  umpire2          754 non-null    object
 17  umpire3          119 non-null    ob

In [244]:
df.describe()

Unnamed: 0,id,dl_applied,win_by_runs,win_by_wickets
count,756.0,756.0,756.0,756.0
mean,1792.178571,0.025132,13.283069,3.350529
std,3464.478148,0.15663,23.471144,3.387963
min,1.0,0.0,0.0,0.0
25%,189.75,0.0,0.0,0.0
50%,378.5,0.0,0.0,4.0
75%,567.25,0.0,19.0,6.0
max,11415.0,1.0,146.0,10.0


### Listing the features having missing values

In [245]:
# listing all the feature that contains missing vlues
feature_nan=[feature for feature in df.columns if df[feature].isnull().sum()>0]
# stating the percentage of missing values in each feature 
for feature in feature_nan:
    print(feature ,np.round(df[feature].isnull().mean()*100,4),'%')

city 0.9259 %
winner 0.5291 %
player_of_match 0.5291 %
umpire1 0.2646 %
umpire2 0.2646 %
umpire3 84.2593 %


#### WE can make few important observations from these % figures of missing values.
1. City has approx. 1% of missing values
2. Winner and player of match have only 0.5% misiing values out of all instances
3. Umpire features are of no use as they are nominal data and we can drop them, also the Umpire 3 feature has 84% of missing values hence can be dropped easily.

## Handling Missing values

### 1. City

In [246]:
df[(df['city'].isnull())][['team1','team2','venue','Season','city']]

Unnamed: 0,team1,team2,venue,Season,city
461,Mumbai Indians,Royal Challengers Bangalore,Dubai International Cricket Stadium,IPL-2014,
462,Kolkata Knight Riders,Delhi Daredevils,Dubai International Cricket Stadium,IPL-2014,
466,Chennai Super Kings,Rajasthan Royals,Dubai International Cricket Stadium,IPL-2014,
468,Sunrisers Hyderabad,Delhi Daredevils,Dubai International Cricket Stadium,IPL-2014,
469,Mumbai Indians,Chennai Super Kings,Dubai International Cricket Stadium,IPL-2014,
474,Royal Challengers Bangalore,Kings XI Punjab,Dubai International Cricket Stadium,IPL-2014,
476,Sunrisers Hyderabad,Mumbai Indians,Dubai International Cricket Stadium,IPL-2014,


### Observations : this table will be really helpful in handling the missing values of the 'city' feature as they all are from one season and venue is same for all.
##### 1. we can either drop these instances as they make for less than 1 % of all the instances, or we can just manually fill the city feature as its a factual data.
So here we will fill this detail manually, as this stadium is located in Dubai.

In [247]:
# lets make these changes in the original df only.
df.loc[(df['venue']=='Dubai International Cricket Stadium'),'city']='Dubai'
df[(df['venue']=='Dubai International Cricket Stadium')]['city']

461    Dubai
462    Dubai
466    Dubai
468    Dubai
469    Dubai
474    Dubai
476    Dubai
Name: city, dtype: object

#### so we have handled the missing values of the feature 'city'

### 2. Winner 
lets see the instances for once we can actually delete these instances directly as they are only  4 missing values 

In [248]:
df[(df['winner'].isnull())]

Unnamed: 0,id,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
300,301,IPL-2011,Delhi,21-05-2011,Delhi Daredevils,Pune Warriors,Delhi Daredevils,bat,no result,0,,0,0,,Feroz Shah Kotla,SS Hazare,RJ Tucker,
545,546,IPL-2015,Bangalore,29-04-2015,Royal Challengers Bangalore,Rajasthan Royals,Rajasthan Royals,field,no result,0,,0,0,,M Chinnaswamy Stadium,JD Cloete,PG Pathak,
570,571,IPL-2015,Bangalore,17-05-2015,Delhi Daredevils,Royal Challengers Bangalore,Royal Challengers Bangalore,field,no result,0,,0,0,,M Chinnaswamy Stadium,HDPK Dharmasena,K Srinivasan,
744,11340,IPL-2019,Bengaluru,30-04-2019,Royal Challengers Bangalore,Rajasthan Royals,Rajasthan Royals,field,no result,0,,0,0,,M. Chinnaswamy Stadium,Nigel Llong,Ulhas Gandhe,Anil Chaudhary


## observations: 
All these nan values means something i.e there was no result in these matches. which can be due to wheather and other uncontrollable reasons, as they are very less only 4 intances we will drop these instances.

In [249]:
# lets make a copy of our original data frame and then drop these instances.
df_1=df.copy()
# dropping the missimg istances from the feature winner.
df_1.dropna(subset='winner',inplace=True)
# checking the shape of our new dataFrame after dropping 4 instances
df_1.shape

(752, 18)

In [250]:
# lets check for null values again
df_1.isnull().sum()

id                   0
Season               0
city                 0
date                 0
team1                0
team2                0
toss_winner          0
toss_decision        0
result               0
dl_applied           0
winner               0
win_by_runs          0
win_by_wickets       0
player_of_match      0
venue                0
umpire1              2
umpire2              2
umpire3            634
dtype: int64

## Observations:
we can clearly observe that now we have only three umpire features that has missing values, and we will drop these along with id as they are of no use.

In [251]:
df_1.columns

Index(['id', 'Season', 'city', 'date', 'team1', 'team2', 'toss_winner',
       'toss_decision', 'result', 'dl_applied', 'winner', 'win_by_runs',
       'win_by_wickets', 'player_of_match', 'venue', 'umpire1', 'umpire2',
       'umpire3'],
      dtype='object')

In [252]:
# dropping features that are of no use 
df_1.drop(['id','umpire1','umpire2','umpire3'],axis=1,inplace=True)

In [253]:
df_1.shape

(752, 14)

## NOTE: this new df_1 has no missing values and we have also dropped few features that are of no use.

In [254]:
df_1.head()

Unnamed: 0,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue
0,IPL-2017,Hyderabad,05-04-2017,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal"
1,IPL-2017,Pune,06-04-2017,Mumbai Indians,Rising Pune Supergiant,Rising Pune Supergiant,field,normal,0,Rising Pune Supergiant,0,7,SPD Smith,Maharashtra Cricket Association Stadium
2,IPL-2017,Rajkot,07-04-2017,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium
3,IPL-2017,Indore,08-04-2017,Rising Pune Supergiant,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium
4,IPL-2017,Bangalore,08-04-2017,Royal Challengers Bangalore,Delhi Daredevils,Royal Challengers Bangalore,bat,normal,0,Royal Challengers Bangalore,15,0,KM Jadhav,M Chinnaswamy Stadium


In [255]:
# lets see our categorical features 
feature_cat=[feature for feature in df.columns if df[feature].dtypes=='O']
feature_cat

['Season',
 'city',
 'date',
 'team1',
 'team2',
 'toss_winner',
 'toss_decision',
 'result',
 'winner',
 'player_of_match',
 'venue',
 'umpire1',
 'umpire2',
 'umpire3']

### lets see uniques of every feature and take necessary steps to organise them 

In [256]:
df_1['Season'].unique()

array(['IPL-2017', 'IPL-2008', 'IPL-2009', 'IPL-2010', 'IPL-2011',
       'IPL-2012', 'IPL-2013', 'IPL-2014', 'IPL-2015', 'IPL-2016',
       'IPL-2018', 'IPL-2019'], dtype=object)

In [257]:
df_1.team1.unique()

array(['Sunrisers Hyderabad', 'Mumbai Indians', 'Gujarat Lions',
       'Rising Pune Supergiant', 'Royal Challengers Bangalore',
       'Kolkata Knight Riders', 'Delhi Daredevils', 'Kings XI Punjab',
       'Chennai Super Kings', 'Rajasthan Royals', 'Deccan Chargers',
       'Kochi Tuskers Kerala', 'Pune Warriors', 'Rising Pune Supergiants',
       'Delhi Capitals'], dtype=object)

## observations:
1. these are very lengthy names we can convert it to respective short forms.
2. there are some spelling mistakes.

In [258]:
df_1.loc[(df_1['team1']=='Sunrisers Hyderabad'),['team1']]='SRH'
df_1.loc[(df_1['team1']=='Mumbai Indians'),['team1']]='MI'
df_1.loc[(df_1['team1']=='Gujarat Lions'),['team1']]='GL'
df_1.loc[(df_1['team1']=='Rising Pune Supergiant'),['team1']]='RPS'
df_1.loc[(df_1['team1']=='Rising Pune Supergiants'),['team1']]='RPS'
df_1.loc[(df_1['team1']=='Royal Challengers Bangalore'),['team1']]='RCB'
df_1.loc[(df_1['team1']=='Kolkata Knight Riders'),['team1']]='KKR'
df_1.loc[(df_1['team1']=='Delhi Daredevils'),['team1']]='DD'
df_1.loc[(df_1['team1']=='Kings XI Punjab'),['team1']]='KXIP'
df_1.loc[(df_1['team1']=='Chennai Super Kings'),['team1']]='CSK'
df_1.loc[(df_1['team1']=='Rajasthan Royals'),['team1']]='RR'
df_1.loc[(df_1['team1']=='Deccan Chargers'),['team1']]='DCR'
df_1.loc[(df_1['team1']=='Kochi Tuskers Kerala'),['team1']]='KTK'
df_1.loc[(df_1['team1']=='Pune Warriors'),['team1']]='PW'
df_1.loc[(df_1['team1']=='Delhi Capitals'),['team1']]='DC'

In [259]:
df_1.team1.unique()

array(['SRH', 'MI', 'GL', 'RPS', 'RCB', 'KKR', 'DD', 'KXIP', 'CSK', 'RR',
       'DCR', 'KTK', 'PW', 'DC'], dtype=object)

In [260]:
df_1.team2.nunique()

15

In [261]:
df_1.loc[(df_1['team2']=='Sunrisers Hyderabad'),['team2']]='SRH'
df_1.loc[(df_1['team2']=='Mumbai Indians'),['team2']]='MI'
df_1.loc[(df_1['team2']=='Gujarat Lions'),['team2']]='GL'
df_1.loc[(df_1['team2']=='Rising Pune Supergiant'),['team2']]='RPS'
df_1.loc[(df_1['team2']=='Rising Pune Supergiants'),['team2']]='RPS'
df_1.loc[(df_1['team2']=='Royal Challengers Bangalore'),['team2']]='RCB'
df_1.loc[(df_1['team2']=='Kolkata Knight Riders'),['team2']]='KKR'
df_1.loc[(df_1['team2']=='Delhi Daredevils'),['team2']]='DD'
df_1.loc[(df_1['team2']=='Kings XI Punjab'),['team2']]='KXIP'
df_1.loc[(df_1['team2']=='Chennai Super Kings'),['team2']]='CSK'
df_1.loc[(df_1['team2']=='Rajasthan Royals'),['team2']]='RR'
df_1.loc[(df_1['team2']=='Deccan Chargers'),['team2']]='DCR'
df_1.loc[(df_1['team2']=='Kochi Tuskers Kerala'),['team2']]='KTK'
df_1.loc[(df_1['team2']=='Pune Warriors'),['team2']]='PW'
df_1.loc[(df_1['team2']=='Delhi Capitals'),['team2']]='DC'

In [262]:
df_1.team2.nunique()

14

In [263]:
df_1.loc[(df_1['toss_winner']=='Sunrisers Hyderabad'),['toss_winner']]='SRH'
df_1.loc[(df_1['toss_winner']=='Mumbai Indians'),['toss_winner']]='MI'
df_1.loc[(df_1['toss_winner']=='Gujarat Lions'),['toss_winner']]='GL'
df_1.loc[(df_1['toss_winner']=='Rising Pune Supergiant'),['toss_winner']]='RPS'
df_1.loc[(df_1['toss_winner']=='Rising Pune Supergiants'),['toss_winner']]='RPS'
df_1.loc[(df_1['toss_winner']=='Royal Challengers Bangalore'),['toss_winner']]='RCB'
df_1.loc[(df_1['toss_winner']=='Kolkata Knight Riders'),['toss_winner']]='KKR'
df_1.loc[(df_1['toss_winner']=='Delhi Daredevils'),['toss_winner']]='DD'
df_1.loc[(df_1['toss_winner']=='Kings XI Punjab'),['toss_winner']]='KXIP'
df_1.loc[(df_1['toss_winner']=='Chennai Super Kings'),['toss_winner']]='CSK'
df_1.loc[(df_1['toss_winner']=='Rajasthan Royals'),['toss_winner']]='RR'
df_1.loc[(df_1['toss_winner']=='Deccan Chargers'),['toss_winner']]='DCR'
df_1.loc[(df_1['toss_winner']=='Kochi Tuskers Kerala'),['toss_winner']]='KTK'
df_1.loc[(df_1['toss_winner']=='Pune Warriors'),['toss_winner']]='PW'
df_1.loc[(df_1['toss_winner']=='Delhi Capitals'),['toss_winner']]='DC'

In [264]:
df_1.loc[(df_1['winner']=='Sunrisers Hyderabad'),['winner']]='SRH'
df_1.loc[(df_1['winner']=='Mumbai Indians'),['winner']]='MI'
df_1.loc[(df_1['winner']=='Gujarat Lions'),['winner']]='GL'
df_1.loc[(df_1['winner']=='Rising Pune Supergiant'),['winner']]='RPS'
df_1.loc[(df_1['winner']=='Rising Pune Supergiants'),['winner']]='RPS'
df_1.loc[(df_1['winner']=='Royal Challengers Bangalore'),['winner']]='RCB'
df_1.loc[(df_1['winner']=='Kolkata Knight Riders'),['winner']]='KKR'
df_1.loc[(df_1['winner']=='Delhi Daredevils'),['winner']]='DD'
df_1.loc[(df_1['winner']=='Kings XI Punjab'),['winner']]='KXIP'
df_1.loc[(df_1['winner']=='Chennai Super Kings'),['winner']]='CSK'
df_1.loc[(df_1['winner']=='Rajasthan Royals'),['winner']]='RR'
df_1.loc[(df_1['winner']=='Deccan Chargers'),['winner']]='DCR'
df_1.loc[(df_1['winner']=='Kochi Tuskers Kerala'),['winner']]='KTK'
df_1.loc[(df_1['winner']=='Pune Warriors'),['winner']]='PW'
df_1.loc[(df_1['winner']=='Delhi Capitals'),['winner']]='DC'

In [265]:
df_1.head()

Unnamed: 0,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue
0,IPL-2017,Hyderabad,05-04-2017,SRH,RCB,RCB,field,normal,0,SRH,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal"
1,IPL-2017,Pune,06-04-2017,MI,RPS,RPS,field,normal,0,RPS,0,7,SPD Smith,Maharashtra Cricket Association Stadium
2,IPL-2017,Rajkot,07-04-2017,GL,KKR,KKR,field,normal,0,KKR,0,10,CA Lynn,Saurashtra Cricket Association Stadium
3,IPL-2017,Indore,08-04-2017,RPS,KXIP,KXIP,field,normal,0,KXIP,0,6,GJ Maxwell,Holkar Cricket Stadium
4,IPL-2017,Bangalore,08-04-2017,RCB,DD,RCB,bat,normal,0,RCB,15,0,KM Jadhav,M Chinnaswamy Stadium


we now have team names as there sort form which will be handy 

In [270]:
df_1.venue.unique()

array(['Rajiv Gandhi International Stadium, Uppal',
       'Maharashtra Cricket Association Stadium',
       'Saurashtra Cricket Association Stadium', 'Holkar Cricket Stadium',
       'M Chinnaswamy Stadium', 'Wankhede Stadium', 'Eden Gardens',
       'Feroz Shah Kotla',
       'Punjab Cricket Association IS Bindra Stadium, Mohali',
       'Green Park', 'Punjab Cricket Association Stadium, Mohali',
       'Sawai Mansingh Stadium', 'MA Chidambaram Stadium, Chepauk',
       'Dr DY Patil Sports Academy', 'Newlands', "St George's Park",
       'Kingsmead', 'SuperSport Park', 'Buffalo Park',
       'New Wanderers Stadium', 'De Beers Diamond Oval',
       'OUTsurance Oval', 'Brabourne Stadium',
       'Sardar Patel Stadium, Motera', 'Barabati Stadium',
       'Vidarbha Cricket Association Stadium, Jamtha',
       'Himachal Pradesh Cricket Association Stadium', 'Nehru Stadium',
       'Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium',
       'Subrata Roy Sahara Stadium',
       'Shaheed V

# spelling mistakes: or repeating with different spelling
M Chinnaswamy Stadium

M. A. Chidambaram Stadium

Feroz Shah Kotla Ground

Punjab Cricket Association IS Bindra Stadium, Mohali

ACA-VDCA Stadium

Rajiv Gandhi Intl. Cricket Stadium

IS Bindra Stadium

lets handle these first

In [271]:
df_1.loc[(df_1['venue']=='M. Chinnaswamy Stadium'),['venue']]='M Chinnaswamy Stadium'

df_1.loc[(df_1['venue']=='M. A. Chidambaram Stadium'),['venue']]='Chepauk Stadium'
df_1.loc[(df_1['venue']=='MA Chidambaram Stadium, Chepauk'),['venue']]='Chepauk Stadium'

df_1.loc[(df_1['venue']=='Feroz Shah Kotla Ground'),['venue']]='Feroz Shah Kotla'

df_1.loc[(df_1['venue']=='Punjab Cricket Association Stadium, Mohali'),['venue']]='Punjab Cricket Association IS Bindra Stadium, Mohali'
df_1.loc[(df_1['venue']=='IS Bindra Stadium'),['venue']]='Punjab Cricket Association IS Bindra Stadium, Mohali'

df_1.loc[(df_1['venue']=='ACA-VDCA Stadium'),['venue']]='Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium'

df_1.loc[(df_1['venue']=='Rajiv Gandhi International Stadium, Uppal'),['venue']]='Rajiv Gandhi Intl. Cricket Stadium'



In [274]:
df_1.venue.unique()

array(['Rajiv Gandhi Intl. Cricket Stadium',
       'Maharashtra Cricket Association Stadium',
       'Saurashtra Cricket Association Stadium', 'Holkar Cricket Stadium',
       'M Chinnaswamy Stadium', 'Wankhede Stadium', 'Eden Gardens',
       'Feroz Shah Kotla',
       'Punjab Cricket Association IS Bindra Stadium, Mohali',
       'Green Park', 'Sawai Mansingh Stadium', 'Chepauk Stadium',
       'Dr DY Patil Sports Academy', 'Newlands', "St George's Park",
       'Kingsmead', 'SuperSport Park', 'Buffalo Park',
       'New Wanderers Stadium', 'De Beers Diamond Oval',
       'OUTsurance Oval', 'Brabourne Stadium',
       'Sardar Patel Stadium, Motera', 'Barabati Stadium',
       'Vidarbha Cricket Association Stadium, Jamtha',
       'Himachal Pradesh Cricket Association Stadium', 'Nehru Stadium',
       'Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium',
       'Subrata Roy Sahara Stadium',
       'Shaheed Veer Narayan Singh International Stadium',
       'JSCA International Stadium

#### we have handled the repeated names with different spellings, if we find more while visualization we will make changes accordingly.

In [275]:
df_1.head()

Unnamed: 0,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue
0,IPL-2017,Hyderabad,05-04-2017,SRH,RCB,RCB,field,normal,0,SRH,35,0,Yuvraj Singh,Rajiv Gandhi Intl. Cricket Stadium
1,IPL-2017,Pune,06-04-2017,MI,RPS,RPS,field,normal,0,RPS,0,7,SPD Smith,Maharashtra Cricket Association Stadium
2,IPL-2017,Rajkot,07-04-2017,GL,KKR,KKR,field,normal,0,KKR,0,10,CA Lynn,Saurashtra Cricket Association Stadium
3,IPL-2017,Indore,08-04-2017,RPS,KXIP,KXIP,field,normal,0,KXIP,0,6,GJ Maxwell,Holkar Cricket Stadium
4,IPL-2017,Bangalore,08-04-2017,RCB,DD,RCB,bat,normal,0,RCB,15,0,KM Jadhav,M Chinnaswamy Stadium


#### Note: since we have venues we can drop the city feature as venues are more meaningful than cities because same city may have more than one venue as in case of Mumbai

In [277]:
df_1.drop('city',inplace=True,axis=1)

In [278]:
df_1.head()

Unnamed: 0,Season,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue
0,IPL-2017,05-04-2017,SRH,RCB,RCB,field,normal,0,SRH,35,0,Yuvraj Singh,Rajiv Gandhi Intl. Cricket Stadium
1,IPL-2017,06-04-2017,MI,RPS,RPS,field,normal,0,RPS,0,7,SPD Smith,Maharashtra Cricket Association Stadium
2,IPL-2017,07-04-2017,GL,KKR,KKR,field,normal,0,KKR,0,10,CA Lynn,Saurashtra Cricket Association Stadium
3,IPL-2017,08-04-2017,RPS,KXIP,KXIP,field,normal,0,KXIP,0,6,GJ Maxwell,Holkar Cricket Stadium
4,IPL-2017,08-04-2017,RCB,DD,RCB,bat,normal,0,RCB,15,0,KM Jadhav,M Chinnaswamy Stadium


In [279]:
df_1.shape

(752, 13)

In [281]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 752 entries, 0 to 755
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Season           752 non-null    object
 1   date             752 non-null    object
 2   team1            752 non-null    object
 3   team2            752 non-null    object
 4   toss_winner      752 non-null    object
 5   toss_decision    752 non-null    object
 6   result           752 non-null    object
 7   dl_applied       752 non-null    int64 
 8   winner           752 non-null    object
 9   win_by_runs      752 non-null    int64 
 10  win_by_wickets   752 non-null    int64 
 11  player_of_match  752 non-null    object
 12  venue            752 non-null    object
dtypes: int64(3), object(10)
memory usage: 82.2+ KB


#### NOTE: so far we have cleaned our data and we can proceed for visuaization.

In [285]:
# saving this DataFrame 
df_1.to_excel('MATCHES_clean.xlsx',index=False)
# we will do the visualization part on this xlsx file.