# IPL 2020 DATA ANALYSIS 




## Problem Statement
We have gathered the data for each of the player sold in the IPL 2022 auction and that of the retained players from each franchise.


**The data contain Information like-** Matches played , Runs , Wickets , Average , Strike rate , Catches , Runouts , Stumps etc.

## Objective 
Try to create the best playing 11 from these set of players from the current campaign.

## Steps Involved 
1. Extracting and Loading Data 
2. Data Cleaning 
3. Analyzing Data on different parameters.
4. Visualizing the important statistics.
5. Forming the best team of 11- players based on the findings i.e How many batters , bowlers , and allrounders should be included in the team.

In [11]:
# Importing Libraries 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 

## Loading the Data 

In [12]:
data = pd.read_csv("IPLData.csv")

In [9]:
#first look of the data , head() returns the first five rows of the dataset.

data.head()

Unnamed: 0,Player Name,Team,Nationality,Player_Type,Capped,Matches_Played,Runs,Average,Strike_Rate,Wickets,Bowling_average,Economy,Bowling_Strike_Rate,Catches,Run_outs,Stumps
0,Shikhar Dhawan,Punjab,Indian,Batter,1,192.0,5783.0,34.63,126.6,4.0,16.5,8.25,12.0,,,
1,Shreyas Iyer,Kolkata,Indian,Batter,1,87.0,2375.0,31.67,123.96,,,,,,,
2,Faf Du Plessis,Bangalore,Overseas,Batter,1,100.0,2935.0,34.94,131.09,,,,,,,
3,Manish Pandey,Lucknow,Indian,Batter,1,154.0,3560.0,30.69,121.83,,,,,,,
4,Shimron Hetmyer,Rajasthan,Overseas,Batter,1,31.0,517.0,25.85,151.17,,,,,,,


In [10]:
#describe function will return all the necessary information like count , mean std deviation etc for the entire dataset.

data.describe()

Unnamed: 0,Capped,Matches_Played,Runs,Average,Strike_Rate,Wickets,Bowling_average,Economy,Bowling_Strike_Rate,Catches,Run_outs,Stumps
count,235.0,215.0,165.0,161.0,163.0,140.0,135.0,143.0,119.0,27.0,27.0,27.0
mean,0.838298,43.897674,840.575758,21.792391,121.009939,31.485714,32.907185,8.223182,24.686134,30.962963,3.444444,6.259259
std,0.561802,48.695302,1270.341831,11.664156,30.739189,36.87242,18.191441,1.223541,12.982049,34.544822,5.010246,9.92895
min,0.0,1.0,0.0,0.0,0.0,0.0,0.0,5.36,0.0,0.0,0.0,0.0
25%,0.5,11.5,67.0,13.8,112.635,6.0,23.025,7.39,18.495,3.5,0.0,0.0
50%,1.0,25.0,289.0,22.41,128.63,19.5,29.07,8.19,21.75,19.0,1.0,2.0
75%,1.0,56.0,954.0,29.3,137.55,40.5,36.03,8.785,26.19,51.5,4.0,7.0
max,2.0,220.0,6283.0,58.5,190.24,167.0,153.0,13.12,108.0,126.0,21.0,39.0


In [11]:
#we can check the number of null values using the isna().sum() method.

data.isna().sum()

Player Name              0
Team                     0
Nationality              0
Player_Type              0
Capped                   0
Matches_Played          20
Runs                    70
Average                 74
Strike_Rate             72
Wickets                 95
Bowling_average        100
Economy                 92
Bowling_Strike_Rate    116
Catches                208
Run_outs               208
Stumps                 208
dtype: int64

In [12]:
#The info method will return the information about the dataset like the non null objects and the data type of each of the elements in the data.
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 235 entries, 0 to 234
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Player Name          235 non-null    object 
 1   Team                 235 non-null    object 
 2   Nationality          235 non-null    object 
 3   Player_Type          235 non-null    object 
 4   Capped               235 non-null    int64  
 5   Matches_Played       215 non-null    float64
 6   Runs                 165 non-null    float64
 7   Average              161 non-null    float64
 8   Strike_Rate          163 non-null    float64
 9   Wickets              140 non-null    float64
 10  Bowling_average      135 non-null    float64
 11  Economy              143 non-null    float64
 12  Bowling_Strike_Rate  119 non-null    float64
 13  Catches              27 non-null     float64
 14  Run_outs             27 non-null     float64
 15  Stumps               27 non-null     flo

## Cleaning the Data

In [13]:
#segregating Data - Capped Batters
#We have kept the batting parameters to restrict the analysis for batter specific tendencies.

batters  = data.loc[(data["Player_Type"] == "Batter")]

batters_new = batters.loc[(batters["Capped"] == 1)]

Capped_Batters = batters_new[['Player Name',
                            'Team',
                            'Nationality',
                            'Matches_Played',
                            'Runs',
                            'Average',
                            'Strike_Rate']]

In [14]:
#the segregation of the batters based on the capped data gives us the batters that have a history of playing IPL matches.
Capped_Batters.head()

Unnamed: 0,Player Name,Team,Nationality,Matches_Played,Runs,Average,Strike_Rate
0,Shikhar Dhawan,Punjab,Indian,192.0,5783.0,34.63,126.6
1,Shreyas Iyer,Kolkata,Indian,87.0,2375.0,31.67,123.96
2,Faf Du Plessis,Bangalore,Overseas,100.0,2935.0,34.94,131.09
3,Manish Pandey,Lucknow,Indian,154.0,3560.0,30.69,121.83
4,Shimron Hetmyer,Rajasthan,Overseas,31.0,517.0,25.85,151.17


In [15]:
#segregating Data - Capped Bowlers
#We have kept the bowling parameters to restrict the analysis for bowler specific tendencies.

bowlers  = data.loc[(data["Player_Type"] == "Bowler ")]

bowlers_new = bowlers.loc[(bowlers["Capped"] == 1)]

Capped_Bowlers = bowlers_new[['Player Name',
                            'Team',
                            'Nationality',
                            'Matches_Played',
                            'Wickets',
                            'Bowling_average',
                            'Economy',
                            'Bowling_Strike_Rate']]

In [17]:
#the segregation of the bowlers based on the capped data gives us the bowlers that have a history of playing IPL matches.
Capped_Bowlers.head()

Unnamed: 0,Player Name,Team,Nationality,Matches_Played,Wickets,Bowling_average,Economy,Bowling_Strike_Rate
36,Kagiso Rabada,Punjab,Overseas,50.0,76.0,20.53,8.21,15.0
37,Trent Boult,Rajasthan,Overseas,62.0,76.0,26.09,8.4,18.64
38,Mohammad Shami,Gujarat,Indian,77.0,79.0,30.41,8.63,21.14
39,T Natarajan,Hyderabad,Indian,24.0,20.0,34.4,8.24,25.05
40,Deepak Chahar,Chennai,Indian,63.0,59.0,29.19,7.8,22.44
