# INDIAN PREMIER LEAGUE DATA ANALYTICS
The Indian Premier League (IPL) is a professional Twenty20 cricket league in India usually contested between March and May of every year by eight teams representing eight different cities or states in India. The league was founded by
the Board of Control for Cricket in India (BCCI) in 2007. The IPL is the most-attended cricket league in the world and the brand value of the IPL in 2019 was ₹475 billion (US$6.7 billion)
### Dataset Exploratory Analysis
#### Dataset link: https://www.kaggle.com/datasets/patrickb1912/ipl-complete-dataset-20082020


## Exploratory Data analysis
### Utilises Medallion architecture for understanding, transforming and producing the final dataset used further for Training ML models
#### Medallion architecture contains of Bronze, Silver and Gold Layer of Dataset analysis


### Bronze layer 
#### Loading of raw data in notebook, loading of different libraries required

In [1]:
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly
from plotly.offline import plot, iplot
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Load data
matches = pd.read_csv('matches.csv')
deliveries = pd.read_csv('deliveries.csv')

### Silver layer 
#### Understanding, Cleaning, standardizing, filtering dataset

#### Describe Data

In [7]:
print(f"Matches DataFrame shape: {matches.shape} (rows, columns)")
print(f"Deliveries DataFrame shape: {deliveries.shape} (rows, columns)")


Matches DataFrame shape: (1095, 20) (rows, columns)
Deliveries DataFrame shape: (260920, 17) (rows, columns)


In [4]:
matches.head()

Unnamed: 0,id,season,city,date,match_type,player_of_match,venue,team1,team2,toss_winner,toss_decision,winner,result,result_margin,target_runs,target_overs,super_over,method,umpire1,umpire2
0,335982,2007/08,Bangalore,2008-04-18,League,BB McCullum,M Chinnaswamy Stadium,Royal Challengers Bangalore,Kolkata Knight Riders,Royal Challengers Bangalore,field,Kolkata Knight Riders,runs,140.0,223.0,20.0,N,,Asad Rauf,RE Koertzen
1,335983,2007/08,Chandigarh,2008-04-19,League,MEK Hussey,"Punjab Cricket Association Stadium, Mohali",Kings XI Punjab,Chennai Super Kings,Chennai Super Kings,bat,Chennai Super Kings,runs,33.0,241.0,20.0,N,,MR Benson,SL Shastri
2,335984,2007/08,Delhi,2008-04-19,League,MF Maharoof,Feroz Shah Kotla,Delhi Daredevils,Rajasthan Royals,Rajasthan Royals,bat,Delhi Daredevils,wickets,9.0,130.0,20.0,N,,Aleem Dar,GA Pratapkumar
3,335985,2007/08,Mumbai,2008-04-20,League,MV Boucher,Wankhede Stadium,Mumbai Indians,Royal Challengers Bangalore,Mumbai Indians,bat,Royal Challengers Bangalore,wickets,5.0,166.0,20.0,N,,SJ Davis,DJ Harper
4,335986,2007/08,Kolkata,2008-04-20,League,DJ Hussey,Eden Gardens,Kolkata Knight Riders,Deccan Chargers,Deccan Chargers,bat,Kolkata Knight Riders,wickets,5.0,111.0,20.0,N,,BF Bowden,K Hariharan


In [8]:
matches.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1095 entries, 0 to 1094
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   id               1095 non-null   int64  
 1   season           1095 non-null   object 
 2   city             1044 non-null   object 
 3   date             1095 non-null   object 
 4   match_type       1095 non-null   object 
 5   player_of_match  1090 non-null   object 
 6   venue            1095 non-null   object 
 7   team1            1095 non-null   object 
 8   team2            1095 non-null   object 
 9   toss_winner      1095 non-null   object 
 10  toss_decision    1095 non-null   object 
 11  winner           1090 non-null   object 
 12  result           1095 non-null   object 
 13  result_margin    1076 non-null   float64
 14  target_runs      1092 non-null   float64
 15  target_overs     1092 non-null   float64
 16  super_over       1095 non-null   object 
 17  method        

In [5]:
deliveries.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,


In [9]:
deliveries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260920 entries, 0 to 260919
Data columns (total 17 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   match_id          260920 non-null  int64 
 1   inning            260920 non-null  int64 
 2   batting_team      260920 non-null  object
 3   bowling_team      260920 non-null  object
 4   over              260920 non-null  int64 
 5   ball              260920 non-null  int64 
 6   batter            260920 non-null  object
 7   bowler            260920 non-null  object
 8   non_striker       260920 non-null  object
 9   batsman_runs      260920 non-null  int64 
 10  extra_runs        260920 non-null  int64 
 11  total_runs        260920 non-null  int64 
 12  extras_type       14125 non-null   object
 13  is_wicket         260920 non-null  int64 
 14  player_dismissed  12950 non-null   object
 15  dismissal_kind    12950 non-null   object
 16  fielder           9354 non-null    obj

In [11]:
matches.describe()

Unnamed: 0,id,result_margin,target_runs,target_overs
count,1095.0,1076.0,1092.0,1092.0
mean,904828.3,17.259294,165.684066,19.759341
std,367740.2,21.787444,33.427048,1.581108
min,335982.0,1.0,43.0,5.0
25%,548331.5,6.0,146.0,20.0
50%,980961.0,8.0,166.0,20.0
75%,1254062.0,20.0,187.0,20.0
max,1426312.0,146.0,288.0,20.0


In [13]:
matches.describe(include='object')

Unnamed: 0,season,city,date,match_type,player_of_match,venue,team1,team2,toss_winner,toss_decision,winner,result,super_over,method,umpire1,umpire2
count,1095,1044,1095,1095,1090,1095,1095,1095,1095,1095,1090,1095,1095,21,1095,1095
unique,17,36,823,8,291,58,19,19,19,2,19,4,2,1,62,62
top,2013,Mumbai,2017-04-09,League,AB de Villiers,Eden Gardens,Royal Challengers Bangalore,Mumbai Indians,Mumbai Indians,field,Mumbai Indians,wickets,N,D/L,AK Chaudhary,S Ravi
freq,76,173,2,1029,25,77,135,138,143,704,144,578,1081,21,115,83


In [12]:
deliveries.describe()

Unnamed: 0,match_id,inning,over,ball,batsman_runs,extra_runs,total_runs,is_wicket
count,260920.0,260920.0,260920.0,260920.0,260920.0,260920.0,260920.0,260920.0
mean,907066.5,1.483531,9.197677,3.624486,1.265001,0.067806,1.332807,0.049632
std,367991.3,0.502643,5.683484,1.81492,1.639298,0.343265,1.626416,0.217184
min,335982.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
25%,548334.0,1.0,4.0,2.0,0.0,0.0,0.0,0.0
50%,980967.0,1.0,9.0,4.0,1.0,0.0,1.0,0.0
75%,1254066.0,2.0,14.0,5.0,1.0,0.0,1.0,0.0
max,1426312.0,6.0,19.0,11.0,6.0,7.0,7.0,1.0


In [14]:
deliveries.describe(include='object')

Unnamed: 0,batting_team,bowling_team,batter,bowler,non_striker,extras_type,player_dismissed,dismissal_kind,fielder
count,260920,260920,260920,260920,260920,14125,12950,12950,9354
unique,19,19,673,530,663,5,629,10,607
top,Mumbai Indians,Mumbai Indians,V Kohli,R Ashwin,V Kohli,wides,RG Sharma,caught,MS Dhoni
freq,31437,31505,6236,4679,6067,8380,223,8063,220


#### Explore Data