# Pandas GroupBy Method Explained Using IPL Matches Dataset

## Introduction
 - This notebook will demonstrate the use of the `groupby` method in pandas using the IPL (Indian Premier League) matches dataset. The `groupby` method is a powerful tool for aggregating data and performing various operations on grouped data.


In [3]:

## Importing Libraries

import pandas as pd
import numpy as np


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#Loading the Dataset
 - For this notebook, we will use a sample IPL matches dataset. You can download the dataset from Kaggle or any other source that provides IPL match data.

In [4]:
# Load the IPL matches dataset

df = pd.read_csv("/content/drive/MyDrive/kapil/Pandas/Data/IPLMatches.csv")


# Display the first few rows of the dataset
df.head()


Unnamed: 0,ID,City,Date,Season,MatchNumber,Team1,Team2,Venue,TossWinner,TossDecision,SuperOver,WinningTeam,WonBy,Margin,method,Player_of_Match,Team1Players,Team2Players,Umpire1,Umpire2
0,1312200,Ahmedabad,2022-05-29,2022,Final,Rajasthan Royals,Gujarat Titans,"Narendra Modi Stadium, Ahmedabad",Rajasthan Royals,bat,N,Gujarat Titans,Wickets,7.0,,HH Pandya,"['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...","['WP Saha', 'Shubman Gill', 'MS Wade', 'HH Pan...",CB Gaffaney,Nitin Menon
1,1312199,Ahmedabad,2022-05-27,2022,Qualifier 2,Royal Challengers Bangalore,Rajasthan Royals,"Narendra Modi Stadium, Ahmedabad",Rajasthan Royals,field,N,Rajasthan Royals,Wickets,7.0,,JC Buttler,"['V Kohli', 'F du Plessis', 'RM Patidar', 'GJ ...","['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...",CB Gaffaney,Nitin Menon
2,1312198,Kolkata,2022-05-25,2022,Eliminator,Royal Challengers Bangalore,Lucknow Super Giants,"Eden Gardens, Kolkata",Lucknow Super Giants,field,N,Royal Challengers Bangalore,Runs,14.0,,RM Patidar,"['V Kohli', 'F du Plessis', 'RM Patidar', 'GJ ...","['Q de Kock', 'KL Rahul', 'M Vohra', 'DJ Hooda...",J Madanagopal,MA Gough
3,1312197,Kolkata,2022-05-24,2022,Qualifier 1,Rajasthan Royals,Gujarat Titans,"Eden Gardens, Kolkata",Gujarat Titans,field,N,Gujarat Titans,Wickets,7.0,,DA Miller,"['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...","['WP Saha', 'Shubman Gill', 'MS Wade', 'HH Pan...",BNJ Oxenford,VK Sharma
4,1304116,Mumbai,2022-05-22,2022,70,Sunrisers Hyderabad,Punjab Kings,"Wankhede Stadium, Mumbai",Sunrisers Hyderabad,bat,N,Punjab Kings,Wickets,5.0,,Harpreet Brar,"['PK Garg', 'Abhishek Sharma', 'RA Tripathi', ...","['JM Bairstow', 'S Dhawan', 'M Shahrukh Khan',...",AK Chaudhary,NA Patwardhan


In [5]:
df.columns

Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
       'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
       'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
       'Team2Players', 'Umpire1', 'Umpire2'],
      dtype='object')

#Understanding the Dataset
- Let's get a basic understanding of the dataset by checking its columns and some basic statistic

In [6]:
# Display basic information about the dataset
df.info()

# Display basic statistics of the dataset
df.describe()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   ID               950 non-null    int64  
 1   City             899 non-null    object 
 2   Date             950 non-null    object 
 3   Season           950 non-null    object 
 4   MatchNumber      950 non-null    object 
 5   Team1            950 non-null    object 
 6   Team2            950 non-null    object 
 7   Venue            950 non-null    object 
 8   TossWinner       950 non-null    object 
 9   TossDecision     950 non-null    object 
 10  SuperOver        946 non-null    object 
 11  WinningTeam      946 non-null    object 
 12  WonBy            950 non-null    object 
 13  Margin           932 non-null    float64
 14  method           19 non-null     object 
 15  Player_of_Match  946 non-null    object 
 16  Team1Players     950 non-null    object 
 17  Team2Players    

Unnamed: 0,ID,Margin
count,950.0,932.0
mean,830485.2,17.056867
std,337567.8,21.633109
min,335982.0,1.0
25%,501261.2,6.0
50%,829738.0,8.0
75%,1175372.0,19.0
max,1312200.0,146.0


#GroupBy Basics
 - The groupby method is used to split the data into groups based on some criteria. Once the data is split, we can apply various aggregation functions on the groups.

##Example 1.1: Matches per Season
 - Let's group the dataset by the season column and count the total number of matches played in each season.

In [7]:
# Group by the 'season' column and count the number of matches in each season
matches_per_season = df.groupby('Season')
# Display the result

matches_per_season


<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7e9a6c760850>

In [8]:
type(matches_per_season)

In [9]:
matches_per_season.groups

{'2007/08': [892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949], '2009': [835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891], '2009/10': [775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834], '2011': [702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717,

In [10]:
for name, group in matches_per_season:
    print(f"Group name: {name}")
    # print(group)

Group name: 2007/08
Group name: 2009
Group name: 2009/10
Group name: 2011
Group name: 2012
Group name: 2013
Group name: 2014
Group name: 2015
Group name: 2016
Group name: 2017
Group name: 2018
Group name: 2019
Group name: 2020/21
Group name: 2021
Group name: 2022


In [11]:
# Group by the 'season' column and count the number of matches in each season
matches_per_season = df.groupby('Season').size()
# Display the result

matches_per_season

Season
2007/08    58
2009       57
2009/10    60
2011       73
2012       74
2013       76
2014       60
2015       59
2016       60
2017       59
2018       60
2019       60
2020/21    60
2021       60
2022       74
dtype: int64

##Example 1.2: Matches per City
 - Group the dataset by the city column to find out how many matches were played in each city.

In [12]:
# Group by the 'city' column and count the number of matches in each city
matches_per_city = df.groupby('City').size()
# Display the result
matches_per_city


City
Abu Dhabi          37
Ahmedabad          19
Bangalore          65
Bengaluru          15
Bloemfontein        2
Cape Town           7
Centurion          12
Chandigarh         56
Chennai            67
Cuttack             7
Delhi              78
Dharamsala          9
Dubai              13
Durban             15
East London         3
Hyderabad          64
Indore              9
Jaipur             47
Johannesburg        8
Kanpur              4
Kimberley           3
Kochi               5
Kolkata            79
Mumbai            159
Nagpur              3
Navi Mumbai         9
Port Elizabeth      7
Pune               51
Raipur              6
Rajkot             10
Ranchi              7
Sharjah            10
Visakhapatnam      13
dtype: int64

In [13]:
matches_per_city = df.groupby('City').size().sort_values(ascending=False)
matches_per_city

City
Mumbai            159
Kolkata            79
Delhi              78
Chennai            67
Bangalore          65
Hyderabad          64
Chandigarh         56
Pune               51
Jaipur             47
Abu Dhabi          37
Ahmedabad          19
Durban             15
Bengaluru          15
Visakhapatnam      13
Dubai              13
Centurion          12
Sharjah            10
Rajkot             10
Indore              9
Navi Mumbai         9
Dharamsala          9
Johannesburg        8
Ranchi              7
Port Elizabeth      7
Cape Town           7
Cuttack             7
Raipur              6
Kochi               5
Kanpur              4
Nagpur              3
Kimberley           3
East London         3
Bloemfontein        2
dtype: int64

##Example 1.3: Wins per Team
 - Group the dataset by the winner column to see how many matches each team has won.

In [14]:
# Group by the 'winner' column and count the number of wins for each team
wins_per_team = df.groupby('WinningTeam').size()
# Display the result
wins_per_team


WinningTeam
Chennai Super Kings            121
Deccan Chargers                 29
Delhi Capitals                  36
Delhi Daredevils                67
Gujarat Lions                   13
Gujarat Titans                  12
Kings XI Punjab                 88
Kochi Tuskers Kerala             6
Kolkata Knight Riders          114
Lucknow Super Giants             9
Mumbai Indians                 131
Pune Warriors                   12
Punjab Kings                    13
Rajasthan Royals                96
Rising Pune Supergiant          10
Rising Pune Supergiants          5
Royal Challengers Bangalore    109
Sunrisers Hyderabad             75
dtype: int64

In [15]:
# Group by the 'winner' column and count the number of wins for each team
wins_per_team = df.groupby('WinningTeam')
# Display the result
wins_per_team

for name, group in wins_per_team:
    print(f"Group name: {name}")
    print(group)

Group name: Chennai Super Kings
          ID         City        Date   Season MatchNumber  \
19   1304101  Navi Mumbai  2022-05-08     2022          55   
28   1304092         Pune  2022-05-01     2022          46   
41   1304079  Navi Mumbai  2022-04-21     2022          33   
52   1304068       Mumbai  2022-04-12     2022          22   
74   1254117        Dubai  2021-10-15     2021       Final   
..       ...          ...         ...      ...         ...   
921   336009        Delhi  2008-05-08  2007/08          28   
935   335996    Bangalore  2008-04-28  2007/08          15   
938   335993      Chennai  2008-04-26  2007/08          11   
942   335989      Chennai  2008-04-23  2007/08           8   
948   335983   Chandigarh  2008-04-19  2007/08           2   

                           Team1                        Team2  \
19           Chennai Super Kings               Delhi Capitals   
28           Chennai Super Kings          Sunrisers Hyderabad   
41                Mumbai Ind

In [16]:
# Group by the 'winner' column and count the number of wins for each team
wins_per_team = df.groupby('WinningTeam').size().sort_values(ascending=True)
# Display the result
wins_per_team


WinningTeam
Rising Pune Supergiants          5
Kochi Tuskers Kerala             6
Lucknow Super Giants             9
Rising Pune Supergiant          10
Gujarat Titans                  12
Pune Warriors                   12
Gujarat Lions                   13
Punjab Kings                    13
Deccan Chargers                 29
Delhi Capitals                  36
Delhi Daredevils                67
Sunrisers Hyderabad             75
Kings XI Punjab                 88
Rajasthan Royals                96
Royal Challengers Bangalore    109
Kolkata Knight Riders          114
Chennai Super Kings            121
Mumbai Indians                 131
dtype: int64

#Example 2: Grouping by Multiple Columns
 - We can also group by multiple columns. Let's group the dataset by season and city to find out how many matches were played in each city for each season.

##Example 2.1: Matches per Season and City
 - Group the dataset by season and city to find out how many matches were played in each city for each season.

In [17]:
# Group by 'season' and 'city' and count the number of matches in each group
matches_per_season_city = df.groupby(['Season', 'City']).size()
# Display the result
matches_per_season_city.head(20)


Season   City          
2007/08  Bangalore          7
         Chandigarh         7
         Chennai            7
         Delhi              6
         Hyderabad          7
         Jaipur             7
         Kolkata            7
         Mumbai            10
2009     Bloemfontein       2
         Cape Town          7
         Centurion         12
         Durban            15
         East London        3
         Johannesburg       8
         Kimberley          3
         Port Elizabeth     7
2009/10  Ahmedabad          4
         Bangalore          7
         Chandigarh         5
         Chennai            7
dtype: int64

##Example 2.2: Wins per Season and Team
 - Group the dataset by season and winner to see how many matches each team has won in each season.

In [18]:
df.columns


Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
       'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
       'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
       'Team2Players', 'Umpire1', 'Umpire2'],
      dtype='object')

In [19]:
# Group by 'season' and 'winner' and count the number of wins in each group
wins_per_season_team = df.groupby(['Season', 'WinningTeam']).size()
# Display the result
wins_per_season_team


Season   WinningTeam                
2007/08  Chennai Super Kings             9
         Deccan Chargers                 2
         Delhi Daredevils                7
         Kings XI Punjab                10
         Kolkata Knight Riders           6
                                        ..
2022     Mumbai Indians                  4
         Punjab Kings                    7
         Rajasthan Royals               10
         Royal Challengers Bangalore     9
         Sunrisers Hyderabad             6
Length: 126, dtype: int64

##For every season find the final winners

In [20]:
#first find all the final matches
# filter the winners

finalMask = df['MatchNumber'] == 'Final'

final_win = df[finalMask]

final_win["WinningTeam"]

0             Gujarat Titans
74       Chennai Super Kings
134           Mumbai Indians
194           Mumbai Indians
254      Chennai Super Kings
314           Mumbai Indians
373      Sunrisers Hyderabad
433           Mumbai Indians
492    Kolkata Knight Riders
552           Mumbai Indians
628    Kolkata Knight Riders
702      Chennai Super Kings
775      Chennai Super Kings
835          Deccan Chargers
892         Rajasthan Royals
Name: WinningTeam, dtype: object

In [21]:

df[finalMask][['Season','WinningTeam']]


Unnamed: 0,Season,WinningTeam
0,2022,Gujarat Titans
74,2021,Chennai Super Kings
134,2020/21,Mumbai Indians
194,2019,Mumbai Indians
254,2018,Chennai Super Kings
314,2017,Mumbai Indians
373,2016,Sunrisers Hyderabad
433,2015,Mumbai Indians
492,2014,Kolkata Knight Riders
552,2013,Mumbai Indians


##Find all the matches that have had super overs

In [22]:
df.columns

Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
       'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
       'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
       'Team2Players', 'Umpire1', 'Umpire2'],
      dtype='object')

In [23]:
df

Unnamed: 0,ID,City,Date,Season,MatchNumber,Team1,Team2,Venue,TossWinner,TossDecision,SuperOver,WinningTeam,WonBy,Margin,method,Player_of_Match,Team1Players,Team2Players,Umpire1,Umpire2
0,1312200,Ahmedabad,2022-05-29,2022,Final,Rajasthan Royals,Gujarat Titans,"Narendra Modi Stadium, Ahmedabad",Rajasthan Royals,bat,N,Gujarat Titans,Wickets,7.0,,HH Pandya,"['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...","['WP Saha', 'Shubman Gill', 'MS Wade', 'HH Pan...",CB Gaffaney,Nitin Menon
1,1312199,Ahmedabad,2022-05-27,2022,Qualifier 2,Royal Challengers Bangalore,Rajasthan Royals,"Narendra Modi Stadium, Ahmedabad",Rajasthan Royals,field,N,Rajasthan Royals,Wickets,7.0,,JC Buttler,"['V Kohli', 'F du Plessis', 'RM Patidar', 'GJ ...","['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...",CB Gaffaney,Nitin Menon
2,1312198,Kolkata,2022-05-25,2022,Eliminator,Royal Challengers Bangalore,Lucknow Super Giants,"Eden Gardens, Kolkata",Lucknow Super Giants,field,N,Royal Challengers Bangalore,Runs,14.0,,RM Patidar,"['V Kohli', 'F du Plessis', 'RM Patidar', 'GJ ...","['Q de Kock', 'KL Rahul', 'M Vohra', 'DJ Hooda...",J Madanagopal,MA Gough
3,1312197,Kolkata,2022-05-24,2022,Qualifier 1,Rajasthan Royals,Gujarat Titans,"Eden Gardens, Kolkata",Gujarat Titans,field,N,Gujarat Titans,Wickets,7.0,,DA Miller,"['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...","['WP Saha', 'Shubman Gill', 'MS Wade', 'HH Pan...",BNJ Oxenford,VK Sharma
4,1304116,Mumbai,2022-05-22,2022,70,Sunrisers Hyderabad,Punjab Kings,"Wankhede Stadium, Mumbai",Sunrisers Hyderabad,bat,N,Punjab Kings,Wickets,5.0,,Harpreet Brar,"['PK Garg', 'Abhishek Sharma', 'RA Tripathi', ...","['JM Bairstow', 'S Dhawan', 'M Shahrukh Khan',...",AK Chaudhary,NA Patwardhan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
945,335986,Kolkata,2008-04-20,2007/08,4,Kolkata Knight Riders,Deccan Chargers,Eden Gardens,Deccan Chargers,bat,N,Kolkata Knight Riders,Wickets,5.0,,DJ Hussey,"['WP Saha', 'BB McCullum', 'RT Ponting', 'SC G...","['AC Gilchrist', 'Y Venugopal Rao', 'VVS Laxma...",BF Bowden,K Hariharan
946,335985,Mumbai,2008-04-20,2007/08,5,Mumbai Indians,Royal Challengers Bangalore,Wankhede Stadium,Mumbai Indians,bat,N,Royal Challengers Bangalore,Wickets,5.0,,MV Boucher,"['L Ronchi', 'ST Jayasuriya', 'DJ Thornely', '...","['S Chanderpaul', 'R Dravid', 'LRPL Taylor', '...",SJ Davis,DJ Harper
947,335984,Delhi,2008-04-19,2007/08,3,Delhi Daredevils,Rajasthan Royals,Feroz Shah Kotla,Rajasthan Royals,bat,N,Delhi Daredevils,Wickets,9.0,,MF Maharoof,"['G Gambhir', 'V Sehwag', 'S Dhawan', 'MK Tiwa...","['T Kohli', 'YK Pathan', 'SR Watson', 'M Kaif'...",Aleem Dar,GA Pratapkumar
948,335983,Chandigarh,2008-04-19,2007/08,2,Kings XI Punjab,Chennai Super Kings,"Punjab Cricket Association Stadium, Mohali",Chennai Super Kings,bat,N,Chennai Super Kings,Runs,33.0,,MEK Hussey,"['K Goel', 'JR Hopes', 'KC Sangakkara', 'Yuvra...","['PA Patel', 'ML Hayden', 'MEK Hussey', 'MS Dh...",MR Benson,SL Shastri


In [24]:
df['SuperOver']=='Y'

0      False
1      False
2      False
3      False
4      False
       ...  
945    False
946    False
947    False
948    False
949    False
Name: SuperOver, Length: 950, dtype: bool

In [25]:
df[df['SuperOver']=='Y'].shape[0]

14

In [26]:
len(df[df['SuperOver']=='Y'])

14

In [27]:
(df['SuperOver']=='Y').sum() # all the trues are added , gets us the same answer

14

#How many times did Punjab Kings win in Delhi

In [28]:
(df['WinningTeam']=="Punjab Kings").sum()

13

In [29]:
df[df['WinningTeam']=="Punjab Kings"]


Unnamed: 0,ID,City,Date,Season,MatchNumber,Team1,Team2,Venue,TossWinner,TossDecision,SuperOver,WinningTeam,WonBy,Margin,method,Player_of_Match,Team1Players,Team2Players,Umpire1,Umpire2
4,1304116,Mumbai,2022-05-22,2022,70,Sunrisers Hyderabad,Punjab Kings,"Wankhede Stadium, Mumbai",Sunrisers Hyderabad,bat,N,Punjab Kings,Wickets,5.0,,Harpreet Brar,"['PK Garg', 'Abhishek Sharma', 'RA Tripathi', ...","['JM Bairstow', 'S Dhawan', 'M Shahrukh Khan',...",AK Chaudhary,NA Patwardhan
14,1304106,Mumbai,2022-05-13,2022,60,Punjab Kings,Royal Challengers Bangalore,"Brabourne Stadium, Mumbai",Royal Challengers Bangalore,field,N,Punjab Kings,Runs,54.0,,JM Bairstow,"['JM Bairstow', 'S Dhawan', 'PBB Rajapaksa', '...","['V Kohli', 'F du Plessis', 'RM Patidar', 'MK ...",J Madanagopal,N Pandit
26,1304094,Navi Mumbai,2022-05-03,2022,48,Gujarat Titans,Punjab Kings,"Dr DY Patil Sports Academy, Mumbai",Gujarat Titans,bat,N,Punjab Kings,Wickets,8.0,,K Rabada,"['WP Saha', 'Shubman Gill', 'B Sai Sudharsan',...","['JM Bairstow', 'S Dhawan', 'PBB Rajapaksa', '...",R Pandit,VK Sharma
36,1304084,Mumbai,2022-04-25,2022,38,Punjab Kings,Chennai Super Kings,"Wankhede Stadium, Mumbai",Chennai Super Kings,field,N,Punjab Kings,Runs,11.0,,S Dhawan,"['MA Agarwal', 'S Dhawan', 'PBB Rajapaksa', 'L...","['RD Gaikwad', 'RV Uthappa', 'MJ Santner', 'S ...",M Erasmus,Tapan Sharma
51,1304069,Pune,2022-04-13,2022,23,Punjab Kings,Mumbai Indians,"Maharashtra Cricket Association Stadium, Pune",Mumbai Indians,field,N,Punjab Kings,Runs,12.0,,MA Agarwal,"['MA Agarwal', 'S Dhawan', 'JM Bairstow', 'LS ...","['RG Sharma', 'Ishan Kishan', 'D Brevis', 'Til...",BNJ Oxenford,UV Gandhe
63,1304057,Mumbai,2022-04-03,2022,11,Punjab Kings,Chennai Super Kings,"Brabourne Stadium, Mumbai",Chennai Super Kings,field,N,Punjab Kings,Runs,54.0,,LS Livingstone,"['MA Agarwal', 'S Dhawan', 'PBB Rajapaksa', 'L...","['RV Uthappa', 'RD Gaikwad', 'MM Ali', 'AT Ray...",RJ Tucker,YC Barde
71,1304049,Mumbai,2022-03-27,2022,3,Royal Challengers Bangalore,Punjab Kings,"Dr DY Patil Sports Academy, Mumbai",Punjab Kings,field,N,Punjab Kings,Wickets,5.0,,OF Smith,"['F du Plessis', 'Anuj Rawat', 'V Kohli', 'KD ...","['MA Agarwal', 'S Dhawan', 'PBB Rajapaksa', 'L...",Nitin Menon,YC Barde
81,1254094,Dubai,2021-10-07,2021,53,Chennai Super Kings,Punjab Kings,Dubai International Cricket Stadium,Punjab Kings,field,N,Punjab Kings,Wickets,6.0,,KL Rahul,"['RD Gaikwad', 'F du Plessis', 'MM Ali', 'RV U...","['KL Rahul', 'MA Agarwal', 'SN Khan', 'M Shahr...",K Srinivasan,RK Illingworth
89,1254102,Dubai,2021-10-01,2021,45,Kolkata Knight Riders,Punjab Kings,Dubai International Cricket Stadium,Punjab Kings,field,N,Punjab Kings,Wickets,5.0,,KL Rahul,"['VR Iyer', 'Shubman Gill', 'RA Tripathi', 'N ...","['KL Rahul', 'MA Agarwal', 'N Pooran', 'AK Mar...",KN Ananthapadmanabhan,RK Illingworth
98,1254107,Sharjah,2021-09-25,2021,37,Punjab Kings,Sunrisers Hyderabad,Sharjah Cricket Stadium,Sunrisers Hyderabad,field,N,Punjab Kings,Runs,5.0,,JO Holder,"['KL Rahul', 'MA Agarwal', 'CH Gayle', 'AK Mar...","['DA Warner', 'WP Saha', 'KS Williamson', 'MK ...",RK Illingworth,YC Barde


In [30]:
pkdf = df[(df['City']=="Delhi") & (df['WinningTeam']=="Punjab Kings")]
pkdf.shape[0]

0

In [31]:

pkdf = df[(df['City']=="Dubai") & (df['WinningTeam']=="Punjab Kings")]
pkdf.shape[0]

2

#Does winning a toss matter?

In [32]:
df.columns

Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
       'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
       'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
       'Team2Players', 'Umpire1', 'Umpire2'],
      dtype='object')

In [33]:
(df[df["TossWinner"] == df["WinningTeam"]].shape[0]/df.shape[0])*100

51.473684210526315

#Does Batting first matter?

In [34]:
df.columns

Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
       'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
       'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
       'Team2Players', 'Umpire1', 'Umpire2'],
      dtype='object')

In [35]:
# Define the function to determine 'firstbatter'
def determine_firstbatter(row):
    if row['TossDecision'] == 'bat':
        return row['TossWinner']
    else:
        return row['Team1'] if row['TossWinner'] != row['Team1'] else row['Team2']

# Apply the function to each row
df['firstbatter'] = df.apply(determine_firstbatter, axis=1)

df[['Team1', 'Team2', 'TossWinner', 'TossDecision', 'firstbatter']]

Unnamed: 0,Team1,Team2,TossWinner,TossDecision,firstbatter
0,Rajasthan Royals,Gujarat Titans,Rajasthan Royals,bat,Rajasthan Royals
1,Royal Challengers Bangalore,Rajasthan Royals,Rajasthan Royals,field,Royal Challengers Bangalore
2,Royal Challengers Bangalore,Lucknow Super Giants,Lucknow Super Giants,field,Royal Challengers Bangalore
3,Rajasthan Royals,Gujarat Titans,Gujarat Titans,field,Rajasthan Royals
4,Sunrisers Hyderabad,Punjab Kings,Sunrisers Hyderabad,bat,Sunrisers Hyderabad
...,...,...,...,...,...
945,Kolkata Knight Riders,Deccan Chargers,Deccan Chargers,bat,Deccan Chargers
946,Mumbai Indians,Royal Challengers Bangalore,Mumbai Indians,bat,Mumbai Indians
947,Delhi Daredevils,Rajasthan Royals,Rajasthan Royals,bat,Rajasthan Royals
948,Kings XI Punjab,Chennai Super Kings,Chennai Super Kings,bat,Chennai Super Kings


In [36]:
(df["firstbatter"] == df["WinningTeam"]).sum()/df.shape[0]

0.4473684210526316

##Find the team that won by hightest wickets

In [37]:
df.columns

Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
       'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
       'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
       'Team2Players', 'Umpire1', 'Umpire2', 'firstbatter'],
      dtype='object')

In [41]:
wic = df[df['WonBy'] == "Wickets"]
wic['WonBy']

0      Wickets
1      Wickets
3      Wickets
4      Wickets
5      Wickets
        ...   
943    Wickets
944    Wickets
945    Wickets
946    Wickets
947    Wickets
Name: WonBy, Length: 509, dtype: object

In [46]:


wic[wic['Margin']==wic['Margin'].max()].head(3)

Unnamed: 0,ID,City,Date,Season,MatchNumber,Team1,Team2,Venue,TossWinner,TossDecision,...,WinningTeam,WonBy,Margin,method,Player_of_Match,Team1Players,Team2Players,Umpire1,Umpire2,firstbatter
118,1254073,Mumbai,2021-04-22,2021,16,Rajasthan Royals,Royal Challengers Bangalore,"Wankhede Stadium, Mumbai",Royal Challengers Bangalore,field,...,Royal Challengers Bangalore,Wickets,10.0,,D Padikkal,"['JC Buttler', 'M Vohra', 'SV Samson', 'DA Mil...","['V Kohli', 'D Padikkal', 'GJ Maxwell', 'AB de...",J Madanagopal,S Ravi,Rajasthan Royals
138,1216495,,2020-11-03,2020/21,56,Mumbai Indians,Sunrisers Hyderabad,Sharjah Cricket Stadium,Sunrisers Hyderabad,field,...,Sunrisers Hyderabad,Wickets,10.0,,S Nadeem,"['RG Sharma', 'Q de Kock', 'SA Yadav', 'Ishan ...","['DA Warner', 'WP Saha', 'MK Pandey', 'KS Will...",C Shamshuddin,RK Illingworth,Mumbai Indians
153,1216521,,2020-10-23,2020/21,41,Chennai Super Kings,Mumbai Indians,Sharjah Cricket Stadium,Mumbai Indians,field,...,Mumbai Indians,Wickets,10.0,,TA Boult,"['RD Gaikwad', 'F du Plessis', 'AT Rayudu', 'N...","['Q de Kock', 'Ishan Kishan', 'SA Yadav', 'SS ...",C Shamshuddin,VA Kulkarni,Chennai Super Kings


#Find the team that won by max runs

In [51]:
df.columns

Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
       'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
       'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
       'Team2Players', 'Umpire1', 'Umpire2', 'firstbatter'],
      dtype='object')

In [55]:
runs = df[df['WonBy']=="Runs"]

runs


Unnamed: 0,ID,City,Date,Season,MatchNumber,Team1,Team2,Venue,TossWinner,TossDecision,...,WinningTeam,WonBy,Margin,method,Player_of_Match,Team1Players,Team2Players,Umpire1,Umpire2,firstbatter
2,1312198,Kolkata,2022-05-25,2022,Eliminator,Royal Challengers Bangalore,Lucknow Super Giants,"Eden Gardens, Kolkata",Lucknow Super Giants,field,...,Royal Challengers Bangalore,Runs,14.0,,RM Patidar,"['V Kohli', 'F du Plessis', 'RM Patidar', 'GJ ...","['Q de Kock', 'KL Rahul', 'M Vohra', 'DJ Hooda...",J Madanagopal,MA Gough,Royal Challengers Bangalore
8,1304112,Navi Mumbai,2022-05-18,2022,66,Lucknow Super Giants,Kolkata Knight Riders,"Dr DY Patil Sports Academy, Mumbai",Lucknow Super Giants,bat,...,Lucknow Super Giants,Runs,2.0,,Q de Kock,"['Q de Kock', 'KL Rahul', 'E Lewis', 'DJ Hooda...","['VR Iyer', 'A Tomar', 'N Rana', 'SS Iyer', 'S...",R Pandit,YC Barde,Lucknow Super Giants
9,1304111,Mumbai,2022-05-17,2022,65,Sunrisers Hyderabad,Mumbai Indians,"Wankhede Stadium, Mumbai",Mumbai Indians,field,...,Sunrisers Hyderabad,Runs,3.0,,RA Tripathi,"['Abhishek Sharma', 'PK Garg', 'RA Tripathi', ...","['RG Sharma', 'Ishan Kishan', 'DR Sams', 'Tila...",CB Gaffaney,N Pandit,Sunrisers Hyderabad
10,1304110,Navi Mumbai,2022-05-16,2022,64,Delhi Capitals,Punjab Kings,"Dr DY Patil Sports Academy, Mumbai",Punjab Kings,field,...,Delhi Capitals,Runs,17.0,,SN Thakur,"['DA Warner', 'SN Khan', 'MR Marsh', 'Lalit Ya...","['JM Bairstow', 'S Dhawan', 'PBB Rajapaksa', '...",GR Sadashiv Iyer,Nitin Menon,Delhi Capitals
11,1304109,Mumbai,2022-05-15,2022,63,Rajasthan Royals,Lucknow Super Giants,"Brabourne Stadium, Mumbai",Rajasthan Royals,bat,...,Rajasthan Royals,Runs,24.0,,TA Boult,"['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...","['Q de Kock', 'KL Rahul', 'A Badoni', 'DJ Hood...",PG Pathak,Tapan Sharma,Rajasthan Royals
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
935,335996,Bangalore,2008-04-28,2007/08,15,Royal Challengers Bangalore,Chennai Super Kings,M Chinnaswamy Stadium,Chennai Super Kings,bat,...,Chennai Super Kings,Runs,13.0,,MS Dhoni,"['B Chipli', 'W Jaffer', 'LRPL Taylor', 'JH Ka...","['PA Patel', 'ML Hayden', 'MEK Hussey', 'SK Ra...",BR Doctrove,RB Tiffin,Chennai Super Kings
940,335991,Chandigarh,2008-04-25,2007/08,10,Kings XI Punjab,Mumbai Indians,"Punjab Cricket Association Stadium, Mohali",Mumbai Indians,field,...,Kings XI Punjab,Runs,66.0,,KC Sangakkara,"['K Goel', 'IK Pathan', 'KC Sangakkara', 'Yuvr...","['L Ronchi', 'ST Jayasuriya', 'RV Uthappa', 'D...",Aleem Dar,AM Saheba,Kings XI Punjab
942,335989,Chennai,2008-04-23,2007/08,8,Chennai Super Kings,Mumbai Indians,"MA Chidambaram Stadium, Chepauk",Mumbai Indians,field,...,Chennai Super Kings,Runs,6.0,,ML Hayden,"['PA Patel', 'ML Hayden', 'MEK Hussey', 'SK Ra...","['L Ronchi', 'ST Jayasuriya', 'RV Uthappa', 'S...",DJ Harper,GA Pratapkumar,Chennai Super Kings
948,335983,Chandigarh,2008-04-19,2007/08,2,Kings XI Punjab,Chennai Super Kings,"Punjab Cricket Association Stadium, Mohali",Chennai Super Kings,bat,...,Chennai Super Kings,Runs,33.0,,MEK Hussey,"['K Goel', 'JR Hopes', 'KC Sangakkara', 'Yuvra...","['PA Patel', 'ML Hayden', 'MEK Hussey', 'MS Dh...",MR Benson,SL Shastri,Chennai Super Kings


In [67]:
runs[runs['Margin'] == runs['Margin'].max()][['Team1'	,'Team2','WinningTeam','Margin']]

Unnamed: 0,Team1,Team2,WinningTeam,Margin
2,Royal Challengers Bangalore,Lucknow Super Giants,Royal Challengers Bangalore,146.0
8,Lucknow Super Giants,Kolkata Knight Riders,Lucknow Super Giants,146.0
9,Sunrisers Hyderabad,Mumbai Indians,Sunrisers Hyderabad,146.0
10,Delhi Capitals,Punjab Kings,Delhi Capitals,146.0
11,Rajasthan Royals,Lucknow Super Giants,Rajasthan Royals,146.0
...,...,...,...,...
935,Royal Challengers Bangalore,Chennai Super Kings,Chennai Super Kings,146.0
940,Kings XI Punjab,Mumbai Indians,Kings XI Punjab,146.0
942,Chennai Super Kings,Mumbai Indians,Chennai Super Kings,146.0
948,Kings XI Punjab,Chennai Super Kings,Chennai Super Kings,146.0
