In [None]:
# import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# read dataset
matches = pd.read_csv("matches.csv")
deli = pd.read_csv("deliveries.csv")

# Question 1:

In [None]:
# all matches between MI and KXIP
mi_kxip = matches.query("team1.isin(['Mumbai Indians', 'Kings XI Punjab']) and team2.isin(['Mumbai Indians', 'Kings XI Punjab'])")

In [None]:
# victories of the two teams against each other 
mi_kxip.winner.value_counts()

In [None]:
# matches of KXIP
kxip = matches.query("team1 == 'Kings XI Punjab' or team2 == 'Kings XI Punjab'")
len(kxip)

In [None]:
# count of winners in KXIP matches
kxip.winner.value_counts()

In [None]:
# KXIP win ratio
82/176

In [None]:
# matches of MI
mi = matches.query("team1 == 'Mumbai Indians' or team2 == 'Mumbai Indians'")
len(mi)

In [None]:
# count of winners in MI matches
mi.winner.value_counts()

In [None]:
# MI win ratio
109/187

## Explanation: 

MI has shown a better performance compared to KXIP in IPL history. Even in the current season, MI is the table topper whereas KXIP is at the bottom of the table. The previous match in this season against the two teams was also won by MI. Hence, it can be safely said that MI will also win tomorrow's game.

# Question 2:

In [None]:
# all dismissals of Quinton De Kock
de_kock = deli[deli.player_dismissed == 'Q de Kock']

In [None]:
# spinners that he was dismissed by
spin = ["KV Sharma", "PV Tambe", "YS Chahal", "S Kaushik", "KH Pandya", "PP Chawla", "R Ashwin", "D Short",
       "Kuldeep Yadav", "M Ali", "SP Narine", "Harbhajan Singh", "S Gopal"]

In [None]:
# seamers that he was dismissed by
seamer = ["AB Dinda", "P Kumar", "MM Sharma", "CJ Anderson", "UT Yadav", "DW Steyn", "MA Starc", "IC Pandey",
         "AD Russell", "SR Watson", "MJ McClenaghan", "DS Kulkarni", "MP Stoinis", "MC Henriques",
         "BB Sran", "DJ Bravo", "I Sharma", "Mohammed Shami", "S Kaul", "DL Chahar", "J Archer", "P Krishna",
         "SN Thakur"]

In [None]:
# number of times he was dismissed by a spinner
len(de_kock[de_kock.bowler.isin(spin)])

In [None]:
# number of times he was dismissed by a spinner
len(de_kock[de_kock.bowler.isin(seamer)])

In [None]:
# overs that he has been dismissed in and their frequency
de_kock.over.value_counts().sort_index()

## Explanation:

Quinton de Kock plays as an opening batsman. Hence it is expected that he gets out in the first 6 overs majority of the times. The first 6 overs of a T20 game are the powerplay overs, which are generally bowled by seamers. Due to this fact, he has been dismissed by seamers most of the times. The two main seamers of KXIP this year, Md. Shami and Sheldon Cotrell have dismissed Quinton one each, the former in a previous season and the latter in this very season of IPL. Thus, owing to the position that he plays in, he can be expected to be dismissed by a seamer.

# Question 3:

In [None]:
# all deliveries faced by KXIP
kxip_bat = deli.query("batting_team == 'Kings XI Punjab'")

In [None]:
# kxip deliveries between 1-6 overs
kxip_1_6 = kxip_bat.query("over.isin([1, 2, 3, 4, 5, 6])")

In [None]:
# kxip wickets between 1-6 overs
wickets_ipl = kxip_1_6[kxip_1_6.player_dismissed.notnull()].groupby("match_id").count().player_dismissed

In [None]:
# histogram
sns.countplot(wickets_ipl)
plt.title("Wickets lost per match by KXIP in IPL history in 1-6 overs")
plt.xlabel("No. of wickets")
plt.ylabel("Frequency")
plt.show()

In [None]:
# measures of central tendency
print(wickets_ipl.mean())
print(wickets_ipl.median())

In [None]:
# mi wickets between 1-6 against MI
kxip_1_6.query("bowling_team == 'Mumbai Indians'").groupby("match_id").count().player_dismissed.value_counts()

## Explanation:

Throughout this season of IPL, it has been noted that KXIP have lost 0-1 wickets when batting first, whereas they lose 2-3 wickets when batting second. The data of the previous years says that when batting against MI, the team usually loses a single wicket. Thus, considering the general trend in case KXIP bat first, or the fact that they might have tried to improve on this weakness when batting second, they will lose 0-1 wickets in today's game.

# Question 4:

In [None]:
# all deliveries faced by KL Rahul
rahul = deli[deli.batsman == "KL Rahul"]
rahul_out = rahul[rahul.player_dismissed.notnull()]

In [None]:
# barplot
sns.countplot(rahul_out.dismissal_kind)
plt.title("Dismissals of KL Rahul")
plt.xticks(rotation=90)
plt.show()

In [None]:
# barplot
sns.countplot(rahul.query("bowling_team == 'Mumbai Indians' and player_dismissed.notnull()").dismissal_kind)
plt.title("Dismissals of KL Rahul against MI")
plt.xticks(rotation=90)
plt.show()

In [None]:
# barplot
sns.countplot(rahul_out.dismissal_kind[-20:])
plt.title("Dismissals of KL Rahul in the last 20 matches")
plt.xticks(rotation=90)
plt.show()

## Explanation:

KL Rahul is currently the highest run scorer of the current season of the IPL. The middle order of his team has not been performing well this season, which has resulted in them being at the bottom of the points table. Hence, also taking his role as captain into account, KL Rahul will try to hit the ball to get as many runs as possible and get caught out in the process.

# Question 5:

In [None]:
# wickets lost per match by KXIP
wickets_ipl_kxip = kxip_bat[kxip_bat.player_dismissed.notnull()].groupby("match_id").count().player_dismissed

In [None]:
# histogram
plt.hist(wickets_ipl_kxip, bins=[0, 3, 6, 9, 10]);
plt.title("Wickets lost per match by KXIP in IPL history")
plt.xlabel("No. of wickets")
plt.ylabel("Frequency")
plt.show()

In [None]:
# measures of central tendency
print(wickets_ipl_kxip.mean())
print(wickets_ipl_kxip.median())

In [None]:
# all deliveries faced by MI
mi_bat = deli.query("batting_team == 'Mumbai Indians'")

In [None]:
# wickets lost per match by MI
wickets_ipl_mi = mi_bat[mi_bat.player_dismissed.notnull()].groupby("match_id").count().player_dismissed

In [None]:
# histogram
plt.hist(wickets_ipl_mi, bins=[0, 3, 6, 9, 10]);
plt.title("Wickets lost per match by MI in IPL history")
plt.xlabel("No. of wickets")
plt.ylabel("Frequency")
plt.show()

In [None]:
# measures of central tendency
print(wickets_ipl_mi.mean())
print(wickets_ipl_mi.median())

In [None]:
# all deliveries in MI vs KXIP matches
mi_kxip_deli = deli.query("batting_team.isin(['Mumbai Indians', 'Kings XI Punjab']) and bowling_team.isin(['Mumbai Indians', 'Kings XI Punjab'])")

In [None]:
# number of wickets per match
wickets_mi_kxip = mi_kxip_deli[mi_kxip_deli.player_dismissed.notnull()].groupby("match_id").count().player_dismissed

In [None]:
# histogram
plt.hist(wickets_mi_kxip, bins=[0, 5, 10, 15, 20]);
plt.title("Wickets lost per MI vs KXIP match in IPL history")
plt.xlabel("No. of wickets")
plt.ylabel("Frequency")
plt.show()

In [None]:
# measures of central tendency
print(wickets_mi_kxip.mean())
print(wickets_mi_kxip.median())

## Explanation:

Both the teams individually tend to lose 6 wickets in an IPL match. This measure is also extended to all MI vs KXIP matches, where the total number of wickets lost was 12. Considering that the venue of Dubai offers a bowling friendly pitch, it can be expected that 11-15 wickets will fall in total in today's match.