# IPL Dataset Analysis

## Problem Statement
We want to know as to what happens during an IPL match which raises several questions in our mind with our limited knowledge about the game called cricket on which it is based. This analysis is done to know as which factors led one of the team to win and how does it matter.

## About the Dataset :
The Indian Premier League (IPL) is a professional T20 cricket league in India contested during April-May of every year by teams representing Indian cities. It is the most-attended cricket league in the world and ranks sixth among all the sports leagues. It has teams with players from around the world and is very competitive and entertaining with a lot of close matches between teams.

The IPL and other cricket related datasets are available at [cricsheet.org](https://cricsheet.org/%c2%a0(data). Feel free to visit the website and explore the data by yourself as exploring new sources of data is one of the interesting activities a data scientist gets to do.

### Analysing data with basic python operation

## Read the data of the format .yaml type

In [1]:
import yaml

In [2]:
# using with open command to read the file
with open('../data/ipl_match.yaml') as f:
    data = yaml.load(f, Loader=yaml.FullLoader)

In [3]:
import os
os.path.abspath('../')

'/home/sahuja4/Work/Learning/git_repos/Basecamp/Day_2_IPL Dataset analysis using basic python constructs'

In [4]:
data
type(data)

dict

Now let's find answers to some prilminary questions such as 

### Can you guess the data type with which your working on ?

In [5]:
type(data)

dict

### In which city the match was played and where was it played ?

In [15]:
data.keys()
print(f"City of the match played is {data['info']['city']}")

City of the match played is Bangalore


In [14]:
print(f"Venue of the match played is {data['info']['venue']}")

Venue of the match played is M Chinnaswamy Stadium


### Which are all the teams that played in the tournament ? How many teams participated  in total?

In [19]:
data['info']['teams']
team1, team2 = data['info']['teams']
print(team2)

Kolkata Knight Riders


In [20]:
print(f'Teams participated are : {team1} and {team2}')
print(f"There are total of {len(data['info']['teams'])} teams that played in tournament")

Teams participated are : Royal Challengers Bangalore and Kolkata Knight Riders
There are total of 2 teams that played in tournament


### Which team won the toss and what was the decision of toss winner ?

In [22]:
data['info']['toss']

{'decision': 'field', 'winner': 'Royal Challengers Bangalore'}

In [24]:
print(f"{data['info']['toss']['winner']} won the toss and decided to {data['info']['toss']['decision']} first.")

Royal Challengers Bangalore won the toss and decided to field first.


### Find the first bowler who played the first ball of the first inning. Also the first batsman who faced first delivery ?

In [45]:
bat = data['innings'][0]['1st innings']['deliveries'][0][0.1]['batsman']
bowl = data['innings'][0]['1st innings']['deliveries'][0][0.1]['bowler']

In [46]:
print(f'First bowl of of the match was bowled by {bowl} and played by {bat}')

First bowl of of the match was bowled by P Kumar and played by SC Ganguly


### How many deliveries were delivered in first inning ?

In [50]:
dely = len(data['innings'][0]['1st innings']['deliveries'])

In [51]:
## Does this come out to be more than 120 if yes then why ?
print(f'First inning was consisting of {dely} deliveries delivered.')

First inning was consisting of 124 deliveries delivered.


### How many deliveries were delivered in second inning ?

In [56]:
dely2 = len(data['innings'][1]['2nd innings']['deliveries'])

In [57]:
## Does this come out to be less or greater than 120 then what's your thought process behind it ?
print(f'Second inning was consisting of {dely2} deliveries delivered.')

Second inning was consisting of 101 deliveries delivered.


### Which team won and how ?


In [70]:
data['info']['outcome']
data['info']['outcome']['by'].keys()

dict_keys(['runs'])

In [72]:
## see if the guess of the students is right did they infer it correctly
winner = data['info']['outcome']['winner']
by = [i for i in data['info']['outcome']['by'].keys()][0]
val = data['info']['outcome']['by'][by]
print(f"{winner} won the match by {val} {by}")

Kolkata Knight Riders won the match by 140 runs


HOMEWORK - Find total extras and extra runs per extra category 

In [133]:
deliveries_extras_cat_1 = []
deliveries_extras_cat_2 = []
deliveries_extras_cat = []
total_extras_runs_1 = 0
total_extras_runs_2 = 0
total_extras_runs = 0

# Get Innings 1 Deliveries
deliveries_1 = data['innings'][0]['1st innings']['deliveries']
# Get Innings 2 Deliveries
deliveries_2 = data['innings'][1]['2nd innings']['deliveries']

# Get Innings 1 Extras
deliveries_1_extras = [d[list(d.keys())[0]]['extras'] for d in deliveries_1 if 'extras' in list(d[list(d.keys())[0]].keys())]
# Get Innings 2 Extras
deliveries_2_extras = [d[list(d.keys())[0]]['extras'] for d in deliveries_2 if 'extras' in list(d[list(d.keys())[0]].keys())]

# Get Innings 1 extras categories into deliveries_extras_cat_1
# And sum of extra runs to total_extras_runs_1
for ex in deliveries_1_extras:
    deliveries_extras_cat_1.extend(list(ex.keys()))
    total_extras_runs_1 = total_extras_runs_1 + sum(list(ex.values()))
    
# Get Innings 2 extras categories into deliveries_extras_cat_2
# And sum of extra runs to total_extras_runs_2
for ex in deliveries_2_extras:
    deliveries_extras_cat_2.extend(list(ex.keys()))
    total_extras_runs_2 = total_extras_runs_2 + sum(list(ex.values()))

# Create Innings 1 Extras Unique Category Dictionary deliveries_extras_cat_d1 from deliveries_extras_cat_1
deliveries_extras_cat_d1 = dict.fromkeys(sorted(deliveries_extras_cat_1))

# Create Innings 2 Extras Unique Category Dictionary deliveries_extras_cat_d2 from deliveries_extras_cat_2
deliveries_extras_cat_d2 = dict.fromkeys(sorted(deliveries_extras_cat_2))

# Merge Innings 1 and 2 extras category to deliveries_extras_cat
deliveries_extras_cat.extend(deliveries_extras_cat_1)
deliveries_extras_cat.extend(deliveries_extras_cat_2)

# Create Combined Unique Category Dictionary from deliveries_extras_cat
deliveries_extras_cat_d = dict.fromkeys(sorted(deliveries_extras_cat))

# Sum of total extras runs from both innings
total_extras_runs = total_extras_runs_1 + total_extras_runs_2

# Putting sum of runs agains each category for Innings 1
for ex in deliveries_1_extras:
    for k in list(ex.keys()):
        deliveries_extras_cat_d1[k] =  (0 if deliveries_extras_cat_d1[k] == None else deliveries_extras_cat_d1[k]) + ex[k]
        deliveries_extras_cat_d[k] =  (0 if deliveries_extras_cat_d[k] == None else deliveries_extras_cat_d[k]) + ex[k]

# Putting sum of runs agains each category for Innings 2
for ex in deliveries_2_extras:
    for k in list(ex.keys()):
        deliveries_extras_cat_d2[k] =  (0 if deliveries_extras_cat_d2[k] == None else deliveries_extras_cat_d2[k]) + ex[k]
        deliveries_extras_cat_d[k] =  (0 if deliveries_extras_cat_d[k] == None else deliveries_extras_cat_d[k]) + ex[k]

# Print Output
print('Innings 1 Extras:')
print('-'*50)
for k, v in deliveries_extras_cat_d1.items():
    print(f'{k}: {v}')
print('-'*50)
print(f'Total: {total_extras_runs_1}')

print('\n')

print('Innings 2 Extras:')
print('-'*50)
for k, v in deliveries_extras_cat_d2.items():
    print(f'{k}: {v}')
print('-'*50)
print(f'Total: {total_extras_runs_2}')

print('\n')

print('Total Extras:')
print('-'*50)
for k, v in deliveries_extras_cat_d.items():
    print(f'{k}: {v}')
print('-'*50)
print(f'Total: {total_extras_runs}')

Innings 1 Extras:
--------------------------------------------------
byes: 4
legbyes: 4
wides: 9
--------------------------------------------------
Total: 17


Innings 2 Extras:
--------------------------------------------------
legbyes: 8
wides: 11
--------------------------------------------------
Total: 19


Total Extras:
--------------------------------------------------
byes: 4
legbyes: 12
wides: 20
--------------------------------------------------
Total: 36


In [140]:
for idx, delivery in enumeratemerate(data['innings'][0]['1st innings']['deliveries']):
    print(idx, delivery)

0 {0.1: {'batsman': 'SC Ganguly', 'bowler': 'P Kumar', 'extras': {'legbyes': 1}, 'non_striker': 'BB McCullum', 'runs': {'batsman': 0, 'extras': 1, 'total': 1}}}
1 {0.2: {'batsman': 'BB McCullum', 'bowler': 'P Kumar', 'non_striker': 'SC Ganguly', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}
2 {0.3: {'batsman': 'BB McCullum', 'bowler': 'P Kumar', 'extras': {'wides': 1}, 'non_striker': 'SC Ganguly', 'runs': {'batsman': 0, 'extras': 1, 'total': 1}}}
3 {0.4: {'batsman': 'BB McCullum', 'bowler': 'P Kumar', 'non_striker': 'SC Ganguly', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}
4 {0.5: {'batsman': 'BB McCullum', 'bowler': 'P Kumar', 'non_striker': 'SC Ganguly', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}
5 {0.6: {'batsman': 'BB McCullum', 'bowler': 'P Kumar', 'non_striker': 'SC Ganguly', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}
6 {0.7: {'batsman': 'BB McCullum', 'bowler': 'P Kumar', 'extras': {'legbyes': 1}, 'non_striker': 'SC Ganguly', 'runs': {'batsman': 0, 'extra