# IPL Dataset Analysis

## Problem Statement
We want to know as to what happens during an IPL match which raises several questions in our mind with our limited knowledge about the game called cricket on which it is based. This analysis is done to know as which factors led one of the team to win and how does it matter.

## About the Dataset :
The Indian Premier League (IPL) is a professional T20 cricket league in India contested during April-May of every year by teams representing Indian cities. It is the most-attended cricket league in the world and ranks sixth among all the sports leagues. It has teams with players from around the world and is very competitive and entertaining with a lot of close matches between teams.

The IPL and other cricket related datasets are available at [cricsheet.org](https://cricsheet.org/%c2%a0(data). Feel free to visit the website and explore the data by yourself as exploring new sources of data is one of the interesting activities a data scientist gets to do.

## About the dataset:
Snapshot of the data you will be working on:<br>
<br>
The dataset 1452 data points and 23 features<br>

|Features|Description|
|-----|-----|
|match_code|Code pertaining to individual match|
|date|Date of the match played|
|city|Location where the match was played|
|team1|team1|
|team2|team2|
|toss_winner|Who won the toss out of two teams|
|toss_decision|toss decision taken by toss winner|
|winner|Winner of that match between two teams|
|win_type|How did the team won(by wickets or runs etc.)|
|win_margin|difference with which the team won| 
|inning|inning type(1st or 2nd)|
|delivery|ball delivery|
|batting_team|current team on batting|
|batsman|current batsman on strike|
|non_striker|batsman on non-strike|
|bowler|Current bowler|
|runs|runs scored|
|extras|extra run scored|
|total|total run scored on that delivery including runs and extras|
|extras_type|extra run scored by wides or no ball or legby|
|player_out|player that got out|
|wicket_kind|How did the player got out|
|wicket_fielders|Fielder who caught out the player by catch|


### Analysing data using numpy module

### Read the data using numpy module.

In [7]:
import numpy as np
# Not every data format will be in csv there are other file formats also.
# This exercise will help you deal with other file formats and how toa read it.
path = './data/ipl_matches_small.csv'
data_ipl = np.genfromtxt(path, delimiter=',',skip_header = 1, dtype=str)
data_ipl


array([['392203', '2009-05-01', 'East London', ..., '', '', ''],
       ['392203', '2009-05-01', 'East London', ..., '', '', ''],
       ['392203', '2009-05-01', 'East London', ..., '', '', ''],
       ...,
       ['335987', '2008-04-21', 'Jaipur', ..., '', '', ''],
       ['335987', '2008-04-21', 'Jaipur', ..., '', '', ''],
       ['335987', '2008-04-21', 'Jaipur', ..., '', '', '']], dtype='<U21')

In [8]:
data_ipl[:,3]

array(['Kolkata Knight Riders', 'Kolkata Knight Riders',
       'Kolkata Knight Riders', ..., 'Rajasthan Royals',
       'Rajasthan Royals', 'Rajasthan Royals'], dtype='<U21')

### Calculate the unique no. of matches in the provided dataset ?

In [9]:
# How many matches were held in total we need to know so that we can analyze further statistics keeping that in mind.

#numpy.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None)

match_code= data_ipl[:,0]
unique_code= np.unique(match_code)

print(unique_code)
print('Unique matches played is-',len(unique_code))


['335987' '392197' '392203' '392212' '501226' '729297']
Unique matches played is- 6


### Find the set of all unique teams that played in the matches in the data set.

In [8]:
# this exercise deals with you getting to know that which are all those six teams that played in the tournament.

teams= data_ipl[:,3:5]
# [3:5--> are the columns]

unique_teams= np.unique(teams)
print(unique_teams)
print('Unique teams played are-', len(unique_teams))

['Chennai Super Kings' 'Deccan Chargers' 'Kings XI Punjab'
 'Kolkata Knight Riders' 'Mumbai Indians' 'Pune Warriors'
 'Rajasthan Royals']
Unique teams played are- 7


### Find sum of all extras in all deliveries in all matches in the dataset

In [10]:
# An exercise to make you familiar with indexing and slicing up within data.
extras= data_ipl[:,17]

extras_int= extras.astype(np.int16)
extras_int.sum()

88

### Get the array of all delivery numbers when a given player got out. Also mention the wicket type.

In [23]:
subset = data_ipl[:,[11,20,21]]

print(subset)

cond= subset[:,2]!= ''   # condition to find not equal to blank
print(cond)

subset[cond]

#Frequency distribution- how many people got out/bowled/lbw


[['0.1' '' '']
 ['0.2' '' '']
 ['0.3' '' '']
 ...
 ['17.6' '' '']
 ['17.7' '' '']
 ['18.1' '' '']]
[False False False ... False False False]


array([['3.2', 'ST Jayasuriya', 'caught'],
       ['5.5', 'Harbhajan Singh', 'caught'],
       ['7.6', 'SR Tendulkar', 'caught'],
       ['11.4', 'AM Nayar', 'bowled'],
       ['15.6', 'GR Napier', 'caught'],
       ['18.6', 'AM Rahane', 'caught'],
       ['0.4', 'SC Ganguly', 'bowled'],
       ['2.2', 'CH Gayle', 'bowled'],
       ['14.5', 'MN van Wyk', 'caught'],
       ['17.2', 'LR Shukla', 'bowled'],
       ['18.6', 'BJ Hodge', 'run out'],
       ['19.3', 'BB McCullum', 'caught'],
       ['12.2', 'SR Tendulkar', 'lbw'],
       ['13.5', 'Harbhajan Singh', 'caught'],
       ['14.4', 'ST Jayasuriya', 'caught'],
       ['15.1', 'AM Nayar', 'run out'],
       ['16.6', 'DJ Bravo', 'caught'],
       ['18.5', 'S Dhawan', 'caught'],
       ['1.7', 'BB McCullum', 'caught'],
       ['2.7', 'CH Gayle', 'caught'],
       ['10.2', 'BJ Hodge', 'bowled'],
       ['12.1', 'SC Ganguly', 'caught'],
       ['12.3', 'AN Ghosh', 'caught'],
       ['13.2', 'Yashpal Singh', 'caught'],
       ['14.5', 'LR 

### How many matches the team `Mumbai Indians` has won the toss?

In [4]:
# this exercise will help you get the statistics on one particular team

winner_cond = data_ipl[:,5] =='Mumbai Indians'
subset= data_ipl[winner_cond]    #Subsetting the Rows
print(winner_cond)
print(subset)

#Toss won by Mumbai indians
unique_matchs= np.unique(subset[:,0])
print('Unique matchs-',unique_matchs)
print('Number of Matchs won by MI-', len(unique_matchs))

[False  True  True ... False False False]
[['392203' '2009-05-01' 'East London' ... '' '' '']
 ['392203' '2009-05-01' 'East London' ... '' '' '']
 ['392203' '2009-05-01' 'East London' ... '' '' '']
 ...
 ['392197' '2009-04-27' 'Port Elizabeth' ... '' '' '']
 ['392197' '2009-04-27' 'Port Elizabeth' ... 'BAW Mendis' 'bowled' '']
 ['392197' '2009-04-27' 'Port Elizabeth' ... 'AB Dinda' 'bowled' '']]
Unique matchs- ['392197' '392203']
Number of Matchs won by MI- 2


### Create a filter that filters only those records where the batsman scored 6 runs. Also who has scored the maximum no. of sixes overall ?

In [38]:
# An exercise to know who is the most aggresive player or maybe the scoring player 

# from collections import Counter
# most_sixes_scored= Counter(sixes[:,13],)


# batsman_score= data_ipl[:,16]
# print(batsman_score)



NameError: name 'sixes' is not defined

In [11]:
cond= data_ipl[:,16].astype(np.int16)== 6
cond

sixes=data_ipl[cond]
unique_player, no_of_sixes = np.unique(sixes[:,13], return_counts= True)

print(unique_player)

print(no_of_sixes)
print("-"*50)
print('Player who has hit max. 6-', unique_player[no_of_sixes.argmax()])

['AC Gilchrist' 'BJ Hodge' 'CH Gayle' 'DR Smith' 'DS Kulkarni'
 'Harbhajan Singh' 'IK Pathan' 'JD Ryder' 'JP Duminy' 'K Goel'
 'KC Sangakkara' 'Kamran Akmal' 'M Manhas' 'M Vijay' 'MEK Hussey'
 'MS Dhoni' 'PR Shah' 'R Bhatia' 'RA Jadeja' 'RG Sharma' 'RR Raje'
 'S Badrinath' 'SC Ganguly' 'SR Tendulkar' 'SR Watson' 'ST Jayasuriya'
 'SV Samson' 'TL Suman' 'Y Venugopal Rao' 'Yuvraj Singh']
[1 3 1 4 2 2 1 1 4 2 1 1 1 1 1 1 2 1 1 1 1 1 1 6 6 4 2 2 1 3]
--------------------------------------------------
Player who has hit max. 6- SR Tendulkar
