# IPL Dataset Analysis

## Problem Statement
We want to know as to what happens during an IPL match which raises several questions in our mind with our limited knowledge about the game called cricket on which it is based. This analysis is done to know as which factors led one of the team to win and how does it matter.

## About the Dataset :
The Indian Premier League (IPL) is a professional T20 cricket league in India contested during April-May of every year by teams representing Indian cities. It is the most-attended cricket league in the world and ranks sixth among all the sports leagues. It has teams with players from around the world and is very competitive and entertaining with a lot of close matches between teams.

The IPL and other cricket related datasets are available at [cricsheet.org](https://cricsheet.org/%c2%a0(data). Feel free to visit the website and explore the data by yourself as exploring new sources of data is one of the interesting activities a data scientist gets to do.

## About the dataset:
Snapshot of the data you will be working on:<br>
<br>
The dataset 1452 data points and 23 features<br>

|Features|Description|
|-----|-----|
|match_code|Code pertaining to individual match|
|date|Date of the match played|
|city|Location where the match was played|
|team1|team1|
|team2|team2|
|toss_winner|Who won the toss out of two teams|
|toss_decision|toss decision taken by toss winner|
|winner|Winner of that match between two teams|
|win_type|How did the team won(by wickets or runs etc.)|
|win_margin|difference with which the team won| 
|inning|inning type(1st or 2nd)|
|delivery|ball delivery|
|batting_team|current team on batting|
|batsman|current batsman on strike|
|non_striker|batsman on non-strike|
|bowler|Current bowler|
|runs|runs scored|
|extras|extra run scored|
|total|total run scored on that delivery including runs and extras|
|extras_type|extra run scored by wides or no ball or legby|
|player_out|player that got out|
|wicket_kind|How did the player got out|
|wicket_fielders|Fielder who caught out the player by catch|


### Analysing data using numpy module

### Read the data using numpy module.

In [1]:
import numpy as np
# Not every data format will be in csv there are other file formats also.
# This exercise will help you deal with other file formats and how toa read it.
path = './data/ipl_matches_small.csv'
data_ipl = np.genfromtxt(path, delimiter=',', skip_header=1, dtype=str)



In [3]:
data_ipl[:,3]

array(['Kolkata Knight Riders', 'Kolkata Knight Riders',
       'Kolkata Knight Riders', ..., 'Rajasthan Royals',
       'Rajasthan Royals', 'Rajasthan Royals'], dtype='<U21')

In [9]:
### Calculate the unique no. of matches in the provided dataset ?
unique = np.unique(data_ipl[:,0])

print("The unique no. of matches in the provided dataset is {}.".format(len(unique)))

The unique no. of matches in the provided dataset is 6.


### Find the set of all unique teams that played in the matches in the data set.

In [12]:
# this exercise deals with you getting to know that which are all those six teams that played in the tournament.
unique_teams = np.unique(data_ipl[0:, [0,3,4]], axis = 0)
unique_teams_array = np.array(unique_teams)
print(unique_teams_array)

[['335987' 'Rajasthan Royals' 'Kings XI Punjab']
 ['392197' 'Kolkata Knight Riders' 'Mumbai Indians']
 ['392203' 'Kolkata Knight Riders' 'Mumbai Indians']
 ['392212' 'Deccan Chargers' 'Mumbai Indians']
 ['501226' 'Chennai Super Kings' 'Pune Warriors']
 ['729297' 'Rajasthan Royals' 'Chennai Super Kings']]


### Find sum of all extras in all deliveries in all matches in the dataset

In [24]:
# An exercise to make you familiar with indexing and slicing up within data.
extras_sum = np.array(data_ipl[:,17], dtype = "int8")
print("The sum of all extras in all deliveries in all matches in the dataset is {}.".format(sum(extras_sum)))


The sum of all extras in all deliveries in all matches in the dataset is 88.


### Get the array of all delivery numbers when a given player got out. Also mention the wicket type.

In [28]:
deliveries = np.array(data_ipl[:,[11,21]])
wicket_deliveries = deliveries[deliveries[:,1] != '']
print(wicket_deliveries)
print(len(wicket_deliveries))

[['3.2' 'caught']
 ['5.5' 'caught']
 ['7.6' 'caught']
 ['11.4' 'bowled']
 ['15.6' 'caught']
 ['18.6' 'caught']
 ['0.4' 'bowled']
 ['2.2' 'bowled']
 ['14.5' 'caught']
 ['17.2' 'bowled']
 ['18.6' 'run out']
 ['19.3' 'caught']
 ['12.2' 'lbw']
 ['13.5' 'caught']
 ['14.4' 'caught']
 ['15.1' 'run out']
 ['16.6' 'caught']
 ['18.5' 'caught']
 ['1.7' 'caught']
 ['2.7' 'caught']
 ['10.2' 'bowled']
 ['12.1' 'caught']
 ['12.3' 'caught']
 ['13.2' 'caught']
 ['14.5' 'caught']
 ['15.1' 'bowled']
 ['15.2' 'bowled']
 ['1.5' 'caught']
 ['5.3' 'caught']
 ['9.4' 'bowled']
 ['12.6' 'bowled']
 ['17.1' 'caught']
 ['19.1' 'run out']
 ['1.4' 'caught']
 ['1.5' 'bowled']
 ['8.5' 'caught']
 ['14.1' 'caught']
 ['15.5' 'bowled']
 ['15.6' 'bowled']
 ['17.1' 'caught']
 ['17.3' 'stumped']
 ['5.3' 'caught']
 ['7.2' 'caught']
 ['8.2' 'caught']
 ['10.1' 'run out']
 ['11.1' 'caught']
 ['14.5' 'caught']
 ['1.3' 'run out']
 ['5.2' 'caught']
 ['6.4' 'caught']
 ['6.5' 'caught and bowled']
 ['10.5' 'caught']
 ['12.6' 'caught']

### How many matches the team `Mumbai Indians` has won the toss?

In [31]:
# this exercise will help you get the statistics on one particular team
toss_winners = np.array(data_ipl[:,[0,5]])
toss_winners_unique = np.unique(toss_winners, axis = 0)
mumbai_toss = toss_winners_unique[toss_winners_unique[:,1] == "Mumbai Indians"]
print("Mumbai Indians won the toss {} times.".format(len(mumbai_toss)))


Mumbai Indians won the toss 2 times.


### Create a filter that filters only those records where the batsman scored 6 runs. Also who has scored the maximum no. of sixes overall ?

In [67]:
# An exercise to know who is the most aggresive player or maybe the scoring player 
batsman_and_runs = np.array(data_ipl[:,[13,16]])
batsman_and_sixes = batsman_and_runs[batsman_and_runs[:,1] == "6"]
print(batsman_and_sixes)
list_of_batsman = list(batsman_and_sixes[:,0])

max_no_of_sixes = []

def compute_mode(numbers):
    mode = 0
    count = 0
    maxcount = 0
    for number in numbers:
        count = numbers.count(number)
        if count >= maxcount:
            maxcount = count
            mode = number
            max_no_of_sixes.append(mode)
            
    
            
        
compute_mode(list_of_batsman)

max_no_of_sixes_unique = []

for i in max_no_of_sixes:
    if i not in max_no_of_sixes_unique:
        max_no_of_sixes_unique.append(i)
        
print("The batsman with the most no. of sixes are {} and {}.".format(*max_no_of_sixes_unique))
        

        




[['SR Tendulkar' '6']
 ['SR Tendulkar' '6']
 ['JP Duminy' '6']
 ['JP Duminy' '6']
 ['JP Duminy' '6']
 ['JP Duminy' '6']
 ['BJ Hodge' '6']
 ['BJ Hodge' '6']
 ['BJ Hodge' '6']
 ['SR Tendulkar' '6']
 ['SR Tendulkar' '6']
 ['ST Jayasuriya' '6']
 ['ST Jayasuriya' '6']
 ['SR Tendulkar' '6']
 ['ST Jayasuriya' '6']
 ['ST Jayasuriya' '6']
 ['SR Tendulkar' '6']
 ['Harbhajan Singh' '6']
 ['Harbhajan Singh' '6']
 ['CH Gayle' '6']
 ['SC Ganguly' '6']
 ['TL Suman' '6']
 ['TL Suman' '6']
 ['AC Gilchrist' '6']
 ['RG Sharma' '6']
 ['DR Smith' '6']
 ['Y Venugopal Rao' '6']
 ['PR Shah' '6']
 ['PR Shah' '6']
 ['RR Raje' '6']
 ['DR Smith' '6']
 ['DR Smith' '6']
 ['DR Smith' '6']
 ['SV Samson' '6']
 ['SV Samson' '6']
 ['SR Watson' '6']
 ['R Bhatia' '6']
 ['DS Kulkarni' '6']
 ['DS Kulkarni' '6']
 ['MEK Hussey' '6']
 ['M Vijay' '6']
 ['MS Dhoni' '6']
 ['S Badrinath' '6']
 ['JD Ryder' '6']
 ['M Manhas' '6']
 ['K Goel' '6']
 ['K Goel' '6']
 ['KC Sangakkara' '6']
 ['Yuvraj Singh' '6']
 ['Yuvraj Singh' '6']
 ['Yu