## Test Cricket Bowlers Analysis
### Dataset sourse- https://stats.espncricinfo.com/ci/content/records/93276.html

#### Importing the libraries

In [1]:
import pandas as pd
import numpy as np

#### Reading 'wickets' sheet from 'test_cricket.xtsx' 

In [2]:
df = pd.read_excel("test_cricket.xlsx", sheet_name='wickets')

#### First 10 rows of the dataset

In [3]:
display(df.head(10))

Unnamed: 0,Player,Span,Mat,Inns,Balls,Runs,Wkts,BBI,BBM,Ave,Econ,SR,5,10
0,M Muralitharan (ICC/SL),1992-2010,133,230,44039,18180,800,1951-09-01 00:00:00,16/220,22.72,2.47,55.0,67,22
1,SK Warne (AUS),1992-2007,145,273,40705,17995,708,1971-08-01 00:00:00,12/128,25.41,2.65,57.4,37,10
2,A Kumble (INDIA),1990-2008,132,236,40850,18355,619,1974-10-01 00:00:00,14/149,29.65,2.69,65.9,35,8
3,JM Anderson (ENG),2003-2021,162,301,34791,16457,617,1942-07-01 00:00:00,1971-11-01 00:00:00,26.67,2.83,56.3,30,3
4,GD McGrath (AUS),1993-2007,124,243,29248,12186,563,2021-08-24 00:00:00,2021-10-27 00:00:00,21.64,2.49,51.9,29,3
5,SCJ Broad (ENG),2007-2021,148,272,29713,14502,523,2021-08-15 00:00:00,11/121,27.72,2.92,56.8,18,3
6,CA Walsh (WI),1984-2001,132,242,30019,12688,519,1937-07-01 00:00:00,13/55,24.44,2.53,57.8,22,3
7,DW Steyn (SA),2004-2019,93,171,18608,10077,439,1951-07-01 00:00:00,1960-11-01 00:00:00,22.95,3.24,42.3,26,5
8,N Kapil Dev (INDIA),1978-1994,131,227,27740,12867,434,1983-09-01 00:00:00,11/146,29.64,2.78,63.9,23,2
9,HMRKB Herath (SL),1999-2018,93,170,25993,12157,433,9/127,14/184,28.07,2.8,60.0,34,9


#### Number of rows and column present

In [4]:
print("Number of rows -",df.shape[0])
print("Number of columns -",df.shape[1])

Number of rows - 79
Number of columns - 14


#### Present Columns and their meaning

In [5]:
print(df.columns)

Index(['Player',   'Span',    'Mat',   'Inns',  'Balls',   'Runs',   'Wkts',
          'BBI',    'BBM',    'Ave',   'Econ',     'SR',        5,       10],
      dtype='object')


##### Column names 
1. **Player** - Name of the players who played test cricket
2. **Span** - Career span of each player
3. **Mat** - Number of matches played
4. **Inns** - Total innings played by the player
5. **Balls** - Total number of balls bowled by the player
6. **Runs** - The number of runs conceded by the player
7. **Wkts** - Total number of wickets taken by the player
8. **BBI** - Best bowling in an innings of the player	
9. **BBM** - Best bowling in a match of the player
10. **Ave** - The average number of runs conceded per wicket by the player. (Runs/W)
11. **Econ** - The average number of runs conceded per over. (Runs/Overs bowled)
12. **SR** - The average number of balls bowled per wicket taken. (Balls/W)
13. **5** - The number of innings in which the bowler took at least five wickets.
14. **10** - The number of matches in which the bowler took at least ten wickets.

#### Statistical measures from the dataset

In [6]:
display(df.describe())

Unnamed: 0,Mat,Inns,Balls,Runs,Wkts,Ave,Econ,SR,5,10
count,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0
mean,80.101266,144.797468,18630.303797,8595.506329,317.101266,27.466456,2.806582,59.187342,16.35443,2.797468
std,28.537692,51.04231,7190.036515,3080.256645,121.731587,3.657561,0.351666,9.349337,9.642372,3.235935
min,37.0,67.0,8785.0,4846.0,200.0,20.94,1.98,41.2,3.0,0.0
25%,60.5,110.0,13580.0,6456.5,229.0,24.425,2.6,53.3,9.5,1.0
50%,71.0,129.0,16498.0,7742.0,266.0,28.0,2.82,57.4,14.0,2.0
75%,93.0,169.0,21742.5,9756.0,374.5,29.87,3.08,63.95,20.5,3.5
max,166.0,301.0,44039.0,18355.0,800.0,34.79,3.46,91.9,67.0,22.0


##### Statistic measures
- Count- total numbers of observation
- mean- average of the values
- std- standard deviation
- min- minimum value in the column
- 25%- 25% of all value is under the certain column value
- 50%- 50% of all value is under the certain column value
- 75%- 75% of all value is under the certain column value
- max- maximum value in the column

##### Observations
- On an average a played has played 80 matches
- Minimum 37 matches were played by a player
- 25% of players have played 60 matches or less
- 50% of players have played 71 matches or less
- 75% of players have played 93 matches or less
- Maximum 166 matches were played by a player
- Maximum 800 wickets were taken by a player
- Minimum 67 innings were played by a player

Many other primary insights can be taken from the the statistical measures

#### Renaming the columns

In [7]:
df = df.rename(columns={'Mat':'Match','Inns':'Innings','Wkts':'Wickets','BBI':'Best Bowling(Innings)','BBM':'Best Bowling(Match)','Ave':'Bowling avg','Econ':'Economy rate','SR':'Strike rate',5:'5 wickets',10:'10 wickets'})
display(df.head(3))

Unnamed: 0,Player,Span,Match,Innings,Balls,Runs,Wickets,Best Bowling(Innings),Best Bowling(Match),Bowling avg,Economy rate,Strike rate,5 wickets,10 wickets
0,M Muralitharan (ICC/SL),1992-2010,133,230,44039,18180,800,1951-09-01 00:00:00,16/220,22.72,2.47,55.0,67,22
1,SK Warne (AUS),1992-2007,145,273,40705,17995,708,1971-08-01 00:00:00,12/128,25.41,2.65,57.4,37,10
2,A Kumble (INDIA),1990-2008,132,236,40850,18355,619,1974-10-01 00:00:00,14/149,29.65,2.69,65.9,35,8


#### Datatypes of the variables in the coulmn

In [8]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 79 entries, 0 to 78
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Player                 79 non-null     object 
 1   Span                   79 non-null     object 
 2   Match                  79 non-null     int64  
 3   Innings                79 non-null     int64  
 4   Balls                  79 non-null     int64  
 5   Runs                   79 non-null     int64  
 6   Wickets                79 non-null     int64  
 7   Best Bowling(Innings)  79 non-null     object 
 8   Best Bowling(Match)    79 non-null     object 
 9   Bowling avg            79 non-null     float64
 10  Economy rate           79 non-null     float64
 11  Strike rate            79 non-null     float64
 12  5 wickets              79 non-null     int64  
 13  10 wickets             79 non-null     int64  
dtypes: float64(3), int64(7), object(4)
memory usage: 8.8+ KB
Non

##### Observations:

> Number of entries(rows) are 79 and and 14 columns.<br><br>
There are
> - 3 float type variables- Bowling average, Economy rate, Strike rate<br>
> - 4 String type variables- Best Bowling in Innings, Best Bowling in Match, Player, Span<br>
> - 7 integer type variables- Match, Innings, Balls, Runs, Wickets, 5 wickets in and innings, 10 wickets in an innings

> From the non-null columns, we can see that there are **no missing value** as of now.<br>

#### Droping/ removing Best Bowling in Innings column

Few data in the column BBI and BBM have got formatted as date instead of number in the format of 'xx/xxx'.<br>
Removing BBI column as of now 

In [9]:
df.drop('Best Bowling(Innings)',axis=1 , inplace=True)
display(df.head(5))

Unnamed: 0,Player,Span,Match,Innings,Balls,Runs,Wickets,Best Bowling(Match),Bowling avg,Economy rate,Strike rate,5 wickets,10 wickets
0,M Muralitharan (ICC/SL),1992-2010,133,230,44039,18180,800,16/220,22.72,2.47,55.0,67,22
1,SK Warne (AUS),1992-2007,145,273,40705,17995,708,12/128,25.41,2.65,57.4,37,10
2,A Kumble (INDIA),1990-2008,132,236,40850,18355,619,14/149,29.65,2.69,65.9,35,8
3,JM Anderson (ENG),2003-2021,162,301,34791,16457,617,1971-11-01 00:00:00,26.67,2.83,56.3,30,3
4,GD McGrath (AUS),1993-2007,124,243,29248,12186,563,2021-10-27 00:00:00,21.64,2.49,51.9,29,3
