# Analysis of Pakistan T20 Players' Performance

This Jupyter Notebook provides an analysis of Pakistan T20 players' performance based on various metrics. The workflow includes reading data from CSV files, processing the data, and generating insights for both batters and bowlers. Below is a step-by-step documentation of the workflow:

# Reading Files Data

We start by importing the necessary libraries and reading the data from CSV files:
- `Pakistan_T20_Players_Innings_Info.csv`
- `T20_Matches_Info.csv`
- `T20_Players_Name.csv`

# Data Cleaning and Preparation

The data is cleaned and prepared by:
- Converting data types for various columns in the `T20_Matches` dataframe.
- Filling missing values in specific columns with default values.
- Merging the `T20_Matches` dataframe with `matches_info` on the `match_type_number` column.
- Filtering the data to include only matches after January 1, 2018.

# Top Batters Analysis

We analyze the top batters by:
- Extracting relevant columns (`Batter` and `Runs_Batter`) and creating new columns for the number of fours and sixes.
- Grouping the data by `Batter` and calculating the total runs, fours, sixes, and balls faced.
- Calculating the strike rate for each batter.
- Selecting the top 7 batters based on total runs and saving the results to a CSV file.

# Top Bowlers Analysis

We analyze the top bowlers by:
- Filtering out the data for bowlers not from Pakistan.
- Extracting relevant columns (`Bowler`, `Wickets`, `Runs_Total`, and `Runs_Extras`).
- Grouping the data by `Bowler` and calculating the total wickets, runs conceded, and extras.
- Calculating the economy rate for each bowler.
- Selecting the top 4 bowlers based on total wickets and saving the results to a CSV file.

# Adding Images

We add images for the top batters and bowlers by:
- Defining file paths for the images.
- Reading the top batters and bowlers data from the CSV files.
- Adding a new column for the image paths in the respective dataframes.
- Saving the updated dataframes back to the CSV files.

# Summary

This notebook provides a comprehensive analysis of the performance of Pakistan T20 players, highlighting the top performers among batters and bowlers. The results are saved to CSV files for further use.
```

In [12]:
import pandas as pd


# Reading Files Data

In [13]:
T20_Matches = pd.read_csv('Pakistan_T20_Players_Innings_Info.csv')
matches_info = pd.read_csv('T20_Matches_Info.csv')
players_Name = pd.read_csv('T20_Players_Name.csv')

T20_Matches.head(5)

In [17]:
T20_Matches['Over'] = T20_Matches['Over'].astype('int16')
T20_Matches['Runs_Batter'] = T20_Matches['Runs_Batter'].astype('int16')
T20_Matches['Runs_Extras'] = T20_Matches['Runs_Extras'].astype('int16')
T20_Matches['Runs_Total'] = T20_Matches['Runs_Total'].astype('int16')
T20_Matches['Wickets'] = T20_Matches['Wickets'].astype('int16')
T20_Matches['Team'] = T20_Matches['Team'].astype('category')
T20_Matches['Batter'] = T20_Matches['Batter'].astype('category')
T20_Matches['Bowler'] = T20_Matches['Bowler'].astype('category')
T20_Matches['Non_Striker'] = T20_Matches['Non_Striker'].astype('category')
T20_Matches['fielder_name'] = T20_Matches['fielder_name'].fillna('Unknown')
T20_Matches['fielder_name'] = T20_Matches['fielder_name'].astype('category')
T20_Matches['kind'] = T20_Matches['kind'].fillna('Not Out')
T20_Matches['kind'] = T20_Matches['kind'].astype('category')
T20_Matches['player_out'] = T20_Matches['player_out'].fillna('Not Out')
T20_Matches['player_out'] = T20_Matches['player_out'].astype('category')
T20_Matches['match_type_number'] = pd.to_numeric(T20_Matches['match_type_number'], errors='coerce')
T20_Matches['match_type_number'] = T20_Matches['match_type_number'] .fillna(T20_Matches['match_type_number'].mean())

In [18]:
T20_Matches.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56234 entries, 0 to 56233
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Team               56234 non-null  category
 1   Over               56234 non-null  int16   
 2   Batter             56234 non-null  category
 3   Bowler             56234 non-null  category
 4   Non_Striker        56234 non-null  category
 5   Runs_Batter        56234 non-null  int16   
 6   Runs_Extras        56234 non-null  int16   
 7   Runs_Total         56234 non-null  int16   
 8   Wickets            56234 non-null  int16   
 9   fielder_name       56234 non-null  category
 10  kind               56234 non-null  category
 11  player_out         56234 non-null  category
 12  match_type_number  56234 non-null  int64   
dtypes: category(7), int16(5), int64(1)
memory usage: 1.7 MB


In [19]:
players_Name.head(5)

Unnamed: 0,name,team,match_type_number,match_type,venue
0,Sharjeel Khan,Pakistan,568,T20,Dubai International Cricket Stadium
1,Khalid Latif,Pakistan,568,T20,Dubai International Cricket Stadium
2,Babar Azam,Pakistan,568,T20,Dubai International Cricket Stadium
3,Shoaib Malik,Pakistan,568,T20,Dubai International Cricket Stadium
4,Umar Akmal,Pakistan,568,T20,Dubai International Cricket Stadium


In [20]:
T20_Matches = T20_Matches.merge(matches_info, on='match_type_number', how='left')

In [21]:
T20_Matches['dates'] = pd.to_datetime(T20_Matches['dates'])

In [22]:
T20_Matches = T20_Matches[T20_Matches['dates'] > '2018-01-01']

In [23]:
T20_Matches 

Unnamed: 0,Team,Over,Batter,Bowler,Non_Striker,Runs_Batter,Runs_Extras,Runs_Total,Wickets,fielder_name,...,winner,winner_by,overs,season,team_type,team1,team2,toss_winner,toss_decision,venue
0,Pakistan,0,Babar Azam,MA Starc,Fakhar Zaman,0,0,0,0,Unknown,...,Australia,{'wickets': 7},20,2019/20,international,Australia,Pakistan,Pakistan,bat,Manuka Oval
1,Pakistan,0,Babar Azam,MA Starc,Fakhar Zaman,1,0,1,0,Unknown,...,Australia,{'wickets': 7},20,2019/20,international,Australia,Pakistan,Pakistan,bat,Manuka Oval
2,Pakistan,0,Fakhar Zaman,MA Starc,Babar Azam,0,0,0,0,Unknown,...,Australia,{'wickets': 7},20,2019/20,international,Australia,Pakistan,Pakistan,bat,Manuka Oval
3,Pakistan,0,Fakhar Zaman,MA Starc,Babar Azam,0,0,0,0,Unknown,...,Australia,{'wickets': 7},20,2019/20,international,Australia,Pakistan,Pakistan,bat,Manuka Oval
4,Pakistan,0,Fakhar Zaman,MA Starc,Babar Azam,2,0,2,0,Unknown,...,Australia,{'wickets': 7},20,2019/20,international,Australia,Pakistan,Pakistan,bat,Manuka Oval
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56229,Pakistan,18,Sarfraz Ahmed,B Muzarabani,Shoaib Malik,2,0,2,0,Unknown,...,Pakistan,{'wickets': 7},20,2018,international,Zimbabwe,Pakistan,Pakistan,field,Harare Sports Club
56230,Pakistan,18,Sarfraz Ahmed,B Muzarabani,Shoaib Malik,4,0,4,0,Unknown,...,Pakistan,{'wickets': 7},20,2018,international,Zimbabwe,Pakistan,Pakistan,field,Harare Sports Club
56231,Pakistan,18,Sarfraz Ahmed,B Muzarabani,Shoaib Malik,1,0,1,0,Unknown,...,Pakistan,{'wickets': 7},20,2018,international,Zimbabwe,Pakistan,Pakistan,field,Harare Sports Club
56232,Pakistan,18,Shoaib Malik,B Muzarabani,Sarfraz Ahmed,0,0,0,0,Unknown,...,Pakistan,{'wickets': 7},20,2018,international,Zimbabwe,Pakistan,Pakistan,field,Harare Sports Club


In [24]:
Top_Batters = T20_Matches.loc[:,['Batter','Runs_Batter']]

In [25]:
Top_Batters['Four'] = Top_Batters['Runs_Batter'].apply(lambda x: 1 if x == 4 else 0)
Top_Batters['Six'] = Top_Batters['Runs_Batter'].apply(lambda x: 1 if x == 6 else 0)

In [26]:
Top_Batters = Top_Batters.groupby('Batter').agg({'Runs_Batter':'sum','Four':'sum','Six':'sum','Batter':'count'}).sort_values(by='Runs_Batter',ascending=False)

  Top_Batters = Top_Batters.groupby('Batter').agg({'Runs_Batter':'sum','Four':'sum','Six':'sum','Batter':'count'}).sort_values(by='Runs_Batter',ascending=False)


In [27]:
Top_Batters = Top_Batters.rename(columns={'Batter':'Ball_Faced'})

In [28]:
Top_Batters['Strike_Rate'] = ((Top_Batters['Runs_Batter']/Top_Batters['Ball_Faced'])*100).round(2)

In [29]:
Top_Batters = Top_Batters.reset_index().head(7)

In [30]:
Top_Batters.to_csv('Top_7_Batters.csv',index=False)

# Determining criteria for Bowlers

In [31]:
Top_Bowlers =  T20_Matches.loc[T20_Matches['Team'] != 'Pakistan']

In [32]:
Top_Bowlers = Top_Bowlers.loc[:,['Bowler','Wickets','Runs_Total','Runs_Extras']]

In [33]:
Top_Bowlers = Top_Bowlers.groupby('Bowler').agg({'Wickets':'sum','Runs_Total':'sum','Runs_Extras':'sum','Bowler':'count'}).sort_values(by='Wickets',ascending=False).head(4)

  Top_Bowlers = Top_Bowlers.groupby('Bowler').agg({'Wickets':'sum','Runs_Total':'sum','Runs_Extras':'sum','Bowler':'count'}).sort_values(by='Wickets',ascending=False).head(4)


In [34]:
Top_Bowlers = Top_Bowlers.rename(columns={'Bowler':'Ball'})

In [35]:
Top_Bowlers = Top_Bowlers.reset_index()

In [36]:
Top_Bowlers['Economy'] = ((Top_Bowlers['Runs_Total'] + Top_Bowlers['Runs_Extras'])/Top_Bowlers['Ball']).round(2)

In [37]:
Top_Bowlers.to_csv('Top_4_Bowlers.csv',index=False)

# Adding Images 

In [38]:
Top_Batters

Unnamed: 0,Batter,Runs_Batter,Four,Six,Ball_Faced,Strike_Rate
0,Babar Azam,3704,399,67,2924,126.68
1,Mohammad Rizwan,3280,275,93,2656,123.49
2,Fakhar Zaman,1683,157,73,1278,131.69
3,Iftikhar Ahmed,941,66,42,746,126.14
4,Mohammad Hafeez,846,72,37,604,140.07
5,Shoaib Malik,583,42,25,380,153.42
6,Shadab Khan,542,33,30,398,136.18


In [39]:
Top_Bowlers

Unnamed: 0,Bowler,Wickets,Runs_Total,Runs_Extras,Ball,Economy
0,Haris Rauf,116,2328,128,1708,1.44
1,Shaheen Shah Afridi,107,2178,159,1680,1.39
2,Shadab Khan,100,2202,50,1791,1.26
3,Imad Wasim,45,994,43,944,1.1


In [32]:
Batters = [
    'C:\\Users\\4A\\Documents\\projects\\Further work\\babar1.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\rizwan.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\fakhar.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\iftikhar.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\hafeez.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\malik.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\shadab.jpeg'
]
Bowlers = [
    'C:\\Users\\4A\\Documents\\projects\\Further work\\haris.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\shaheen.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\shadab.jpeg', 
    'C:\\Users\\4A\\Documents\\projects\\Further work\\imad.jpeg'
]

In [33]:
import pandas as pd
batsman = pd.read_csv('Top_7_Batters.csv')
bowlers = pd.read_csv('Top_4_Bowlers.csv')
batsman['Pictures'] = Batters
bowlers['Pictures'] = Bowlers

In [34]:
batsman.to_csv('Top_7_Batters.csv',index=False)
bowlers.to_csv('Top_4_Bowlers.csv',index=False)