# IPL 2022 Player Statistics
Dataset link:- https://www.kaggle.com/datasets/vora1011/ipl-2022-player-statistics

## About Dataset
### Context

Many of us have watched the movie Moneyball. The film summarizes that with proper scouting and believing in the statistics of players, a great team can be built. However, this analysis can be done with an excellent dataset to help analyze the players, strengths, and weaknesses.
So with the new season of IPL starting very soon and with the squad finalized, this dataset is a concise dataset to get statistics of all the players. All you need now is to get this data and start analyzing it to make your dream team, which can also help you play all the fantasy leagues coming your way.

### Content

This datasheet has a single CSV file with all players in the list. It contains details of each player's all-time batting, bowling, and fielding figures in IPL and T20 Matches. 
File IPLData.csv contains details of all players with their all-time IPL stats.
File T20Data.csv contains details of all players with their all-time T20 stats, either international or domestic, apart from IPL.


## Importing Important libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

## Importing dataset

In [None]:
df=pd.read_csv('ipl_2022_dataset.csv')
df.head(5)

In [None]:
df.shape

#### So there are total of 633 players in this auction.

In [None]:
df.columns

## Data Preprocessing

In [None]:
# since ther's an unnecessary column called unnamed so removing it
df.drop('Unnamed: 0',axis=1,inplace=True)
df.head(2)

In [None]:
df.info()

In [None]:
# finding out the null values
df.isnull().sum()

In [None]:
# let's check why there are so many null values in Cost column

df[df['COST IN ₹ (CR.)'].isnull()]

#### So from here we can see that the cost column has so many NaN values because they all were unsold in this 2022 IPL. Hence we can fill these NaN values as 0.

Also there's no need of Cost in Dollar column so dropping it.

In [None]:
# Renaming 'COST IN ₹ (CR.)' to bit simpler
df.rename(columns = {'COST IN ₹ (CR.)':'COST IN CR'}, inplace = True)

In [None]:
df.drop('Cost IN $ (000)', axis=1, inplace=True)

In [None]:
df['COST IN CR']=df['COST IN CR'].fillna(0)
df.isnull().sum()

In [None]:
# Now lets check why there are so many null values in 2021 Squad
df[df['2021 Squad'].isnull()]

#### So these players are the new enterants in 2022 or remain unsold in 2021 autions. Hence we will fill those NaN values with 'Not Participated'

In [None]:
df['2021 Squad']=df['2021 Squad'].fillna('Not Participated')
df.isnull().sum()

## Exploratory Data Analysis

### Lets just create a column of all the players to check there status as sold or unsold.

In [None]:
# making a variable as teams and storing name of all the team names  
teams = df[df['COST IN CR']>0]['Team'].unique()
teams

In [None]:
# making another column just to check the status of player i.e, whether the player is sold or unsold
df['Status']= df['Team'].replace(teams,'sold')
df.head(5)

In [None]:
df[df['Player'].duplicated(keep=False)]

### How many players have participated in 2022 auction?

In [None]:
df.shape[0]

### What is the distribution of players that have participated according to their expertise?

In [None]:
# storing this value inside a variable
types=df['TYPE'].value_counts()
types.reset_index()

In [None]:
# making a pie chart so as to viualize it as well as to know the percentage distribution
plt.pie(types.values, labels=types.index, labeldistance=1.2, autopct='%1.2f%%', shadow=True, startangle=60)
plt.title('Role of players participated', fontsize=15)

In [None]:
# Players sold and unsold using a bar graph

plt.figure(figsize=(10,5))
fig=sns.countplot(df['Status'],palette=['Orange','Blue'])
plt.xlabel('Sold or unsold')
plt.ylabel('No. of players')
plt.title('Sold vs unsold', fontsize=15)

#to display labels:-
for p in fig.patches:
    fig.annotate(format(p.get_height(), '.0f'),(p.get_x()+p.get_width()/2., p.get_height()), ha='center', va='center',
                 xytext=(0,4),textcoords='offset points')

In [None]:
df.groupby('Status')['Player'].count()

### Total no. of players bought by each team

In [None]:
plt.figure(figsize=(20,10))
fig = sns.countplot(df[df['Team']!='Unsold']['Team'])
plt.xlabel('Team Names')
plt.ylabel('Number of Players')
plt.title('Players bought by each team', fontsize=12)
plt.xticks(rotation=70)

#to display labels:-
for p in fig.patches:
    fig.annotate(format(p.get_height(), '.0f'),(p.get_x()+p.get_width()/2., p.get_height()), ha='center', va='center',
                 xytext=(0,4),textcoords='offset points')

In [None]:
df['Base Price'].unique()

In [None]:
# making new column to normalize the values as came from auction and one who are retained in the team.
df['Retention']=df['Base Price']

df['Retention'].replace(['2 Cr', '40 Lakh', '20 Lakh', '1 Cr', '75 Lakh',
       '50 Lakh', '30 Lakh', '1.5 Cr'], 'From Auction', inplace=True)

In [None]:
df['Base Price'].replace('Draft Pick', 0, inplace=True)
df.head(5)

In [None]:
# making 2 more columns for segregate base price from its unit.
df['base_price_unit']=df['Base Price'].apply(lambda x:str(x).split(' ')[-1])
df['base_price']=df['Base Price'].apply(lambda x:str(x).split(' ')[0])

In [None]:
df['base_price'].replace('Retained',0,inplace=True)
df.head(5)

In [None]:
# Total players retained and bought
rt=df.groupby(['Team','Retention'])['Retention'].count()

In [None]:
plt.figure(figsize=(20,10))
fig = sns.countplot(df[df['Team']!='Unsold']['Team'],hue=df['TYPE'])
plt.title('Player in each team',fontsize=15)
plt.xlabel('Team Names')
plt.ylabel('Number of Players')
plt.xticks(rotation=50)

### Highest amount spent on a single player by each team

In [None]:
df[df['Retention']=='From Auction'].groupby(['Team'])['COST IN CR'].max()[:-1].sort_values(ascending=False)

### Player retained at maximum price

In [None]:
df[df['Retention']=='Retained'].sort_values(by='COST IN CR',ascending =False)

### Top 5 bowlers

In [None]:
df[(df['Retention']=='From Auction') & (df['TYPE']=='BOWLER')].sort_values(by = 'COST IN CR', ascending=False).head(5)

### Top 5 Batsman

In [None]:
df[(df['Retention']=='From Auction') & (df['TYPE']=='BATTER')].sort_values(by = 'COST IN CR', ascending=False).head(5)

### Top 5 ALL ROUNDER

In [None]:
df[(df['Retention']=='From Auction') & (df['TYPE']=='ALL-ROUNDER')].sort_values(by = 'COST IN CR', ascending=False).head(5)

### All the players that remained unsold or not participated in 2022 IPL Auction

In [None]:
unsold_players=df[(df['2021 Squad'] != 'Not Participated') & (df.Team=='Unsold')][['Player','2021 Squad']]
unsold_players