<h1 style="text-align:center;">IPL 2023 Auction Analytics</h1>
<br>
<img src="https://purneauniversity.org/wp-content/uploads/2022/10/IPL-2023-Auction.png" >

The IPL 2023 Player auction featured a total of 405 players who were set to go under the hammer in Kochi on 23rd December. The initial list had over 991 cricketers who had registered for the auction. However, the final list was trimmed to 405. There were a total of 87 slots spread across ten franchises up for grabs out of which 30 were slotted for overseas players. Among the 405 players, 273 were Indian players, 132 were overseas players. Out of 87 available slots 80 were filled and the rest remained balanced.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [None]:
df = pd.read_csv('/kaggle/input/ipl-2023-auction-dataset/ipl_2023_dataset.csv')
df.head()

In [None]:
df.shape

In [None]:
#renaming the columns for easy access
df.columns = ['name','base_price','type','cost_rs','cost_usd','2022_s','2023_s']

In [None]:
df.head()

In [None]:
df.info()

There are clearly more than 50% null values for cost and 2022_s, these need to be analyzed first

In [None]:
df.isna().sum()

In [None]:
df[(df['2023_s']=='Unsold') & (df.cost_rs.isna())]

So all the Unsold players here have null values for cost_rs and cost_usd

In [None]:
df.nunique()

Clearly we have 3 players who are duplicate, we can assume that these just have duplicate names.

In [None]:
df.duplicated().sum()

In [None]:
df.info()

There are no row duplicates in the dataset

#### Univariate Analysis
<hr>


In [None]:
print(df.base_price.value_counts())
print('converting "Retained" to 0')
df.base_price = np.where(df.base_price=='Retained',0,df.base_price).astype('float')
#convert base_price to Crores as we have cost also in crores
df.base_price = df.base_price/10000000.0
print(df.base_price.value_counts())

In [None]:
df.base_price.value_counts().plot(kind='bar')
plt.xlabel('Base Price')
plt.show()

There are 270 players with base price 0.2 Cr, there are 163 players who are 'Retained' and 61 players with base price 0.5 Cr.

In [None]:
df.type.value_counts()

Most of the players in IPL auction are ALL_ROUNDER followed by BOWLER and then BATSMAN.

In [None]:
plt.hist(df.cost_rs,bins=18,color = "skyblue",ec='blue')
plt.show()

As per the above graph, a significant number of players are sold less than 2.5 Cr

In [None]:
print("INR/USD factor")
print((df.cost_rs/df.cost_usd).value_counts())

As there is a a constant factor between INR and USD, it is a same feature and we can remove cost_usd from our scope of analysis

In [None]:
df.drop('cost_usd',axis=1,inplace=True)

In [None]:
#2022 Squads
df['2022_s'].value_counts()

In [None]:
df['2023_s'].value_counts()

This year there are 325 players who are unsold

In [None]:
print('Year 2023 has '+ str(df.loc[df['2023_s']!='Unsold','2023_s'].count()) + ' and year 2022 had ' + str(df['2022_s'].count()) + ' players')

### Bivariate Analysis
<hr>

In [None]:
#lets analyse base price by type
print('\nTOTAL SUM/ PLAYER TYPE')
print(df.groupby('type').agg({'base_price':'sum'}))
print('\nAVG BASE/ PLAYER TYPE')
print(df.groupby('type').agg({'base_price':'mean'}))
print('\nSTD DEV./ PLAYER TYPE')
print(df.groupby('type').agg({'base_price':'std'}))
print('\nCALCULATE Coeff. of Variation')
print(df.groupby('type').agg({'base_price':'std'})/df.groupby('type').agg({'base_price':'mean'}))

<b>ALL-ROUNDERs have highest aggregate sum of base_price 58Cr followed by BOWLERS 48.35 but if we check the mean and variability, we observed that BATSMEN and WICKETKEEPERs win by base price.<br>
We can also say from previous analysis of type that although ALL-ROUNDERS and BOWLERS are higest in number but in auction BATSMEN and WICKETKEEPERs get better base price</b>
<br><br>
note: lets see if same trend can be obtained from cost of sold players.

In [None]:
df.groupby('type').agg({'cost_rs':'count'})

<b>Clearly ALL-ROUNDERS and BOWLERS are most sold in 2023</b>

In [None]:
#lets analyse cost price by type

print('\nFor Year 2023\n\n')
print('\nTOTAL SUM COST/ PLAYER TYPE')
print(df.groupby('type').agg({'cost_rs':'sum'}))
print('\nAVG COST/ PLAYER TYPE')
print(df.groupby('type').agg({'cost_rs':'mean'}))
print('\nSTD DEV. COST/ PLAYER TYPE')
print(df.groupby('type').agg({'cost_rs':'std'}))
print('\nCALCULATE Coeff. of Variation')
print(df.groupby('type').agg({'cost_rs':'std'})/df.groupby('type').agg({'cost_rs':'mean'}))

<b>Observations</b>
<ol>
    <li>BOWLERS are sold more but the overall cost for them is very low with an avg cost price of 0.37 Cr.</li>
    <li>IN 2023, ALL-ROUNDERS are sold at better price with avg of 0.81 Cr but the variablity is quite high.</li>
    <li>In 2023, BATSMAN are the most expensive type of players sold as per the statistics above.</li>
    </ol>

In [None]:
df.corr()

We can say that there is a strong positive correlation between players' base price and cost in 2023. It means we can consider the analytics got by one of these prices as they are highly related

In [None]:
#setting for subplots
fig, axes = plt.subplots(1,2, figsize=(15,5))
fig.suptitle('Players bought by teams per player types')

tempgroup1 = df.groupby(['type','2022_s']).agg(count=('2022_s','count')).reset_index()
df1 = pd.pivot_table(tempgroup1,values=['2022_s'],columns='type',index='2022_s')
sns.barplot(ax=axes[0],data=tempgroup1, x='2022_s', y='count', hue='type')
axes[0].set_title('YEAR 2022')


tempgroup2 = df.loc[df['2023_s']!='Unsold'].groupby(['type','2023_s']).agg(count=('2023_s','count')).reset_index()
pd.pivot_table(tempgroup2,values=['2023_s'],columns='type',index='2023_s')
sns.barplot(ax=axes[1],data=tempgroup2, x='2023_s', y='count', hue='type')
axes[1].set_title('YEAR 2023')
plt.show()




We can see that Mumbai Indians are consistently hiring BATSMAN more over these 2 yearsand LSG has hiring more BATSMAN than last year

In [None]:
#Lets analyze which team spend more on the players this year
#setting for subplots
fig, axes = plt.subplots(1,2, figsize=(15,5))
fig.suptitle('Cost spent by teams per player types')

tempgroup1 = df.groupby(['type','2022_s']).agg(cost=('cost_rs','sum')).reset_index()
sns.barplot(ax=axes[0],data=tempgroup1, x='2022_s', y='cost', hue='type')
axes[0].set_title('YEAR 2022')


tempgroup2 = df.loc[df['2023_s']!='Unsold'].groupby(['type','2023_s']).agg(cost=('cost_rs','sum')).reset_index()
sns.barplot(ax=axes[1],data=tempgroup2, x='2023_s', y='cost', hue='type')
axes[1].set_title('YEAR 2023')
plt.show()




It is clear that this year teams have bought more batsman and all-rounder than last year. <br>
Also in year 2022, all the teams have spent 48 Crs on players and in 2023, 167 Crs have been spent