## Problem Statement:
You have been hired as a data analyst by a sports management company. They are interested
in forming a new team for the upcoming IPL Season 2024 and want your expertise to suggest
players that will maximize their chances of winning matches. Your task is to analyze the IPL
dataset and recommend the top-performing players in various positions to include in the new
team.


## Dataset:
https://www.kaggle.com/datasets/anandkumarsahu09/ipl-player-stats-20162022

# Tasks for Player Selection and Analysis:


## 1. Data Loading and Inspection:

### ● Load the IPL dataset into your programming environment.

### ● Print the first few rows to understand the structure and content of the data.

### ● Check the dimensions of the dataset.

### ● Identify the different variables/columns available in the dataset and their meanings.

In [1]:
#importing basics libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")


In [12]:
#first readings all the datasets of the ipl matches.
df1 = pd.read_csv("ipl 2017 data/FACT_BALL_BY_BALL.csv")
df2 = pd.read_excel("ipl 2017 data/DIM_MATCH.xlsx")
df3 = pd.read_csv("ipl 2017 data/DIM_PLAYER.csv",encoding= 'unicode_escape')
df4 = pd.read_csv("ipl 2017 data/DIM_PLAYER_MATCH.csv",encoding= 'unicode_escape')
df5 = pd.read_csv("ipl 2017 data/DIM_TEAM.csv")

In [17]:
df1.head() #first dataset tells about the ball by ball details in the match.

Unnamed: 0,Ball_key,MatcH_id,Over_id,Ball_id,Innings_No,Team_Batting,Team_Bowling,Striker_Batting_Position,StrikerKey,NonStrikerKey,...,NONStriker_SK,Fielder_match_SK,Fielder_SK,Bowler_match_SK,BOWLER_SK,PlayerOut_match_SK,BattingTeam_SK,BowlingTeam_SK,Keeper_Catch,Player_out_sk
0,3359870010101,335987,1,1,1,1,2,1.0,33598700001,33598700002,...,1,-1,-1,12702,13,-1,0,1,0,
1,3359870010102,335987,1,2,1,1,2,2.0,33598700002,33598700001,...,0,-1,-1,12702,13,-1,0,1,0,
2,3359870010103,335987,1,3,1,1,2,2.0,33598700002,33598700001,...,0,-1,-1,12702,13,-1,0,1,0,
3,3359870010104,335987,1,4,1,1,2,2.0,33598700002,33598700001,...,0,-1,-1,12702,13,-1,0,1,0,
4,3359870010105,335987,1,5,1,1,2,2.0,33598700002,33598700001,...,0,-1,-1,12702,13,-1,0,1,0,


In [16]:
print(df1.columns)
print(df1.shape)

Index(['Ball_key', 'MatcH_id', 'Over_id', 'Ball_id', 'Innings_No',
       'Team_Batting', 'Team_Bowling', 'Striker_Batting_Position',
       'StrikerKey', 'NonStrikerKey', 'BowlerKey', 'PlayerOutKey',
       'FeilderKey', 'Extra_Type', 'Runs_Scored', 'Extra_runs', 'Wides',
       'Legbyes', 'Byes', 'Noballs', 'Penalty', 'Bowler_Extras', 'Out_type',
       'Caught', 'Bowled', 'Run_out', 'LBW', 'Retired_hurt', 'Stumped',
       'caught_and_bowled', 'hit_wicket', 'ObstructingFeild', 'Bowler_Wicket',
       'Match_Date', 'Season', 'Striker', 'Non_Striker', 'Bowler',
       'Player_Out', 'Fielders', 'Striker_match_SK', 'StrikerSK',
       'NonStriker_match_SK', 'NONStriker_SK', 'Fielder_match_SK',
       'Fielder_SK', 'Bowler_match_SK', 'BOWLER_SK', 'PlayerOut_match_SK',
       'BattingTeam_SK', 'BowlingTeam_SK', 'Keeper_Catch', 'Player_out_sk'],
      dtype='object')
(150451, 53)


In [18]:
df2.head()

Unnamed: 0,Match_SK,match_id,Team1,Team2,match_date,Season_Year,Venue_Name,City_Name,Country_Name,Toss_Winner,match_winner,Toss_Name,Win_Type,Outcome_Type,ManOfMach,Win_Margin,Country_id
0,546,980964,Royal Challengers Bangalore,Kolkata Knight Riders,2016-05-02,2016,M Chinnaswamy Stadium,Bangalore,India,Kolkata Knight Riders,Kolkata Knight Riders,field,wickets,Result,AD Russell,5.0,1
1,547,980966,Gujarat Lions,Delhi Daredevils,2016-05-03,2016,Saurashtra Cricket Association Stadium,Rajkot,India,Delhi Daredevils,Delhi Daredevils,field,wickets,Result,RR Pant,8.0,1
2,548,980968,Kolkata Knight Riders,Kings XI Punjab,2016-05-04,2016,Eden Gardens,Kolkata,India,Kings XI Punjab,Kolkata Knight Riders,field,runs,Result,AD Russell,7.0,1
3,549,980970,Delhi Daredevils,Rising Pune Supergiants,2016-05-05,2016,Feroz Shah Kotla,Delhi,India,Rising Pune Supergiants,Rising Pune Supergiants,field,wickets,Result,AM Rahane,7.0,1
4,550,980972,Sunrisers Hyderabad,Gujarat Lions,2016-05-06,2016,"Rajiv Gandhi International Stadium, Uppal",Hyderabad,India,Sunrisers Hyderabad,Sunrisers Hyderabad,field,wickets,Result,B Kumar,5.0,1


In [19]:
df2.shape

(637, 17)

In [20]:
df3.head()  #datset 3 basically tells about the personal details of the each player

Unnamed: 0,PLAYER_SK,Player_Id,Player_Name,DOB,Batting_hand,Bowling_skill,Country_Name
0,0,1,SC Ganguly,1972-07-08,Left-hand bat,Right-arm medium,India
1,1,2,BB McCullum,1981-09-27,Right-hand bat,Right-arm medium,New Zealand
2,2,3,RT Ponting,1974-12-19,Right-hand bat,Right-arm medium,Australia
3,3,4,DJ Hussey,1977-07-15,Right-hand bat,Right-arm offbreak,Australia
4,4,5,Mohammad Hafeez,1980-10-17,Right-hand bat,Right-arm offbreak,Pakistan


In [21]:
df3.shape

(497, 7)

In [22]:
df4.head()

Unnamed: 0,Player_match_SK,PlayerMatch_key,Match_Id,Player_Id,Player_Name,DOB,Batting_hand,Bowling_skill,Country_Name,Role_Desc,...,Season_year,is_manofThematch,Age_As_on_match,IsPlayers_Team_won,Batting_Status,Bowling_Status,Player_Captain,Opposit_captain,Player_keeper,Opposit_keeper
0,-1,-1,-1,-1,,,,,,,...,,,,,,,,,,
1,12694,33598700006,335987,6,R Dravid,1973-01-11,Right-hand bat,Right-arm offbreak,India,Captain,...,2008.0,0.0,35.0,0.0,,,R Dravid,SC Ganguly,MV Boucher,WP Saha
2,12695,33598700007,335987,7,W Jaffer,1978-02-16,Right-hand bat,Right-arm offbreak,India,Player,...,2008.0,0.0,30.0,0.0,,,R Dravid,SC Ganguly,MV Boucher,WP Saha
3,12696,33598700008,335987,8,V Kohli,1988-11-05,Right-hand bat,Right-arm medium,India,Player,...,2008.0,0.0,20.0,0.0,,,R Dravid,SC Ganguly,MV Boucher,WP Saha
4,12697,33598700009,335987,9,JH Kallis,1975-10-16,Right-hand bat,Right-arm fast-medium,South Africa,Player,...,2008.0,0.0,33.0,0.0,,,R Dravid,SC Ganguly,MV Boucher,WP Saha


In [23]:
df4.shape

(13993, 22)

In [24]:
df5.head() #fifth dataset tells about the total team names in the ipl.

Unnamed: 0,Team_SK,Team_Id,Team_Name
0,0,1,Kolkata Knight Riders
1,1,2,Royal Challengers Bangalore
2,2,3,Chennai Super Kings
3,3,4,Kings XI Punjab
4,4,5,Rajasthan Royals


In [25]:
df5.shape

(13, 3)

In [None]:
2. Data Cleaning and Preparation:
● Handle missing values appropriately (e.g., fill or drop missing values).
● Remove irrelevant columns that are not necessary for player analysis.
● Convert data types if required (e.g., converting string dates to datetime objects).
