# **Exploratory Data Analysis of Teamwise Home and Away matches**

### **In this step we will be analysing the Teamwise Home and Away matches data**

In [2]:
# Importing the Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
# Reading the csv file
team_df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Exploratory Data Analysis/3. IPL/5. Team Matches Wins/teamwise_home_and_away.csv')

# **Step 1: Skimming over the Dataset and Analysing the Data**

### **In this step we will be analysing the dataset**

In [4]:
# Shape of the dataset
team_df.shape

(14, 7)

In [5]:
# Columns in the dataset
team_df.columns

Index(['team', 'home_wins', 'away_wins', 'home_matches', 'away_matches',
       'home_win_percentage', 'away_win_percentage'],
      dtype='object')

In [6]:
# Info of the dataset
team_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   team                 14 non-null     object 
 1   home_wins            14 non-null     int64  
 2   away_wins            14 non-null     int64  
 3   home_matches         14 non-null     int64  
 4   away_matches         14 non-null     int64  
 5   home_win_percentage  14 non-null     float64
 6   away_win_percentage  14 non-null     float64
dtypes: float64(2), int64(4), object(1)
memory usage: 912.0+ bytes


In [7]:
# Describe the dataset
team_df.describe

<bound method NDFrame.describe of                            team  ...  away_win_percentage
0        Rising Pune Supergiant  ...            62.500000
1                Mumbai Indians  ...            59.302326
2           Chennai Super Kings  ...            65.333333
3                Delhi Capitals  ...            70.000000
4           Sunrisers Hyderabad  ...            62.222222
5              Rajasthan Royals  ...            57.500000
6               Deccan Chargers  ...            34.375000
7               Kings XI Punjab  ...            51.764706
8   Royal Challengers Bangalore  ...            51.578947
9         Kolkata Knight Riders  ...            61.052632
10             Delhi Daredevils  ...            47.191011
11                Pune Warriors  ...            23.076923
12         Kochi Tuskers Kerala  ...            57.142857
13                Gujarat Lions  ...            75.000000

[14 rows x 7 columns]>

In [8]:
team_df

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,Rising Pune Supergiant,5,5,8,8,62.5,62.5
1,Mumbai Indians,58,51,101,86,57.425743,59.302326
2,Chennai Super Kings,51,49,89,75,57.303371,65.333333
3,Delhi Capitals,3,7,6,10,50.0,70.0
4,Sunrisers Hyderabad,30,28,63,45,47.619048,62.222222
5,Rajasthan Royals,29,46,67,80,43.283582,57.5
6,Deccan Chargers,18,11,43,32,41.860465,34.375
7,Kings XI Punjab,38,44,91,85,41.758242,51.764706
8,Royal Challengers Bangalore,35,49,85,95,41.176471,51.578947
9,Kolkata Knight Riders,34,58,83,95,40.963855,61.052632


In [9]:
# We can see that data cleaning and updatin is required - 'Delhi Capitals' - 'Delhi Daredevils', 'Pune Warriors' - 'Rising Pune Supergiant'

# **Step 2: Data Cleaning and Updating Columns**

### **In this step we will be cleaning the data (ex: removing null values, and empty columns if any)**

In [10]:
# Updating the team names and values
delhi_teams_df = team_df.loc[(team_df['team'] == 'Delhi Daredevils') | (team_df['team'] == 'Delhi Capitals')].reset_index()
pune_teams_df = team_df.loc[(team_df['team'] == 'Rising Pune Supergiant') | (team_df['team'] == 'Pune Warriors')].reset_index()

In [11]:
# Drop the index column
delhi_teams_df.drop(columns = ['index'])
pune_teams_df.drop(columns = ['index'])

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,Rising Pune Supergiant,5,5,8,8,62.5,62.5
1,Pune Warriors,6,6,20,26,30.0,23.076923


In [12]:
delhi_teams_df

Unnamed: 0,index,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,3,Delhi Capitals,3,7,6,10,50.0,70.0
1,10,Delhi Daredevils,25,42,72,89,34.722222,47.191011


In [13]:
# Let's update the 'Delhi Capitals' data
delhi_teams_df['home_wins'][0] = delhi_teams_df['home_wins'][0] + delhi_teams_df['home_wins'][1]
delhi_teams_df['away_wins'][0] = delhi_teams_df['away_wins'][0] + delhi_teams_df['away_wins'][1]

delhi_teams_df['home_matches'][0] = delhi_teams_df['home_matches'][0] + delhi_teams_df['home_matches'][1]
delhi_teams_df['away_matches'][0] = delhi_teams_df['away_matches'][0] + delhi_teams_df['away_matches'][1]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [14]:
# Let's update the 'Rising Pune Supergiant' data
pune_teams_df['home_wins'][0] = pune_teams_df['home_wins'][0] + pune_teams_df['home_wins'][1]
pune_teams_df['away_wins'][0] = pune_teams_df['away_wins'][0] + pune_teams_df['away_wins'][1]

pune_teams_df['home_matches'][0] = pune_teams_df['home_matches'][0] + pune_teams_df['home_matches'][1]
pune_teams_df['away_matches'][0] = pune_teams_df['away_matches'][0] + pune_teams_df['away_matches'][1]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [15]:
# Update the percentage values of both the tables
delhi_teams_df['home_win_percentage'][0] = (delhi_teams_df['home_wins'][0] / delhi_teams_df['home_matches'][0]) * 100
delhi_teams_df['away_win_percentage'][0] = (delhi_teams_df['away_wins'][0] / delhi_teams_df['away_matches'][0]) * 100

pune_teams_df['home_win_percentage'][0] = (pune_teams_df['home_wins'][0] / pune_teams_df['home_matches'][0]) * 100
pune_teams_df['away_win_percentage'][0] = (pune_teams_df['away_wins'][0] / pune_teams_df['away_matches'][0]) * 100

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [16]:
# Drop extra column
delhi_teams_df.drop(columns = ['index'], inplace = True)
pune_teams_df.drop(columns = ['index'], inplace = True)

In [17]:
# Lets drop the 'Delhi Daredevils and Rising Pune Supergiant' data from team_df
team_df.drop(team_df.index[(team_df["team"] == "Delhi Daredevils")], inplace = True)
team_df.drop(team_df.index[(team_df["team"] == "Delhi Capitals")], inplace = True)

team_df.drop(team_df.index[(team_df["team"] == "Rising Pune Supergiant")], inplace = True)
team_df.drop(team_df.index[(team_df["team"] == "Pune Warriors")], inplace = True)

In [18]:
team_df

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
1,Mumbai Indians,58,51,101,86,57.425743,59.302326
2,Chennai Super Kings,51,49,89,75,57.303371,65.333333
4,Sunrisers Hyderabad,30,28,63,45,47.619048,62.222222
5,Rajasthan Royals,29,46,67,80,43.283582,57.5
6,Deccan Chargers,18,11,43,32,41.860465,34.375
7,Kings XI Punjab,38,44,91,85,41.758242,51.764706
8,Royal Challengers Bangalore,35,49,85,95,41.176471,51.578947
9,Kolkata Knight Riders,34,58,83,95,40.963855,61.052632
12,Kochi Tuskers Kerala,2,4,7,7,28.571429,57.142857
13,Gujarat Lions,1,12,14,16,7.142857,75.0


In [19]:
delhi_teams_df.head(1)

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,Delhi Capitals,28,49,78,99,35.897436,49.494949


In [20]:
pune_teams_df.head(1)

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,Rising Pune Supergiant,11,11,28,34,39.285714,32.352941


In [21]:
# Let's add updated rows
team_df = team_df.append(delhi_teams_df.head(1), ignore_index = True)
team_df = team_df.append(pune_teams_df.head(1), ignore_index = True)

In [22]:
team_df

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,Mumbai Indians,58,51,101,86,57.425743,59.302326
1,Chennai Super Kings,51,49,89,75,57.303371,65.333333
2,Sunrisers Hyderabad,30,28,63,45,47.619048,62.222222
3,Rajasthan Royals,29,46,67,80,43.283582,57.5
4,Deccan Chargers,18,11,43,32,41.860465,34.375
5,Kings XI Punjab,38,44,91,85,41.758242,51.764706
6,Royal Challengers Bangalore,35,49,85,95,41.176471,51.578947
7,Kolkata Knight Riders,34,58,83,95,40.963855,61.052632
8,Kochi Tuskers Kerala,2,4,7,7,28.571429,57.142857
9,Gujarat Lions,1,12,14,16,7.142857,75.0


# **Step 3: Exploratory Data Analysis**

### **In this step we will be analysing the data**

# **Objective - Home and Away Matches Dataset**
**1. Team camparison Home wins vs Away wins**

**2. Team camparison Home wins vs Away wins in percentage**

**3. Final Conclusion of Home Wins and Aways Wins DataFrame**

## **Task-1 Team comparison home wins and away wins**

### **In this task we will be getting the team with most home wins and away wins**

In [23]:
# Let's sort the data according to home wins
team_df.sort_values(by = 'home_wins', ascending = False, inplace = True)

In [24]:
# Print the team df
team_df

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,Mumbai Indians,58,51,101,86,57.425743,59.302326
1,Chennai Super Kings,51,49,89,75,57.303371,65.333333
5,Kings XI Punjab,38,44,91,85,41.758242,51.764706
6,Royal Challengers Bangalore,35,49,85,95,41.176471,51.578947
7,Kolkata Knight Riders,34,58,83,95,40.963855,61.052632
2,Sunrisers Hyderabad,30,28,63,45,47.619048,62.222222
3,Rajasthan Royals,29,46,67,80,43.283582,57.5
10,Delhi Capitals,28,49,78,99,35.897436,49.494949
4,Deccan Chargers,18,11,43,32,41.860465,34.375
11,Rising Pune Supergiant,11,11,28,34,39.285714,32.352941


In [26]:
# Let's plot the graph to get better idea
import plotly.graph_objs as go

home_wins = go.Bar(x = team_df['team'], y = team_df['home_wins'], name = 'Home Wins', marker = dict(color = '#00D9C0'))
away_wins = go.Bar(x = team_df['team'], y = team_df['away_wins'], name = 'Away Wins', marker = dict(color = '#FFC43D'))

data = [home_wins, away_wins]

In [27]:
from plotly.offline import iplot

layout = go.Layout(title = 'Team Home Wins and Away Wins', xaxis = dict(title = 'team'), yaxis = dict(title = 'Total'), bargap = 0.2, bargroupgap = 0.1)

figure = go.Figure(data = data, layout = layout)
iplot(figure)

## **Conclusion**

**Mumbai Indians have won most number of matches at home - 58 matches, followed by the Chennai Super Kings - 51 matches**

**Kolkata Night Riders have won most number of matches away from home - 58 matches, followed by the Mumbai Indians - 51 matches**

## **Task-2 Team comparison home wins percentage and away wins percentage**

### **In this task we will be getting the team with most home wins and away wins percentage**

In [28]:
# Let's sort the data according to home wins
team_df.sort_values(by = 'home_win_percentage', ascending = False, inplace = True)

In [29]:
# Print df
team_df

Unnamed: 0,team,home_wins,away_wins,home_matches,away_matches,home_win_percentage,away_win_percentage
0,Mumbai Indians,58,51,101,86,57.425743,59.302326
1,Chennai Super Kings,51,49,89,75,57.303371,65.333333
2,Sunrisers Hyderabad,30,28,63,45,47.619048,62.222222
3,Rajasthan Royals,29,46,67,80,43.283582,57.5
4,Deccan Chargers,18,11,43,32,41.860465,34.375
5,Kings XI Punjab,38,44,91,85,41.758242,51.764706
6,Royal Challengers Bangalore,35,49,85,95,41.176471,51.578947
7,Kolkata Knight Riders,34,58,83,95,40.963855,61.052632
11,Rising Pune Supergiant,11,11,28,34,39.285714,32.352941
10,Delhi Capitals,28,49,78,99,35.897436,49.494949


In [30]:
# Let's plot the graph to get better idea
home_wins = go.Bar(x = team_df['team'], y = team_df['home_win_percentage'], name = 'Home Wins Percentage', marker = dict(color = '#00D9C0'))
away_wins = go.Bar(x = team_df['team'], y = team_df['away_win_percentage'], name = 'Away Wins Percentage', marker = dict(color = '#FFC43D'))

data = [home_wins, away_wins]

In [31]:
layout = go.Layout(title = 'Team Home Wins and Away Wins Percentage', xaxis = dict(title = 'team'), yaxis = dict(title = 'Total'), bargap = 0.2, bargroupgap = 0.1)

figure = go.Figure(data = data, layout = layout)
iplot(figure)

## **Conclusion**

**Mumbai Indians and Chennai Super Kings are the top 2 teams that have won 57% of the matches at their homes ground, followed by the Sunrisers Hyderabad - 47%**

**Gujarat Lions have won 75% of their matches away from home ground, followed by the Chennai Super Kings - 65%**

# **Conclusion of Home and Away wins DataFrame**

**Data Cleaning**
1. There are No Null values in this dataset
2. We updated some of the teams home_wins and away_wins data - especially of "Delhi Capitals" and "Rising Pune Supergiant"

## **Exploratory Data Analysis of Teamwise Home and Away wins DataFrame**

**Home wins and Away wins Analysis**
1. Teams that won most number of home matches - Mumbai Indians 58 home matches, Chennai Super Kings 51 home matches, and Kings 11 Punjab 38 home matches
2. Teams that won most number of away matches - Kolkata Knight Riders 58 away matches, Mumbai Indians 51 away matches, followed by the Delhi Cpitals & Chennai Super Kings with 49 away matches
3. Teams that won most number home matches percentage wise - Mumbai Indians 57%, Chennai Super Kings 57%, Sunrisers Hyderabad 47%
4. Teams that won mmost number of away matches percentage wise - Gujarat Lions 75%, Chennai Super Kings 65%, and Sunrisers Hyderabad 62%
***Above data shows that Chennai Super kings and Mumbai Indians are top 2 teams who won majourity of matches at home as well as away from home ground**