# **EDA_IPL Data Analysis**
# **By Amit Kharche**
**Follow me** on [Linkedin](https://www.linkedin.com/in/amit-kharche) and [Medium](https://medium.com/@amitkharche14) for more insights on **Data Science** and **AI**

---
# **Table of Contents**
---

1. [**Introduction**](#Section1)<br>
2. [**Problem Statement**](#Section2)<br>
3. [**Installing & Importing Libraries**](#Section3)<br>
  3.1 [**Installing Libraries**](#Section31)<br>
  3.2 [**Upgrading Libraries**](#Section32)<br>
  3.3 [**Importing Libraries**](#Section33)<br>
4. [**Data Acquisition & Description**](#Section4)<br>
5. [**Data Pre-Profiling**](#Section5)<br>
6. [**Data Pre-Processing**](#Section6)<br>
7. [**Data Post-Profiling**](#Section7)<br>
8. [**Exploratory Data Analysis**](#Section8)<br>
9. [**Summarization**](#Section9)</br>
  9.1 [**Conclusion**](#Section91)</br>
  9.2 [**Actionable Insights**](#Section91)</br>

---

---
<a name = Section1></a>
# **1. Introduction**
---

Cricket is deeply rooted in Indian culture, and the **Indian Premier League (IPL)** has emerged as the most thrilling and widely followed format. This project is centered around analyzing **IPL match data** from **2008 to 2020**. The objective is to derive **valuable insights** and pinpoint the **key factors** that influence a team’s success.

To achieve this, I’ll be using **Python libraries** such as **`pandas`** and **`numpy`** for **data manipulation**, and **`matplotlib`** and **`seaborn`** for **data visualization**. These tools enable efficient handling of large datasets and the creation of insightful visual representations.

I acquired these skills through a **free online course** offered by **Jovian.ml** titled *Data Analysis with Python: Zero to Pandas*. This project is part of the course’s **hands-on learning** approach, allowing me to apply theoretical concepts to a real-world dataset while exploring a sport I’m passionate about.



---
<a name = Section2></a>
# **2. Problem Statement**
---

Cricket is more than just a game in India—it's a national passion. Among its various formats, the **Indian Premier League (IPL)** stands out as the most dynamic and widely followed. With each season, teams invest heavily in players, strategies, and support staff, yet the **factors that consistently lead to victory** remain unclear.

Although IPL data is publicly available, it is often underutilized. There is a growing interest in using **data analytics** to uncover **patterns and trends** that can help teams make smarter decisions. This project aims to analyze IPL match data from **2008 to 2020** to identify the **key elements** that influence match outcomes, such as toss results, player performance, and venue conditions.

## 🏏 Scenario

Imagine a group of passionate data science learners working on a capstone project as part of their training. They are exploring real-world datasets to apply their newly acquired skills in **data analysis and visualization**.

One of the learners chooses the IPL dataset, intrigued by the idea of combining sports and analytics. With access to match data from over a decade, the goal is to uncover **insights that could help teams improve performance** and **predict match outcomes** more accurately.

This project is not just an academic exercise—it’s a step toward understanding how **data-driven decisions** can transform the way cricket is played and analyzed.

---
<div align="center">
  <img src="https://i.pinimg.com/736x/bf/97/ab/bf97ab38490d4be1ef4cd42aee1aa986.jpg" alt="IPL Image">
</div>

---
<a id = Section3></a>
# **3. Installing & Importing Libraries**
---

- This section is emphasised on installing and importing the necessary libraries that will be required.

<a name = Section31></a>
### **Installing Libraries**

In [None]:
!pip install -q datascience                                         # Package that is required by pandas profiling
!pip install -q pandas-profiling                                    # Library to generate basic statistics about data

<a name = Section32></a>
### **Upgrading Libraries**

- **After upgrading** the libraries, you need to **restart the runtime** to make the libraries in sync.

- Make sure not to execute the cell under Installing Libraries and Upgrading Libraries again after restarting the runtime.

In [None]:
#!pip install -q --upgrade datascience                               # Package that is required by pandas profiling
#!pip install -q --upgrade pandas-profiling                          # Library to generate basic statistics about data
#! pip install ydata_profiling

<a name = Section33></a>
### **Importing Libraries**

In [None]:
#-------------------------------------------------------------------------------------------------------------------------------
import pandas as pd # Importing for panel data analysis
import numpy as np
pd.set_option('display.max_columns', None) # Unfolding hidden features if the cardinality is high
pd.set_option('display.max_colwidth', None)# Unfolding the max feature width for better clearity
pd.set_option('display.max_rows', None)# Unfolding hidden data points if the cardinality is high
pd.set_option('mode.chained_assignment', None)# Removing restriction over chained assignments operations
pd.set_option('display.float_format', lambda x: '%.5f' % x)
# To suppress scientific notation over exponential values
#-------------------------------------------------------------------------------------------------------------------------------
from collections import Counter    # For counting hashable objects
#-------------------------------------------------------------------------------------------------------------------------------
import matplotlib.pyplot as plt   # Importing pyplot interface using matplotlib
import plotly.graph_objs as go    # For Plotly interfaced graphs
import seaborn as sns             # Importin seaborm library for interactive visualization
%matplotlib inline
#-------------------------------------------------------------------------------------------------------------------------------
import warnings                     # Importing warning to disable runtime warnings
warnings.filterwarnings("ignore")   # Warnings will appear only once

---
<a name = Section4></a>
# **4. Data Acquisition & Description**
---

- This section is emphasised on the accquiring the data and obtain some descriptive information out of it.

- You could either scrap the data and then continue, or use a direct source of link (generally preferred in most cases).

- You will be working with a direct source of link to head start your work without worrying about anything.

- Before going further you must have a good idea about the features of the data set:

|Id|Feature|Description|
|:--|:--|:--|
|01||fixed acidity| most acids involved with wine or fixed or nonvolatile (do not evaporate readily).| 
|02|volatile acidity|the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste.| 
|03|citric acid|found in small quantities, citric acid can add ‘freshness’ and flavor to wines.| 
|04|residual sugar | the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter |
|||and wines with greater than 45 grams/liter are considered sweet.|
|05| chlorides| the amount of salt in the wine.|
|06| free sulfur dioxide  | free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion;|
|||it prevents microbial growth and the oxidation of wine.|
|07|  total sulfur dioxide | amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine,|
|||but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine|
|08| density| the density of water is close to that of water depending on the percent alcohol and sugar content|
|09| pH | describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic);|
||| most wines are between 3-4 on the pH scale.|
|10| sulphates| a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant.|
|11| alcohol| the percent alcohol content of the wine.|
|12| quality| score between 0 to 10.| 


In [2]:
data = pd.read_csv(filepath_or_buffer = 'https://raw.githubusercontent.com/insaid2018/Term-1/master/Data/Projects/winequality.csv')
print('Data Shape:', data.shape)
data.head()

Data Shape: (6497, 12)


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


### **Data Description**

- To get some quick description out of the data you can use describe method defined in pandas library.

In [None]:
# Insert your code here...

### **Data Information**

In [None]:
# Insert your code here...

---
<a name = Section5></a>
# **5. Data Pre-Profiling**
---

- This section is emphasised on getting a report about the data.

- You need to perform pandas profiling and get some observations out of it...

In [None]:
# Insert your code here...

---
<a name = Section6></a>
# **6. Data Pre-Processing**
---

- This section is emphasised on performing data manipulation over unstructured data for further processing and analysis.

- To modify unstructured data to strucuted data you need to verify and manipulate the integrity of the data by:
  - Handling missing data,

  - Handling redundant data,

  - Handling inconsistent data,

  - Handling outliers,

  - Handling typos

In [None]:
# Insert your code here...

---
<a name = Section7></a>
# **7. Data Post-Profiling**
---

- This section is emphasised on getting a report about the data after the data manipulation.

- You may end up observing some new changes, so keep it under check and make right observations.

In [None]:
# Insert your code here...

---
<a name = Section8></a>
# **8. Exploratory Data Analysis**
---

- This section is emphasised on asking the right questions and perform analysis using the data.

- Note that there is no limit how deep you can go, but make sure not to get distracted from right track.

In [None]:
# Insert your code here...

---
<a name = Section9></a>
# **9. Summarization**
---

<a name = Section91></a>
### **9.1 Conclusion**

- In this part you need to provide a conclusion about your overall analysis.

- Write down some short points that you have observed so far.

<a name = Section92></a>
### **9.2 Actionable Insights**

- This is a very crucial part where you will present your actionable insights.
- You need to give suggestions about what could be applied and what not.
- Make sure that these suggestions are short and to the point, ultimately it's a catalyst to your business.