In [1]:
import pandas as pd


In [1]:
!pip install ydata-profiling

Collecting ydata-profiling
  Downloading ydata_profiling-4.18.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting scipy<1.17,>=1.8 (from ydata-profiling)
  Downloading scipy-1.16.3-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting pandas!=1.4.0,<3.0,>1.5 (from ydata-profiling)
  Downloading pandas-2.3.3-cp313-cp313-win_amd64.whl.metadata (19 kB)
Collecting matplotlib<=3.10,>=3.5 (from ydata-profiling)
  Downloading matplotlib-3.10.0-cp313-cp313-win_amd64.whl.metadata (11 kB)
Collecting PyYAML<6.1,>=6.0.3 (from ydata-profiling)
  Downloading pyyaml-6.0.3-cp313-cp313-win_amd64.whl.metadata (2.4 kB)
Collecting jinja2<3.2,>=3.1.6 (from ydata-profiling)
  Using cached jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting visions<0.8.2,>=0.7.5 (from visions[type_image_path]<0.8.2,>=0.7.5->ydata-profiling)
  Downloading visions-0.8.1-py3-none-any.whl.metadata (11 kB)
Collecting numpy<2.4,>=1.22 (from ydata-profiling)
  Downloading numpy-2.3.5-cp313-cp313-win_amd64.whl.metadata (60 kB)

In [2]:
from ydata_profiling import ProfileReport

  from .autonotebook import tqdm as notebook_tqdm


# üìä Automated EDA using Pandas Profiling (ProfileReport)

---

## üîπ Introduction

In previous videos, we learned:

1. Basic Questions in EDA  
2. Univariate Analysis  
3. Bivariate & Multivariate Analysis  

In this video, we learn about a powerful tool that **automates most of the EDA work**.

Tool Name:
üëâ Pandas Profiling (now called ydata-profiling)

This tool automatically generates a complete EDA report in one command.

---

# üîπ Installation

Install the library using:

```bash
pip install pandas-profiling
```

(For latest versions: ydata-profiling)

---

# üîπ Basic Usage

```python
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.read_csv("titanic.csv")

profile = ProfileReport(df)
profile.to_file("output.html")
```

After running:
- An HTML report is generated
- Open it in browser
- Full EDA report is available

---

# üîπ Main Sections of the Report

The report is divided into major sections:

1. Overview
2. Variables
3. Interactions
4. Correlations
5. Missing Values
6. Sample Data

---

# 1Ô∏è‚É£ Overview Section

Shows basic dataset information:

- Number of rows (observations)
- Number of columns (features)
- Missing values count
- Duplicate rows
- Memory usage
- Variable types (Numeric, Categorical, etc.)
- Warnings (High cardinality, missing %, etc.)

‚úî Same as asking basic EDA questions manually.

---

# 2Ô∏è‚É£ Variables Section

Detailed analysis for each column.

### For Categorical Columns:
- Unique values count
- Missing values
- Frequency table
- Bar chart / Pie chart
- Sample values

### For Numerical Columns:
- Mean
- Median
- Min, Max
- Standard Deviation
- Variance
- Percentiles (25%, 50%, 75%)
- IQR
- Histogram
- Outliers

‚úî Automatically generates distribution plots.

---

# 3Ô∏è‚É£ Interactions Section

- Shows relationship between two variables
- Scatter plots for numerical variables
- Helps in bivariate analysis

---

# 4Ô∏è‚É£ Correlation Section

Shows correlation matrix:

- Pearson correlation
- Spearman correlation
- Kendall correlation (if selected)

Range:
- -1 ‚Üí Strong negative
- 0 ‚Üí No relation
- +1 ‚Üí Strong positive

‚úî Helps detect multicollinearity

---

# 5Ô∏è‚É£ Missing Values Section

- Count of missing values per column
- Percentage of missing data
- Missing value heatmap
- Matrix visualization

‚úî Helps decide:
- Drop column
- Impute values
- Ignore

---

# 6Ô∏è‚É£ Sample Data

- First 5 rows
- Last 5 rows

‚úî Quick view of dataset structure

---

# üîπ Why Use Pandas Profiling?

- Saves hours of manual EDA
- Automatically detects:
  - High cardinality
  - High missing %
  - Constant columns
  - Duplicate rows
- Generates professional report
- Very useful before starting ML model

---

# üîπ Important Note

This tool is powerful, but:

- Always read the report carefully
- Do not blindly trust
- Understand the insights
- Practice with multiple datasets

---

# üéØ Best Practice

For any new dataset:

1. Run Pandas Profiling
2. Study the report carefully
3. Write observations
4. Then start Feature Engineering

---

# üöÄ Final Takeaway

Pandas Profiling is a powerful automated EDA tool that:

- Summarizes data
- Detects problems
- Shows relationships
- Helps in faster ML workflow

Next Step:
üëâ Feature Engineering

---


In [3]:
import pandas as pd
from ydata_profiling import ProfileReport
df = pd.read_csv(r"C:\Users\chhij\Downloads\iris (2).csv")
profile = ProfileReport(df, title="Pandas Profiling Report")
profile.to_file("iris_report.html")


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:00<00:00, 5657.28it/s]<00:00, 66.92it/s, Describe variable: species]
Summarize dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 30/30 [00:01<00:00, 25.01it/s, Completed]                         
Generate report structure: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  1.22it/s]
Render HTML: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  5.31it/s]
Export report to file: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 228.39it/s]
