# Day 22: Exploratory Data Analysis (EDA) - Multivariate and Automated Profiling

This folder explores multivariate EDA (analyzing relationships between multiple variables) and introduces automated data profiling tools.

## Key Concepts Covered

- Loading Data:
  - Using pandas to read the dataset (train.csv).
- Automated Data Profiling:
  - Using pandas-profiling or ydata-profiling to generate a detailed report about the dataset.
  - The report includes statistics, correlations, missing values, and visualizations.
- Saving Reports:
  - Exporting the profiling report to an HTML file for easy sharing and review.

## Libraries Used

- pandas: For data loading and manipulation.
- pandas-profiling or ydata-profiling: For automated EDA reports.

## Why This is Important

Multivariate EDA helps you understand how variables interact, which is crucial for feature engineering and modeling. Automated profiling saves time and gives a comprehensive overview of your data.

This folder is a great way to quickly understand and visualize your dataset! 

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("train.csv")
df.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [3]:
!pip install pandas-profiling

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas-profiling
  Using cached pandas_profiling-3.2.0-py2.py3-none-any.whl.metadata (21 kB)
Using cached pandas_profiling-3.2.0-py2.py3-none-any.whl (262 kB)
Installing collected packages: pandas-profiling
Successfully installed pandas-profiling-3.2.0


In [4]:
from pandas_profiling import ProfileReport
profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)
profile.to_file("output.html")

PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package. See https://docs.pydantic.dev/2.10/migration/#basesettings-has-moved-to-pydantic-settings for more details.

For further information visit https://errors.pydantic.dev/2.10/u/import-error

In [5]:
pip uninstall pandas-profiling -y

Found existing installation: pandas-profiling 3.2.0
Uninstalling pandas-profiling-3.2.0:
  Successfully uninstalled pandas-profiling-3.2.0
Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install ydata-profiling

Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the kernel to use updated packages.

Collecting ydata-profiling
  Using cached ydata_profiling-4.16.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting visions<0.8.2,>=0.7.5 (from visions[type_image_path]<0.8.2,>=0.7.5->ydata-profiling)
  Using cached visions-0.8.1-py3-none-any.whl.metadata (11 kB)
Collecting multimethod<2,>=1.4 (from ydata-profiling)
  Using cached multimethod-1.12-py3-none-any.whl.metadata (9.6 kB)
Collecting typeguard<5,>=3 (from ydata-profiling)
  Using cached typeguard-4.4.4-py3-none-any.whl.metadata (3.3 kB)
Collecting imagehash==4.3.1 (from ydata-profiling)
  Using cached ImageHash-4.3.1-py2.py3-none-any.whl.metadata (8.0 kB)
Collecting wordcloud>=1.9.3 (from ydata-profiling)
  Using cached wordcloud-1.9.4-cp312-cp312-win_amd64.whl.metadata (3.5 kB)
Collecting dacite>=1.8 (from ydata-profiling)
  Using cached dacite-1.9.2-py3-none-any.whl.metadata (17 kB)
Co

   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -------------------- -------------------  5/10 [numba]
   -----------

In [7]:
from ydata_profiling import ProfileReport

In [8]:


prof = ProfileReport(df)
prof.to_file(output_file='output.html')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 12/12 [00:13<00:00,  1.11s/it]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]