# Ydata-Profiling (former pandas profiling)
A package so support exploratory data analysis of tabular data built on top of pandas. This notebook illustrates the capabilities and limitations of the package.

Documentation: https://docs.profiling.ydata.ai/latest/getting-started/concepts/

## Prerequisites

`ydata-profiling` is incompatible with the latest version of `seaborn`. Hence a separate conda environment is created (see the file `profiling.yml`).

1. Deactivate the current environment: `conda deactivate`
2. Install the new environment from file: `conda install -f profiling.yml`
3. Activate the new environment: `conda activate profiling`
4. Start `jupyter lab`

Make sure you have the environment set up and activated before running this notebook:  
- `conda install -f profiling.yml` (you only need to do this once on the machine you are working on)
- `conda activate profiling`

## Imports and Environment Variable

In [1]:
import os
import pandas as pd
from ydata_profiling import ProfileReport

from dotenv import load_dotenv

load_dotenv()

True

Check that the environment variable points to the correct folder:

In [2]:
os.getenv('DATA_FOLDER')

'/Users/schiba/data/vdss'

In [3]:
work_path = os.path.join(os.getenv('OUTPUT_FOLDER'), '01_setup', '01_4_profiling')
if work_path and not os.path.exists(work_path):
    os.makedirs(work_path)

## Load the data

In [4]:
path_to_data_folder = os.getenv('DATA_FOLDER')
path_to_file = os.path.join(path_to_data_folder, 'motortrends', 'mtcars.csv')
df = pd.read_csv(path_to_file, delimiter=',')
df.head()

Unnamed: 0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


## Create Profile Report

In [5]:

profile = ProfileReport(df)

Display as widgets inside the Jupyter notebook:

In [6]:
profile.to_widgets()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

Save to file:

In [7]:
profile.to_file(os.path.join(work_path, 'profile_mpg.html'))

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]