# ⚙️ Installation

To use `ydata-profiling`, you can simply install the package from `pip`.

First, create a virtual/conda environment:

```
conda create -n profiling-env python=3.10
conda activate profiling-env
```

And install `ydata-profiling`. Don't forget to declare the extra "[notebook]" that adds support for rendering the report in Jupyter notebook widgets.

```
pip install -U "ydata-profiling[notebook]"
```

# ▶️ Quickstart
Once installed, you just need to `import` the module. Then, using `ydata-profilig` is a simple two-step process`

1. Create a `ProfileReport`
2. Use a `to_notebook_iframe()` function to render the report. You can also save the report to an **html** or **json** file.

Let's get started and import ydata-profiling, pandas, and the HCC dataset, which we will use for this notebook:


In [21]:
# Necessary imports
import pandas as pd
from ydata_profiling import ProfileReport

# Read the data (reading NIST "MA" dataset)
df = pd.read_csv('../data/ma2019.csv')

In [22]:
df.head()

Unnamed: 0,PUMA,AGEP,SEX,MSP,HISP,RAC1P,NOC,NPF,HOUSING_TYPE,OWN_RENT,...,PINCP,PINCP_DECILE,POVPIP,DVET,DREM,DPHY,DEYE,DEAR,PWGTP,WGTP
0,25-00503,18,1,6,0,1,N,N,3,0,...,5000.0,1,N,N,2,2,2,2,72,0
1,25-00703,21,2,6,0,1,N,N,3,0,...,0.0,0,N,N,2,2,2,2,6,0
2,25-00503,22,2,6,0,6,N,N,3,0,...,18000.0,3,N,N,2,2,2,2,80,0
3,25-01300,58,1,6,0,1,N,N,2,0,...,0.0,0,N,N,1,2,2,2,57,0
4,25-00703,18,2,6,0,1,N,N,3,0,...,3300.0,1,N,N,2,2,2,2,24,0


# 💾 Generate and save the profiling report (.html)

In [23]:
# Generate the data profiling report 
ma_report = ProfileReport(df, title='MA Dataset - Original Data')
ma_report.to_file("ma_original_report.html")

Summarize dataset: 100%|█████████████| 49/49 [00:02<00:00, 18.05it/s, Completed]
Generate report structure: 100%|██████████████████| 1/1 [00:04<00:00,  4.15s/it]
Render HTML: 100%|████████████████████████████████| 1/1 [00:00<00:00,  1.53it/s]
Export report to file: 100%|█████████████████████| 1/1 [00:00<00:00, 372.83it/s]


In [None]:
ma_report.to_notebook_iframe()

# 🔍 Next step: investigate the data

- What type of features are in the data (Numeric/Categorical)?
- What do the features correspond to?
- Does the data contain missing values?
- What preprocessing steps may the data require?