# Pandas Profiling with Ydata-profiling

[![Open in Colab](https://lab.aef.me/files/assets/colab-badge.svg)](https://colab.research.google.com/github/adamelliotfields/lab/blob/main/files/profiling.ipynb)
[![Open in Kaggle](https://lab.aef.me/files/assets/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/adamelliotfields/lab/blob/main/files/profiling.ipynb)
[![Render nbviewer](https://lab.aef.me/files/assets/nbviewer_badge.svg)](https://nbviewer.org/github/adamelliotfields/lab/blob/main/files/profiling.ipynb)

Demonstration of [Ydata-profiling](https://github.com/ydataai/ydata-profiling) on the diabetes dataset from [LARS](https://hastie.su.domains/Papers/LARS/).

In [1]:
import os

import pandas as pd

from IPython.display import HTML
from ydata_profiling import ProfileReport

In [2]:
os.environ["YDATA_PROFILING_NO_ANALYTICS"] = "True"

In [3]:
diabetes_df = pd.read_csv("https://lab.aef.me/files/data/diabetes.csv")

# rename columns
diabetes_df.columns = [
    "age",
    "sex",
    "bmi",
    "bp",
    "tc",
    "ldl",
    "hdl",
    "tch",
    "ltg",
    "glu",
    "target",
]

In [4]:
if not os.path.exists("ydata_profiling/index.html"):
    profile = ProfileReport(
        diabetes_df,
        explorative=True,
        progress_bar=False,
        title="Diabetes Profiling Report",
        dataset={
            "description": "Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline. Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of `n_samples` (i.e. the sum of squares of each column totals 1).",
            "url": "https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html",
        },
        variables={
            "descriptions": {
                "age": "Age in years",
                "sex": "Sex",
                "bmi": "Body mass index",
                "bp": "Average blood pressure",
                "tc": "Total serum cholesterol",
                "ldl": "Low-density lipoproteins",
                "hdl": "High-density lipoproteins",
                "tch": "Total cholesterol / HDL ratio",
                "ltg": "Log of serum triglycerides level",
                "glu": "Blood sugar level",
                "target": "Measure of disease progression one year after baseline",
            }
        },
        type_schema={"sex": "categorical"},
        correlations={"pearson": {"calculate": True}},
        html={
            "minify_html": False,
            "style": {"theme": "united", "full_width": True},
        },
    )
    os.makedirs("profile_reports", exist_ok=True)
    profile.to_file("ydata_profiling/index.html")

In [5]:
# display as widgets
# profile.to_widgets()

# render app in notebook
# HTML(filename="ydata_profiling/index.html")

# launch server from terminal
# python -m http.server 8000 -d files/ydata_profiling