## Pandas Profiling: NASA Meteorites example
Source of data: https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh

The autoreload instruction reloads modules automatically before code execution, which is helpful for the update below.

In [1]:
%load_ext autoreload
%autoreload 2

Make sure that we have the latest version of pandas-profiling.

In [3]:
import sys

!{sys.executable} -m pip install -U 'pandas-profiling[notebook]'
!jupyter nbextension enable --py widgetsnbextension

Collecting pandas-profiling[notebook]
  Downloading pandas_profiling-3.2.0-py2.py3-none-any.whl (262 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m262.6/262.6 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m
[?25hCollecting joblib~=1.1.0 (from pandas-profiling[notebook])
  Downloading joblib-1.1.1-py2.py3-none-any.whl (309 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.8/309.8 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
Collecting PyYAML>=5.0.0 (from pandas-profiling[notebook])
  Obtaining dependency information for PyYAML>=5.0.0 from https://files.pythonhosted.org/packages/28/09/55f715ddbf95a054b764b547f617e22f1d5e45d83905660e9a088078fe67/PyYAML-6.0.1-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Downloading PyYAML-6.0.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (2.1 kB)
Collecting visions[type_image_path]==0.7.4 (from pandas-profiling[notebook])
  Downloading visions-0.7.4-py3-none-any.whl (102 kB)


Collecting pytz>=2020.1 (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling[notebook])
  Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m502.3/502.3 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m
Collecting charset-normalizer<4,>=2 (from requests>=2.24.0->pandas-profiling[notebook])
  Obtaining dependency information for charset-normalizer<4,>=2 from https://files.pythonhosted.org/packages/91/e6/8fa919fc84a106e9b04109de62bdf8526899e2754a64da66e1cd50ac1faa/charset_normalizer-3.2.0-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Downloading charset_normalizer-3.2.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (31 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.24.0->pandas-profiling[notebook])
  Obtaining dependency information for urllib3<3,>=1.21.1 from https://files.pythonhosted.org/packages/9b/81/62fd61001fa4b9d0df6e31d47ff49cfa9de4af03adecf339c7bc30656b37/urllib3-

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


You might want to restart the kernel now.

### Import libraries

In [None]:
from pathlib import Path

import numpy as np
import pandas as pd
import requests

import ydata_profiling
from ydata_profiling.utils.cache import cache_file

### Load and prepare example dataset
We add some fake variables for illustrating pandas-profiling capabilities

In [None]:
file_name = cache_file(
    "meteorites.csv",
    "https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv?accessType=DOWNLOAD",
)

df = pd.read_csv(file_name)

# Note: Pandas does not support dates before 1880, so we ignore these for this analysis
df["year"] = pd.to_datetime(df["year"], errors="coerce")

# Example: Constant variable
df["source"] = "NASA"

# Example: Boolean variable
df["boolean"] = np.random.choice([True, False], df.shape[0])

# Example: Mixed with base types
df["mixed"] = np.random.choice([1, "A"], df.shape[0])

# Example: Highly correlated variables
df["reclat_city"] = df["reclat"] + np.random.normal(scale=5, size=(len(df)))

# Example: Duplicate observations
duplicates_to_add = pd.DataFrame(df.iloc[0:10])
duplicates_to_add["name"] = duplicates_to_add["name"] + " copy"

df = pd.concat([df, duplicates_to_add], ignore_index=True)

### Inline report without saving object

In [None]:
report = df.profile_report(
    sort=None, html={"style": {"full_width": True}}, progress_bar=False
)
report

### Save report to file

In [None]:
profile_report = df.profile_report(html={"style": {"full_width": True}})
profile_report.to_file("/tmp/example.html")

### More analysis (Unicode) and Print existing ProfileReport object inline

In [None]:
profile_report = df.profile_report(
    explorative=True, html={"style": {"full_width": True}}
)
profile_report

### Notebook Widgets

In [None]:
profile_report.to_widgets()