# `ydata-profiling`

`ydata-profiling` is an additional data exploration python package that you may be interested in exploring.

## What we will accomplish

In this notebook we will:
- Introduce the `ydata-profiling` package,
- Mention the installation instructions,
- Demonstrate some of the functionality of this package, and
- Point in the direction of where we can learn more about the package.

In [None]:
import pandas as pd
import numpy as np

## The iris data set

For this notebook we will look at the iris data set. This data provides information on the:
- Iris type,
- Petal length,
- Petal width,
- Sepal length, and
- Sepal width
for 150 irises. It is stored in the `data` folder, but a version of the data set, along with a data description can also be found here, <a href="https://archive.ics.uci.edu/ml/datasets/iris">https://archive.ics.uci.edu/ml/datasets/iris</a>. Let's load the data now, and show the first five rows.

<i>Note: I will quickly clean part of the data so that the `target` column provides human readable values for the iris type instead of integers.</i>

In [None]:
df = pd.read_csv("../data/iris.csv")

df.loc[df.target == 0, 'target'] = "setosa"
df.loc[df.target == 1, 'target'] = "versicolor"
df.loc[df.target == 2, 'target'] = "virginica"

df = df.rename(columns = {'target':'iris_type'})


df.head()

## `ydata-profiling` installation and version

`ydata-profiling` is not a common package to already have installed. It is likely that you will have to install the package before proceeding. Installation instructions can be found here, <a href="https://ydata-profiling.ydata.ai/docs/master/pages/getting_started/installation.html">https://ydata-profiling.ydata.ai/docs/master/pages/getting_started/installation.html</a>. If you are unsure how to install a python package you should also examing the python package installation guide on the Erd&#337;s Institute website.

Once you think you have successfully installed `ydata-profiling` try running the following code.

In [None]:
## This notebook was written with version 4.1.1
print(ydata_profiling.__version__)

## Generating exploratory data analysis <i>profiles</i>

A common step in data analysis or data science is exploratory data analysis (EDA). EDA involved the calculation of basic statistics and the plotting of variables from your data in order to explore possible relationships or patterns of the data.

While most of your common EDA steps or procedures can be accomplished with `pandas` or `numpy`, it can take a fair amount of code to perform those steps in a way that is quickly interpretable. the `ydata-profiling` package offers a quick and easy way to generate and investigate some of the most common EDA steps.

`ydata-profiling` enables us to create a <i>profile report</i> of a `DataFrame` in a couple of lines of code.

In [None]:
## Importing the ProfileReport Class


In [None]:
## You create a profile report by providing a data frame
## and optionally a title
profile = 

In [None]:
## To display the profile report in a jupyter notebook
## run profile.to_notebook_iframe()
profile

As we can see the profile report provides a number of useful EDA in an easy to read format. One useful features, not present in standard `pandas` or `numpy` is the "Alerts" tab. This tab provides alerts when your data exhibits certain behavior that could be troublesome for standard analysis. For example, in the iris data many columns are highly correlated, which can cause a problem for regression algorithms. For a full list of possible alerts see this page of the documentation, <a href="https://ydata-profiling.ydata.ai/docs/master/pages/getting_started/concepts.html#data-quality-alerts">https://ydata-profiling.ydata.ai/docs/master/pages/getting_started/concepts.html#data-quality-alerts</a>.

### Exporting reports

In addition to viewing your report inside a `jupyter notebook`, you can also export your report to an HTML file, with `profile.to_file("filename.html")`. This could be preferred if you want to share the report with someone that does not use `jupyter notebook`s.

## Learning more

This is a basic introduction to the package, so you may be interested in learning more. To learn more about what you can do with this package, check out the documentation page here, <a href="https://ydata-profiling.ydata.ai/docs/master/index.html">https://ydata-profiling.ydata.ai/docs/master/index.html</a>. There you can find examples and explanations of what else is possible with `ydata-profiling`.

--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2023.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)