## Historical Profilers

The **Historical Profiler** class is meant to provide a data object that could be used to track profiles that were taken on the same dataset over time.

This provides the user with a suite of all new capabilities that primarly are related to measuring the way data changes over time. 

Follow the cells in this notebook to see a basic example of the **HistoricalProfiler** in action.

Importing Required Libraries & Data

In [None]:
import os
import json
import dataprofiler as dp
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

In [None]:
data_path = "../dataprofiler/tests/data/csv/ny_climate.csv"

df = pd.read_csv(data_path)
df

In [None]:
df.sort_values(by="YEAR", axis=0, inplace=True)
df

In [None]:
years = df["YEAR"].unique().tolist()
years.reverse()
years

In [None]:
individualDataframes = []
for year in years:
    current_year_df = df.loc[df["YEAR"]==year]
    current_year_df = current_year_df.drop("YEAR", axis=1)
    individualDataframes.append(current_year_df)
individualDataframes[0]

Now we must create DataProfiler Profiler objects for each of these DataFrames

In [None]:
profilerObjs = []
for data in individualDataframes:
    profilerObjs.append(dp.Profiler(data))
profilerObjs[0]

### Create

##### Instantiate new HistoricalProfiler()

In [None]:
historical_profiler = dp.HistoricalProfiler(profilerObjs[1:])
historical_profiler

In [None]:
historical_profiler.historical_profile

##### Append new Profile

In [None]:
historical_profiler.append(profilerObjs[0])
historical_profiler.historical_profile


### Read

##### Get Most Recent

In [None]:
historical_profiler.get_most_recent_profile_report()

##### Get Oldest Report

In [None]:
historical_profiler.get_oldest_profile_report()

##### Get Report From Index

In [None]:
historical_profiler.get_profile_report_by_index(3)

##### Get Full Report

In [None]:
historical_profiler.report()

OR

In [None]:
historical_profiler.historical_profile

##### Get Length

In [None]:
len(historical_profiler)

### Update

##### Update Profile Report at Index

Previous most recent report:

In [None]:
historical_profiler.get_most_recent_profile_report()

In [None]:
historical_profiler.update_profile_report_at_index(profile=profilerObjs[2], index=0)

New most recent report:

In [None]:
historical_profiler.get_most_recent_profile_report()

### Delete

##### Delete Report at Index

In [None]:
len(historical_profiler)

In [None]:
historical_profiler.delete_profile_report_at_index(0)

In [None]:
len(historical_profiler)

### Supplementary Reports

##### Consecutive Profiles Diff Report

The consecutive profiles diff report will return a report in which the value of every key contains a list with #_of_profiles-1 values denoting the change between each consecutive profile within this historical profiler.

For instance, for the `min` key in the `statistics` dict of `column_one` in a consecutive diff report, we may see value of the form `[70, -10, 25, 20]`
- Denoting the Following:
    - the difference in `min` between `profile_4` and `profile_3` was *+20*
        - `+` indicating that this value increased as time moved forward
    - the difference in `min` between `profile_3` and `profile_2` was *+25*
    - the difference in `min` between `profile_2` and `profile_1` was *-10*
        - `-` indicating that this value decreased as time moved forward
    - the difference in `min` between `profile_1` and `profile_0` was *+70*

**Note:** Here, `profile_0` refers to the most recent profile in time and `profile_4` refers to the oldest profile in time.


In [None]:
historical_profiler.get_consecutive_diffs_report()

##### Consecutive Diffs Min and Max Report

This report is used to find the *global* minimum and maximum differences between each key across each profile report.

For example, if the consecutive diff report value for `min` of `column_one` was equal to `[70, -10, 25, 20]`, `min` would equal `-10` and `max` would equal `70`

This report is returned as a dictionary of the following structure:

```
{
    "global_stats": {
        ...,
        "row_count": (x, y)
    }
    "data_stats": [
        {
            ...,
            "statistics": {
                ...,
                "min": (x, y),
                ...,
            }
        },
        ...
    ]
}
```
Where `x` and `y` denote the `min` and `max` for each key, respectively

In [None]:
historical_profiler.get_diff_min_and_max_report()