## Historical Profilers

The **Historical Profiler** class is meant to provide a data object that could be used to track profiles that were taken on the same dataset over time.

This provides the user with a suite of all new capabilities that primarly are related to measuring the way data changes over time. 

Follow the cells in this notebook to see a basic example of the **HistoricalProfiler** in action.

Importing Required Libraries & Data

In [1]:
import os
import json
import dataprofiler as dp
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)



In [2]:
data_path = "../dataprofiler/tests/data/csv/ny_climate.csv"

df = pd.read_csv(data_path)
df

Unnamed: 0,YEAR,AWND,MonthlyDaysWithGT001Precip,MonthlyDepartureFromNormalAverageTemperature,MonthlyDepartureFromNormalCoolingDegreeDays,MonthlyDepartureFromNormalHeatingDegreeDays,MonthlyDepartureFromNormalMaximumTemperature,MonthlyGreatestPrecip,MonthlyGreatestPrecipDate,WindEquipmentChangeDate
0,2015,8.9,10,-2.9,0,85,-1.7,0.79,03-04,2006-09-08
1,2015,8.7,13,-13.2,0,367,-11.6,0.88,02-02,2006-09-08
2,2015,9.4,10,-5.2,0,158,-5.4,0.78,26-26,2006-09-08
3,2015,9.4,11,0.0,-4,-4,0.4,0.59,20-21,2006-09-08
4,2015,8.3,6,7.3,92,-134,9.0,0.46,30-31,2006-09-08
...,...,...,...,...,...,...,...,...,...,...
82,2022,8.7,10,-2.9,0,83,-1.6,0.66,16-17,2006-09-08
83,2022,10.3,10,1.6,0,-46,3.0,1.25,03-04,2006-09-08
84,2022,9.2,14,2.3,0,-72,2.8,1.01,31-31,2006-09-08
85,2022,9.6,14,0.0,-4,-5,0.1,2.24,07-08,2006-09-08


In [3]:
df.sort_values(by="YEAR", axis=0, inplace=True)
df

Unnamed: 0,YEAR,AWND,MonthlyDaysWithGT001Precip,MonthlyDepartureFromNormalAverageTemperature,MonthlyDepartureFromNormalCoolingDegreeDays,MonthlyDepartureFromNormalHeatingDegreeDays,MonthlyDepartureFromNormalMaximumTemperature,MonthlyGreatestPrecip,MonthlyGreatestPrecipDate,WindEquipmentChangeDate
0,2015,8.9,10,-2.9,0,85,-1.7,0.79,03-04,2006-09-08
11,2015,7.6,13,13.3,0,-413,13.5,1.02,28-29,2006-09-08
10,2015,7.4,7,5.8,0,-176,7.8,0.96,10-11,2006-09-08
9,2015,6.7,10,0.1,-4,-6,0.0,1.86,28-29,2006-09-08
7,2015,5.8,7,3.0,73,-18,3.5,1.84,11-11,2006-09-08
...,...,...,...,...,...,...,...,...,...,...
85,2022,9.6,14,0.0,-4,-5,0.1,2.24,07-08,2006-09-08
82,2022,8.7,10,-2.9,0,83,-1.6,0.66,16-17,2006-09-08
83,2022,10.3,10,1.6,0,-46,3.0,1.25,03-04,2006-09-08
84,2022,9.2,14,2.3,0,-72,2.8,1.01,31-31,2006-09-08


In [4]:
years = df["YEAR"].unique().tolist()
years.reverse()
years

[2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015]

In [5]:
individualDataframes = []
for year in years:
    current_year_df = df.loc[df["YEAR"]==year]
    current_year_df = current_year_df.drop("YEAR", axis=1)
    individualDataframes.append(current_year_df)
individualDataframes[0]

Unnamed: 0,AWND,MonthlyDaysWithGT001Precip,MonthlyDepartureFromNormalAverageTemperature,MonthlyDepartureFromNormalCoolingDegreeDays,MonthlyDepartureFromNormalHeatingDegreeDays,MonthlyDepartureFromNormalMaximumTemperature,MonthlyGreatestPrecip,MonthlyGreatestPrecipDate,WindEquipmentChangeDate
85,9.6,14,0.0,-4,-5,0.1,2.24,07-08,2006-09-08
82,8.7,10,-2.9,0,83,-1.6,0.66,16-17,2006-09-08
83,10.3,10,1.6,0,-46,3.0,1.25,03-04,2006-09-08
84,9.2,14,2.3,0,-72,2.8,1.01,31-31,2006-09-08
86,7.2,10,5.0,58,-98,5.7,0.52,16-17,2006-09-08


Now we must create DataProfiler Profiler objects for each of these DataFrames

In [6]:
profilerObjs = []
for data in individualDataframes:
    profilerObjs.append(dp.Profiler(data))
profilerObjs[0]

2022-07-29 16:45:16.866347: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 295.81it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.49it/s]


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 781.00it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.82it/s]


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 972.43it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.60it/s]


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 737.52it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.84it/s]


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 820.13it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.80it/s]


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 868.07it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.82it/s]


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 909.72it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.88it/s]


INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns... 


  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
  df_series = df_series.loc[true_sample_list]
100%|██████████| 9/9 [00:00<00:00, 868.81it/s]


INFO:DataProfiler.profilers.profile_builder: Calculating the statistics...  (with 4 processes)


100%|██████████| 9/9 [00:03<00:00,  2.82it/s]


<dataprofiler.profilers.profile_builder.StructuredProfiler at 0x1778db1f0>

### Create

##### Instantiate new HistoricalProfiler()

In [7]:
historical_profiler = dp.HistoricalProfiler(profilerObjs[1:])
historical_profiler

<dataprofiler.profilers.historical_profiler.HistoricalProfiler at 0x17c3a69d0>

In [8]:
historical_profiler.historical_profile

{'global_stats': {'samples_used': [12, 12, 10, 12, 12, 12, 12],
  'column_count': [9, 9, 9, 9, 9, 9, 9],
  'row_count': [12, 12, 10, 12, 12, 12, 12],
  'row_has_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'row_is_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'unique_row_ratio': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
  'duplicate_row_count': [0, 0, 0, 0, 0, 0, 0],
  'file_type': ["<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>"],
  'encoding': [None, None, None, None, None, None, None],
  'correlation_matrix': [None, None, None, None, None, None, None],
  'chi2_matrix': [[[1.0,
     0.155027781767463,
     nan,
     0.11943496941705767,
     nan,
     nan,
     nan,
     nan,
     0.007600390681066993],
    [0.15502778176746

##### Append new Profile

In [9]:
historical_profiler.append(profilerObjs[0])
historical_profiler.historical_profile


{'global_stats': {'samples_used': [5, 12, 12, 10, 12, 12, 12, 12],
  'column_count': [9, 9, 9, 9, 9, 9, 9, 9],
  'row_count': [5, 12, 12, 10, 12, 12, 12, 12],
  'row_has_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'row_is_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'unique_row_ratio': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
  'duplicate_row_count': [0, 0, 0, 0, 0, 0, 0, 0],
  'file_type': ["<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>"],
  'encoding': [None, None, None, None, None, None, None, None],
  'correlation_matrix': [None, None, None, None, None, None, None, None],
  'chi2_matrix': [[[1.0,
     0.12465201948308113,
     0.3504852123233613,
     0.18857346

### Read

##### Get Most Recent

In [10]:
historical_profiler.get_most_recent_profile_report()

{'global_stats': {'samples_used': 5,
  'column_count': 9,
  'row_count': 5,
  'row_has_null_ratio': 0.0,
  'row_is_null_ratio': 0.0,
  'unique_row_ratio': 1.0,
  'duplicate_row_count': 0,
  'file_type': "<class 'pandas.core.frame.DataFrame'>",
  'encoding': None,
  'correlation_matrix': None,
  'chi2_matrix': [[1.0,
    0.12465201948308113,
    0.3504852123233613,
    0.18857346751344994,
    0.3504852123233613,
    0.3504852123233613,
    0.3504852123233613,
    0.2650259152973615,
    0.07523524614651222],
   [0.12465201948308113,
    1.0,
    0.12465201948308113,
    0.04042768199451274,
    0.12465201948308113,
    0.12465201948308113,
    0.12465201948308113,
    0.07523524614651222,
    0.006737946999085476],
   [0.3504852123233613,
    0.12465201948308113,
    1.0,
    0.18857346751344994,
    0.3504852123233613,
    0.3504852123233613,
    0.3504852123233613,
    0.2650259152973615,
    0.07523524614651222],
   [0.18857346751344994,
    0.04042768199451274,
    0.18857346751344

##### Get Oldest Report

In [11]:
historical_profiler.get_oldest_profile_report()

{'global_stats': {'samples_used': 12,
  'column_count': 9,
  'row_count': 12,
  'row_has_null_ratio': 0.0,
  'row_is_null_ratio': 0.0,
  'unique_row_ratio': 1.0,
  'duplicate_row_count': 0,
  'file_type': "<class 'pandas.core.frame.DataFrame'>",
  'encoding': None,
  'correlation_matrix': None,
  'chi2_matrix': [[1.0,
    0.0895044968401758,
    nan,
    0.0895044968401758,
    nan,
    nan,
    nan,
    nan,
    0.007600390681066993],
   [0.0895044968401758,
    1.0,
    nan,
    0.031130059512438857,
    nan,
    nan,
    nan,
    nan,
    0.0011393511789474786],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [0.0895044968401758,
    0.031130059512438857,
    nan,
    1.0,
    nan,
    nan,
    nan,
    nan,
    0.0011393511789474786],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [0.007600390681066993,
    0.00113935117894

##### Get Report From Index

In [12]:
historical_profiler.get_profile_report_by_index(3)

{'global_stats': {'samples_used': 10,
  'column_count': 9,
  'row_count': 10,
  'row_has_null_ratio': 0.0,
  'row_is_null_ratio': 0.0,
  'unique_row_ratio': 1.0,
  'duplicate_row_count': 0,
  'file_type': "<class 'pandas.core.frame.DataFrame'>",
  'encoding': None,
  'correlation_matrix': None,
  'chi2_matrix': [[1.0,
    0.22022064660169904,
    0.33281967875071916,
    0.1719326893766009,
    0.33281967875071916,
    0.33281967875071916,
    0.33281967875071916,
    0.33281967875071916,
    0.01791240452984333],
   [0.22022064660169904,
    1.0,
    0.27422926710794715,
    0.13014142088248304,
    0.27422926710794715,
    0.27422926710794715,
    0.27422926710794715,
    0.27422926710794715,
    0.01033605067592569],
   [0.33281967875071916,
    0.27422926710794715,
    1.0,
    0.22022064660169904,
    0.39457818208600104,
    0.5987138355230368,
    0.39457818208600104,
    0.39457818208600104,
    0.02925268807696113],
   [0.1719326893766009,
    0.13014142088248304,
    0.220220

##### Get Full Report

In [13]:
historical_profiler.report()

{'global_stats': {'samples_used': [5, 12, 12, 10, 12, 12, 12, 12],
  'column_count': [9, 9, 9, 9, 9, 9, 9, 9],
  'row_count': [5, 12, 12, 10, 12, 12, 12, 12],
  'row_has_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'row_is_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'unique_row_ratio': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
  'duplicate_row_count': [0, 0, 0, 0, 0, 0, 0, 0],
  'file_type': ["<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>"],
  'encoding': [None, None, None, None, None, None, None, None],
  'correlation_matrix': [None, None, None, None, None, None, None, None],
  'chi2_matrix': [[[1.0,
     0.12465201948308113,
     0.3504852123233613,
     0.18857346

OR

In [14]:
historical_profiler.historical_profile

{'global_stats': {'samples_used': [5, 12, 12, 10, 12, 12, 12, 12],
  'column_count': [9, 9, 9, 9, 9, 9, 9, 9],
  'row_count': [5, 12, 12, 10, 12, 12, 12, 12],
  'row_has_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'row_is_null_ratio': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  'unique_row_ratio': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
  'duplicate_row_count': [0, 0, 0, 0, 0, 0, 0, 0],
  'file_type': ["<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>",
   "<class 'pandas.core.frame.DataFrame'>"],
  'encoding': [None, None, None, None, None, None, None, None],
  'correlation_matrix': [None, None, None, None, None, None, None, None],
  'chi2_matrix': [[[1.0,
     0.12465201948308113,
     0.3504852123233613,
     0.18857346

##### Get Length

In [15]:
len(historical_profiler)

8

### Update

##### Update Profile Report at Index

Previous most recent report:

In [16]:
historical_profiler.get_most_recent_profile_report()

{'global_stats': {'samples_used': 5,
  'column_count': 9,
  'row_count': 5,
  'row_has_null_ratio': 0.0,
  'row_is_null_ratio': 0.0,
  'unique_row_ratio': 1.0,
  'duplicate_row_count': 0,
  'file_type': "<class 'pandas.core.frame.DataFrame'>",
  'encoding': None,
  'correlation_matrix': None,
  'chi2_matrix': [[1.0,
    0.12465201948308113,
    0.3504852123233613,
    0.18857346751344994,
    0.3504852123233613,
    0.3504852123233613,
    0.3504852123233613,
    0.2650259152973615,
    0.07523524614651222],
   [0.12465201948308113,
    1.0,
    0.12465201948308113,
    0.04042768199451274,
    0.12465201948308113,
    0.12465201948308113,
    0.12465201948308113,
    0.07523524614651222,
    0.006737946999085476],
   [0.3504852123233613,
    0.12465201948308113,
    1.0,
    0.18857346751344994,
    0.3504852123233613,
    0.3504852123233613,
    0.3504852123233613,
    0.2650259152973615,
    0.07523524614651222],
   [0.18857346751344994,
    0.04042768199451274,
    0.18857346751344

In [17]:
historical_profiler.update_profile_report_at_index(profile=profilerObjs[2], index=0)

New most recent report:

In [18]:
historical_profiler.get_most_recent_profile_report()

{'global_stats': {'samples_used': 12,
  'column_count': 9,
  'row_count': 12,
  'row_has_null_ratio': 0.0,
  'row_is_null_ratio': 0.0,
  'unique_row_ratio': 1.0,
  'duplicate_row_count': 0,
  'file_type': "<class 'pandas.core.frame.DataFrame'>",
  'encoding': None,
  'correlation_matrix': None,
  'chi2_matrix': [[1.0,
    0.0895044968401758,
    nan,
    0.0650934863988305,
    nan,
    nan,
    0.155027781767463,
    0.155027781767463,
    0.004301310843500827],
   [0.0895044968401758,
    1.0,
    nan,
    0.045822306888651076,
    nan,
    nan,
    0.11943496941705767,
    0.11943496941705767,
    0.002291791207791438],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [0.0650934863988305,
    0.045822306888651076,
    nan,
    1.0,
    nan,
    nan,
    0.0895044968401758,
    0.0895044968401758,
    0.0011393511789474786],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [0.155027781767463,
    0.11943496941705767,
    nan,


### Delete

##### Delete Report at Index

In [19]:
len(historical_profiler)

8

In [20]:
historical_profiler.delete_profile_report_at_index(0)

{'global_stats': {'samples_used': 12,
  'column_count': 9,
  'row_count': 12,
  'row_has_null_ratio': 0.0,
  'row_is_null_ratio': 0.0,
  'unique_row_ratio': 1.0,
  'duplicate_row_count': 0,
  'file_type': "<class 'pandas.core.frame.DataFrame'>",
  'encoding': None,
  'correlation_matrix': None,
  'chi2_matrix': [[1.0,
    0.0895044968401758,
    nan,
    0.0650934863988305,
    nan,
    nan,
    0.155027781767463,
    0.155027781767463,
    0.004301310843500827],
   [0.0895044968401758,
    1.0,
    nan,
    0.045822306888651076,
    nan,
    nan,
    0.11943496941705767,
    0.11943496941705767,
    0.002291791207791438],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [0.0650934863988305,
    0.045822306888651076,
    nan,
    1.0,
    nan,
    nan,
    0.0895044968401758,
    0.0895044968401758,
    0.0011393511789474786],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [nan, nan, nan, nan, nan, nan, nan, nan, nan],
   [0.155027781767463,
    0.11943496941705767,
    nan,


In [21]:
len(historical_profiler)

7

### Supplementary Reports

##### Consecutive Profiles Diff Report

The consecutive profiles diff report will return a report in which the value of every key contains a list with #_of_profiles-1 values denoting the change between each consecutive profile within this historical profiler.

For instance, for the `min` key in the `statistics` dict of `column_one` in a consecutive diff report, we may see value of the form `[70, -10, 25, 20]`
- Denoting the Following:
    - the difference in `min` between `profile_4` and `profile_3` was *+20*
        - `+` indicating that this value increased as time moved forward
    - the difference in `min` between `profile_3` and `profile_2` was *+25*
    - the difference in `min` between `profile_2` and `profile_1` was *-10*
        - `-` indicating that this value decreased as time moved forward
    - the difference in `min` between `profile_1` and `profile_0` was *+70*

**Note:** Here, `profile_0` refers to the most recent profile in time and `profile_4` refers to the oldest profile in time.


In [22]:
historical_profiler.get_consecutive_diffs_report()

{'global_stats': {'samples_used': ['unchanged',
   2,
   -2,
   'unchanged',
   'unchanged',
   'unchanged'],
  'column_count': ['unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged'],
  'row_count': ['unchanged', 2, -2, 'unchanged', 'unchanged', 'unchanged'],
  'row_has_null_ratio': ['unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged'],
  'row_is_null_ratio': ['unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged'],
  'unique_row_ratio': ['unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged'],
  'duplicate_row_count': ['unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged'],
  'file_type': ['unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged'],
  'encoding': ['unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged',
   'unchanged'],
  'corre

##### Consecutive Diffs Min and Max Report

This report is used to find the *global* minimum and maximum differences between each key across each profile report.

For example, if the consecutive diff report value for `min` of `column_one` was equal to `[70, -10, 25, 20]`, `min` would equal `-10` and `max` would equal `70`

This report is returned as a dictionary of the following structure:

```
{
    "global_stats": {
        ...,
        "row_count": (x, y)
    }
    "data_stats": [
        {
            ...,
            "statistics": {
                ...,
                "min": (x, y),
                ...,
            }
        },
        ...
    ]
}
```
Where `x` and `y` denote the `min` and `max` for each key, respectively

In [23]:
historical_profiler.get_diff_min_and_max_report()

  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)


{'global_stats': {'samples_used': (-2, 2),
  'column_count': (0, 0),
  'row_count': (-2, 2),
  'row_has_null_ratio': (0, 0),
  'row_is_null_ratio': (0, 0),
  'unique_row_ratio': (0, 0),
  'duplicate_row_count': (0, 0),
  'file_type': (0, 0),
  'encoding': (0, 0),
  'correlation_matrix': (0, 0),
  'chi2_matrix': None,
  'profile_schema': [{},
   {'AWND': 'unchanged',
    'MonthlyDaysWithGT001Precip': 'unchanged',
    'MonthlyDepartureFromNormalAverageTemperature': 'unchanged',
    'MonthlyDepartureFromNormalCoolingDegreeDays': 'unchanged',
    'MonthlyDepartureFromNormalHeatingDegreeDays': 'unchanged',
    'MonthlyDepartureFromNormalMaximumTemperature': 'unchanged',
    'MonthlyGreatestPrecip': 'unchanged',
    'MonthlyGreatestPrecipDate': 'unchanged',
    'WindEquipmentChangeDate': 'unchanged'},
   {}],
  'times': {'row_stats': (-0.00039124488830566406, 0.0004589557647705078)}},
 'data_stats': [{'column_name': (0, 0),
   'data_type': (0, 0),
   'data_label': (0, 0),
   'categorical': (