# Distributed Profiling of Model Features with Whylogs & Fugue

It is a usual practice in the Machine Learning worls to log incoming model inference requests and outgoing predictions. These logs are then processed and aggregated later for various monitoring and drift detection purposes. However, consuming this raw data presents several pain points:
+ Machine Learning models vary widely in the number and nature of features and predictions. Some have 10 features and emit probability scores while others may have 30 features and emit a ranking. 
+ They also differ significantly in the type of features, with some having more categorical features and others having more numerical features.

It is imperative for us to devise a uniform way of processing them. We cannot have a specific monitoring logic for each model. 

In this tutorial we show how to use [Whylogs](https://whylabs.ai/whylogs) to profile the features and predictions and extract only the essential metrics from these profiles, regardless of the scale of the data.

The purposes of profiling are:
+ To normalize and compress metric data while retaining maximal information.
+ We can unify data from totally different models and process them using the same pipeline in the following step.
+ The subsequent steps will only need to handle purely numerical time series.
+ Significantly reduce the scale of the problem, so the compute can be more efficient and cost effective.

We also use the open source framework called [Fugue](https://fugue-tutorials.readthedocs.io/index.html) for its excellent abstraction layer that unifies the computing logic over Pandas, Spark, Ray or Dask.One of Fugue's most popular features is the ability to use a simple Python function call to distribute logic across many partitions of a larger dataframe. Users can provide functions with type-annotated inputs and outputs, and Fugue then converts the data based on the type annotations. This makes the custom logic independent from Fugue and Spark, removing the need for any pre-existing knowledge.

![](images/scale-up-ad.png)

In [1]:
import seaborn as sns
from matplotlib import pyplot as plt

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# this allows plots to appear directly in the notebook
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)


In [2]:
import pandas as pd

pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 100)

In [5]:
demo_df = pd.read_parquet('addemo23/demo_raw_data_20.parquet')

In [7]:
demo_df.shape

(333263, 5)

In [8]:
demo_df.head(5)

Unnamed: 0,occurred_at,model_name,version,predictions,features
0,2023-01-06 20:07:09,demo_model,1.0.1,15.838747,"{""feature_5"":55.0,""feature_6"":-5.1932850153,""feature_1"":1.0,""feature_3"":-16.5737745071,""feature_..."
1,2023-02-05 06:15:19,demo_model,1.0.1,16.47332,"{""feature_5"":33.0,""feature_6"":-3.1535183613,""feature_1"":1.0,""feature_3"":-13.291135202,""feature_2..."
2,2023-03-14 19:20:53,demo_model,1.0.1,19.68442,"{""feature_5"":66.0,""feature_6"":-0.6214204458,""feature_1"":1.0,""feature_3"":16.5737745071,""feature_2..."
3,2023-01-28 07:54:25,demo_model,1.0.1,32.174606,"{""feature_5"":3168.0,""feature_6"":3.2646410148,""feature_1"":0.0,""feature_3"":-13.291135202,""feature_..."
4,2023-02-24 18:21:44,demo_model,1.0.1,21.86408,"{""feature_5"":44.0,""feature_6"":2.9850439286,""feature_1"":1.0,""feature_3"":-16.5737745071,""feature_2..."


## Load Model Feature and Prediction Logs

### Extract Features and Predictions from model logs

In [11]:
import json
import pandas as pd

def extract_features(df: pd.DataFrame) -> pd.DataFrame:
    json_str = "[" + (",".join(df.features)) + "]"
    feature_df = pd.DataFrame(json.loads(json_str))
    #feature_df = feature_df.reset_index(drop=True)
    return feature_df[sorted(feature_df.columns)]

In [12]:
extracted_features_df = extract_features(demo_df)

In [13]:
extracted_features_df.head(5)

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6
0,1.0,2.777778,-16.573775,-232.159268,55.0,-5.193285
1,1.0,0.333333,-13.291135,-356.840737,33.0,-3.153518
2,1.0,2.5,16.573775,-110.94075,66.0,-0.62142
3,0.0,0.0,-13.291135,-290.004783,3168.0,3.264641
4,1.0,0.583333,-16.573775,-356.05467,44.0,2.985044


In [14]:
extracted_features_df.shape, demo_df.shape

((333263, 6), (333263, 5))

In [15]:
pd.concat([demo_df, extracted_features_df], axis=1, ignore_index=True).head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,2023-01-06 20:07:09,demo_model,1.0.1,15.838747,"{""feature_5"":55.0,""feature_6"":-5.1932850153,""feature_1"":1.0,""feature_3"":-16.5737745071,""feature_...",1.0,2.777778,-16.573775,-232.159268,55.0,-5.193285
1,2023-02-05 06:15:19,demo_model,1.0.1,16.47332,"{""feature_5"":33.0,""feature_6"":-3.1535183613,""feature_1"":1.0,""feature_3"":-13.291135202,""feature_2...",1.0,0.333333,-13.291135,-356.840737,33.0,-3.153518
2,2023-03-14 19:20:53,demo_model,1.0.1,19.68442,"{""feature_5"":66.0,""feature_6"":-0.6214204458,""feature_1"":1.0,""feature_3"":16.5737745071,""feature_2...",1.0,2.5,16.573775,-110.94075,66.0,-0.62142
3,2023-01-28 07:54:25,demo_model,1.0.1,32.174606,"{""feature_5"":3168.0,""feature_6"":3.2646410148,""feature_1"":0.0,""feature_3"":-13.291135202,""feature_...",0.0,0.0,-13.291135,-290.004783,3168.0,3.264641
4,2023-02-24 18:21:44,demo_model,1.0.1,21.86408,"{""feature_5"":44.0,""feature_6"":2.9850439286,""feature_1"":1.0,""feature_3"":-16.5737745071,""feature_2...",1.0,0.583333,-16.573775,-356.05467,44.0,2.985044


### A unit function to work on a partition of data

In [17]:
import json
import pandas as pd

def extract_features(model_logs_df: pd.DataFrame) -> pd.DataFrame:
    json_str = "[" + (",".join(model_logs_df.features)) + "]"
    extracted_features_df = pd.DataFrame(json.loads(json_str))
    extracted_features_df = extracted_features_df[sorted(extracted_features_df.columns)]
    model_logs_df['occurred_at'] = model_logs_df['occurred_at'].apply(lambda x: x.replace(microsecond=0))
    model_logs_df['ds'] = model_logs_df['occurred_at'].apply(lambda x: x.strftime("%Y-%m-%d"))
    model_logs_df['hour'] = model_logs_df['occurred_at'].apply(lambda x: x.hour)
    #return pd.merge(model_logs_df[['occurred_at', 'ds', 'hour', 'model_name', 'version', 'predictions']], feature_df, left_index=True, right_index=True)
    features_df = pd.concat([model_logs_df[['occurred_at', 'ds', 'hour', 'model_name', 'version', 'predictions']], extracted_features_df], axis=1, ignore_index=True)
    features_df.columns = ['occurred_at', 'ds', 'hour', 'model_name', 'version', 'predictions', 'feature_1', 'feature_2', 'feature_3', 'feature_4', 'feature_5', 'feature_6']
    return features_df

In [20]:
features_df = extract_features(demo_df)

In [21]:
features_df.shape

(333263, 12)

In [22]:
features_df.head(5)

Unnamed: 0,occurred_at,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6
0,2023-01-06 20:07:09,2023-01-06,20,demo_model,1.0.1,15.838747,1.0,2.777778,-16.573775,-232.159268,55.0,-5.193285
1,2023-02-05 06:15:19,2023-02-05,6,demo_model,1.0.1,16.47332,1.0,0.333333,-13.291135,-356.840737,33.0,-3.153518
2,2023-03-14 19:20:53,2023-03-14,19,demo_model,1.0.1,19.68442,1.0,2.5,16.573775,-110.94075,66.0,-0.62142
3,2023-01-28 07:54:25,2023-01-28,7,demo_model,1.0.1,32.174606,0.0,0.0,-13.291135,-290.004783,3168.0,3.264641
4,2023-02-24 18:21:44,2023-02-24,18,demo_model,1.0.1,21.86408,1.0,0.583333,-16.573775,-356.05467,44.0,2.985044


In [23]:
features_df.tail(5)

Unnamed: 0,occurred_at,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6
333258,2023-02-23 16:40:46,2023-02-23,16,demo_model,1.0.1,53.05978,1.0,6.25,-7.376024,-367.645223,6699.0,5.656609
333259,2023-01-27 03:02:18,2023-01-27,3,demo_model,1.0.1,39.292686,1.0,1.944444,-7.376024,-263.743379,2475.0,-5.085476
333260,2023-02-03 17:54:28,2023-02-03,17,demo_model,1.0.1,47.992615,0.0,5.961538,-16.573775,-367.57484,5390.0,3.935311
333261,2023-03-03 04:31:40,2023-03-03,4,demo_model,1.0.1,30.156523,0.0,6.185185,-7.376024,-222.633205,33.0,-0.899799
333262,2023-01-14 03:55:25,2023-01-14,3,demo_model,1.0.1,56.554688,0.0,0.0,-16.573775,-242.219296,6358.0,-2.291863


In [24]:
features_df.dtypes

occurred_at    datetime64[ns]
ds                     object
hour                    int64
model_name             object
version                object
predictions           float32
feature_1             float64
feature_2             float64
feature_3             float64
feature_4             float64
feature_5             float64
feature_6             float64
dtype: object

In [26]:
len(features_df.ds.unique())

88

In [27]:
features_df.hour.unique()

array([20,  6, 19,  7, 18, 22, 12,  5, 16, 15,  1,  3, 14,  4,  9, 13,  0,
        2, 11,  8, 23, 10, 17, 21])

In [28]:
features_df[(features_df['ds'] == '2023-02-10') & (features_df['hour'] == 5)].head(5)

Unnamed: 0,occurred_at,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6
1077,2023-02-10 05:42:33,2023-02-10,5,demo_model,1.0.1,71.940903,1.0,1.486486,-7.376024,-356.228413,14388.0,-4.583076
1107,2023-02-10 05:40:30,2023-02-10,5,demo_model,1.0.1,22.455576,0.0,0.3125,-7.376024,-356.292225,407.0,-4.115045
1163,2023-02-10 05:01:49,2023-02-10,5,demo_model,1.0.1,35.436371,0.0,500000000.0,-7.376024,-269.816818,2046.0,-1.90541
1862,2023-02-10 05:01:05,2023-02-10,5,demo_model,1.0.1,29.995705,0.0,3.611111,-7.376024,-355.86944,220.0,-5.041671
2472,2023-02-10 05:02:14,2023-02-10,5,demo_model,1.0.1,63.379929,0.0,0.7407407,-7.376024,-346.756512,10142.0,-5.340054


### !!PAUSE!! Questions ?

### Generate Whylogs Profiles - For month of Feb and Mar

In [29]:
import json
import numpy as np

import whylogs as why
from whylogs import DatasetProfileView

In [30]:
feb_test_df = features_df[(features_df['ds'] == '2023-02-10') & (features_df['hour'] == 5)]

In [31]:
feb_test_df.head(5)

Unnamed: 0,occurred_at,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6
1077,2023-02-10 05:42:33,2023-02-10,5,demo_model,1.0.1,71.940903,1.0,1.486486,-7.376024,-356.228413,14388.0,-4.583076
1107,2023-02-10 05:40:30,2023-02-10,5,demo_model,1.0.1,22.455576,0.0,0.3125,-7.376024,-356.292225,407.0,-4.115045
1163,2023-02-10 05:01:49,2023-02-10,5,demo_model,1.0.1,35.436371,0.0,500000000.0,-7.376024,-269.816818,2046.0,-1.90541
1862,2023-02-10 05:01:05,2023-02-10,5,demo_model,1.0.1,29.995705,0.0,3.611111,-7.376024,-355.86944,220.0,-5.041671
2472,2023-02-10 05:02:14,2023-02-10,5,demo_model,1.0.1,63.379929,0.0,0.7407407,-7.376024,-346.756512,10142.0,-5.340054


In [32]:
feb_whylogs_prof = why.log(feb_test_df[['feature_5', 'feature_6']]).view()

In [33]:
mar_test_df = features_df[(features_df['ds'] == '2023-03-10') & (features_df['hour'] == 5)]

In [34]:
mar_test_df.head(5)

Unnamed: 0,occurred_at,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6
2305,2023-03-10 05:15:10,2023-03-10,5,demo_model,1.0.1,27.194382,0.0,0.714286,-7.376024,-263.800903,22.0,-1.390994
4828,2023-03-10 05:00:02,2023-03-10,5,demo_model,1.0.1,15.076751,1.0,18.125,-7.376024,-222.363082,33.0,-0.03111
10811,2023-03-10 05:45:28,2023-03-10,5,demo_model,1.0.1,29.559851,0.0,1.290323,-7.376024,-263.718682,11.0,-0.466324
20547,2023-03-10 05:23:59,2023-03-10,5,demo_model,1.0.1,19.866926,1.0,7.0,-7.376024,-355.643119,33.0,-4.511189
26701,2023-03-10 05:38:55,2023-03-10,5,demo_model,1.0.1,10.803627,1.0,25.0,-16.573775,-222.363082,33.0,1.176789


In [35]:
mar_whylogs_prof = why.log(mar_test_df[['feature_5', 'feature_6']]).view()

In [36]:
feb_whylogs_prof.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
feature_5,123.000037,123.0,123.006179,0,192,0,0,16654.0,2167.859375,572.0,11.0,192,11.0,33.0,33.0,55.0,3509.0,6534.0,9218.0,14388.0,3188.816752,SummaryType.COLUMN,0,192,0,0,0,0
feature_6,122.000037,122.0,122.006128,0,192,0,0,1.785209,-1.978073,-1.755073,-6.338692,192,-5.360618,-4.907968,-4.724481,-4.1657,0.248833,1.207462,1.360468,1.785209,2.278944,SummaryType.COLUMN,0,192,0,0,0,0


In [37]:
mar_whylogs_prof.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
feature_5,9.0,9.0,9.00045,0,51,0,0,121.0,41.843137,33.0,11.0,51,11.0,22.0,22.0,22.0,44.0,55.0,110.0,121.0,24.892065,SummaryType.COLUMN,0,51,0,0,0,0
feature_6,43.000004,43.0,43.002151,0,51,0,0,1.512827,-1.37844,-0.466324,-5.019625,51,-5.019625,-4.770901,-4.606864,-3.961216,0.838042,1.238112,1.512827,1.512827,2.395114,SummaryType.COLUMN,0,51,0,0,0,0


### Visualize Whylogs Profiles

In [38]:
from whylogs.viz import NotebookProfileVisualizer

from whylogs.viz.utils.histogram_calculations import histogram_from_view
from whylogs.viz.utils.frequent_items_calculations import frequent_items_from_view

In [39]:
visualization = NotebookProfileVisualizer()
visualization.set_profiles(target_profile_view=feb_whylogs_prof, reference_profile_view=mar_whylogs_prof)

In [40]:
visualization.double_histogram(feature_name="feature_6")

### Serialize Whylogs Profiles

In [41]:
feb_whylogs_prof.serialize()[0:100]

b'WHY1\x00\xc1\x02\n\x0e \xfa\xe0\xc6\xe8\xfb0(\xfa\xe0\xc6\xe8\xfb0\x12\x10\n\tfeature_5\x12\x03\n\x01\x00\x12\x11\n\tfeature_6\x12\x04\n\x02\xb6\x11 \xe8"*\x0c\x08\x02\x12\x08counts/n*\x15\x08\x07\x12\x11distribution/mean'

### !!PAUSE!! Questions ?

### Generate Hourly Profiles using Fugue

Showing profiling and serializing in one place.

### A unit function to work on a partition of data

In [42]:
import json
import pandas as pd

def profile_features(features_df: pd.DataFrame) -> pd.DataFrame:
    features_buf = why.log(features_df[['feature_1', 'feature_2', 'feature_3', 'feature_4', 'feature_5', 'feature_6']]).view().serialize()
    predictions_buf = why.log(features_df[['predictions']]).view().serialize()
    profiled_features = features_df.head(1).copy()
    profiled_features = profiled_features.drop(['occurred_at'], axis=1)
    profiled_features = profiled_features.assign(features_profile=features_buf, predictions_profile = predictions_buf, sample_records=len(features_df))
    return profiled_features

In [43]:
feb_test_df.shape

(192, 12)

In [44]:
profile_features(feb_test_df[(feb_test_df['ds'] == '2023-02-10') & (feb_test_df['hour'] == 5)])

Unnamed: 0,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,features_profile,predictions_profile,sample_records
1077,2023-02-10,5,demo_model,1.0.1,71.940903,1.0,1.486486,-7.376024,-356.228413,14388.0,-4.583076,b'WHY1\x00\x8d\x03\n\x0e \xb4\xd4\xc7\xe8\xfb0(\xb4\xd4\xc7\xe8\xfb0\x12\x10\n\tfeature_1\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x8b\xd5\xc7\xe8\xfb0(\x8b\xd5\xc7\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,192


### SCALE up the unit function to work on ALL partitions of data [Takes 3 mins]

In [45]:
from fugue import transform

hourly_feature_profile_df = transform(
    df=features_df, 
    using=profile_features, 
    schema="*-occurred_at+features_profile:binary,predictions_profile:binary,sample_records:long",
    partition=dict(by=['ds', 'hour', 'model_name', 'version']), 
    engine=None
)

In [46]:
hourly_feature_profile_df

Unnamed: 0,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,features_profile,predictions_profile,sample_records
0,2023-01-01,0,demo_model,1.0.1,27.501650,1.0,0.000000,-13.291135,-240.505562,154.0,-6.731355,b'WHY1\x00\x8d\x03\n\x0e \xeb\x82\xc8\xe8\xfb0(\xeb\x82\xc8\xe8\xfb0\x12\x11\n\tfeature_3\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xb3\x83\xc8\xe8\xfb0(\xb3\x83\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,202
1,2023-01-01,1,demo_model,1.0.1,46.351963,1.0,1.666667,-13.291135,-254.243683,2948.0,-5.481842,b'WHY1\x00\x8d\x03\n\x0e \xd5\x83\xc8\xe8\xfb0(\xd5\x83\xc8\xe8\xfb0\x12\x11\n\tfeature_4\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xa0\x84\xc8\xe8\xfb0(\xa0\x84\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,209
2,2023-01-01,2,demo_model,1.0.1,17.805542,0.0,0.384615,-13.291135,-241.290781,33.0,-4.997483,b'WHY1\x00\x8d\x03\n\x0e \xc1\x84\xc8\xe8\xfb0(\xc1\x84\xc8\xe8\xfb0\x12\x11\n\tfeature_2\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x93\x85\xc8\xe8\xfb0(\x93\x85\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,201
3,2023-01-01,3,demo_model,1.0.1,3.677644,1.0,6.666667,-13.291135,-240.622560,22.0,-2.467815,b'WHY1\x00\x8d\x03\n\x0e \xb7\x85\xc8\xe8\xfb0(\xb7\x85\xc8\xe8\xfb0\x12\x11\n\tfeature_2\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xf4\x85\xc8\xe8\xfb0(\xf4\x85\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,194
4,2023-01-01,4,demo_model,1.0.1,35.517563,0.0,0.000000,-13.291135,-289.579238,1782.0,-3.645520,b'WHY1\x00\x8d\x03\n\x0e \x8f\x86\xc8\xe8\xfb0(\x8f\x86\xc8\xe8\xfb0\x12\x11\n\tfeature_2\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xcc\x86\xc8\xe8\xfb0(\xcc\x86\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2091,2023-03-29,3,demo_model,1.0.1,31.919691,0.0,3.200000,16.573775,-356.444525,33.0,-5.945606,b'WHY1\x00\x8d\x03\n\x0e \xbb\x9d\xd3\xe8\xfb0(\xbb\x9d\xd3\xe8\xfb0\x12\x11\n\tfeature_5\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xed\x9d\xd3\xe8\xfb0(\xed\x9d\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,42
2092,2023-03-29,4,demo_model,1.0.1,10.014977,1.0,21.875000,16.573775,-368.380424,33.0,-4.414140,b'WHY1\x00\x8d\x03\n\x0e \x86\x9e\xd3\xe8\xfb0(\x86\x9e\xd3\xe8\xfb0\x12\x11\n\tfeature_5\x12\x0...,"b""WHY1\x00\xb0\x02\n\x0e \xb7\x9e\xd3\xe8\xfb0(\xb7\x9e\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...",25
2093,2023-03-29,5,demo_model,1.0.1,9.073092,1.0,85.576923,7.376024,-222.609995,33.0,3.402142,b'WHY1\x00\x8d\x03\n\x0e \xd0\x9e\xd3\xe8\xfb0(\xd0\x9e\xd3\xe8\xfb0\x12\x11\n\tfeature_6\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x81\x9f\xd3\xe8\xfb0(\x81\x9f\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,14
2094,2023-03-29,6,demo_model,1.0.1,25.611717,1.0,33.809524,16.573775,-368.432825,33.0,-1.268737,b'WHY1\x00\x8d\x03\n\x0e \x9a\x9f\xd3\xe8\xfb0(\x9a\x9f\xd3\xe8\xfb0\x12\x11\n\tfeature_6\x12\x0...,"b""WHY1\x00\xb0\x02\n\x0e \xcb\x9f\xd3\xe8\xfb0(\xcb\x9f\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...",19


### !!PAUSE!! Questions ?

### Merge Whylogs Profiles

We already have the Hourly profiles. Can we resuse that to get the daily profiles ? Can help incremental merging ? 

In [47]:
type(feb_whylogs_prof)

whylogs.core.view.dataset_profile_view.DatasetProfileView

In [48]:
feb_whylogs_prof.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
feature_5,123.000037,123.0,123.006179,0,192,0,0,16654.0,2167.859375,572.0,11.0,192,11.0,33.0,33.0,55.0,3509.0,6534.0,9218.0,14388.0,3188.816752,SummaryType.COLUMN,0,192,0,0,0,0
feature_6,122.000037,122.0,122.006128,0,192,0,0,1.785209,-1.978073,-1.755073,-6.338692,192,-5.360618,-4.907968,-4.724481,-4.1657,0.248833,1.207462,1.360468,1.785209,2.278944,SummaryType.COLUMN,0,192,0,0,0,0


In [49]:
mar_whylogs_prof.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
feature_5,9.0,9.0,9.00045,0,51,0,0,121.0,41.843137,33.0,11.0,51,11.0,22.0,22.0,22.0,44.0,55.0,110.0,121.0,24.892065,SummaryType.COLUMN,0,51,0,0,0,0
feature_6,43.000004,43.0,43.002151,0,51,0,0,1.512827,-1.37844,-0.466324,-5.019625,51,-5.019625,-4.770901,-4.606864,-3.961216,0.838042,1.238112,1.512827,1.512827,2.395114,SummaryType.COLUMN,0,51,0,0,0,0


In [50]:
merged_prof_view = feb_whylogs_prof.merge(mar_whylogs_prof)
merged_prof_view.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
feature_5,123.000037,123.0,123.006179,0,243,0,0,16654.0,1721.658436,220.0,11.0,243,11.0,22.0,33.0,44.0,2079.0,5896.0,8382.0,13233.0,2962.828606,SummaryType.COLUMN,0,243,0,0,0,0
feature_6,140.000048,140.0,140.007038,0,243,0,0,1.785209,-1.852224,-1.664465,-6.338692,243,-5.340054,-4.907968,-4.701136,-4.140412,0.342084,1.238112,1.390994,1.785209,2.311749,SummaryType.COLUMN,0,243,0,0,0,0


In [51]:
merge_test_df = features_df[((features_df['ds'] == '2023-02-10') | (features_df['ds'] == '2023-03-10')) & (features_df['hour'] == 5)]

In [52]:
merge_test_df['ds'].unique()

array(['2023-02-10', '2023-03-10'], dtype=object)

In [53]:
merged_whylogs_prof = why.log(merge_test_df[['feature_5', 'feature_6']]).view()

In [54]:
merged_whylogs_prof.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
feature_5,123.000037,123.0,123.006179,0,243,0,0,16654.0,1721.658436,220.0,11.0,243,11.0,22.0,33.0,44.0,2079.0,5896.0,8382.0,13233.0,2962.828606,SummaryType.COLUMN,0,243,0,0,0,0
feature_6,140.000048,140.0,140.007038,0,243,0,0,1.785209,-1.852224,-1.664465,-6.338692,243,-5.340054,-4.907968,-4.701136,-4.140412,0.342084,1.238112,1.390994,1.785209,2.311749,SummaryType.COLUMN,0,243,0,0,0,0


### Generate Daily Profiles

### A unit function to work on a partition of data

In [55]:
from functools import reduce

def profile_reduce(hourly_profiles_df: pd.DataFrame) -> pd.DataFrame:
    features_buf = reduce(
        lambda acc, x: acc.merge(x),
        hourly_profiles_df.features_profile.apply(DatasetProfileView.deserialize),
    ).serialize()
    predictions_buf = reduce(
        lambda acc, x: acc.merge(x),
        hourly_profiles_df.predictions_profile.apply(DatasetProfileView.deserialize),
    ).serialize()
    records = hourly_profiles_df.sample_records.sum()
    daily_profiles_df = hourly_profiles_df.head(1).copy()
    daily_profiles_df = daily_profiles_df.drop(['hour'], axis=1)
    daily_profiles_df = daily_profiles_df.assign(features_profile=features_buf, predictions_profile = predictions_buf, sample_records=records)
    return daily_profiles_df

In [56]:
hourly_feature_profile_df

Unnamed: 0,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,features_profile,predictions_profile,sample_records
0,2023-01-01,0,demo_model,1.0.1,27.501650,1.0,0.000000,-13.291135,-240.505562,154.0,-6.731355,b'WHY1\x00\x8d\x03\n\x0e \xeb\x82\xc8\xe8\xfb0(\xeb\x82\xc8\xe8\xfb0\x12\x11\n\tfeature_3\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xb3\x83\xc8\xe8\xfb0(\xb3\x83\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,202
1,2023-01-01,1,demo_model,1.0.1,46.351963,1.0,1.666667,-13.291135,-254.243683,2948.0,-5.481842,b'WHY1\x00\x8d\x03\n\x0e \xd5\x83\xc8\xe8\xfb0(\xd5\x83\xc8\xe8\xfb0\x12\x11\n\tfeature_4\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xa0\x84\xc8\xe8\xfb0(\xa0\x84\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,209
2,2023-01-01,2,demo_model,1.0.1,17.805542,0.0,0.384615,-13.291135,-241.290781,33.0,-4.997483,b'WHY1\x00\x8d\x03\n\x0e \xc1\x84\xc8\xe8\xfb0(\xc1\x84\xc8\xe8\xfb0\x12\x11\n\tfeature_2\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x93\x85\xc8\xe8\xfb0(\x93\x85\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,201
3,2023-01-01,3,demo_model,1.0.1,3.677644,1.0,6.666667,-13.291135,-240.622560,22.0,-2.467815,b'WHY1\x00\x8d\x03\n\x0e \xb7\x85\xc8\xe8\xfb0(\xb7\x85\xc8\xe8\xfb0\x12\x11\n\tfeature_2\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xf4\x85\xc8\xe8\xfb0(\xf4\x85\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,194
4,2023-01-01,4,demo_model,1.0.1,35.517563,0.0,0.000000,-13.291135,-289.579238,1782.0,-3.645520,b'WHY1\x00\x8d\x03\n\x0e \x8f\x86\xc8\xe8\xfb0(\x8f\x86\xc8\xe8\xfb0\x12\x11\n\tfeature_2\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xcc\x86\xc8\xe8\xfb0(\xcc\x86\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2091,2023-03-29,3,demo_model,1.0.1,31.919691,0.0,3.200000,16.573775,-356.444525,33.0,-5.945606,b'WHY1\x00\x8d\x03\n\x0e \xbb\x9d\xd3\xe8\xfb0(\xbb\x9d\xd3\xe8\xfb0\x12\x11\n\tfeature_5\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xed\x9d\xd3\xe8\xfb0(\xed\x9d\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,42
2092,2023-03-29,4,demo_model,1.0.1,10.014977,1.0,21.875000,16.573775,-368.380424,33.0,-4.414140,b'WHY1\x00\x8d\x03\n\x0e \x86\x9e\xd3\xe8\xfb0(\x86\x9e\xd3\xe8\xfb0\x12\x11\n\tfeature_5\x12\x0...,"b""WHY1\x00\xb0\x02\n\x0e \xb7\x9e\xd3\xe8\xfb0(\xb7\x9e\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...",25
2093,2023-03-29,5,demo_model,1.0.1,9.073092,1.0,85.576923,7.376024,-222.609995,33.0,3.402142,b'WHY1\x00\x8d\x03\n\x0e \xd0\x9e\xd3\xe8\xfb0(\xd0\x9e\xd3\xe8\xfb0\x12\x11\n\tfeature_6\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x81\x9f\xd3\xe8\xfb0(\x81\x9f\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,14
2094,2023-03-29,6,demo_model,1.0.1,25.611717,1.0,33.809524,16.573775,-368.432825,33.0,-1.268737,b'WHY1\x00\x8d\x03\n\x0e \x9a\x9f\xd3\xe8\xfb0(\x9a\x9f\xd3\xe8\xfb0\x12\x11\n\tfeature_6\x12\x0...,"b""WHY1\x00\xb0\x02\n\x0e \xcb\x9f\xd3\xe8\xfb0(\xcb\x9f\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...",19


In [57]:
profile_reduce(hourly_feature_profile_df[hourly_feature_profile_df['ds'] == '2023-01-01'])

Unnamed: 0,ds,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,features_profile,predictions_profile,sample_records
0,2023-01-01,demo_model,1.0.1,27.50165,1.0,0.0,-13.291135,-240.505562,154.0,-6.731355,b'WHY1\x00\x93\x03\n\x0e \xeb\x82\xc8\xe8\xfb0(\xeb\x82\xc8\xe8\xfb0\x12\x12\n\tfeature_6\x12\x0...,b'WHY1\x00\xb1\x02\n\x0e \xb3\x83\xc8\xe8\xfb0(\xb3\x83\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,4955


In [58]:
from fugue import transform

daily_feature_profile_df = transform(
    df=hourly_feature_profile_df, 
    using=profile_reduce, 
    schema="*-hour",
    partition=dict(by=['ds', 'model_name', 'version']), 
    engine=None
)

In [59]:
daily_feature_profile_df

Unnamed: 0,ds,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,features_profile,predictions_profile,sample_records
0,2023-01-01,demo_model,1.0.1,27.501650,1.0,0.000000,-13.291135,-240.505562,154.0,-6.731355,b'WHY1\x00\x93\x03\n\x0e \xeb\x82\xc8\xe8\xfb0(\xeb\x82\xc8\xe8\xfb0\x12\x12\n\tfeature_2\x12\x0...,b'WHY1\x00\xb1\x02\n\x0e \xb3\x83\xc8\xe8\xfb0(\xb3\x83\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,4955
1,2023-01-02,demo_model,1.0.1,12.510367,1.0,21.309524,-0.000000,-222.309847,253.0,-6.251062,b'WHY1\x00\x92\x03\n\x0e \xdb\x93\xc8\xe8\xfb0(\xdb\x93\xc8\xe8\xfb0\x12\x10\n\tfeature_1\x12\x0...,b'WHY1\x00\xb1\x02\n\x0e \x96\x94\xc8\xe8\xfb0(\x96\x94\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,4729
2,2023-01-03,demo_model,1.0.1,12.701192,0.0,6.153846,13.291135,-287.052503,33.0,-6.967658,b'WHY1\x00\x92\x03\n\x0e \xee\xa3\xc8\xe8\xfb0(\xee\xa3\xc8\xe8\xfb0\x12\x12\n\tfeature_6\x12\x0...,b'WHY1\x00\xb1\x02\n\x0e \xa8\xa4\xc8\xe8\xfb0(\xa8\xa4\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,4792
3,2023-01-04,demo_model,1.0.1,10.353567,1.0,5.714286,16.573775,-245.199055,33.0,-6.236039,b'WHY1\x00\x92\x03\n\x0e \xa3\xb5\xc8\xe8\xfb0(\xa3\xb5\xc8\xe8\xfb0\x12\x12\n\tfeature_4\x12\x0...,b'WHY1\x00\xb1\x02\n\x0e \xdd\xb5\xc8\xe8\xfb0(\xdd\xb5\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,4776
4,2023-01-05,demo_model,1.0.1,12.300185,0.0,2.156863,7.376024,-368.564221,22.0,-6.622412,b'WHY1\x00\x92\x03\n\x0e \xc4\xc5\xc8\xe8\xfb0(\xc4\xc5\xc8\xe8\xfb0\x12\x10\n\tfeature_1\x12\x0...,b'WHY1\x00\xb1\x02\n\x0e \xff\xc5\xc8\xe8\xfb0(\xff\xc5\xc8\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,4695
...,...,...,...,...,...,...,...,...,...,...,...,...,...
83,2023-03-25,demo_model,1.0.1,16.031528,0.0,1.280000,-16.573775,-263.922188,44.0,-6.741547,b'WHY1\x00\x91\x03\n\x0e \xd5\xe1\xd2\xe8\xfb0(\xd5\xe1\xd2\xe8\xfb0\x12\x12\n\tfeature_6\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x87\xe2\xd2\xe8\xfb0(\x87\xe2\xd2\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,818
84,2023-03-26,demo_model,1.0.1,13.260884,1.0,1.923077,-13.291135,-264.100678,22.0,-6.800002,b'WHY1\x00\x91\x03\n\x0e \xf0\xef\xd2\xe8\xfb0(\xf0\xef\xd2\xe8\xfb0\x12\x11\n\tfeature_2\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xa3\xf0\xd2\xe8\xfb0(\xa3\xf0\xd2\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,761
85,2023-03-27,demo_model,1.0.1,27.206446,1.0,2.234043,-0.000000,-222.435779,33.0,-6.030039,b'WHY1\x00\x91\x03\n\x0e \x83\xfe\xd2\xe8\xfb0(\x83\xfe\xd2\xe8\xfb0\x12\x11\n\tfeature_3\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xb4\xfe\xd2\xe8\xfb0(\xb4\xfe\xd2\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,776
86,2023-03-28,demo_model,1.0.1,11.736825,1.0,8.364486,13.291135,-222.363082,33.0,-5.541051,b'WHY1\x00\x90\x03\n\x0e \xba\x8d\xd3\xe8\xfb0(\xba\x8d\xd3\xe8\xfb0\x12\x12\n\tfeature_5\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xec\x8d\xd3\xe8\xfb0(\xec\x8d\xd3\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,1073


### !!PAUSE!! Questions ?

### Scaling up with Fugue & Dask

#### DASK

In [60]:
from fugue import transform

hourly_feature_profile_df = transform(
    df=features_df, 
    using=profile_features, 
    schema="*-occurred_at+features_profile:binary,predictions_profile:binary,sample_records:long",
    partition=dict(by=['ds', 'hour', 'model_name', 'version']), 
    engine="dask"
)

<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)


In [61]:
hourly_feature_profile_df.head(5)

This may cause some slowdown.
Consider scattering data ahead of time and using futures.


Unnamed: 0,ds,hour,model_name,version,predictions,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,features_profile,predictions_profile,sample_records
0,2023-01-01,2,demo_model,1.0.1,7.47996,0.0,3.770492,-13.291135,-368.58654,44.0,-7.051683,b'WHY1\x00\x8d\x03\n\x0e \xeb\xd4\xda\xe8\xfb0(\xeb\xd4\xda\xe8\xfb0\x12\x10\n\tfeature_1\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x88\xd6\xda\xe8\xfb0(\x88\xd6\xda\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,201
1,2023-01-02,9,demo_model,1.0.1,18.254972,1.0,1.296296,13.291135,-222.721609,374.0,6.610821,b'WHY1\x00\x8d\x03\n\x0e \xd2\xd6\xda\xe8\xfb0(\xd2\xd6\xda\xe8\xfb0\x12\x11\n\tfeature_3\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xb1\xd7\xda\xe8\xfb0(\xb1\xd7\xda\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,212
2,2023-01-02,22,demo_model,1.0.1,42.016945,0.0,1.25,13.291135,-270.799231,4103.0,-6.761548,b'WHY1\x00\x8d\x03\n\x0e \xe4\xd7\xda\xe8\xfb0(\xe4\xd7\xda\xe8\xfb0\x12\x11\n\tfeature_3\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xc6\xd8\xda\xe8\xfb0(\xc6\xd8\xda\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,204
3,2023-01-03,2,demo_model,1.0.1,19.434961,1.0,0.416667,13.291135,-369.323376,55.0,-7.080578,b'WHY1\x00\x8d\x03\n\x0e \x86\xd9\xda\xe8\xfb0(\x86\xd9\xda\xe8\xfb0\x12\x10\n\tfeature_1\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \x8d\xda\xda\xe8\xfb0(\x8d\xda\xda\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,207
4,2023-01-04,10,demo_model,1.0.1,28.43408,1.0,0.0,7.376024,-250.414699,121.0,6.895038,b'WHY1\x00\x8d\x03\n\x0e \xc6\xda\xda\xe8\xfb0(\xc6\xda\xda\xe8\xfb0\x12\x11\n\tfeature_6\x12\x0...,b'WHY1\x00\xb0\x02\n\x0e \xac\xdb\xda\xe8\xfb0(\xac\xdb\xda\xe8\xfb0\x12\x12\n\x0bpredictions\x1...,189


Similarly, we can also use `engine="ray"` `engine="spark"` as the backend engines to scale it up seamlessly with `Ray` or `Spark`.