How to use featuretools at the test time? It seems featuretools' feature definitions do not store train time statistics to accurately apply primitives like 'PERCENTILE' at the test time #2697

nitinmnsn · 2024-03-23T22:23:04Z

Creating a github issue for better attention. I have a StackOverflow question for the same as well

I would demonstrate the issue with an example:

Let us say we want to use the primitive 'PERCENTILE'

Imports:

import pandas as pd
import featuretools as ft

For training (create a simple data with one column and let featuretools compute a percentile feature on top of it):

df_train = pd.DataFrame({'index':[1,2,3,4,5], 'val':[1,2,3,4,5]})
es_train = ft.EntitySet("es_train")
es_train.add_dataframe(df_train,'df')
fm, fl = ft.dfs(entityset = es_train, trans_primitives=['percentile'], agg_primitives=[], target_dataframe_name='df')

output:

print(fm)
       val  PERCENTILE(val)
index                      
1        1              0.2
2        2              0.4
3        3              0.6
4        4              0.8
5        5              1.0

So far everything is expected

Now, when I get an example with the value, say, 3, at the test time. I would want it translated to 0.6 as per the training data. But, that is not what happens

df_test = pd.DataFrame({'index':[1], 'val':[3]})
es_test = ft.EntitySet("es_test")
es_test.add_dataframe(df_test,'df')
ft.calculate_feature_matrix(features = fl, entityset=es_test)

output:

       val  PERCENTILE(val)
index                      
1        3              1.0

So, metadata in feature definitions in fl that is the output of ft.dfs does not store train time stats needed to compute the features at the test time. This would throw any machine-learning model into a tailspin

What is the canonical way to apply featuretools at the test time?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use featuretools at the test time? It seems featuretools' feature definitions do not store train time statistics to accurately apply primitives like 'PERCENTILE' at the test time #2697

How to use featuretools at the test time? It seems featuretools' feature definitions do not store train time statistics to accurately apply primitives like 'PERCENTILE' at the test time #2697

nitinmnsn commented Mar 23, 2024

How to use featuretools at the test time? It seems featuretools' feature definitions do not store train time statistics to accurately apply primitives like 'PERCENTILE' at the test time #2697

How to use featuretools at the test time? It seems featuretools' feature definitions do not store train time statistics to accurately apply primitives like 'PERCENTILE' at the test time #2697

Comments

nitinmnsn commented Mar 23, 2024