You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been testing feature tools extensively. I do not have any specific bug report, but I still would like to report the performance issues, which we encountered with FeatureTools.
Our typical usecase looks like this. We usually just aggregate data from transactional level to application level.
In our specific usecase, we usually do following features:
a) simple aggregations
b) simple aggregations over time windows
c) ratio of aggregations - different aggregations
d) ratio of aggregations - different time windows
e) time since first/last event
All of these are sometimes segmented per another features.
Out of these features, default FeatureTools are good for use case a) and e).
If we consider 'TREND" primitive, we can also calculate d).
If we write custom primitives, use case b) and c) is also managable.
However, this does not solve the problem that the calculation is slow, and this is also amplified when we use the custom primitives (which are fyi using numpy, not pandas).
These are the features to be calculated (~220 features):
[<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT)>,
<Feature: MAX(transactions.TRANSACTION_FEE)>,
<Feature: MODE(transactions.HOME_PLACE)>,
<Feature: MODE(transactions.TRANSACTION_CLASS)>,
<Feature: MODE(transactions.TRANSACTION_PLACE)>,
<Feature: MODE(transactions.TRANSACTION_PURPOSE)>,
<Feature: MODE(transactions.TRANSACTION_TYPE)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT)>,
<Feature: MIN(transactions.TRANSACTION_FEE)>,
<Feature: MEAN(transactions.TRANSACTION_AMOUNT)>,
<Feature: MEAN(transactions.TRANSACTION_FEE)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME)>,
<Feature: STD(transactions.TRANSACTION_AMOUNT)>,
<Feature: STD(transactions.TRANSACTION_FEE)>,
<Feature: TIME_SINCE_LAST(transactions.TIME)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = nan)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = nan)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = nan)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PURPOSE = charity)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Úžice)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Ledečko)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Talmberk)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PURPOSE = hazard)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Skalice)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = nan)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Rataje)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Sázava)>,
<Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PURPOSE = charity)>]
Calculation of these features takes 2 hours, 30 minutes.
If we calculate similar features using pandas (I wrote a specialised tool for this use case), calculation of the same features on a single core takes 10 seconds + more of the ratio features (450 features total).
However, even if we omit the custom features, estimated calculation time is still in hours (97 features)
This is not a single usecase with this issue. We encountered this problem on several occasions throughout the company (different data, different setups, different features).
It is true that FeatureTools can be parallelised. But they become very resource hungry and still, even 100 cores will not decrease the computation time to seconds.
Feel free to contact me to specify this problem any further.
P.S. I am using FT 0.9.1, Python 3.7.4, pandas 0.25rc, numpy 1.17
The text was updated successfully, but these errors were encountered:
The issue probably is that FT have to check time index column against cutoff_date for every row over which you aggregate. Moreover for custom features you are doing these expensive calculations independently for each custom feature you calculate.
When you use your tool you probably "manually" drop the future information and info outside of training window and then do the aggregations, which is obviously much faster.
Where it comes to check for future information you can theoretically drop it manually before running the calculation and somehow instruct FT not to check if cutoff_date is prior to the time of the event. As far as I know FT do not offer such option (maybe option approximate in dfs function can be used for this?).
The training window check is bit more tricky, very inelegant solution would be to drop info outside of training window and then do the calculation independently for all time windows you want to consider.
This could be solved by adding the following features to FT (which I believe would be very helpful in general):
option to don't perform check for "future information" (when cuttof_dates are specified)
@johnnyheineken I realize this issue is old, but are you able to run this on the latest version of featuretools? If so, we'd be curious to see how it runs now. Based on that we can come up with some actions that we can take if things aren't fixed.
I'm going to close this issue but feel free to open a new one if you have a chance to run! Thanks
Hi,
I've been testing feature tools extensively. I do not have any specific bug report, but I still would like to report the performance issues, which we encountered with FeatureTools.
Our typical usecase looks like this. We usually just aggregate data from transactional level to application level.
Entity set + transaction entity
and
In our specific usecase, we usually do following features:
a) simple aggregations
b) simple aggregations over time windows
c) ratio of aggregations - different aggregations
d) ratio of aggregations - different time windows
e) time since first/last event
All of these are sometimes segmented per another features.
Out of these features, default FeatureTools are good for use case a) and e).
If we consider 'TREND" primitive, we can also calculate d).
If we write custom primitives, use case b) and c) is also managable.
However, this does not solve the problem that the calculation is slow, and this is also amplified when we use the custom primitives (which are fyi using numpy, not pandas).
These are the features to be calculated (~220 features):
Calculation of these features takes 2 hours, 30 minutes.
If we calculate similar features using pandas (I wrote a specialised tool for this use case), calculation of the same features on a single core takes 10 seconds + more of the ratio features (450 features total).
However, even if we omit the custom features, estimated calculation time is still in hours (97 features)
This is not a single usecase with this issue. We encountered this problem on several occasions throughout the company (different data, different setups, different features).
It is true that FeatureTools can be parallelised. But they become very resource hungry and still, even 100 cores will not decrease the computation time to seconds.
Feel free to contact me to specify this problem any further.
P.S. I am using FT 0.9.1, Python 3.7.4, pandas 0.25rc, numpy 1.17
The text was updated successfully, but these errors were encountered: