Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall slow performance of feature calculation #701

Closed
johnnyheineken opened this issue Aug 8, 2019 · 2 comments
Closed

Overall slow performance of feature calculation #701

johnnyheineken opened this issue Aug 8, 2019 · 2 comments

Comments

@johnnyheineken
Copy link

johnnyheineken commented Aug 8, 2019

Hi,

I've been testing feature tools extensively. I do not have any specific bug report, but I still would like to report the performance issues, which we encountered with FeatureTools.

Our typical usecase looks like this. We usually just aggregate data from transactional level to application level.

Entity set + transaction entity
Entityset: clients
  Entities:
    transactions [Rows: 297432, Columns: 11]
    applications [Rows: 17901, Columns: 1]
  Relationships:
    transactions.CUSTOMER_ID -> applications.CUSTOMER_ID

and

Entity: transactions
  Variables:
    ID_TRANSACTION (dtype: index)
    CUSTOMER_ID (dtype: id)
    TIME (dtype: datetime_time_index)
    HOME_PLACE (dtype: categorical)
    TRANSACTION_AMOUNT (dtype: numeric)
    TRANSACTION_CLASS (dtype: categorical)
    TRANSACTION_FEE (dtype: numeric)
    TRANSACTION_PLACE (dtype: categorical)
    TRANSACTION_PURPOSE (dtype: categorical)
    TRANSACTION_TIME (dtype: datetime)
    TRANSACTION_TYPE (dtype: categorical)
  Shape:
    (Rows: 297432, Columns: 11)

In our specific usecase, we usually do following features:

a) simple aggregations
b) simple aggregations over time windows
c) ratio of aggregations - different aggregations
d) ratio of aggregations - different time windows
e) time since first/last event

All of these are sometimes segmented per another features.

Out of these features, default FeatureTools are good for use case a) and e).
If we consider 'TREND" primitive, we can also calculate d).
If we write custom primitives, use case b) and c) is also managable.

However, this does not solve the problem that the calculation is slow, and this is also amplified when we use the custom primitives (which are fyi using numpy, not pandas).

These are the features to be calculated (~220 features):
[<Feature: MAX_30D(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MAX(transactions.TRANSACTION_FEE)>,
 <Feature: MODE(transactions.HOME_PLACE)>,
 <Feature: MODE(transactions.TRANSACTION_CLASS)>,
 <Feature: MODE(transactions.TRANSACTION_PLACE)>,
 <Feature: MODE(transactions.TRANSACTION_PURPOSE)>,
 <Feature: MODE(transactions.TRANSACTION_TYPE)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MIN(transactions.TRANSACTION_FEE)>,
 <Feature: MEAN(transactions.TRANSACTION_AMOUNT)>,
 <Feature: MEAN(transactions.TRANSACTION_FEE)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME)>,
 <Feature: STD(transactions.TRANSACTION_AMOUNT)>,
 <Feature: STD(transactions.TRANSACTION_FEE)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MAX_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MAX_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MAX_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MAX_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MIN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MIN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MIN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,

 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MIN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MEAN_30D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MEAN_90D(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MAX(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MAX(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MIN(transactions.TRANSACTION_AMOUNT WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = nan)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: MIN(transactions.TRANSACTION_FEE WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = nan)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: TREND(transactions.TRANSACTION_AMOUNT, TIME WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Ledečko)>,

 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = nan)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: TREND(transactions.TRANSACTION_FEE, TIME WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = nan)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: TIME_SINCE_LAST(transactions.TIME WHERE TRANSACTION_PURPOSE = charity)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Úžice)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Ledečko)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Talmberk)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PURPOSE = hazard)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Skalice)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Mrchojedy)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = nan)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Rataje)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PLACE = Sázava)>,
 <Feature: TIME_SINCE_FIRST(transactions.TIME WHERE TRANSACTION_PURPOSE = charity)>]

Calculation of these features takes 2 hours, 30 minutes.

If we calculate similar features using pandas (I wrote a specialised tool for this use case), calculation of the same features on a single core takes 10 seconds + more of the ratio features (450 features total).

However, even if we omit the custom features, estimated calculation time is still in hours (97 features)

This is not a single usecase with this issue. We encountered this problem on several occasions throughout the company (different data, different setups, different features).

It is true that FeatureTools can be parallelised. But they become very resource hungry and still, even 100 cores will not decrease the computation time to seconds.

Feel free to contact me to specify this problem any further.

P.S. I am using FT 0.9.1, Python 3.7.4, pandas 0.25rc, numpy 1.17

@VasekSvo
Copy link

The issue probably is that FT have to check time index column against cutoff_date for every row over which you aggregate. Moreover for custom features you are doing these expensive calculations independently for each custom feature you calculate.

When you use your tool you probably "manually" drop the future information and info outside of training window and then do the aggregations, which is obviously much faster.

Where it comes to check for future information you can theoretically drop it manually before running the calculation and somehow instruct FT not to check if cutoff_date is prior to the time of the event. As far as I know FT do not offer such option (maybe option approximate in dfs function can be used for this?).

The training window check is bit more tricky, very inelegant solution would be to drop info outside of training window and then do the calculation independently for all time windows you want to consider.

This could be solved by adding the following features to FT (which I believe would be very helpful in general):

  1. option to don't perform check for "future information" (when cuttof_dates are specified)
  2. add support for multiple time windows

@kmax12
Copy link
Contributor

kmax12 commented Jun 4, 2020

@johnnyheineken I realize this issue is old, but are you able to run this on the latest version of featuretools? If so, we'd be curious to see how it runs now. Based on that we can come up with some actions that we can take if things aren't fixed.

I'm going to close this issue but feel free to open a new one if you have a chance to run! Thanks

@kmax12 kmax12 closed this as completed Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants