FFT takes enormous amount of time for high number of features #2

amritbhanu · 2018-03-11T01:26:22Z

If the number of features expands upto 1000, building 32 trees takes forever.

amritbhanu · 2018-03-11T01:28:10Z

@dichen001: anyway to make it faster, where could be some room to improve?

dichen001 · 2018-03-12T03:17:40Z

Yeah, I know the speed is an issue for the current implementation.
At each level, FFT will iterate through every single feature to get the median and evaluate the split. One quick thing to do is to cache FFT results for each level.

e.g. when building FFT0, we already iterated and evaluated all features for the WHOLE dataset.
While building FFT1, FFT2,...FFTN, we just keep doing this repetitively on the WHOLE dataset.

# 1st level
"All data":
   feature1: median, metrics
   feature2: median, metrics
   ...

# 2nd level
"feature1 < mid1:"  # this one is added when it is selected as the feature for the split in the 1st level.
   feature2: median, metrics
   feature3: median, metrics

"featureX < mid1:"  # this one is added when it is selected as the feature for the split in the 1st level.
   feature1: median, metrics
   feature2: median, metrics

So if we can cache the intermediate results, then the speed should be much faster.

amritbhanu added the enhancement New feature or request label Mar 11, 2018

amritbhanu assigned dichen001 Mar 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFT takes enormous amount of time for high number of features #2

FFT takes enormous amount of time for high number of features #2

amritbhanu commented Mar 11, 2018

amritbhanu commented Mar 11, 2018

dichen001 commented Mar 12, 2018

FFT takes enormous amount of time for high number of features #2

FFT takes enormous amount of time for high number of features #2

Comments

amritbhanu commented Mar 11, 2018

amritbhanu commented Mar 11, 2018

dichen001 commented Mar 12, 2018