Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py #24

VolodymyrOrlov · 2019-05-18T01:03:51Z

I tried to use treeinterpreter to calculate feature contribution components for a large dataset which consist of 55K rows, each row ~ 15K features and even though I've parallelized my computation using Spark I was not able to run the code successfully.

One of the issues I was facing was tremendous amount of memory required by treeinterpreter for each run. It turned out that in my case most of the memory is used by _predict_forest to assemble lists of biases, contributions and predictions which are later used to calculate corresponding mean vectors.
To improve the code memory usage and runtime I propose to use iterative method for computing averages, as it is summarized in http://www.heikohoffmann.de/htmlthesis/node134.html

Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py

af66764

andosa merged commit 9b846cd into andosa:master May 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py #24

Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py #24

VolodymyrOrlov commented May 18, 2019

Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py #24

Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py #24

Conversation

VolodymyrOrlov commented May 18, 2019