Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: Use NULL::integer[] when no continuous features #249

Closed
wants to merge 1 commit into from

Conversation

iyerr3
Copy link
Contributor

@iyerr3 iyerr3 commented Mar 26, 2018

JIRA: MADLIB-1219

When variable importance is enabled, to compute importance score,
distribution of the categorical and continuous features are computed.
For continuous features, this function requires initializing a vector of
length = number of continuous features. When there are no continuous
features, this initialization fails.

This commit fixes the issue by inputing a NULL::integer[] vector when
there no continuous features.

Closes #249

@asfgit
Copy link

asfgit commented Mar 26, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/402/

@asfgit
Copy link

asfgit commented Mar 26, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/403/

@fmcquillan99
Copy link

See https://issues.apache.org/jira/browse/MADLIB-1219 for results from my tests.

LGTM

JIRA: MADLIB-1219

When variable importance is enabled, to compute importance score,
distribution of the categorical and continuous features are computed.
For continuous features, this function requires initializing a vector of
length = number of continuous features. When there are no continuous
features, this initialization fails.

This commit fixes the issue by inputing a NULL::integer[] vector when
there no continuous features.

Closes apache#249
@iyerr3 iyerr3 force-pushed the bugfix/rf_var_imp_no_continuous branch from 2a088d9 to 4d9863d Compare March 26, 2018 17:19
@asfgit
Copy link

asfgit commented Mar 26, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/406/

@kaknikhil
Copy link
Contributor

The changes look good. +1 for adding an install check test.

I noticed that con_features is populated by decision_tree.py:_classify_features which does something like con_types = ['real', 'float8', 'double precision', 'numeric']. Is it better to call the utilities function is_psql_numeric_type ?

@iyerr3
Copy link
Contributor Author

iyerr3 commented Mar 27, 2018

@kaknikhil Thanks for the suggestion. It makes sense to use that function with integer types excluded (since DT does not treat integer as continuous). I'll add that as a separate commit since it's unrelated to this PR.

@asfgit asfgit closed this in 67e7cb2 Mar 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants