New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DT: Don't use NULL value to get dep_var type #268
Conversation
Refer to this link for build results (access rights to CI server needed): |
d8f3e5c
to
ad0f6e7
Compare
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
JIRA: MADLIB-1233 Function `_is_dep_categorical` is used to obtain the type of the dependent variable expression. This function gets a random value using `LIMIT 1` and checks the type of the corresponding value in Python. Further this does not filter out NULL values. Since NULL values are not filtered out, it's possible the `LIMIT 1` returns a "None" type in Python, leading to incorrect results. This commit updates the type extraction by checking the type in the database instead of in Python and also filters out NULL values. Additionally it checks if at least one non-NULL value is obtained, else throws an appropriate error.
JIRA: MADLIB-1236 If a cat_feature is dropped (due to just a single level), that feature should not be included in the summary table list, since tree_predict uses the features in summary table while reading source table. This commit ensures the right features are populated in the summary table. Closes apache#268
c6f8fca
to
d77c35c
Compare
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins, please retest |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
return expr_type.lower() | ||
""".format(expr, tbl)) | ||
if not expr_type: | ||
plpy.error("Table {0} does not contain any valid tuples".format(tbl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the error message should also mention that we failed to get the expression type of column foo in table bar. what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add the text and merge the PR.
JIRA: MADLIB-1233
Function
_is_dep_categorical
is used to obtain the type of thedependent variable expression. This function gets a random value using
LIMIT 1
and checks the type of the corresponding value in Python.Further this does not filter out NULL values.
Since NULL values are not filtered out,
it's possible the
LIMIT 1
returns a "None" type in Python, leading toincorrect results.
This commit updates the type extraction by checking the type in the
database instead of in Python and also filters out NULL values.
Additionally it checks if at least one non-NULL value is obtained, else
throws an appropriate error.