Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DT: Don't use NULL value to get dep_var type #268

Closed
wants to merge 4 commits into from

Conversation

iyerr3
Copy link
Contributor

@iyerr3 iyerr3 commented May 1, 2018

JIRA: MADLIB-1233

Function _is_dep_categorical is used to obtain the type of the
dependent variable expression. This function gets a random value using
LIMIT 1 and checks the type of the corresponding value in Python.
Further this does not filter out NULL values.
Since NULL values are not filtered out,
it's possible the LIMIT 1 returns a "None" type in Python, leading to
incorrect results.

This commit updates the type extraction by checking the type in the
database instead of in Python and also filters out NULL values.
Additionally it checks if at least one non-NULL value is obtained, else
throws an appropriate error.

@asfgit
Copy link

asfgit commented May 1, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/463/

@iyerr3 iyerr3 force-pushed the bugfix/dt_dep_var_type_null branch from d8f3e5c to ad0f6e7 Compare May 2, 2018 11:47
@asfgit
Copy link

asfgit commented May 2, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/465/

@asfgit
Copy link

asfgit commented May 2, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/466/

@asfgit
Copy link

asfgit commented May 2, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/467/

iyerr3 added 2 commits May 3, 2018 14:27
JIRA: MADLIB-1233

Function `_is_dep_categorical` is used to obtain the type of the
dependent variable expression. This function gets a random value using
`LIMIT 1` and checks the type of the corresponding value in Python.
Further this does not filter out NULL values.
Since NULL values are not filtered out,
it's possible the `LIMIT 1` returns a "None" type in Python, leading to
incorrect results.

This commit updates the type extraction by checking the type in the
database instead of in Python and also filters out NULL values.
Additionally it checks if at least one non-NULL value is obtained, else
throws an appropriate error.
JIRA: MADLIB-1236

If a cat_feature is dropped (due to just a single level), that feature
should not be included in the summary table list, since tree_predict
uses the features in summary table while reading source table. This
commit ensures the right features are populated in the summary table.

Closes apache#268
@iyerr3 iyerr3 force-pushed the bugfix/dt_dep_var_type_null branch from c6f8fca to d77c35c Compare May 3, 2018 21:32
@asfgit
Copy link

asfgit commented May 3, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/468/

Copy link
Contributor

@orhankislal orhankislal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iyerr3
Copy link
Contributor Author

iyerr3 commented May 10, 2018

jenkins, please retest

@asfgit
Copy link

asfgit commented May 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/475/

@asfgit
Copy link

asfgit commented May 14, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/477/

return expr_type.lower()
""".format(expr, tbl))
if not expr_type:
plpy.error("Table {0} does not contain any valid tuples".format(tbl))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the error message should also mention that we failed to get the expression type of column foo in table bar. what do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add the text and merge the PR.

@asfgit asfgit closed this in ef52d87 Jun 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants