Skip to content

Enable grouping for minibatch preprocessing#254

Closed
kaknikhil wants to merge 3 commits intoapache:masterfrom
madlib:feature/minibatch-preprocessing-grouping
Closed

Enable grouping for minibatch preprocessing#254
kaknikhil wants to merge 3 commits intoapache:masterfrom
madlib:feature/minibatch-preprocessing-grouping

Conversation

@kaknikhil
Copy link
Contributor

This PR enables grouping for the minibatch preprocessor module.

Other changes

  1. Added install check test for special chars.
  2. Improved error messages and created a reusable function for
    testing column dimension in install check.
  3. Added an optional flag to utils_ind_var_scales_grouping so as to
    create a persistent x_mean table that will be reused as the
    standardization table by the preprocessor module.
  4. Added unittests for input_tbl_valid and output_tbl_valid in validate_args.py_in
  5. Raise custom exception for mocked plpy error.

Co-authored-by: Jingyi Mei jmei@pivotal.io

@asfgit
Copy link

asfgit commented Apr 2, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/418/

Copy link
Contributor

@njayaram2 njayaram2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but for a couple of very minor comments.

set_zero_std_to_one (optional, default is False. If set to true
0.0 standard deviation values will be set to 1.0)
create_temp_table If set to true, create a persistent instead of a temp
table, else create a temp table for x_mean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this comment say create temp table when true, and a persistent table when set to false?

grouping_cols,
x_mean_table,
set_zero_std_to_one = True,
create_temp_table = False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please correct the indentation here?

kaknikhil and others added 3 commits April 3, 2018 15:00
This commit enables grouping for the minibatch preprocessor module.

Other changes
1. Added install check test for special chars.
2. Improved error messages and created a reusable function for
testing column dimension in install check.
3. Add a new optional flag to utils_ind_var_scales_grouping so as to
create a persistent x_mean table that will be reused as the
standardization table by the preprocessor module.

Co-authored-by: Jingyi Mei <jmei@pivotal.io>
This commit adds a new unittest file for the validate_args python file.
The only two functions tested right now are input_tbl_valid and
output_tbl_valid.
Before this commit, all the unit tests that wanted to assert that
plpy.error was called had to assert that an Exception was raised. This
was too generic and did not distinguish between an exception coming from
the plpy mock class vs any other exception.
With this commit, we now raise a custom plpy exception so that we don't
need to assert for the equality of the error messages. Asserting for the
exception is proof enough that plpy.error was called.
@jingyimei jingyimei force-pushed the feature/minibatch-preprocessing-grouping branch from a4d8b69 to 76a3b9b Compare April 3, 2018 22:04
@asfgit
Copy link

asfgit commented Apr 3, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/421/

@asfgit asfgit closed this in 63261df Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants