Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minibatch: Add one-hot encoding option for int #259

Closed
wants to merge 2 commits into from

Conversation

iyerr3
Copy link
Contributor

@iyerr3 iyerr3 commented Apr 10, 2018

JIRA: MADLIB-1226

Integer dependent variables can be used either in regression or
classification. To use in classification, they need to be one-hot
encoded. This commit adds an option to allow users to pick if a integer
dependent input needs to one-hot encoded or not. The flag is ignored if
the variable is not of integer type.

Other changes include adding an appropriate test in install-check,
code cleanup and PEP8 conformance.

@asfgit
Copy link

asfgit commented Apr 10, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/431/

Copy link
Contributor

@njayaram2 njayaram2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be rebased over master to ensure dependent_vartype is included in the summary table. Tested it after rebasing over master locally (with minor changes), and that looks good.

preprocessor whether you want to encode them or not. In the case that you have
already encoded the dependent variable yourself, you can ignore this parameter.
Also, if you want to encode float values for some reason, cast them to text
first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the explanation.

JIRA: MADLIB-1226

Integer dependent variables can be used either in regression or
classification. To use in classification, they need to be one-hot
encoded. This commit adds an option to allow users to pick if a integer
dependent input needs to one-hot encoded or not. The flag is ignored if
the variable is not of integer type.

Other changes include adding an appropriate test in install-check,
code cleanup and PEP8 conformance.
@iyerr3 iyerr3 force-pushed the feature/minibatch_one_hot_encode branch from 4729973 to d9381e2 Compare April 11, 2018 04:02
@iyerr3
Copy link
Contributor Author

iyerr3 commented Apr 11, 2018

I've (force) pushed after the rebase. This should now reflect the dependent_vartype change from previous PR.

@asfgit
Copy link

asfgit commented Apr 11, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/436/

@asfgit
Copy link

asfgit commented Apr 12, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/444/

@asfgit asfgit closed this in feeb8a5 Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants