Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Elastic net gives inconsistent result #126

Closed

Conversation

njayaram2
Copy link
Contributor

JIRA: MADLIB-1092

  • Elastic net used to consider the number of rows as the total number
    of rows in the table even when grouping was used. This fix changes
    that to consider the number of rows in a group while computing IGD.
  • Elastic net used to consider mean and standard deviation for both
    independent and dependent variables based on the entire table even
    when grouping was used. This is now computed based on a group,
    which is used to computed the scaled data when standardize=TRUE
    for Gaussian IGD.
  • One approximation still remains. During gradient computation (C++),
    every value in the independent variable (for each dimension) is
    subtracted with the mean computed based on the entire table and
    not groups. This approximiation was adopted since it is messy to
    pass group specific mean values for every row in the table to the
    C++ layer.

@iyerr3

JIRA: MADLIB-1092

- Elastic net used to consider the number of rows as the total number
of rows in the table even when grouping was used. This fix changes
that to consider the number of rows in a group while computing IGD.
- Elastic net used to consider mean and standard deviation for both
independent and dependent variables based on the entire table even
when grouping was used. This is now computed based on a group,
which is used to computed the scaled data when standardize=TRUE
for Gaussian IGD.
- One approximation still remains. During gradient computation (C++),
every value in the independent variable (for each dimension) is
subtracted with the mean computed based on the entire table and
not groups. This approximiation was adopted since it is messy to
pass group specific mean values for every row in the table to the
C++ layer.
@asfbot
Copy link

asfbot commented Apr 27, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/51/

@asfgit asfgit closed this in 0ff829a Apr 29, 2017
@asfbot
Copy link

asfbot commented Apr 29, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/55/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants