Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinity splits #12

Merged
merged 43 commits into from
May 30, 2017
Merged

Infinity splits #12

merged 43 commits into from
May 30, 2017

Conversation

aldro61
Copy link
Owner

@aldro61 aldro61 commented Apr 18, 2017

Changed the predicted value for the case where the function only contains one type of bound.

  • Case -- only lower bounds: the predicted value is the position of the first breakpoint to the left of the minimum pointer (i.e, the largest lower bound + margin).

  • Case -- only upper bounds: the predicted value is the position of the minimum pointer (i.e., the smallest upper bound - margin).

Also updated the unit tests to account for this change.

@aldro61 aldro61 requested a review from tdhock April 18, 2017 20:19
Copy link
Collaborator

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@@ -62,7 +62,7 @@ plotMargin(1.5)
out1.5 <- compute_optimal_costs(
target.mat, margin=1.5, loss="hinge")
test_that("margin=1.5 yields cost 0,1", {
expect_equal(out1.5$pred, c(Inf, 1, 0))
expect_equal(out1.5$pred, c(0.5, 1, 0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating the R unit tests


def test_dummy_dataset_1(self):
"""
learning on dummy dataset #1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "noiseless" instead of dummy? That would be more specific.

@tdhock
Copy link
Collaborator

tdhock commented Apr 20, 2017

another thing I was thinking -- in many data sets, some input features are monotonic transforms of others, e.g. first column is x, second columns is log(x). Do you think that would confuse the learning algo? Do you think we should add a pre-processing step that filters any features that are monotonic transforms of other features? we could do that by comparing the sort order indices of the features...

@@ -165,7 +165,7 @@ int PiecewiseFunction::insert_point(double y, bool is_upper_bound) {
}

// Breakpoint info
float b = y - s * this->margin;
Copy link
Owner Author

@aldro61 aldro61 Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdhock I'm pretty sure this is why we had really poor precision on the breakpoint positions! The breakpoint positions were floats and were getting converted to double each time a comparison was made, since all the other real-valued variables have type double.

With this fix, I was able to increase the tolerance to super small values (e.g., 1e-20) and didn't get any failing tests. For now, I set the tolerance to 1e-9.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, sorry I didn't catch that -- in general I never use float -- that should have raised a red flag in my head -- I completely agree, change to using double all the time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I would not expect that it would make a big difference in terms of test error...

* that value + some offset.
* */
return get_breakpoint_position(std::prev(this->min_ptr));
return get_breakpoint_position(std::prev(this->min_ptr)) + 1e-4; // Add a small value because the previous breakpoint is not included in the minimum segment ]lower, upper]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdhock I made this change because b_{t, i-1} is not included in the minimum segment, which is defined over the range ]b_{t, i-1}, b_{t, i}]. Should not change much in practice, but at least it's consistent. I updated the unit tests in R and Python.

@tdhock
Copy link
Collaborator

tdhock commented May 30, 2017 via email

@aldro61
Copy link
Owner Author

aldro61 commented May 30, 2017 via email

@tdhock
Copy link
Collaborator

tdhock commented May 30, 2017 via email

@aldro61
Copy link
Owner Author

aldro61 commented May 30, 2017 via email

@aldro61
Copy link
Owner Author

aldro61 commented May 30, 2017

My code review is done and I'm ready to merge into Master. Ok for you @tdhock?

@tdhock
Copy link
Collaborator

tdhock commented May 30, 2017

great, please merge +1

@aldro61 aldro61 merged commit f3a047b into master May 30, 2017
@aldro61 aldro61 deleted the infinity-splits branch May 30, 2017 20:43
parismita pushed a commit that referenced this pull request May 29, 2019
Update Travis in this branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants