Infinity splits #12

aldro61 · 2017-04-18T20:19:46Z

Changed the predicted value for the case where the function only contains one type of bound.

Case -- only lower bounds: the predicted value is the position of the first breakpoint to the left of the minimum pointer (i.e, the largest lower bound + margin).
Case -- only upper bounds: the predicted value is the position of the minimum pointer (i.e., the smallest upper bound - margin).

Also updated the unit tests to account for this change.

…nly one type of bound. Also updated the tests.

…linear) with a single split.

tdhock

looks good to me

tdhock · 2017-04-19T11:57:28Z

Rpackage/tests/testthat/test-trivial.R

@@ -62,7 +62,7 @@ plotMargin(1.5)
 out1.5 <- compute_optimal_costs(
  target.mat, margin=1.5, loss="hinge")
 test_that("margin=1.5 yields cost 0,1", {
-  expect_equal(out1.5$pred, c(Inf, 1, 0))
+  expect_equal(out1.5$pred, c(0.5, 1, 0))


thanks for updating the R unit tests

tdhock · 2017-04-19T12:01:23Z

mmit/tests/test_learner.py

+
+    def test_dummy_dataset_1(self):
+        """
+        learning on dummy dataset #1


maybe "noiseless" instead of dummy? That would be more specific.

tdhock · 2017-04-20T19:29:32Z

another thing I was thinking -- in many data sets, some input features are monotonic transforms of others, e.g. first column is x, second columns is log(x). Do you think that would confuse the learning algo? Do you think we should add a pre-processing step that filters any features that are monotonic transforms of other features? we could do that by comparing the sort order indices of the features...

…o infinity-splits

aldro61 · 2017-04-25T21:43:48Z

mmit/core/piecewise_function.cpp

@@ -165,7 +165,7 @@ int PiecewiseFunction::insert_point(double y, bool is_upper_bound) {
    }

    // Breakpoint info
-    float b = y - s * this->margin;


@tdhock I'm pretty sure this is why we had really poor precision on the breakpoint positions! The breakpoint positions were floats and were getting converted to double each time a comparison was made, since all the other real-valued variables have type double.

With this fix, I was able to increase the tolerance to super small values (e.g., 1e-20) and didn't get any failing tests. For now, I set the tolerance to 1e-9.

wow, sorry I didn't catch that -- in general I never use float -- that should have raised a red flag in my head -- I completely agree, change to using double all the time.

although I would not expect that it would make a big difference in terms of test error...

aldro61 · 2017-05-30T16:40:40Z

mmit/core/piecewise_function.cpp

-             * that value + some offset.
-             * */
-            return get_breakpoint_position(std::prev(this->min_ptr));
+            return get_breakpoint_position(std::prev(this->min_ptr)) + 1e-4;  // Add a small value because the previous breakpoint is not included in the minimum segment ]lower, upper]


@tdhock I made this change because b_{t, i-1} is not included in the minimum segment, which is defined over the range ]b_{t, i-1}, b_{t, i}]. Should not change much in practice, but at least it's consistent. I updated the unit tests in R and Python.

tdhock · 2017-05-30T16:48:52Z

I would suggest to avoid adding arbitrary constants (like 1e-4) wherever possible. Is it really necessary? When there are all lower bounds, then the value of the cost function at the last breakpoint is zero, right? In fact all predictions in [b, \infty) result in zero cost, right? (where b is the last breakpoint)

aldro61 · 2017-05-30T17:03:00Z

Yes that’s right. However, as per our definition, b is not part of the minimum segment, which is ]b, \infty). I added an arbitrary constant because any value > b is in the minimum segment and leads to zero cost. I know that arbitrary constants are ugly, but at least this makes the code consistant with the theory. On the other hand, maybe we could make an exception to the theory, since we are guaranteed that “b” leads to zero cost. What’s your take on this: stick to the theory or exception?

tdhock · 2017-05-30T17:23:29Z

You are right that b is not part of the minimum segment, but we do know that b is also a minimum. I would recommend to delete the constant, and just use the last breakpoint. It won't make any difference in terms of learning, but I think it will clarify the code (it is always better to avoid introducing arbitrary constants in my opinion). Another option is to go back to reporting Infty as the minimum (rather than b). Another option is to report both the lower and upper bounds of optimal solutions, here is would be the two double precision numbers b and INFINITY. On Tue, May 30, 2017 at 1:03 PM, Alexandre Drouin <notifications@github.com> wrote:

…

Yes that’s right. However, as per our definition, b is not part of the minimum segment, which is ]b, \infty). I added an arbitrary constant because any value > b is in the minimum segment and leads to zero cost. I know that arbitrary constants are ugly, but at least this makes the code consistant with the theory. On the other hand, maybe we could make an exception to the theory, since we are guaranteed that “b” leads to zero cost. What’s your take on this: stick to the theory or exception? On May 30, 2017 at 12:48:58 PM, Toby Dylan Hocking ( ***@***.***) wrote: I would suggest to avoid adding arbitrary constants (like 1e-4) wherever possible. Is it really necessary? When there are all lower bounds, then the value of the cost function at the last breakpoint is zero, right? In fact all predictions in [b, \infty) result in zero cost, right? (where b is the last breakpoint) On Tue, May 30, 2017 at 12:40 PM, Alexandre Drouin < ***@***.*** > wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In mmit/core/piecewise_function.cpp > <#12 (comment)>: > > > @@ -101,11 +101,7 @@ double PiecewiseFunction::get_minimum_position() { > else if(equal(this->min_coefficients.quadratic, 0) && equal(this->min_coefficients.linear, 0)){ > if(is_end(this->min_ptr)){ > // Case: \___x lower bounds only > - /* TODO: I don't think this is correct. The position of the previous breakpoint is not in the min > - * segment. So by returning that value, we are not returning a true function minimum. We should return > - * that value + some offset. > - * */ > - return get_breakpoint_position(std::prev(this->min_ptr)); > + return get_breakpoint_position(std::prev(this->min_ptr)) + 1e-4; // Add a small value because the previous breakpoint is not included in the minimum segment ]lower, upper] > > @tdhock <https://github.com/tdhock> I made this change because b_{t, i-1} > is not included in the minimum segment, which is defined over the range > ]b_{t, i-1}, b_{t, i}]. Should not change much in practice, but at least > it's consistent. I updated the unit tests in R and Python. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#12 (review)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ AA478urE4OibZruTTk56l4a4KjumANNMks5r_EaIgaJpZM4NA1-M > > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ ACQ9RKNTIZ6dgpxqFPMkBfVwjwtBY-RMks5r_Eh3gaJpZM4NA1-M> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA478qea7dbqGRJrJf7glAmqN_geJejYks5r_EvKgaJpZM4NA1-M> .

aldro61 · 2017-05-30T18:14:57Z

Thanks for your opinion! I’ll go with option 1 and roll back to what we were doing before: return “b” as the predicted value. It’s the simplest solution and we do know that it is a minimum.

… and made a few changes to comments

aldro61 · 2017-05-30T20:14:06Z

My code review is done and I'm ready to merge into Master. Ok for you @tdhock?

tdhock · 2017-05-30T20:39:17Z

great, please merge +1

Update Travis in this branch

aldro61 added 3 commits April 18, 2017 16:09

Do not filter splits that lead to infinity leaves

89762ec

Better handling of infinity values in floating point comparisons

23766a5

Changed the returned value for the case where the function includes o…

fb0e65a

…nly one type of bound. Also updated the tests.

aldro61 requested a review from tdhock April 18, 2017 20:19

aldro61 and others added 3 commits April 18, 2017 16:56

New test for the learner: a dataset that is perfectly separable (non-…

7d01d0f

…linear) with a single split.

Typo in test name

6d3788d

Changed order of features

664e58b

tdhock approved these changes Apr 19, 2017

View reviewed changes

aldro61 and others added 8 commits April 19, 2017 11:30

Changed dummy to noiseless

d229b3f

Match solver's tolerance

c03f99b

Prevent nodes to be named with a trailing -

70d9d55

Tolerance in float comparisons

4f48e49

New test for squared hinge

f6374be

Entire code review + replaced lists by dicts for clarity

99f8d3f

keep score for all hp combinations, including alpha

1dd6fbe

Update model_selection.py

4faebea

aldro61 added 8 commits April 20, 2017 17:43

Details for solver errors

e7da7ed

Merge branch 'infinity-splits' of https://github.com/aldro61/mmit int…

2c569d1

…o infinity-splits

Save training score

7d53ee7

Save objective value for train data

f1f4671

Option to prune or not

0de532f

Increased numerical tolerance

9c27823

Fixed a bug that caused numerical instability

0f56bb3

Added a new test case that was failing prior to the bug fix

ddc6364

aldro61 commented Apr 25, 2017

View reviewed changes

aldro61 and others added 4 commits April 26, 2017 10:37

Better handling of infinite values

46547d9

Added a test case where infinite values were causing problems

f77cc3e

Cleaner less and greater functions

d65d252

Delete main.h

e35b56e

aldro61 and others added 10 commits May 29, 2017 10:03

Delete main.cpp

6d5366f

Update solver.h

dab19e1

Update solver.cpp

32f7466

Update piecewise_function.h

794c26e

Update piecewise_function.cpp

9fd1243

Update piecewise_function.cpp

2faef7a

Code review: a few simplifications

fd37aaa

try installing testthat from CRAN

aa704c8

Always return a minimum that is in the minimum segment in the \_ case

4e19660

Updated R tests

caf07b0

aldro61 commented May 30, 2017

View reviewed changes

aldro61 added 7 commits May 30, 2017 14:22

Code review and some simplifications

5e817c9

Updated unit tests after remove arbitrary constant in min position

d7cc687

Code review: checked that coefficients are ok for both loss types

34e1fe7

Code review: added validation of input arrays

846d09f

Code review: removed useless parentheses

ac16f5d

Code review: removed space

8251d2d

Added a random tiebreaker, changed hinge to linear for the loss type,…

9ec99af

… and made a few changes to comments

aldro61 merged commit f3a047b into master May 30, 2017

aldro61 deleted the infinity-splits branch May 30, 2017 20:43

parismita pushed a commit that referenced this pull request May 29, 2019

Merge pull request #12 from aldro61/master

a9a1595

Update Travis in this branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinity splits #12

Infinity splits #12

aldro61 commented Apr 18, 2017

tdhock left a comment

tdhock Apr 19, 2017

tdhock Apr 19, 2017

tdhock commented Apr 20, 2017

aldro61 Apr 25, 2017 •

edited

Loading

tdhock Apr 26, 2017

tdhock Apr 26, 2017

aldro61 May 30, 2017

tdhock commented May 30, 2017 via email •

edited by aldro61

Loading

aldro61 commented May 30, 2017 via email •

edited

Loading

tdhock commented May 30, 2017 via email

aldro61 commented May 30, 2017 via email

aldro61 commented May 30, 2017

tdhock commented May 30, 2017

Infinity splits #12

Infinity splits #12

Conversation

aldro61 commented Apr 18, 2017

tdhock left a comment

Choose a reason for hiding this comment

tdhock Apr 19, 2017

Choose a reason for hiding this comment

tdhock Apr 19, 2017

Choose a reason for hiding this comment

tdhock commented Apr 20, 2017

aldro61 Apr 25, 2017 • edited Loading

Choose a reason for hiding this comment

tdhock Apr 26, 2017

Choose a reason for hiding this comment

tdhock Apr 26, 2017

Choose a reason for hiding this comment

aldro61 May 30, 2017

Choose a reason for hiding this comment

tdhock commented May 30, 2017 via email • edited by aldro61 Loading

aldro61 commented May 30, 2017 via email • edited Loading

tdhock commented May 30, 2017 via email

aldro61 commented May 30, 2017 via email

aldro61 commented May 30, 2017

tdhock commented May 30, 2017

aldro61 Apr 25, 2017 •

edited

Loading

tdhock commented May 30, 2017 via email •

edited by aldro61

Loading

aldro61 commented May 30, 2017 via email •

edited

Loading