Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing the lower and upper limits V.2 #41

Merged
merged 12 commits into from
Apr 17, 2018

Conversation

eric-valente
Copy link
Contributor

Trying to get this PR merged. I forked and fixed the remaining issues.

#19

"either input an array or a single value of lower and upper limits
check that lower_limits <= 0 and upper_limits >= 0"

Copy link
Contributor

@stephen-hoover stephen-hoover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviving this PR! I noticed some things which the previous review failed to catch. The comments apply to both logistic and linear algorithms.

glmnet/linear.py Outdated
if any(self.lower_limits) > 0 if isinstance(self.lower_limits, np.ndarray) else self.lower_limits > 0:
raise ValueError("lower_limits must be non-positive")

if any(self.upper_limits) < 0 if isinstance(self.upper_limits, np.ndarray) else self.upper_limits < 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second parentheses around the any call need to be after the 0, here and above.

 any(self.lower_limits > 0)

Another suggestion: Cast the inputs with an np.asarray if they fail an np.isscalar check; that will allow lists or tuples to be valid inputs. There should also be a ValueError raised if there's an array input with the wrong number of elements.

Please also put these checks in fit, maybe in their own helper function. The scikit-learn API requires that __init__ do nothing other than set the input parameters.

m = ElasticNet(lower_limits=lower_limits, upper_limits=upper_limits, random_state=5934)
m = m.fit(x, y)
assert(np.all(m.coef_) >= 0)
assert(np.all(m.coef_) <= 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as all coefficients are non-zero, these tests reduce to assert True >= 0 and assert True <= 1. The closing parenthesis on the np.all needs to be after the integer.

def test_coef_limits(self):
x, y = self.inputs[0]
lower_limits = 0
upper_limits = np.repeat(1,x.shape[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test fails to catch the bug in the bounds checking, how about you try re-writing it with lower limits of np.repeat(-1, x.shape[1]) and upper limits of 0.

@eric-valente
Copy link
Contributor Author

Think I covered all your points except this: There should also be a ValueError raised if there's an array input with the wrong number of elements.

Can you give me a little guidance? Assuming if the array > number of features throw ValueError?

@stephen-hoover
Copy link
Contributor

The limits must be either a scalar, or else a list or array with length equal to the number of features. In the if not np.isscalar(self.lower_limits) blocks, you can put a check like if len(...) != X.shape[1]: raise ValueError. You should move the X, y = check_X_y(X, y, accept_sparse='csr', ensure_min_samples=2) line to the top of the function so that you can be sure that X is an array at that point.

@eric-valente
Copy link
Contributor Author

Okay - made the request changed, will check if it passes ci.

@eric-valente
Copy link
Contributor Author

AssertionError on the logistic tests, not linear. Thoughts?

Traceback (most recent call last):
File "/src/python-glmnet/glmnet/tests/test_logistic.py", line 101, in test_coef_limits
assert(np.all(m.coef_ >= 0))
AssertionError

@stephen-hoover
Copy link
Contributor

I'm bothered by the fact that the linear tests don't fail. In each test, you've set lower limits of -1 and upper limits of 0. The tests are testing that the coefficients are between 0 and 1. The tests should be changed to check that coefficients are between -1 and 0. If the linear tests are passing, then it must be that all coefficients are 0.

Try setting alpha = 0 in both the ElasticNet and LogitNet constructors in the new tests. It defaults to 1, which is lasso. That will push coefficients to equal 0. If you make that change in the linear tests before switching the limits, then that test should start failing.

@eric-valente
Copy link
Contributor Author

No luck w/ alpha=0

======================================================================
FAIL: test_coef_limits (test_linear.TestElasticNet)

Traceback (most recent call last):
File "/src/python-glmnet/glmnet/tests/test_linear.py", line 111, in test_coef_limits
assert(np.all(m.coef_ >= 0))
AssertionError

======================================================================
FAIL: test_coef_limits (test_logistic.TestLogitNet)

Traceback (most recent call last):
File "/src/python-glmnet/glmnet/tests/test_logistic.py", line 101, in test_coef_limits
assert(np.all(m.coef_ >= 0))
AssertionError

@eric-valente
Copy link
Contributor Author

I think the code is okay but we are setting lower_limits to -1, it makes sense that this test fails. We set lower_limits to -1 so it will trigger this: assert(np.all(m.coef_ >= 0))

Would this make more sense?

       assert(np.all(m.coef_ >= -1))
        assert(np.all(m.coef_ <= 0))     

@stephen-hoover
Copy link
Contributor

Yes, I'm happy to see that both tests are now failing when you set alpha=0. That's good. If you switch the asserts like you suggest, then the tests should pass.

@eric-valente
Copy link
Contributor Author

Changed the bounds of the test, but does the same thing. It passed. Anything else?

        x, y = self.inputs[0]
        lower_limits = 0
        upper_limits = np.repeat(1, x.shape[1])
        m = ElasticNet(lower_limits=lower_limits, upper_limits=upper_limits, random_state=5934, alpha=0)
        m = m.fit(x, y)
        assert(np.all(m.coef_ >= 0))
        assert(np.all(m.coef_ <= 1))     

@stephen-hoover
Copy link
Contributor

I'd previously requested that you change the limits to -1 to 0 because the original tests, with limits from 0 to 1, failed to expose a bug in the bounds checking. I'd prefer that these tests use a lower limit of -1 and an upper limit of 0.

@eric-valente
Copy link
Contributor Author

Okay put it back and changed this:

        assert(np.all(m.coef_ >= -1))
        assert(np.all(m.coef_ <= 0))    

@eric-valente
Copy link
Contributor Author

Okay looks good! Let me know.

Copy link
Contributor

@stephen-hoover stephen-hoover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution!

@stephen-hoover stephen-hoover merged commit de9b92b into civisanalytics:master Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants