New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ConvSCCS model + refactoring of SCCS-related stuff (preprocessing, simulations, etc.) #158
Conversation
@MaryanMorel : good job on this PR. I've had a first quick review, and it seems that there is no example in the documentation, please provide one (on simulations at least, later on a real dataset), so that we can understand how to use the preprocessing tools and the ConvSCCS learner |
Indeed, I haven't thought of putting an example. I'll add the one corresponding to the 'example notebook', it takes some time to run though (~2min). Is it ok for the doc or should I accelerate things up? |
We don't care if it takes extra 2 minutes for the doc build. The example must appear in the examples of the doc of tick, not in unittests of course (see the tick/examples folder), and maybe refer to it in the documentation. |
If it is possible to accelerate it might worth it though! When working on the doc it's more convenient if things go fast. But yes, if it cannot be accelerated, it's better to have a slow example rather than no example at all! |
doc/modules/survival.rst
Outdated
@@ -41,6 +41,7 @@ be used for now with a stochastic solver, while we can with :class:`ModelSCCS <t | |||
|
|||
ModelCoxRegPartialLik | |||
ModelSCCS | |||
ConvSCCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConvSCCS is a learner right, not a model ? This should go in the 1. Learner section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! But GitHub does not seem to understand the change has been done. Do I need to perform a specific action?
@MaryanMorel : Where are we with this ? |
Ooops, I broke one test, it should be fixed now. I got an example running quite "quickly" (regarding we need a bootstrap do compute confidence intervals). |
594b316
to
2444f5f
Compare
@stephanegaiffas the build is passing, do you still need to review some code ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly things about :
- Attribute and option names
- tests in setters
- an attributed allowing to get the intensity and the CI for the intensity...
- Some better explanations in docstring about the attributes. We need to explain precisely how coeffs is organized... I'll help if you want for this and the doc
tick/survival/convolutional_sccs.py
Outdated
|
||
def __init__(self, n_lags: np.array, | ||
penalized_features: np.array, | ||
strength_tv=None, strength_group_l1=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm annoyed by passing strengths to solvers... we should use C_tv (which corresponds to 1 / strength) and C_group_l1 instead, in a more sciait-like manner... (this is what we do in other learners for linear models for instance...)
tick/survival/convolutional_sccs.py
Outdated
`n_lags` time intervals. `n_lags` values must be between 0 and | ||
`n_intervals` - 1. | ||
|
||
penalized_features : `numpy.ndarray`, shape=(n_features,), dtype="bool" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what don't we penalize all features by default ? Can't we put a default to this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is not straightforward, as we only get the number of features during the _prefit
tick/survival/convolutional_sccs.py
Outdated
verbose : `bool`, default=False | ||
If `True`, solver verboses history, otherwise nothing is | ||
displayed. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print_every and record_every at not documented (use the docstring from solvers :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
tick/survival/convolutional_sccs.py
Outdated
n_coeffs : `int` (read-only) | ||
Total number of coefficients of the model | ||
|
||
coeffs : `numpy.ndarray`, shape=(n_coeffs,), dtype="float64" (read-only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should explain how the coeffs are organised, or output in this learning a 2D matrix (n_features, n_lags) or something like this... We need to explain how the coeffs can be used :)
tick/survival/convolutional_sccs.py
Outdated
|
||
bootstrap_coeffs : `Bootstrap_CI` (read-only) | ||
Bootstrap coefficients and confidence intervals of the model. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an attribute (using a property) that computes the intensity of each feature or something like this ?
tick/survival/convolutional_sccs.py
Outdated
def _construct_solver_obj(step, max_iter, tol, print_every, | ||
record_every, verbose, seed): | ||
# seed cannot be None in SVRG | ||
solver_obj = SVRG(step=step, max_iter=max_iter, tol=tol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment : # TODO: we might want to use SAGA also later... (might be faster here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
tick/survival/convolutional_sccs.py
Outdated
|
||
@step.setter | ||
def step(self, value): | ||
self._set('_step_size', value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that it's > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
tick/survival/convolutional_sccs.py
Outdated
self._solver_obj.step = value | ||
|
||
@property | ||
def _strengths(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather like a separate C_tv and C_group_l1 with getters and setter, where we check that there are > 0, and where we update private _strenght_tv and _strengh_group_l1 (if zero we put None). This is what we do in linear learners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this property is meant to be private, i.e. used only internally to avoid having many .set() everywhere, C_tv and C_group_l1 have public getters and setters of their own. I can remove it if you find it ugly
|
||
@n_lags.setter | ||
def n_lags(self, value): | ||
offsets = [0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must check here that it's >= 0 and <= n_intervals - 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for >= 0
, but we only get to know n_intervals during _prefit
. This model does not fit very well with sklearn fit / predict framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally put the check on n_lags <= n_intervals - 1
in _prefit
rather than the setter
tick/survival/convolutional_sccs.py
Outdated
Coefficients of the model. | ||
|
||
bootstrap_coeffs : `Bootstrap_CI` (read-only) | ||
Bootstrap coefficients and confidence intervals of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a Reference section with the paper describing the method
6f06f43
to
a3d2c98
Compare
@stephanegaiffas Are you ok to merge? (after rebasing) |
@stephanegaiffas the branch has been rebased, ok to merge? |
Done ! |
Thanks ! |
Add ConvSCCS model + refactoring of SCCS-related stuff (preprocessing, simulations, etc.) according to the paper
Feature products are not shipped as a learner option for now, but it can be used independently.
Most objects are not picklable for now, this should be fixed in another task to allow parallel CV and bootstrap.