Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ConvSCCS model + refactoring of SCCS-related stuff (preprocessing, simulations, etc.) #158

Merged
merged 2 commits into from Mar 22, 2018

Conversation

MaryanMorel
Copy link
Member

Add ConvSCCS model + refactoring of SCCS-related stuff (preprocessing, simulations, etc.) according to the paper

Feature products are not shipped as a learner option for now, but it can be used independently.

Most objects are not picklable for now, this should be fixed in another task to allow parallel CV and bootstrap.

@stephanegaiffas stephanegaiffas changed the title Tick 361 Add ConvSCCS model + refactoring of SCCS-related stuff (preprocessing, simulations, etc.) Jan 31, 2018
@stephanegaiffas
Copy link
Collaborator

@MaryanMorel : good job on this PR. I've had a first quick review, and it seems that there is no example in the documentation, please provide one (on simulations at least, later on a real dataset), so that we can understand how to use the preprocessing tools and the ConvSCCS learner

@MaryanMorel
Copy link
Member Author

Indeed, I haven't thought of putting an example. I'll add the one corresponding to the 'example notebook', it takes some time to run though (~2min). Is it ok for the doc or should I accelerate things up?

@stephanegaiffas
Copy link
Collaborator

We don't care if it takes extra 2 minutes for the doc build. The example must appear in the examples of the doc of tick, not in unittests of course (see the tick/examples folder), and maybe refer to it in the documentation.

@Mbompr
Copy link
Contributor

Mbompr commented Jan 31, 2018

If it is possible to accelerate it might worth it though! When working on the doc it's more convenient if things go fast. But yes, if it cannot be accelerated, it's better to have a slow example rather than no example at all!

@@ -41,6 +41,7 @@ be used for now with a stochastic solver, while we can with :class:`ModelSCCS <t

ModelCoxRegPartialLik
ModelSCCS
ConvSCCS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConvSCCS is a learner right, not a model ? This should go in the 1. Learner section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! But GitHub does not seem to understand the change has been done. Do I need to perform a specific action?

@stephanegaiffas
Copy link
Collaborator

@MaryanMorel : Where are we with this ?

@MaryanMorel
Copy link
Member Author

Ooops, I broke one test, it should be fixed now. I got an example running quite "quickly" (regarding we need a bootstrap do compute confidence intervals).

@MaryanMorel MaryanMorel force-pushed the TICK-361 branch 2 times, most recently from 594b316 to 2444f5f Compare February 7, 2018 16:27
@MaryanMorel
Copy link
Member Author

@stephanegaiffas the build is passing, do you still need to review some code ?

Copy link
Collaborator

@stephanegaiffas stephanegaiffas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly things about :

  • Attribute and option names
  • tests in setters
  • an attributed allowing to get the intensity and the CI for the intensity...
  • Some better explanations in docstring about the attributes. We need to explain precisely how coeffs is organized... I'll help if you want for this and the doc


def __init__(self, n_lags: np.array,
penalized_features: np.array,
strength_tv=None, strength_group_l1=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm annoyed by passing strengths to solvers... we should use C_tv (which corresponds to 1 / strength) and C_group_l1 instead, in a more sciait-like manner... (this is what we do in other learners for linear models for instance...)

`n_lags` time intervals. `n_lags` values must be between 0 and
`n_intervals` - 1.

penalized_features : `numpy.ndarray`, shape=(n_features,), dtype="bool"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what don't we penalize all features by default ? Can't we put a default to this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is not straightforward, as we only get the number of features during the _prefit

verbose : `bool`, default=False
If `True`, solver verboses history, otherwise nothing is
displayed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print_every and record_every at not documented (use the docstring from solvers :))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

n_coeffs : `int` (read-only)
Total number of coefficients of the model

coeffs : `numpy.ndarray`, shape=(n_coeffs,), dtype="float64" (read-only)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain how the coeffs are organised, or output in this learning a 2D matrix (n_features, n_lags) or something like this... We need to explain how the coeffs can be used :)


bootstrap_coeffs : `Bootstrap_CI` (read-only)
Bootstrap coefficients and confidence intervals of the model.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an attribute (using a property) that computes the intensity of each feature or something like this ?

def _construct_solver_obj(step, max_iter, tol, print_every,
record_every, verbose, seed):
# seed cannot be None in SVRG
solver_obj = SVRG(step=step, max_iter=max_iter, tol=tol,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment : # TODO: we might want to use SAGA also later... (might be faster here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


@step.setter
def step(self, value):
self._set('_step_size', value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that it's > 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

self._solver_obj.step = value

@property
def _strengths(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather like a separate C_tv and C_group_l1 with getters and setter, where we check that there are > 0, and where we update private _strenght_tv and _strengh_group_l1 (if zero we put None). This is what we do in linear learners

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this property is meant to be private, i.e. used only internally to avoid having many .set() everywhere, C_tv and C_group_l1 have public getters and setters of their own. I can remove it if you find it ugly


@n_lags.setter
def n_lags(self, value):
offsets = [0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must check here that it's >= 0 and <= n_intervals - 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for >= 0, but we only get to know n_intervals during _prefit. This model does not fit very well with sklearn fit / predict framework

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally put the check on n_lags <= n_intervals - 1 in _prefit rather than the setter

Coefficients of the model.

bootstrap_coeffs : `Bootstrap_CI` (read-only)
Bootstrap coefficients and confidence intervals of the model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a Reference section with the paper describing the method

@Mbompr Mbompr mentioned this pull request Feb 13, 2018
@MaryanMorel MaryanMorel force-pushed the TICK-361 branch 3 times, most recently from 6f06f43 to a3d2c98 Compare February 22, 2018 15:51
@MaryanMorel
Copy link
Member Author

MaryanMorel commented Mar 12, 2018

@stephanegaiffas Are you ok to merge? (after rebasing)

@MaryanMorel
Copy link
Member Author

@stephanegaiffas the branch has been rebased, ok to merge?

@stephanegaiffas
Copy link
Collaborator

Done !

@MaryanMorel
Copy link
Member Author

Thanks !

@MaryanMorel MaryanMorel merged commit 088393b into master Mar 22, 2018
@Mbompr Mbompr deleted the TICK-361 branch September 28, 2018 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants