Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grand flattening and reorganization of tick #124

Closed
stephanegaiffas opened this issue Oct 26, 2017 · 12 comments
Closed

Grand flattening and reorganization of tick #124

stephanegaiffas opened this issue Oct 26, 2017 · 12 comments
Assignees
Milestone

Comments

@stephanegaiffas
Copy link
Collaborator

stephanegaiffas commented Oct 26, 2017

Grand reorganization of tick

A lot of features have been added and now tick is much more than a library that fits point processes.
The main issue is that the tick.inference module contains a lot of methods that have nothing to do together.
I suggest that we reorganize the things in big families of methods, putting together inference, simulation and models together in such families.

This is a big change, that will pose problems with C++ includes. Several base class must be included in a tick.base module as soon as they are used in several modules. An example is ModelFirstOrder python class, or the Model C++ class, that are parent of almost all models in tick. Such classes will have to be in a tick.base.

> TODO: list of all the things that go in `tick.base` and `tick.*.base` modules.

Here is a first attempt of grand reorganization of tick. I've also renamed a lot of classes, so that inference and simulation classes names match...

> TODO: Discuss and give your opinion about this here

1. tick.linear_model

Inference, simulation and models all belong to a common namespace

  • tick.linear_model.LinearRegression
  • tick.linear_model.LogisticRegression
  • tick.linear_model.PoissonRegression
  • tick.linear_model.SimuLinearRegression
  • tick.linear_model.SimuLogisticRegression
  • tick.linear_model.SimuPoissonRegression
  • tick.linear_model.ModelLeastSquares
  • tick.linear_model.ModelLogistic
  • tick.linear_model.ModelHinge
  • tick.linear_model.ModelSmoothedHinge
  • tick.linear_model.ModelQuadraticHinge
  • tick.linear_model.ModelPoisson

2. tick.survival

  • tick.survival.nelson_aalen
  • tick.survival.kaplan_meier
  • tick.survival.CoxRegression
  • tick.survival.SimuCoxRegression
  • tick.survival.ModelCoxRegPartialLik
  • tick.survival.ModelSCCS

We will put extra things here : Cox regression with the full likelihood, the SCCS model from Maryan, etc.

3. tick.point_process

  • tick.point_process.HawkesExpKern
  • tick.point_process.HawkesSumExpKern
  • tick.point_process.HawkesEM
  • tick.point_process.HawkesADM4
  • tick.point_process.HawkesBasisKernels
  • tick.point_process.HawkesSumGaussians
  • tick.point_process.HawkesConditionalLaw
  • tick.point_process.SimuPoissonProcess
  • tick.point_process.SimuInhomogeneousPoisson
  • tick.point_process.SimuHawkes
  • tick.point_process.SimuHawkesExpKernels
  • tick.point_process.SimuHawkesSumExpKernels
  • tick.point_process.SimuHawkesMulti
  • tick.point_process.HawkesKernelExp
  • tick.point_process.HawkesKernelSumExp
  • tick.point_process.HawkesKernelPowerLaw
  • tick.point_process.HawkesKernelTimeFunc
  • tick.point_process.TimeFunction
  • tick.point_process.ModelHawkesFixedExpKernLogLik
  • tick.point_process.ModelHawkesFixedExpKernLeastSq
  • tick.point_process.ModelHawkesFixedSumExpKernLogLik
  • tick.point_process.ModelHawkesFixedSumExpKernLeastSq

4. tick.robust

Linear methods with a focus towards robust estimation and outliers detection

  • tick.robust.RobustLinearRegression
  • tick.robust.std_mad
  • tick.robust.std_iqr
  • tick.robust.ModelLinRegWithIntercepts
  • tick.robust.ModelHuber
  • tick.robust.ModelModifiedHuber
  • tick.robust.ModelAbsolute
  • tick.robust.ModelEpsilonInsensitive

5. tick.prox

This module contains all the proximal operators available in tick.

  • tick.prox.ProxZero
  • tick.prox.ProxL1
  • tick.prox.ProxL1w
  • tick.prox.ProxElasticNet
  • tick.prox.ProxL2Sq
  • tick.prox.ProxL2
  • tick.prox.ProxMulti
  • tick.prox.ProxNuclear
  • tick.prox.ProxPositive
  • tick.prox.ProxEquality
  • tick.prox.ProxSlope
  • tick.prox.ProxTV
  • tick.prox.ProxBinarsity
  • tick.prox.ProxGroupL1

6. tick.solver

This module contains all the solvers available in tick.

  • tick.solver.GD
  • tick.solver.AGD
  • tick.solver.BFGS
  • tick.solver.GFB
  • tick.solver.SCPG
  • tick.solver.SGD
  • tick.solver.AdaGrad
  • tick.solver.SVRG
  • tick.solver.SAGA
  • tick.solver.SDCA
  • tick.solver.History

7. tick.plot

This remains unchanged

  • tick.plot.plot_history
  • tick.plot.plot_hawkes_kernels
  • tick.plot.plot_hawkes_kernel_norms
  • tick.plot.plot_basis_kernels
  • tick.plot.plot_timefunction
  • tick.plot.plot_point_process
  • tick.plot.stems

But we can move some of them elsewhere ? Since some plots are dedicated only to specific objects (such as Hawkes things...)

8. tick.preprocessing

This remains unchanged

  • tick.preprocessing.FeaturesBinarizer
  • tick.preprocessing.LongitudinalFeaturesProduct
  • tick.preprocessing.LongitudinalFeaturesLagger

9. tick.metrics

This remains unchanged

  • tick.metrics.support_fdp
  • tick.metrics.support_recall

We'll add specific metrics for survival analysis here

10. tick.simulation

About this one : I don't know where to put these, now that all model simulation is moved in other modules.
This could be base_simulation or tools, what do you think ?

  • tick.simulation.features_normal_cov_uniform
  • tick.simulation.features_normal_cov_toeplitz
  • tick.simulation.weights_sparse_exp
  • tick.simulation.weights_sparse_gauss

11. tick.datasets

Remains unchanged

  • tick.dataset.fetch_tick_dataset
  • tick.dataset.fetch_hawkes_bund_data

12. tick.model_base

  • tick.model_base.model
  • tick.model_base.model_first_order
  • tick.model_base.model_generalized_linear
  • tick.model_base.model_generalized_linear_with_intercept
  • tick.model_base.model_labels_features
  • tick.model_base.model_lipschitz
  • tick.model_base.model_self_concordant
  • tick.model_base.model_second_order

Proposal for the reorganization of the lib directory

lib
├── CMakeLists.txt
├── Doxyfile
├── cpp
│   ├── array
│   │   ├── CMakeLists.txt
│   │   └── alloc.cpp
│   ├── array_test
│   │   ├── CMakeLists.txt
│   │   ├── array_test.cpp
│   │   ├── performance_test.cpp
│   │   ├── sbasearray_container.cpp
│   │   ├── typemap_test.cpp
│   │   └── varraycontainer.cpp
│   ├── base
│   │   ├── CMakeLists.txt
│   │   ├── exceptions_test.cpp
│   │   ├── interruption.cpp
│   │   ├── math
│   │   │   ├── normal_distribution.cpp
│   │   │   └── t2exp.cpp
│   │   ├── model
│   │   │   ├── CMakeLists.txt
│   │   │   ├── model_generalized_linear.cpp
│   │   │   ├── model_labels_features.cpp
│   │   │   └── model_lipschitz.cpp
│   │   └── time_func.cpp
│   ├── linear_model
│   │   ├── CMakeLists.txt
│   │   ├── linreg.cpp
│   │   ├── logreg.cpp
│   │   ├── model_hinge.cpp
│   │   ├── model_quadratic_hinge.cpp
│   │   ├── model_smoothed_hinge.cpp
│   │   └── poisreg.cpp
│   ├── hawkes
│   │   ├── CMakeLists.txt
│   │   ├── model
│   │   │   ├── base
│   │   │   │   ├── hawkes_fixed_kern_loglik.cpp
│   │   │   │   ├── hawkes_list.cpp
│   │   │   │   ├── hawkes_model.cpp
│   │   │   │   └── hawkes_single.cpp
│   │   │   └── variants
│   │   │       ├── hawkes_fixed_expkern_leastsq_list.cpp
│   │   │       ├── hawkes_fixed_expkern_loglik_list.cpp
│   │   │       ├── hawkes_fixed_kern_loglik_list.cpp
│   │   │       ├── hawkes_fixed_sumexpkern_leastsq_list.cpp
│   │   │       ├── hawkes_fixed_sumexpkern_loglik_list.cpp
│   │   │       └── hawkes_leastsq_list.cpp
│   │   │   ├── hawkes_utils.cpp
│   │   │   ├── hawkes_fixed_expkern_leastsq.cpp
│   │   │   ├── hawkes_fixed_expkern_loglik.cpp
│   │   │   ├── hawkes_fixed_sumexpkern_leastsq.cpp
│   │   │   ├── hawkes_fixed_sumexpkern_loglik.cpp
│   │   ├── inference
│   │   │   ├── hawkes_adm4.cpp
│   │   │   ├── hawkes_basis_kernels.cpp
│   │   │   ├── hawkes_conditional_law.cpp
│   │   │   ├── hawkes_em.cpp
│   │   │   ├── hawkes_sumgaussians.cpp
│   │   ├── simulation
│   │   │   ├── hawkes_kernels
│   │   │   │   ├── hawkes_kernel.cpp
│   │   │   │   ├── hawkes_kernel_exp.cpp
│   │   │   │   ├── hawkes_kernel_power_law.cpp
│   │   │   │   ├── hawkes_kernel_sum_exp.cpp
│   │   │   │   └── hawkes_kernel_time_func.cpp
│   │   │   ├── hawkes_baselines
│   │   │   │   ├── constant_baseline.cpp
│   │   │   │   └── timefunction_baseline.cpp
│   │   │   ├── hawkes.cpp
│   │   │   ├── inhomogeneous_poisson.cpp
│   │   │   ├── poisson.cpp
│   │   │   ├── pp.cpp
│   ├── preprocessing
│   │   ├── CMakeLists.txt
│   │   ├── longitudinal_features_lagger.cpp
│   │   └── sparse_longitudinal_features_product.cpp
│   ├── prox
│   │   ├── CMakeLists.txt
│   │   ├── prox.cpp
│   │   ├── prox_binarsity.cpp
│   │   ├── prox_elasticnet.cpp
│   │   ├── prox_equality.cpp
│   │   ├── prox_group_l1.cpp
│   │   ├── prox_l1.cpp
│   │   ├── prox_l1w.cpp
│   │   ├── prox_l2.cpp
│   │   ├── prox_l2sq.cpp
│   │   ├── prox_multi.cpp
│   │   ├── prox_positive.cpp
│   │   ├── prox_separable.cpp
│   │   ├── prox_slope.cpp
│   │   ├── prox_sorted_l1.cpp
│   │   ├── prox_tv.cpp
│   │   ├── prox_with_groups.cpp
│   │   └── prox_zero.cpp
│   ├── random
│   │   ├── CMakeLists.txt
│   │   ├── rand.cpp
│   │   └── test_rand.cpp
│   ├── robust
│   │   ├── model_absolute_regression.cpp
│   │   ├── model_epsilon_insensitive.cpp
│   │   ├── model_generalized_linear_with_intercepts.cpp
│   │   ├── model_huber.cpp
│   │   ├── model_linreg_with_intercepts.cpp
│   │   └── model_modified_huber.cpp
│   ├── solver
│   │   ├── CMakeLists.txt
│   │   ├── adagrad.cpp
│   │   ├── saga.cpp
│   │   ├── sdca.cpp
│   │   ├── sgd.cpp
│   │   ├── sto_solver.cpp
│   │   └── svrg.cpp
│   └── survival
│       ├── coxreg_partial_lik.cpp
│       └── sccs.cpp

cpp-test, include and swig follow cpp structure

@stephanegaiffas
Copy link
Collaborator Author

@Mbompr @MaryanMorel @vinther @dekken Don't hesitate to comment about this here... it's kind of a big deal :) We can also put in the loop everyone from the team (Youcef, Marcello, etc...)

@MaryanMorel
Copy link
Member

Isn't that weird to mix models and learners? It might be quite confusing for the end users.

@Mbompr
Copy link
Contributor

Mbompr commented Oct 26, 2017

We also need to have a module with all model abstract classes

12. tick.base_model

  • model
  • model_first_order
  • model_generalized_linear
  • model_generalized_linear_with_intercept
  • model_labels_features
  • model_lipschitz
  • model_self_concordant
  • model_second_order

That would be linked to other model modules in order to avoir circular dependencies.

@stephanegaiffas
Copy link
Collaborator Author

Yes I agree, but what I want to do with this is to show more of what we can do with tick : you can use if you want many more loss for supervised learning than what is done in the current inference things.
We can maybe put the models in a tick.*.model submodule, or something like that... And the fact that there are inference classes and models is very clean from the naming of the class, and in the documentation...

@stephanegaiffas
Copy link
Collaborator Author

stephanegaiffas commented Oct 26, 2017

@Mbompr : yes, you can edit the Issue text including all the base classes :) Same goes for learners...

@MaryanMorel
Copy link
Member

So maybe have a tick.base.{model, learner, etc.}. We don't need to have a flat base as end users are not likely to fool around with base classes.

The difference between a model and a learner is clear to us, but I don't know if it'd be for John Does. Besides, what we call Learner is called model in scikit-learn... Maybe we should rename our models "loss" or something else

@stephanegaiffas
Copy link
Collaborator Author

stephanegaiffas commented Oct 26, 2017

I agree with this. We could use submodules in tick.base.
Although I agree with the use of 'loss' terminology for "linear models", what about Hawkes and other models where there is no "loss" but only a global goodness of fit ?

@PhilipDeegan
Copy link
Member

PhilipDeegan commented Oct 27, 2017

I've done some re-organising which can be viewed here: https://github.com/Dekken/tick/tree/re-org

I've separated the includes and sources, and changed all "#include " paths to be complete relative to the directory ./include

Now there is only a single include directory for all cpp files in all modules (excluding swig cpp creation)

This should make things easier to move around.

Edit: meant to hit "update comment" not "close issue" >_<

@Mbompr
Copy link
Contributor

Mbompr commented Dec 5, 2017

Concerning point_process, I think it is a bit messy to mix inference / simulation / model objects in the c++ parts. Those are very differents objects that don't interract with each other.

@stephanegaiffas
Copy link
Collaborator Author

Yes alright we can reorganize into subfolders, but I'm pretty sure that the problem comes also from naming of files. All model classes should start with model_* , simulation classes with simulation_*

@PhilipDeegan
Copy link
Member

Are there are outstanding items we should consider before closing this?

@stephanegaiffas
Copy link
Collaborator Author

No it's alright, I'm closing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants