Regressor generalization #101

dwysocki · 2016-01-21T21:51:52Z

This resolves issue #90, and the regressor-specific portion of #91.

It removes the hand-chosen regressor objects, such as --regressor LassoCV, and replaces them with the ability to specify any regressor. So the above call would need to be more verbose: --regressor sklearn.linear_model.LassoCV, but in exchange for some verbosity, we gain complete generality. This is implemented with plotypus.utils.import_name.

This also removes the --max-iter and --regularization-cv options, which were passed to specific regressors, and instead implements a catch-all --regressor-options KEY VALUE ... KEY VALUE option. This allows the user to provide any option to their regressor, provided VALUE is a simple Python literal (parsable by the standard library function ast.literal_eval or a string. Value parsing is implemented with plotypus.utils.literal_eval_str, and conversion to a dict is done with plotypus.utils.strlist_to_dict.

Because of these changes, we could no longer assume that fit_intercept=False on the regressors. In fact, it is most likely that fit_intercept=True, as that is the default for every scikit-learn regressor that I know of. For this reason, I have removed the one's column from preprocessing.Fourier.design_matrix, and get_lightcurve now takes the A_0 term from the regressor's intercept_ attribute.

There is a lot of coupling when it comes to phased/unphased data, so I had to make a lot of changes in this one commit. The main point of this commit was to deal with temporal-space data instead of phase-space data wherever possible. This was especially important in the `Fourier` class, to prepare for multiperiodic signal support. Now the period needs to be provided to `Fourier` before `Fourier.fit` or `Fourier.transform` is called. To allow for setting the `period` parameter properly when `Fourier` is nested within another estimator, I had to make `Fourier` a subclass of scikit-learn's `BaseEstimator`, which we should have done long ago. Most of the changes in `get_lightcurve` deal with replacing `phase` with `time`. There are also some changes which deal with providing the correct data when dealing with masked arrays. There is also a new piece of code which passes the period into the `Fourier` object, which required a little bit of hackery (see the comments before the call to `predictor.set_params`). The `rephase` function in `plotypus.periodogram` was also modified, such that masked values are still rephased.

Instead of multiplying each index `i` by `2 * pi`, and then dividing every element of the matrix by `period`, I instead define `omega = 2 * pi / period` (as we usually do), and multiply each index `i` by that.

Added a new command line option: `--regressor-options key1 val1 ... keyN valN`. Allows user to pass arbitrary arguments. Old regressor options now ignored (such as `--max-iter`) and will be removed soon. Need to at least pass `fit_intercept False` to retain previous functionality, which will change soon.

All CLI regressor options are now encompassed by the `--regressor-options` option. The old options have been removed completely. Also, a help string has been added to `--regressor-options`.

Now the Fourier design matrix no longer provides the intercept column, and relies on the regressor object to fit the intercept separately. This is already the default behavior of all sklearn regressors (that I'm aware of), and so this simply saves the user from needing to provide `fit_intercept=False` every time.

Since it takes a series of key-value pairs, I made the metavar "KEY VALUE". This makes it look like there are two metavars (desired) when it's really just one with a space inbetween.

Instead of having the `--regressor` option take one of a few hand-chosen values, it now takes the name of _any arbitrary regressor object_, such as the default, `--regressor sklearn.linear_model.LassoLarsIC`. This requires a little more user-knowledge, but opens a world of possibilities. This is accomplished through the new `plotypus.utils.import_name` function, which imports an object from its full name and returns it, without affecting the enclosing namespace.

dwysocki added 9 commits December 29, 2015 19:34

Removing commented out block from Fourier class

c0cef8c

Optimized part of Fourier.design_matrix

43e39e8

Instead of multiplying each index `i` by `2 * pi`, and then dividing every element of the matrix by `period`, I instead define `omega = 2 * pi / period` (as we usually do), and multiply each index `i` by that.

Removed deprecated regressor options and added help string

cda1a61

All CLI regressor options are now encompassed by the `--regressor-options` option. The old options have been removed completely. Also, a help string has been added to `--regressor-options`.

Improved metavar for new --regressor-option option

332a6e8

Since it takes a series of key-value pairs, I made the metavar "KEY VALUE". This makes it look like there are two metavars (desired) when it's really just one with a space inbetween.

Moved --regressor option from the Fourier to the Regressor group

2c08caa

dwysocki merged commit b63d61f into develop Jun 14, 2016

dwysocki deleted the regressor-generalization branch June 14, 2016 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regressor generalization #101

Regressor generalization #101

dwysocki commented Jan 21, 2016

Regressor generalization #101

Regressor generalization #101

Conversation

dwysocki commented Jan 21, 2016