Estimator serialization - before or after fitting - work with `param` #208

PeterDSteinberg · 2017-10-13T17:21:05Z

Elm PR #192 add elm.mldataset.serialize_mixin for serialization will dill for models that have been initialized and/or fit. We also want to provide a means of plain text (YAML for now) serialization of estimators (and Pipelines) so that we have:

Some part of Elm that are usable to non-Python users
An easier time passing json text model specifications between UI / backend in later UI related milestones
A mapping to param data structures. param allows input validation and structuring of inputs for UIs in Bokeh and other tools, as needed in later UI related work of Phase I. Feel free to separate this param spec work to separate issues/PRs as the work gets done.

TODO:

Implement a to_spec and from_spec method for each estimator, where to/from spec means to return/read a text specification of an estimator, yaml format by default
For an estimator, a spec consists of a dict with three keys:
- func: Callable such as elm.pipeline.steps.linear_model.LinearRegression
- args: List - positional arguments to func
- kwargs: Keyword arguments to func, such as fit_intercept as in the example - kwargs that can go to set_params or __init__ of func
Implement this generally for classes that are BaseEstimator-like in inheritance and BaseComposition-like separately, i.e. most estimators/transformers have a common base callable for how they do to/from spec and special cases like EaSearchCV, Pipeline, and others are handled separately.

The text was updated successfully, but these errors were encountered:

PeterDSteinberg · 2017-10-20T17:44:27Z

This is a rough draft idea of spec for a Pipeline or estimator/transformer:

class SpecMixinBaseEstimator:

    _root = 'elm.pipeline.steps.{}'
    @property
    def spec(self):
        _cls = getattr(self, '_cls', None)
        if not _cls:
            _cls = self.__class__
        name = _cls.__name__
        module = _cls.__module__.split('.')[1]
        return dict(name=_cls.__name__,
                    module=self._root.format(module),
                    params=self.get_params())

    @classmethod
    def from_spec(self, spec):
        modul, name, params = spec['module'], spec['name'], spec['params']
        parts = modul.split('.')
        elm = '.'.join(parts[:-1])
        sk_module = __import__(elm, globals(), locals())
        for p in parts[1:]:
            sk_module = getattr(sk_module, p)
        return getattr(sk_module, name)(**params)


class PipelineSpecMixin(SpecMixinBaseEstimator):

    @property
    def spec(self):
        steps = [[name, step.spec] for name, step in self.steps]
        spec = super(PipelineSpecMixin, self).spec
        spec['steps'] = steps
        return spec

    @classmethod
    def from_spec(self, spec):
        spec = spec.copy()
        from_spec = super(PipelineSpecMixin, self).from_spec
        steps = [[name, from_spec(spec)] for name, spec in spec.pop('steps')]
        return super(PipelineSpecMixin, self).from_spec(**spec)

Those mixins would be used on elm.pipeline.steps.*.* classes and Pipeline, respectively.

PeterDSteinberg · 2017-10-20T17:46:26Z

Also I wrote this wiki page about the idea (let's adapt that over time or transition it to main docs for Elm as the work is completed).

PeterDSteinberg assigned gbrener Oct 13, 2017

PeterDSteinberg added this to the Preliminary Web-Based Map User Interface for ELM milestone Oct 13, 2017

PeterDSteinberg mentioned this issue Nov 1, 2017

Martin's thoughts summary intake/intake#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimator serialization - before or after fitting - work with `param` #208

Estimator serialization - before or after fitting - work with `param` #208

PeterDSteinberg commented Oct 13, 2017

PeterDSteinberg commented Oct 20, 2017

PeterDSteinberg commented Oct 20, 2017

Estimator serialization - before or after fitting - work with param #208

Estimator serialization - before or after fitting - work with param #208

Comments

PeterDSteinberg commented Oct 13, 2017

PeterDSteinberg commented Oct 20, 2017

PeterDSteinberg commented Oct 20, 2017

Estimator serialization - before or after fitting - work with `param` #208

Estimator serialization - before or after fitting - work with `param` #208