Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimator serialization - before or after fitting - work with param #208

Open
6 tasks
PeterDSteinberg opened this issue Oct 13, 2017 · 2 comments
Open
6 tasks
Assignees

Comments

@PeterDSteinberg
Copy link
Contributor

Elm PR #192 add elm.mldataset.serialize_mixin for serialization will dill for models that have been initialized and/or fit. We also want to provide a means of plain text (YAML for now) serialization of estimators (and Pipelines) so that we have:

  • Some part of Elm that are usable to non-Python users
  • An easier time passing json text model specifications between UI / backend in later UI related milestones
  • A mapping to param data structures. param allows input validation and structuring of inputs for UIs in Bokeh and other tools, as needed in later UI related work of Phase I. Feel free to separate this param spec work to separate issues/PRs as the work gets done.

TODO:

  • Implement a to_spec and from_spec method for each estimator, where to/from spec means to return/read a text specification of an estimator, yaml format by default
  • For an estimator, a spec consists of a dict with three keys:
    • func: Callable such as elm.pipeline.steps.linear_model.LinearRegression
    • args: List - positional arguments to func
    • kwargs: Keyword arguments to func, such as fit_intercept as in the example - kwargs that can go to set_params or __init__ of func
  • Implement this generally for classes that are BaseEstimator-like in inheritance and BaseComposition-like separately, i.e. most estimators/transformers have a common base callable for how they do to/from spec and special cases like EaSearchCV, Pipeline, and others are handled separately.
@PeterDSteinberg
Copy link
Contributor Author

This is a rough draft idea of spec for a Pipeline or estimator/transformer:

class SpecMixinBaseEstimator:

    _root = 'elm.pipeline.steps.{}'
    @property
    def spec(self):
        _cls = getattr(self, '_cls', None)
        if not _cls:
            _cls = self.__class__
        name = _cls.__name__
        module = _cls.__module__.split('.')[1]
        return dict(name=_cls.__name__,
                    module=self._root.format(module),
                    params=self.get_params())

    @classmethod
    def from_spec(self, spec):
        modul, name, params = spec['module'], spec['name'], spec['params']
        parts = modul.split('.')
        elm = '.'.join(parts[:-1])
        sk_module = __import__(elm, globals(), locals())
        for p in parts[1:]:
            sk_module = getattr(sk_module, p)
        return getattr(sk_module, name)(**params)


class PipelineSpecMixin(SpecMixinBaseEstimator):

    @property
    def spec(self):
        steps = [[name, step.spec] for name, step in self.steps]
        spec = super(PipelineSpecMixin, self).spec
        spec['steps'] = steps
        return spec

    @classmethod
    def from_spec(self, spec):
        spec = spec.copy()
        from_spec = super(PipelineSpecMixin, self).from_spec
        steps = [[name, from_spec(spec)] for name, spec in spec.pop('steps')]
        return super(PipelineSpecMixin, self).from_spec(**spec)

Those mixins would be used on elm.pipeline.steps.*.* classes and Pipeline, respectively.

@PeterDSteinberg
Copy link
Contributor Author

Also I wrote this wiki page about the idea (let's adapt that over time or transition it to main docs for Elm as the work is completed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants