Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too much magic - what would it take to have an object oriented interface? #193

Closed
rueberger opened this issue Jul 19, 2017 · 28 comments
Closed
Labels

Comments

@rueberger
Copy link
Contributor

Hello sacred authors! Great work - I'm just starting to embrace sacred and am feeling very excited about the potential.

So far I've been finding it just a bit too magical for my tastes though. I think that the magic has it's place in reducing the boilerplate to almost zero for a quick experiment - but I've been finding it to be somewhat of an activation barrier in getting started for someone not familiar with the library.

The issue is that sacred breaks completely out of the standard python programming paradigm, which makes it difficult to reason about code behavior.

Again, I do think that this has it's place for quick scripting - but for production workflows that prioritize extensibility and maintainability over line count, I think an object-oriented interface would be more appropriate.

My reasoning is that:

  • the object hierarchy (which I feel pretty confused about right now) would be explicitly defined in the code, it's clear exactly an experiment is made up of
  • easier to integrate with existing code. I know how classes behave, how inheritance works, how to work them into module structure. It's unclear to me how to effectively use the magic interface to scared within a larger project and not as a one-off thing.
  • much easier to get less engineer-minded lab members to used sacred when they don't have to learn a whole new paradigm. They just have to fill in a template

Any estimates how challenging this would be? Recommendations on where to start?

@nimrand
Copy link

nimrand commented Jul 21, 2017

I think your request would be more clear if you elaborated on what you see as being magical. Aside from captured functions, I don't see much magic in Sacred (though its still quite neat).

When you talk about object-oriented interface and inheritance, I think you may be referring to the fact that Sacred takes the design approach that an experiment is an object, not a class. By that I mean, to create a new experiment, you create a new Experiment object, and then you use it to decorate methods that define how the experiment is configured and run. Code sharing between experiments is achieved through composition of ingredients.

An approach that may be more familiar to some OO programmers is to define a new experiment by deriving a new class from Experiment, and override a run method, or something like that. Code
sharing between experiments is achieved by making abstract base classes derived from Experiment, and then deriving concrete Experiment classes from those.

I'm not sure if that would be a better approach, though. There would be more boilerplate, for one. Also, there is some theoretical argument for Experiments being objects in that in you make Experiments classes, you'd only ever instantiate each one once, so you're basically just defining a bunch of singleton classes. It may be possible to support both, though. I guess we'll see what the Qwlouse has to say.

@Qwlouse
Copy link
Collaborator

Qwlouse commented Jul 21, 2017

Hi @rueberger, hi @nimrand.
Thanks a lot for your feedback on this. I'm very happy to see an issue that talks about larger design choices as opposed to a bug or small feature! The issue of having "too much magic" is one that had me thinking for a long while, and so I'd like to provide a multi-part answer:

  1. Config Scopes: The functions decorated with @ex.config are IMHO the place with the most magic in Sacred: Extracting locals without any obvious mechanism, blocking statements in response to updates, and introducing a seed variable. This is something that I've recently decided, that I'd like to change. I think it would be much better to have a prepare_config function that is injected with a special dict-like config object, and assigns values to that. The config object would support attribute access to its keys and be able to block assignments as needed. Example:

    @ex.config
    def prepare_config(cfg):
        cfg.learning_rate = 0.1
        cfg.network.hidden_size = 100
        cfg.network.nr_layers = 4

    This would only be minimally more effort to write and much more obvious.

  2. Experiments as objects: I agree with @nimrand there, in that I prefer the instance + decorate workflow over inheritance. This helps to support a scripting workflow with minimal boilerplate, and while config injection is something you have to get used to, it is not that magical in my opinion.
    That being said, I'd be happy to support both approaches, but I'm not sure yet how much work this would be. I'll have to think more about that. But this also brings me to my next point:

  3. Ingredients: I'm very unhappy with ingredients. They are meant to provide a convenient means for modularizing an experiment. But thus far they have never worked to my satisfaction. They are too inflexible and break in unexpected ways. And this problem I relate to the concerns/confusion that @rueberger mentioned. So one thing that I would like to do is to completely overhaul how ingredients work. And so far the best way that I could think of was to have them be classes deriving from Ingredient, that would be instantiated inside the config. Something along these lines:

    class Dataset(Ingredient):
        def __init__(self, cfg):
            super().__init__(cfg)
            self.path = '/home/user/datasets`
            self.normalize = False
    
    @ex.config
    def prepare_config(cfg):
        cfg.data = Dataset(cfg)

    This would solve many flexibility issues (e.g. using an ingredient multiple times, or using dynamically different ingredients based on some other config value). I haven't fully thought this through, but it seems doable with only mild magic. This would mean that ingredients have to be implemented as classes. Which I think is ok, given that you probably need to do some engineering anyways when you hit the point that you need ingredients. But it also bothers me to break the symmetry between experiment and ingredient. So that would be an argument for experiments as classes....

Ok. To sum up: I don't know a good solution to these issues yet. But I'd be very interested if these are the same "too much magic" issues that you had in mind, and to hear your thoughts on my suggestions.

@pchalasani
Copy link

I'd like to add some comments here. I've "sacredized" my entire PyTorch experiment workflow (in Jupyter notebooks), but then hit upon 2 problems:

  • as mentioned above, it makes it hard for now folks to understand the code (arguments of captured functions are "magically" filled in by sacred from configs)
  • more importantly, debugging is a pain because the captured functions are not runnable on their own. I don't see why this is not allowed by the framework.

@Qwlouse
Copy link
Collaborator

Qwlouse commented Jul 21, 2017

Hi @pchalasani. Thanks for the feedback! Your second point is important and already on my list of things to fix (see #171). You are right, there is no reason for not allowing this. It is just an oversight.

Considering your first point: I understand, and that is indeed a drawback. So far I wasn't able to come up with a way of getting the benefits of config injection without this hurdle. I really want to avoid having a global config variable, and passing everything around manually also seems like a bad deal.

@trickmeyer
Copy link

trickmeyer commented Aug 9, 2017

Hey everyone! Great work so far.

I can see everyone's sides to this issue, but my style errs more towards that of @rueberger.

That being said, I do think that it's likely possible wrap some light OO around what already exists. A simple example would be:

from sacred import Experiment

ex = Experiment('test')

class SimpleExperiment(object):
    three = 3

    @staticmethod
    @ex.config
    def cfg():
        one = 1
        two =2

    @staticmethod
    @ex.automain
    def run(one, two):
        return one + two + SimpleExperiment.three

I think the way to make this more powerful while minimizing (and potentially reducing or improving) boilerplate is via metaclasses like Django, Luigi, etc.

Here's how Django uses metaclasses to populate a DB with instance values that are later created, similar to Sacred:

from django.db import models

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')

The parallel is that the Question class attributes are the config of the experiment. In my opinion, this is much more natural than writing a decorated config function, and also more explicit. Yes, you have to create a class, but all it does is inherit from an interface that makes its role explicit. In terms of lines of code, I think it'll end up being equal or less, but it does swap one form of magic (DI) for another (metaclasses).

I could see an experiment looking something like:

from sacred.oo import ExperimentModel

class Question(ExperimentModel):
    # these will be config defaults
    question_text = "Am I asking a stupid question?"
    pub_date = datetime.date(2017-08-09)

    @staticmethod
    @ex.automain
    def run():
        return '{} was published on {}'.format(self.question_text, self.pub_date)

You could likely get the @staticmethod and @ex.automain decorators into ExperimentModel.run().

Sorry I haven't had time to do all that work, but if you think it's promising and haven't tried the same path only to hit a blocker, let me know and I'll run with it. I really want the same OO aspects as @rueberger for a lot of reasons, so I'll likely give this a go on a fork regardless.

Thanks again for the great work!

@Qwlouse
Copy link
Collaborator

Qwlouse commented Aug 22, 2017

I agree with you (@trickmeyer and @rueberger) that an OO interface would be nice. Let's brainstorm on what that could look like. Like you suggested, I think the basis should be an experiment base class that each individual experiment inherit from. Let's stick with ExperimentModel for now.

The user provided subclass would take the role of the current experiment object instance, by collecting functions, configurations, and some other settings. The main method (or commands in general) would be methods, and configuration entries could be accessible through self.

But departing from your suggesting I would have instantiations of that class take the role of the current run objects, by holding the (modified) configuration and settings, being executable, capturing stdout, and firing events for the observers.

That means config updates should be evaluated during instantiation of the class. That would suggest putting the configuration code into the __init__ method. ExperimentModel could either provide a self.config or change the way __dict__ behaves, such that config updates block assignments.
Putting the config code into the class definition (with or without metaclass support) seems suboptimal, because that code is executed when creating the subclass, not when creating the instance

So to a first approximation, it might look like this:

from sacred.oo import ExperimentModel, main

class MyExperiment(ExperimentModel):
    def __init__(self, command=None, config_updates=None, named_configs=None, options=None):
        super().__init__(command, config_updates, named_configs, options)
        self.single = 1
        self.double = self.one * 2

    @main  # serves as a marker for the entry point
    def mymain(self):
        print("double of {} is {}".format(self.single, self.double))

if __name__ == '__main__':
    run = MyExperiment(config_updates={'single': 5})
    run()  # calls run.mymain() with stdout capturing and event firing

One potential problem with this is that the namespace of this object is used for many different things: the methods of the experiment, the methods of the run, the configuration, and the custom methods. But other than that I quite like it. Integrating this would take quite some effort, but is certainly doable. What do you guys think? Is this what you are looking for?

@rueberger
Copy link
Contributor Author

Thanks all for the great discussion!

@nimrand @Qwlouse in retrospect, yes I think that the config scopes are to me the most magical part of sacred. I was thrown off in the beginning by experiments being objects, but I've grown to appreciate that design choice. I think it makes sense for the current approach of defining an experiment in a single module.

However, my intuition for how to use the experiment object breaks down for larger projects where one would want to keep different components in separate modules. It's this use case that I think an OO interface would be useful for.

In the context of @Qwlouse being unhappy about the design of ingredients, I wonder if it would make sense to use the current interface as a 'light' interface to sacred without support for ingredients, and use the OO interface as the full interface, including ingredients. My thoughts being that ingredients seem to map naturally onto a OO structure.

Aside: recently I've been looking into options for building reproducible data pipelines. Sacred has enabled me to keep the model half of my projects on lockdown, but the data half is still the wild west. pachyderm is the closest thing I've found to what I'm looking for, but seems rather more complicated than is warranted. All I want is a linear pipeline that uses a hash list of configs to enforce reproducibility and lazily builds data. Here's a draft of the base class skeleton. I mention this because it's right up sacred's alley, and if ingredients are getting redesigned anyway... This should probably be its own issue though.

@Qwlouse, that looks good to me. If namespace pollution is an issue, why not store it under self.config or something? Similar to your 1st suggestion in this thread. I wonder if named configs are redundant with an OO interface - could be accomplished by subclassing the primary experiment object and overriding __init__.

Pragmatically, @Qwlouse, would you mind giving a rough overview of what the required changes would be?

@trickmeyer
Copy link

@Qwlouse I think your draft looks good. Let us know if we can contribute to help.

@ahundt
Copy link

ahundt commented Sep 2, 2017

I very much like this idea! Some comments:

  1. Keras Callbacks might be a good source of inspiration for how ingredients and extending sacred could work. The utilities in sanctuary, a keras + sacred integration library, might also be useful for ideas. @bzamecnik perhaps you're interested in commenting?
  2. Command line generation for the new class setup might be done with python-fire.
  3. Xonsh shell has xontrib extensions, events and a history backend which might be another good source of design ideas, that's aside from the possibility of direct xonsh integration I mentioned in xonsh shell #203.

For my perspective on what constitutes "magic", I found two aspects confusing: (1) the magic setting of variables, and (2) the way an Experiment and an Ingredient look basically the same but do different things, #202 and #151 might be illustrative. I also wrote a dataset loader with sacred, when I submitted it as a PR to a large project it was turned down in part due to the sacred API being inconsistent with Pythonic design expectations.

I think making Ingredient into something you subclass and setting things up so you can have multiple Experiment instances at once would be very helpful. I'd also appreciate if things were designed so that it is easy to integrate with an existing dataset download/management, and training code classes which have been implemented before sacred was incorporated.

@Qwlouse in your example I'd really like if the calls could be done in a way that is totally obvious even to a novice who hasn't seen the docs. I think the invisible remapping of the call from run to mymain would be a source of confusion. See my suggestion in the changes in __main__ below:

from sacred.oo import ExperimentModel, main

class MyExperiment(ExperimentModel):
    def __init__(self, command=None, config_updates=None, named_configs=None, options=None):
        super().__init__(command, config_updates, named_configs, options)
        self.single = 1
        self.double = self.one * 2

    @main  # serves as a marker for the entry point
    def mymain(self):
        print("double of {} is {}".format(self.single, self.double))

if __name__ == '__main__':
    # behavior that might be easier to understand and use:
    run = MyExperiment(config_updates={'single': 5})
    run.mymain() # calls run.mymain() with stdout capturing and event firing

    # previous example code
    # run = MyExperiment(config_updates={'single': 5})
    # run()  # calls run.mymain() with stdout capturing and event firing

@rueberger my concern with the preprocessor skeleton is it might not work well for certain use cases, like if you're loading datasets in tensorflow via input tensors.

@leezu
Copy link

leezu commented Oct 4, 2017

I would like to bring the ipython traitlets project to your attention. They handle the "configuration part" using an object oriented approach based on traits. It could be a good project to base sacred on, or at least may be inspiration for the OO interface design.

@Qwlouse
Copy link
Collaborator

Qwlouse commented Nov 21, 2017

@leezu: I like traitlets. They don't quite fit my requirements for the config system, but I am thinking about using them (or something similar) for the --option=value part of sacred.

@rueberger , @trickmeyer , @ahundt : Sorry for being unresponsive recently. Unfortunately my time for sacred development has been a rather limited recently, so I've egoistically concentrated on fixes and features that I needed. But I have thought a lot about refactoring/redesigning sacred, and I hope to share these ideas with you guys soon. If you are still interested, should I maybe create a google-group or some other mailing-list for discussions?

@trickmeyer
Copy link

trickmeyer commented Nov 21, 2017 via email

@ahundt
Copy link

ahundt commented Nov 23, 2017

No problem! That's part of life. I actually prefer discussions within github issues, especially since github emails you like a mailing list anyway and makes it pretty easy to enable/disable subscription on a per-thread basis right on the website.

I'd love to hear your thoughts!

@kirk86
Copy link

kirk86 commented Feb 13, 2019

Hi folks,
regarding the OOP approach and adding capture in classes, I've noticed the following weird things and I was wondering if anyone could shed some light.

MWE:
we have 2 files, src.py where it contains a bunch of computations and main.py which is the entry point to our experiments conducted in src.py.

The src.py file looks like this

import sacred                                                                                                                                                               
                                                                                                                                                                            
sacred_ingredient = sacred.Ingredient('default-params')                                                                                                                     
                                                                                                                                                                            
sacred_ingredient.add_config('./params.yaml')                                                                                                                                                                                                                                                                              
                                                                                                                                                                            
@sacred_ingredient.capture                                                                                                                                                  
class TestClass(object):                                                                                                                                                    
    """Documentation for TestClass                                                                                                                                          
                                                                                                                                                                            
    """                                                                                                                                                                     
    def __init__(self):                                                                                                                                                     
        super().__init__()                                                                                                                                                  
                                                                                                                                                                            
    @sacred_ingredient.capture                                                                                                                                              
    def __call__(self, params):                                                                                                                                             
        return params['batch'] * params['epochs'] + params['devices'] 

Notice that decorating hte class with sacredIngredient.capture doesn't do anything we'll have to decorate each individual method.

As it is the above code works but if I change the call to super from super().__init__() to super(TestClass, self).__init__() then I get the following error:

Traceback (most recent calls WITHOUT Sacred internals):
  File "main.py", line 14, in main
    obj = TestClass()
  File "/home/user/sacredtest/src.py", line 21, in __init__
    super(TestClass, self).__init__()
TypeError: super() argument 1 must be type, not FunctionWrapper

Here's the main entry point of the experiment main.py:

import sacred                                                                                                                                                               
from src import TestClass, sacred_ingredient                                                                                                                                
                                                                                                                                                                            
                                                                                                                                                                            
experiment = sacred.Experiment('some-random-expr',                                                                                                                          
                               ingredients=[sacred_ingredient])                                                                                                                                                                                                                                                             
                                                                                                                                                                            
                                                                                                                                                                            
@experiment.automain                                                                                                                                                        
def main(seed):                                                                                                                                                             
    print(seed)                                                                                                                                                             
    obj = TestClass()                                                                                                                                                       
    result = obj()                                                                                                                                                          
    print(result) 

Also I'd like to point that it seems quite difficult to capture params in the main.py even if I decorate def main() with experiment.capture or sacred_ingredient.capture for that matter. Since I'm not an expert I was wondering if it's possible to achieve that?

@Qwlouse
Copy link
Collaborator

Qwlouse commented Feb 19, 2019

Notice that decorating hte class with sacredIngredient.capture doesn't do anything we'll have to decorate each individual method.

Decorating the class does make its __init__ a captured function. So you could use injected config values in the init method.

As it is the above code works but if I change the call to super from super().__init__() to super(TestClass, self).__init__() then I get the following error:

Oh, dang. This is a gotcha I wasn't aware of. It seems that wrapping the class, precludes it from being used within super. It would be good to fix this, but I do not currently know how.

As for the parameter capture in main.py I believe I have answered your question in #416. Let me know if there are questions left.

@kirk86
Copy link

kirk86 commented Feb 19, 2019

@Qwlouse
Thanks a lot for the clarifications. Those help a lot in figuring things out which are not in the docs.
Cheers ;)

@stale
Copy link

stale bot commented May 4, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label May 4, 2019
@JarnoRFB
Copy link
Collaborator

JarnoRFB commented May 8, 2019

unstale

@stale stale bot removed the stale label May 8, 2019
@rueberger
Copy link
Contributor Author

rueberger commented Jun 5, 2019

Lately I've really been straining against the limits of sacred abstractions.

In particular, I've been setting up hyperparameter optimization for my experiment with the excellent ray.tune. It's possible to get the basics running, but not at all trivial due to some difficulties arising from experiments not being serializeable.

You have to do something like this (note that the experiment is not imported until train_func is executed):

def train_func_factory(named_config):
    """ Create train func with experiment bound in its scope
    Workaround to calling main directly, experiments are not pickle-able

    Args:
       named_configs - [str]

    Returns:
      train_func
    """

    def train_func(config_updates, reporter):
        """ Wraps experiment into format expected by ray

        Args:
          config_updates
          report: reporter method passed by ray
        """
        from foo.experiment import foo_experiment

        info = {
            'reporter': reporter,
        }

        foo_experiment.run(
            config_updates=config_updates,
            named_configs=[named_config],
            info=info  # NOTE: see https://github.com/IDSIA/sacred/issues/480
        )

    return train_func

This is just a small annoyance, but would not be necessary with an 'OOP' interface to sacred.

The real problem is that it's currently impossible to implement tune's class-based API with sacred - and you need the class-based API to take full advantage of tune (can't do checkpointing without it).

One reason that it is currently impossible to implement tune's class-based API is that you need to be able to stop/resume experiments.

Another even thornier reason is that we need to be able support config mutating over time to able to use PBT. And branching for experiments too? PBT is a nightmare for the abstraction, but so powerful we should seriously think about how it could conceivably be supported.

I've been using sacred for all of my work for several years now, and I love it to death and am ever grateful to Klauss and the other maintainers. But I often feel encumbered by the current abstraction - heck, it's been almost two years since I opened this issue! And I get the sense that I generally don't use config in the way it's intended (#476)/ take full advantage of its features (never used ingredients).

So when I am faced with such a drastic departure from the current abstraction, as is required for PBT, the temptation to rip out the complexity I don't use in config to prepare sacred for the kind of open heart surgery it would take to support PBT is strong. Because it is just the core engine I'm after, the observers and all the associated goodies that help me maintain reproducibility. Perhaps this would be a good opportunity to consider refactoring the core engine into it's own project and have the config goodies as a layer on top of that?

Looking forward to hearing your thoughts.

@Qwlouse
Copy link
Collaborator

Qwlouse commented Jun 20, 2019

@rueberger : I hear you and fully agree with your assessment. Going forward, Sacred should be modularized much better, such that all the individual features such as config management, commandline interface, observers, etc. can be easily used in isolation and swappend on demand. I love the idea of PBT and figuring out a good way to support resuming and branching would be extremely valuable.

That being said, I currently have to focus on writing up my thesis and unfortunately do not have the bandwidth to tackle these issue. I'd be happy to share my thoughts and support any efforts in that direction, but designing and implementing these are large endeavors, that require more time and effort than I am currently willing to invest.

@gabrieldemarmiesse
Copy link
Collaborator

We're looking for feedback: here is the proposal to have a Config object #623

Comments are welcome!

@flukeskywalker
Copy link

I'm also a big fan of ray tune, so @rueberger's suggestions will be very helpful indeed.

@stale
Copy link

stale bot commented Dec 15, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 15, 2019
@Roxxx-yyy
Copy link

Magic Johnson

@stale stale bot removed the stale label Dec 16, 2019
@frthjf
Copy link

frthjf commented Jan 1, 2020

Hi @rueberger, I've just released https://github.com/machinable-org/machinable that has a similiar feature list as sacred but uses a fairly different object-oriented API. The configuration is specified in a project-wide configuration file while functionality is implemented in subclasses that can overwrite events like 'on_create' etc. The documentation provides a short overview here. Perhaps that suits your needs?

@frthjf
Copy link

frthjf commented Jan 1, 2020

@flukeskywalker machinable also integrates with Ray and Ray Tune as well, see here and here.

@stale
Copy link

stale bot commented Mar 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added stale and removed stale labels Mar 31, 2020
@stale
Copy link

stale bot commented Jul 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests