WIP: Error messages #348

thequilo · 2018-08-23T04:59:09Z

This is a WIP pullrequest that addresses the error messages as described in #239. I finally got time to prepare a pullrequest for this.

The code is currently very ugly and could more be seen as a proof of concept that those things can work.

There are lists of things that are already done and that need to be done below.
Any suggestions for further improvement of the error messages (or the code, since some part of it is not well structured and untested) are welcome.

Done

Use iterate_ingredients for gathering commands and named configs

This causes gather_commands and gather_named_configs to raise a CircularDependencyError instead of a RecursionError, which makes much clearer what is causing the error.
In addition, any future gather_something functions that may be implemented can overwrite one method and the error handling is done in iterate_ingredients, and the path filtering for experiments is done there.

Track Ingredients that cause circular dependencies

The CircularDependencyError is caught in iterate_ingredients and the current ingredient is added to a list CircularDependencyError.__ingredients__ to keep track of which ingrediens cuased the circular depenceny.

An example error:

Traceback (most recent call last):
  File "error_messages.py", line 24, in <module>
    @ex.automain
  File ".../sacred/experiment.py", line 141, in automain
    self.run_commandline()
  File ".../sacred/experiment.py", line 248, in run_commandline
    short_usage, usage, internal_usage = self.get_usage()
  File ".../sacred/experiment.py", line 173, in get_usage
    commands = OrderedDict(self.gather_commands())
  File ".../sacred/experiment.py", line 394, in _gather
    for ingredient, _ in self.traverse_ingredients():
  File ".../sacred/ingredient.py", line 370, in traverse_ingredients
    raise e
  File ".../sacred/ingredient.py", line 363, in traverse_ingredients
    for ingred, depth in ingredient.traverse_ingredients():
  File ".../sacred/ingredient.py", line 370, in traverse_ingredients
    raise e
  File ".../sacred/ingredient.py", line 363, in traverse_ingredients
    for ingred, depth in ingredient.traverse_ingredients():
  File ".../sacred/ingredient.py", line 370, in traverse_ingredients
    raise e
  File ".../sacred/ingredient.py", line 363, in traverse_ingredients
    for ingred, depth in ingredient.traverse_ingredients():
  File ".../sacred/ingredient.py", line 357, in traverse_ingredients
    raise CircularDependencyError(ingredients=[self])
sacred.exception.CircularDependencyError: ing->ing2->ing

Track sources of configuration entries

This code is still very ugly, but it allows to track the sources of configuration values.
This works up to different resolutions:

for a ConfigScope, we can find the wrapped function and get the place of definition of this function (file + line of the signature line)
for a configuration file we can find the file that defines the configuration values. It would be very difficult to get the line of the config value inside of the file.
for a dict config, we can use inspect.stack to find the line in which the dict configuration value was added.
for configuration defined in the command line, we can say that it was defined in the command line options

See the InvalidConfigError for examples.

Add a baseclass `SacredError` for future Excpetions that is pretty printed in `experiment.run_commandline`

The init definition looks like this:

def __init__(self, *args, print_traceback=True,
                 filter_traceback=None, print_usage=False):
    # ...

It provides the following additional arguments (that are handled in experiment.run_commandline):

print_traceback: if True, traceback is printed according to filter_traceback. If False, no traceback is printed (except for the Exception itself)
filter_traceback: If True, the traceback is filtered (WITHOUT sacred internals), if False, it is not filtered and if None, it falls back to the previous behaviour (filter if not raised within sacred)
print_usage: The short usage is printed when this is set to True.

Add an `InvalidConfigError` that can be raised in user code

Added an InvalidConfigError that prints the conflicting configuration values.

Example:

ex = Experiment()

@ex.config
def config():
    config1 = 123
    config2 = dict(a=234)

@ex.automain
def main(config1, config2):
  if not type(config1) == type(config2['a']):
    raise InvalidConfigError('Must have same type', conflicting_configs=('config1', 'config2.a'))

$ python error_messages.py with config1=abcde

WARNING - root - Changed type of config entry "config1" from int to str
WARNING - error_messages - No observers have been added to this run
INFO - error_messages - Running command 'main'
INFO - error_messages - Started
ERROR - error_messages - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
  File ".../wrapt/wrappers.py", line 523, in __call__
    args, kwargs)
  File "error_messages.py", line 27, in main
    raise InvalidConfigError('Must have same type', conflicting_configs=('config1', 'config2.a'))
sacred.exception.InvalidConfigError: Must have same type
Conflicting configuration values:
  config1=abcde
    defined in command line config "config1=abcde"
  config2.a=234
    defined in "error_messages.py:20"

MissingConfigError

Prints missing configuration values. Prints the filtered stack trace by default, so that the function call that is missing values can be found.
It also prints the name of the ingredient that captured the function and the file in which the captured function is defined.

Example error:

Traceback (most recent calls WITHOUT Sacred internals):
  File .../wrapt/wrappers.py", line 523, in __call__
    args, kwargs)
sacred.exception.MissingConfigError: main is missing value(s) for ['config3']
Function that caused the exception: <function main at 0x0F7A0780> captured by the experiment "error_messages" at "error_messages.py:24"

NamedConfigNotFoundError

Raise a NamedConfigNotFoundError instead of KeyError, and don't print traceback.

TODO

print list of available named configs
give suggestion based on levenshtein distance

ConfigAddedError

Raise a ConfigAddedError when a config value is added that is not used anywhere. This is a sublcass of ConfigError and prints the source where the new configuration value is defined:

Traceback (most recent call last):
sacred.utils.ConfigAddedError: Added new config entry "unused" that is not used anywhere
Conflicting configuration values:
  unused=3
    defined in command line config "unused=3"
Did you mean "config1" instead of "unused"

TODO

print suggestions based on levenshtein distance

TODO

print suggestions for ConfigAddedError
(colored exception output?)
make source tracking optional in SETTINGS
improve resolution of source tracking (line of config file, line in a config scope maybe using inspect.stack)
CommandNotFoundError (?)
Error when parameter is not present for config scope
tests

- shortens code - raises CircularDependencyError instead of RecursionError when there is a circular depencency

…essages

thequilo · 2018-08-23T06:59:16Z

Oh, I just noticed that I used some 3.6 syntax. That must be removed.

Qwlouse · 2018-08-30T08:50:17Z

Hey! This looks amazing. Thanks a lot for all the effort!
I didn't have the time yet to properly go through the code, but I'll try to do so over the weekend.

For the "did you mean" suggestions: difflib is part of the python standard library and provides a function to get close matches.

Qwlouse · 2018-09-16T16:38:05Z

sacred/experiment.py

-        for ingred in self.ingredients:
-            for cmd_name, cmd in ingred.gather_commands():
-                yield cmd_name, cmd
+    def _gather(self, func):


I like the idea of a generic gather function a lot. This removes some of the code duplication and as you said, it raises the more descriptive CircularDependencyError.

Some minor aesthetics: the usage of this function with the inline lambda expressions is a bit unwieldy. How about using it as a decorator? Something like this:

@self._gather_from_ingredients @staticmethod def gather_commands(ingredient): for command_name, command in ingredient.commands.items(): yield join_paths(ingredient.path, command_name), command

Not entirely sure if that works, because of the interaction with @staticmethod, but it feels cleaner and more readable. Alternatively having a separate get_commands method that is used inside the gather_commands method instead of the lambda expression could be good too.

I like the idea of the decorator. self is unfortunately not defined outside of the function. We can define a decorating function outside of the class body:

def gather_from_ingredients(f): def wrapped(self): for ingredient, _ in self.traverse_ingredients(): for item in f(ingredient): yield item return wrapped class Ingredient(object): # ... @gather_from_ingredients def gather_commands(ingredient): for command_name, command in ingredient.commands.items(): yield join_paths(ingredient.path, command_name), command

Qwlouse · 2018-09-16T16:41:12Z

sacred/experiment.py

+        for ingredient, _ in self.traverse_ingredients():
+            for name, item in func(ingredient):
+                if ingredient == self:
+                    name = name[len(self.path) + 1:]


Not sure I understand: what is the purpose of this if clause?

The command names of the experiment itself should not be prefixed with the experiment name. traverse_ingredients returns all ingredients and the gathering function returns the full names. I kept the previous behavior by removing the own path prefix from the name of the experiment.

Qwlouse · 2018-09-16T16:55:44Z

sacred/ingredient.py

+            for ingredient in self.ingredients:
+                for ingred, depth in ingredient.traverse_ingredients():
+                    yield ingred, depth + 1
+        except CircularDependencyError as e:


You use this pattern a lot. Basically:

try: # stuff except SomeCustomError as e: # special handling (i.e. adding information) of that error

I think it might be nicer to provide a context manager to provide that special handling without cluttering the code too much. It could even be part of the exception itself. Something like:

with CircularDependencyError.track(self): # not sure about the name # stuff

Shouldn't be hard to implement and would improve readability IMHO.

I already thought about doing this. This should be easy to implement and improve readability a lot.

Qwlouse · 2018-09-16T16:57:05Z

sacred/exception.py

@@ -0,0 +1,223 @@
+import inspect


Having a separate exception.py file is a good idea. But I think you forgot to remove the Exceptions from utils.py ;-)

Oh, I didn't want to push ´exception.py´. At the moment, the exceptions depend on ´utils.py´, and moving the exceptions without an import in ´utils.py´ would be incompatible to the current version.

Qwlouse · 2018-09-16T16:58:52Z

sacred/exception.py

+
+from sacred.config.config_sources import ConfigSource
+
+if colored_exception_output:


This variable is not defined anywhere. You import it from utils later, but also there it is not defined.
Also, you probably should import BLUE, GREEN, etc. in either case. ;-)

Should this maybe be in settings? Maybe alongside the colors? Or did you plan on auto-detecting if the console supports color? In that case it might be worth it to have a separate colored_output file that takes care of that logic and defines the colors. Aren't there some good libraries to handle this?

Qwlouse · 2018-09-16T17:34:51Z

sacred/experiment.py

-                raise
+        except Exception as e:
+            if not self.current_run or not self.current_run.debug \
+                    or not self.current_run.pdb:


Confusing if-statement. Are you sure this is what you want? The only case in which this will not execute is if there is a current_run and debug is true and pdb is true,

Qwlouse · 2018-09-16T18:52:03Z

sacred/initialize.py

            set_by_dotted_path(config_updates,
                               join_paths(scaff.path, ncfg_key),
                               value)

    distribute_config_updates(prefixes, scaffolding, config_updates)

+    distribute_config_sources(prefixes, scaffolding, config_sources)
+


Wow, this piece of code got even more complicated. If this work as intendend, I am impressed. Even before your additions I was having difficulties mentally keeping track of what happens. This is not good (but not your fault). I think a serious refactoring might be needed here.
If I understand correctly, you are capturing where a particular config entry originated from and passing this information around. Maybe there is a way to somehow encapsulate this into the config entries inside the configuration process. I'll have to think about it. (This is just me thinking out loud, not an instruction to you).

This piece of code is very ugly and I'm not sure if it works in all cases. It worked for all scenarios I tested, but it may fail for others. I think I should rethink and rewrite this part, including tests.

One (again very rough) idea to make it more structured and clear (I may be missing some important points and maybe it is not possible like this):

We could add these configuration origins to the ConfigSummary class and use the ConfigSummary throughout the whole configuration process, and not just at the end to track the changes. This would e.g. simplify the interface of chain_evaluate_config_scopes to just return a ConfigSummary and we could combine this with the config updates using a simple ConfigSummary.update_from (or similar). The config sources for config scopes could then be added inside the config scopes, and the config scope's __call__ would also return a ConfigSummary. We could store everything hierarchically (so, every sub-dict would be a ConfigSummary, same for lists) and create an empty structure based on the ingredient hierarchy before starting the configuration process to get rid of these calls to distribute_config_updates, because we could directly write to the sub-ConfigSummary of the scaffolding using ´recursive_update´ or set_by_dotted_path.

Qwlouse · 2018-09-16T19:14:30Z

Hi,
I just had some time to look over your PR. As you said, it is a bit rough, but I like where this is going. Many very nice improvements for the error reporting. This has the potential to significantly improve the user-experience. So again: thanks a lot for doing this!

I left a few thought in the code, but let me also comment on a high level:

iterate_ingredients: very nice!
circular dependency errors: very nice!
sources of config entries: Very useful feature! But the code is indeed ugly. Doing this nicely might require some more thorough refactoring, so I am not sure what the best way to proceed is. I won't have time to do any refactoring on that scale in the next 2 months. We can use ugly code, but then it definitely needs testing.
SacredError baseclass: Having a baseclass for all Sacred errors is a good idea. The properties make sense to me, but AFAICT you are not actually using them yet.
InvalidConfigError: Having a go-to exception for user-code is a good idea. This might also provide the basis for some config-validation convenience functions or features.
MissingConfigError: very nice!
NamedConfigError: very nice!
ConfigAddedError: nice! Not sure about calling them "conflicting" though. Maybe "unexpected"?

In general: this is a very large PR, which will make it hard to test and review. That is not a show-stopper, but if you can split it into smaller chunks, I think it will get into master sooner.

thequilo · 2018-09-18T10:06:51Z

Thanks for your feedback!

I agree that this is a very large PR and splitting it into smaller ones seems appropriate. This is a first rough suggestion of how to split it:

generic gathering function (without CircularDependencyError)
base class for Exceptions and handling of additional args
Error classes without source tracking, maybe split into multiple PRs?
Source tracking, maybe including some refactoring/rewriting of the initialize.py code
"did you mean" suggestions

thequilo · 2018-10-16T10:10:28Z

The basic exceptions are now merged with #367. Missing points are:

suggestions (I'm working on that), and
track where the configuration values were set

For the latter part, some refactoring of initialize.py would be very helpful / required. @Qwlouse are you going to work on this in the near future?

Qwlouse · 2018-10-16T11:34:06Z

Great! 🎉

Oxt weekend I might be able to spend a day on refactoring initialize.py. Not sure that is going to be enough time, but hopefully I'll be able to lay the foundation for tracking the origin of config values. I like your idea of using ConfigSummaries, so I guess that is what I'll work towards.

Does that sound good to you? Any further thoughts?

thequilo · 2018-10-16T13:08:19Z

Yes that sounds great! I'm glad that you like the idea to use ConfigSummaries. My suggested "hierarchical structure" that can be updated by recursive dict updates might be a lot more complicated than I thought, especially when the same Ingredient appears in different places. But I think you have a lot better overview over sacred and the configuration process and maybe come up with better ideas.

I first had to check what "oxt weekend" means, but I find that a great idea. I am always confused by the phrase "next weekend"...

I'll wait until you made some progress (hopefully oxt weekend) and then start to implement the "tracking of origin of config values" feature based on your changes.

What do you think about colored output for the error messages, e.g. highlight all config keys with the same color?

Qwlouse · 2018-10-27T20:32:21Z

Hey @thequilo.
So the good news is that I spent quite a while today thinking and coding. But trying to refactor the ugly initialize code I went down a rabbit-hole of changes that I'd like to implement, and unfortunately I couldn't get to a usable state yet. So the bad news is, that I am not sure when I'll be able to finish this. The next three weeks are rather intense for me.

If you are interested in the current state you can find the code in the config_refactor branch. But sadly this is more of a sketch of my intentions than an actual refactoring.

Some highlights of what I am trying to do:

Divide the configuration process into several stages, where config entries are frozen after each stage.
- Stage 0: intialization sets the initial seed
- Stage 1: config updates incorporates the commandline updates
- Stage 2: runs through named configurations
- Stage 3: is the regular configurations
Get rid of the Scaffold objects and possibly migrate a lot of the logic into the Run object. (moving towards something like the suggestion of @johny-c here)
introduce an explicit Path object that unifies handling of paths like "foo.bar.a" including non-string keys
Have containers that save meta information alongside the entries, and that support attribute access.

Regarding colored output messages: I think that is a good idea and there are probably several opportunities for helpful highlighting. Since this concerns error messages, we might need to be careful about not breaking the output in terminals that don't support color (Though, I am not sure if that is even a real problem).

thequilo · 2018-11-08T09:16:26Z

It looks like you already made some progress that looks promising. I like the changes that you are trying to do. I'll just wait with my modifications until you finished implementing it.

I found that the 'got unexpected kwarg(s)' exception is currently not handled by the SacredError. I'll prepare a PR for that.

thequilo · 2019-01-07T21:36:04Z

First, happy new year!

How's the progress on the config_refactor branch?

Qwlouse · 2019-02-21T20:10:37Z

Hi @thequilo,
a belated happy new year to you too 🎆, and sorry for the lengthy delay. I plan to get back to the config refactor sometime next week once I have caught up with the other issues and PRs. I'll keep you updated.

stale · 2019-05-04T19:33:29Z