Refactor config attributes loading #251

ioparaskev · 2019-07-10T20:54:02Z

Extends the validation of the mandatory_dict_keys in config by
checking that they also have a value. There is no point in having checks
for mandatory keys if they are empty (this would mean that they are not
really mandatory)
Extracts _parse_config method to utilize for loading configs
Moves the self._config check from _verify_dict method to beginning
of initialization. If the main config file is wrong, there's no need
proceeding any more
Extracts plugin loading to _load_plugins method
Fixes Reading config fails if no microservcies are configured #184 by setting empty list as default value for plugin configs
in case a specific plugin config is unset thus allowing MICRO_SERVICES
plugin to be empty (instead of deleting the key)
Adds test cases for invalid internal attributes and empty
MICRO_SERVICES plugin
Adds test case for invalid conf although since the exception thrown is
generic for other config cases as well, the only way to verify this
is by checking the coverage (subclassing the config errors is probably a
better alternative)

All Submissions:

Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Have you added an explanation of what problem you are trying to solve with this PR?
Have you added information on what your changes do and why you chose this as your solution?
Have you written new tests for your changes?
Does your submission pass tests?
This project follows PEP8 style guide. Have you run your code against the 'flake8' linter?

- Extends the validation of the mandatory_dict_keys in config by checking that they also have a value. There is no point in having checks for mandatory keys if they are empty (this would mean that they are not really mandatory) Signed-off-by: John Paraskevopoulos <jpr@dialectics.space>

- Extracts `_parse_config` method to utilize for loading configs - Moves the `self._config` check from `_verify_dict` method to beginning of initialization. If the main config file is wrong, there's no need proceeding any more - Extracts plugin loading to `_load_plugins` methoda - Fixes IdentityPython#184 by setting empty list as default value for plugin configs in case a specific plugin config is unset thus allowing MICRO_SERVICES plugin to be empty (instead of deleting the key) - Adds test cases for invalid internal attributes and empty MICRO_SERVICES plugin - Adds test case for invalid conf although since the exception thrown is generic for other config cases as well, the only way to verify this is by checking the coverage (subclassing the config errors is probably a better alternative) Signed-off-by: John Paraskevopoulos <jpr@dialectics.space>

c00kiemon5ter · 2019-07-11T11:25:27Z

src/satosa/satosa_config.py

+                ),
+            ),
+            None,
+        )


I would write this, as:

value = next( ( conf for parser in [self._load_dict, self._load_yaml] for conf in [parser(config)] if conf ), None, ) return value

it is cleaner, and more "pythonic"

For "pythonic" (whatever this vague term means) I agree. For cleaner I think it's more subjective. I always found nested comprehensions to be less clear than nested maps (especially when nesting > 2)

If I should put a debugger there I'm sure that there will be problems.
Pythonic is elegant, compact, winking and whatever but probably it could be better a simplified and debuggable approach. Just put import pdb; pdb.set_trace() before that code before choosing

How would you put the debugger inside the comprehension to debug it? On the contrary I think it's easier to debug filter & map since each map call can be debugged on it's own, whereas a nested comprehension must be run fully or rewritten in smaller parts to be debugged.

You simply can't! This is why it would be better to avoid that kind of generalization. This is just a personal view, I like the others code style, I also imitate these if needed and with curiosity. I'm jut getting older, that's all. Because of this I'd prefer a simple and very debuggable code. I've get into this thread to exchange some words about my personal view, now I've done, you can simply ignore me and go ahead 👍

If you really need to drop a debugger, it is very easy to convert the comprehension to a regular for-loop and place it under the right scope. While, you need to think more to convert the map/filter style into a regular for-loop.

You don't have to convert anything in the map/filter, you simply evaluate each part (the map part, or/and the filter part). There is no need to write any for loop.

Map/filter power is composition which cannot be achieved with comprehensions. Try having a 4 times nested comprehension (for complex constructs) to see how readable it is (I'm not sure even if it is achievable). Whereas map/filter does not change at all, you compose each map as an argument to the next map. List comprehensions have a limit since they are a syntactic construct whereas map is a higher order function that can be composed with much less limitation.
As I said, I personally always found nested list comprehensions (especially crowded with if statements) more difficult to process (due to the change in order) and I've come to prefer map/filter exactly because I can compose them one part at a time and join them without having to think the order of the for statement like in comprehensions.

Readability for this specific discussion, depends on the experience one has with map/filter usage and concepts and it is a personal taste as well. If we had this conversation 10 years ago, mutating everything like crazy and having 10 levels deep for loops could be a normal thing that one would consider "readable". There is a reason concepts like map/filter and other functional concepts are being introduced and used more and more today even (if Guido wants to rant about how hard he finds map/filter/reduce and prefers comprehensions). The same thing can be said about the usage of next. One may be more familiar with a for loop with a break statement.

you simply evaluate each part

you cannot evaluate each part, as the two are nested. I would understand this if the code was in the form

intermediate_result = map(foo, bar) final_result = filter(baz, intermediate_result)

which directly translates to comprehensions

intermediate_result = (foo(x) for x in bar) final_result = (x for x in intermediate_result if baz(x))

Map/filter power is composition

we agree on that.

which cannot be achieved with comprehensions

I can't see why and I'm interested in understanding this.

join them without having to think the order of the for statement

what is the order of the for statement?

For this specific snippet, there is nothing being composed. Creating lambdas on the fly is not composition. You would be composing if these were functions that were being reused to achieve a greater effect. Instead, this is just hardcoded behaviour in a map/filter construct. Just because someone is using map/filter/reduce, it does not mean that they're composing.

you simply evaluate each part

you cannot evaluate each part, as the two are nested. I would understand this if the code was in the form

in order to debug the following

( conf for parser in [self._load_dict, self._load_yaml] for conf in [parser(config)] if conf )

you have to evaluate its parts. to evaluate them you have to write a for loop.

in order to evaluate this:

filter(lambda conf: conf is not None, map(lambda parser: parser(config), (self._load_dict, self._load_yaml)))

you evaluate map(lambda parser: parser(config), (self._load_dict, self._load_yaml)) and you evaluate filter(lambda conf: conf is not None on the result. there is no need to think of any for loop, that's what I'm saying

Map/filter power is composition

we agree on that.

which cannot be achieved with comprehensions

I can't see why and I'm interested in understanding this.

I actually meant functional decomposition (splitting a problem into a set of functions). Composition is another feature that is not builtin in python (you have to create a function that will create composite function whereas in other languages there is a operator to do that)
Let me rephrase this. You cannot use them the way you use map/filter to process things. If you want to create list comprehensions with multiple checks and processing you have to write a new nested comprehension with extra variables.

For example:

def remainder_of_two(x): return int(x) % 2 def add_star(x): return "*{}".format(x)

I want to flatten a list of integers a=[[1,2],[3,4]], transform them to string, get those that are odd and add a * character to them. I have the above functions.
With map/reduce/filter, I feed each operation into the other:

list(map(add_star, filter(remainder_of_two, map(str, reduce(add, a)))))

reading from right to left, I flatten the list, then I get the string representation of them, then I filter out those that are even, then I add the star character

With list comprehensions and if statements:

[add_star(y) for y in [x for b in a for x in b if remainder_of_two(str(x)) != 0]]

Now how do I read this to describe what is done? Can I read from right to left? No. Can I read from left to right? Probably, let's try: I add a star character to a variable that is the one that is contained in a list of other variables that the string representation of that sub-variable when applied the remainder_of_two function, returns != 0. Is this narrative correct? Not exactly because I never mentioned the flattening of the list of lists explicitly. That kind of processing with list comprehensions needs extra variables. map/filter/reduce are higher order functions and don't need those (unless you pass a lambda). (sidenote: until python 3 list comprehensions were leaking scope of their variables (that is, they were accessing outside variables instead of creating a local scope).

join them without having to think the order of the for statement

what is the order of the for statement?

As I said before, narrating the operation in one's head, feeding results in functions is simpler than having to keep variables in brain memory. In our case:
I want to apply a list of functions in a config, and if their result is truthish, I want to get the result:

( conf for parser in [self._load_dict, self._load_yaml] for conf in [parser(config)] if conf )

How do we express this above? I have a result called conf that -skip to line 3- when applied to a variable called parser -skip to line 2- where parser is a loop over a list of functions, -skipt to line 4- if conf is truthish. What I mean by this extremely naive narration, is that in comprehensions you have to read the whole expression to understand what's happening. Whereas in map/filter in our case:

filter(lambda conf: conf is not None, map(lambda parser: parser(config), (self._load_dict, self._load_yaml)))

Reading for right to left: I have a list of functions, and I apply them to a config, and then the results I feed them to a check to see if they are truthish and keep those that are.

For this specific snippet, there is nothing being composed. Creating lambdas on the fly is not composition. You would be composing if these were functions that were being reused to achieve a greater effect. Instead, this is just hardcoded behaviour in a map/filter construct. Just because someone is using map/filter/reduce, it does not mean that they're composing.

Indeed there is no composition here since there aren't any composite functions

in order to debug the following [...] you have to evaluate its parts.

yes.

to evaluate them you have to write a for loop.

no. As with map/filter you evaluate the different parts.

in order to evaluate this: [...]
you evaluate map()
and you evaluate filter() on the result

Which means that you evaluate its parts.
And you can only do that if these are written separately, not nested.

I think that you are assuming that the way map/filter work in python, is the same as in most other languages, but this is not the case. The map() and filter() functions return map and filter objects respectively. map and filter objects are actually iterators that yield (they work just like generators) items. This means that map and filter are not separately evaluated. When a function pulls an item (pull-functions are next(), list(), etc) the filter-fn will retrieve the next item from the given iterable; this iterable is the map-iterator; to get the next item of the map-iterator, the map-fn retrieves the next item from the given iterable; the map-fn applies its callable to the item, saves its state and returns a single result; then the filter-fn applies its callable to the item it got from the map-iterator, saves its state and returns a single result. The map/filter-fn never evaluate completely (or anything), if a pull-fn does not ask them to.

This works exactly as comprehensions work, and it is the reason you need to wrap the outermost function (filter in this case) with a pull-function, as you do with comprehensions.

In most other languages, map and filter do not return an iterator, but a complete result. That used to be the case for python2; but for python3 the effect is the same as with comprehensions.

there is no need to think of any for loop, that's what I'm saying

yes. The reason I mentioned the transformation to the for loop, was to be able to invoke a debugger inside the loop.

If you're saying there is not reason to look at the for-keyword of the comprehension, then you can hide it behind a function, just like map/filter do. Just because you do not see the for keyword, it does not mean that there is no loop.

The only good thing with map/reduce is that you can replace them with versions that can parallelize the operation safely. (which is not needed here and in the majority of the cases)

I want to

flatten a list of integers a=[[1,2],[3,4]],

transform them to string,

get those that are odd

and add a * character to them.

With map/reduce/filter, I feed each operation into the other:

list( map( # add a * character add_star, filter( # get odds remainder_of_two, map( str, # convert to str reduce(add, a) # flatten ) ) ) )

reading from right to left ...

Reading from right to left is the opposite of what people do. To be precise, this reads inside-out (or bottom-top if you will).

With list comprehensions and if statements:

[ add_star(y) for y in [ x for b in a for x in b if remainder_of_two( str(x) ) ] ]

There is no need for this nesting. This should be written as

list( add_star(s) # add a * character for sublist in a for x in sublist # flatten for s in [str(x)] # convert to str if remainder_of_two(s) # check for odd )

or even

list( v for sublist in a for x in sublist # flatten for s in [str(x)] # convert to str if remainder_of_two(s) # check for odd for v in [add_star(y)] # add a * character )

Reading is fine, from top to bottom.

How do we express this above? I have a result called conf that -skip to line 3- ...

What you're doing here is the same as this:

In order to read this:

filter(lambda conf: conf is not None, map(lambda parser: parser(config), (self._load_dict, self._load_yaml)))

you have to find the inner most function, then find the leftmost argument, then apply it to each item of the remaining arguments, then imagine the results, then skip to next outer function, then find the leftmost argument, then apply it to the remaining arguments including the one you imagined before, then imagine the results, and so on.

(which, again, is not true, because in practice map and filter return iterators.)

Likewise, I can read the comprehension as:

for each parser, apply it to config and generate a conf, if that is truthish keep it

I doubt that you find it easier to locate the inner-most function, and immediately reason about it, while you find it hard to read that comprehension.

In general, comprehensions read from top to bottom, left to right, and accumulate results in the very first line. Instead map/filter read inside-out, and demand understanding of the arguments.
I can read both, but I can read comprehensions easier, because the order of the statements is natural. Badly written code will read bad no matter if it is within a comprehension or a map/filter.

Having said all this, I do not think that there is any other place where you would nest functions one inside the other like you do with map and filter. You would not do:

command3(command2(command1(arg1, arg2)))

instead you would split this into multiple lines that feed results into each other.

res1 = command1(arg1, arg2) res2 = command2(res1) res3 = command3(res2)

so, I cannot understand how this changes with map and filter, and how it is helpful to anyone to read it nested like that. Do people really write nested code like that with map/filter/reduce? Do they try to convert their whole programs into a one-liner? Where does this come from?

That kind of processing with list comprehensions needs extra variables

The only reason you do not have variables is because of this nesting. Variables is not a bad thing that we should avoid. Instead, variables are pretty helpful to document in-code the intermediate results.

In most other languages, map and filter do not return an iterator, but a complete result. That used to be the case for python2; but for python3 the effect is the same as with comprehensions.

Well it depends. In Clojure for example maps are lazy (realized in chunks to be exact) whereas in Haskell they are not (they are non-strict which is not the same) although they act like the same

there is no need to think of any for loop, that's what I'm saying

yes. The reason I mentioned the transformation to the for loop, was to be able to invoke a debugger inside the loop.

If you're saying there is not reason to look at the for-keyword of the comprehension, then you can hide it behind a function, just like map/filter do. Just because you do not see the for keyword, it does not mean that there is no loop.

I said it to point the fact that you simply evaluate each part without rewriting it whereas in list comprehensions you have to rewrite it if it has many parts (i.e nested comprehension with if statement)

How do we express this above? I have a result called conf that -skip to line 3- ...

What you're doing here is the same as this:

In order to read this:

filter(lambda conf: conf is not None, map(lambda parser: parser(config), (self._load_dict, self._load_yaml)))

you have to find the inner most function, then find the leftmost argument, then apply it to each item of the remaining arguments, then imagine the results, then skip to next outer function, then find the leftmost argument, then apply it to the remaining arguments including the one you imagined before, then imagine the results, and so on.

(which, again, is not true, because in practice map and filter return iterators.)

Likewise, I can read the comprehension as:

for each parser, apply it to config and generate a conf, if that is truthish keep it

I doubt that you find it easier to locate the inner-most function, and immediately reason about it, while you find it hard to read that comprehension.

In general, comprehensions read from top to bottom, left to right, and accumulate results in the very first line. Instead map/filter read inside-out, and demand understanding of the arguments.
I can read both, but I can read comprehensions easier, because the order of the statements is natural. Badly written code will read bad no matter if it is within a comprehension or a map/filter.

Well from top to bottom in the example I read a v and the declaration for what v is, is at the bottom of the statement. That's what I meant that in comprehensions you use goto in your mind whereas in map/filter/reduce you read inside out and chain the parts together (you read, "I first do this, then I do that, then I do that etc" whereas in comprehensions you have to remember variable names and jump inside the comprehension to read). So comprehensions don't exactly read from top to bottom because if you read them from top to bottom you'll have to remember what v is. So you choose either goto and not top-to-bottom reading either remembering what that variable v in the first line will be defined 3-4 lines later.

Having said all this, I do not think that there is any other place where you would nest functions one inside the other like you do with map and filter. You would not do:

so, I cannot understand how this changes with map and filter, and how it is helpful to anyone to read it nested like that. Do people really write nested code like that with map/filter/reduce? Do they try to convert their whole programs into a one-liner? Where does this come from?

They don't I'm simply mentioning this to underline why I think map/filter is more powerful; because it chains better

That kind of processing with list comprehensions needs extra variables

The only reason you do not have variables is because of this nesting. Variables is not a bad thing that we should avoid. Instead, variables are pretty helpful to document in-code the intermediate results.

Variables is not a bad thing indeed. I simply mentioned them that you have to remember variables when reading a comprehension

I think we're beating a dead horse here, we will probably not reach an agreement. I think this ends up to personal taste. People using mostly Python might find comprehensions more intuitive because they are more exposed to them and the community & BDFL supports them. I started this discussion to underline the fact that "clearer" and "pythonic" are not objective terms and should be taken with a grain of salt. Especially "pythonic" can lead to assumptions that are counter-intuitive for other languages (i.e by default ordered dicts, assignment expressions as in pep572 etc)

c00kiemon5ter · 2019-07-11T11:30:33Z

src/satosa/satosa_config.py

+        if not self._config:
+            raise SATOSAConfigurationError(
+                "Missing configuration or unknown format"
+            )


Should this be part of _parse_config?

Nope. This statement used to be in the beginning of _verify_dict which was passed the self._config as an argument. It was not used as a generic error, only to state the failure of the initial config parsing

Let me rephrase the question.
Should the check if not self._config: raise error() be inside _parse_config?

Nope. _parse_config parses the argument config. It could as well be a staticmethod or function if the _load_dict and _load_yaml where not instance methods (and they could as well be external functions, there is no need to be coupled inside the SATOSAConfig object

This is not answering what I am asking. Whether those are functions or methods is irrelevant to the question. What I am asking is who is responsible for checking if the result of an action is an error.

You do not need to think about this particular case at all; see the example bellow. You have two choices, the first is what is being done now - the caller is checking for errors

def foo(x): value = bar(x) if has_error(value): generate_error() do_something_with(value) def bar(x): value = get_value(x) return value

and a second choice, where the callee is checking for the error

def foo(x): value = bar(x) do_something_with(value) def bar(x): value = get_value(x) if has_error(value): generate_error() return value

How can we compare those two and choose one over the other?

It depends is the answer. It depends on what kind of error is this. Is it an evaluation error or it is a logic error? bar could throw errors if they have to do with the processing of x but if there is a logical error with the processing of x, should bar know the logic of foo and throw the error? I would say no

OK, so what we do here is we parse the config. And parsing fails - none of the parsers can parse the given thing. By what you said, _parse_config should then throw the error, because it failed to do its job. The return value of None is not a logical error, it is a parsing error; a return value sentinel that signifies the failure of the operation. For this specific operation, an evaluation error would be a config that was parsed but has inconsistent values for the different options it holds, ie it would be a validation error.

Indeed in this case, it is a parsing error and the parsing function should throw this

c00kiemon5ter · 2019-07-11T11:35:18Z

src/satosa/satosa_config.py

+                    "Failed to load plugin config '{}'".format(config)
+                )
+            else:
+                return plugin_config


We should try to have a straight codepath. else is not needed here.

Thought that "explicit is better than implicit" for the eye here, but sure

if/else says that at this point there is one of two things that you can follow. This is not the case here. What we do here is just check and handle an error case. To make this clearer forget that the else part is a return statement. return is just a command, and it is only by chance that there only one command here. If you imagine we had 4 commands would you still nest them under the else branch?

def foo(x): value = command1(x) if has_error(value): generate_error() else: command2(value) command3(value) command4(value)

or

def foo(x): value = command1(x) if has_error(value): generate_error() command2(value) command3(value) command4(value)

It depends is the answer again. I would probably have a validation function call to throw the error and hide the if from foo so as to keep in main view one logical exit path

to keep in main view one logical exit path

I feel we're saying the same thing. To keep one logical path you would not put the rest of the code under the else branch.

c00kiemon5ter · 2019-07-11T11:36:02Z

src/satosa/satosa_config.py

+
+        if not self._config:
+            raise SATOSAConfigurationError(
+                "Missing configuration or unknown format"


How can we know which of the two it is?

We didn't and we don't

How can we provide this information?

Uknown format means parsing failed or returned empty dict whereas missing configuration could mean that either an empty config argument was given to the constructor or that the argument was a file that couldn't be opened. The first case was/is only indirectly checked. The second will be displayed in the logs (if _load_yaml cannot open the file for example). I'm not saying that this is the way it should be done, I'm just saying how it is currently implemented

I'm not saying that this is the way it should be done, I'm just saying how it is currently implemented

Yes, I know; and what I'm saying is that we should make clear what went wrong. How we make it clearer is what we are discussing.

Ideas to improve this:

Provide config as part of the message and let the user figure out which of the two it is implicitly

Make more fine grained checks and provide the exact message

These should probably be written in a new issue, otherwise in a PR they will be lost

c00kiemon5ter · 2019-07-11T11:36:56Z

src/satosa/satosa_config.py

+            plugin_config = self._parse_config(config)
+            if not plugin_config:
+                raise SATOSAConfigurationError(
+                    "Failed to load plugin config '{}'".format(config)


Why is this message different than the one on line 38?

c00kiemon5ter · 2019-07-11T11:43:29Z

src/satosa/satosa_config.py

+                    lambda x: load_plugin_config(x),
+                    self._config.get(key, []) or [],
+                )
+            )


it is cleaner with a comprehension

# generate pconfs = { ptype: [ load_plugin_config(pconf) for pconf self._config.get(ptype) or [] ] for ptype in ["BACKEND_MODULES", "FRONTEND_MODULES", "MICRO_SERVICES"]: } # mutate self._config.update(pconfs)

c00kiemon5ter · 2019-07-11T11:50:52Z

src/satosa/satosa_config.py

        if not self._config["INTERNAL_ATTRIBUTES"]:
-            raise SATOSAConfigurationError("Could not load attribute mapping from 'INTERNAL_ATTRIBUTES.")
+            raise SATOSAConfigurationError(
+                "Could not load attribute mapping from 'INTERNAL_ATTRIBUTES."


A third different message for the same action. Can we make those messages similar?

c00kiemon5ter · 2019-07-11T11:51:54Z

src/satosa/satosa_config.py


        for key in SATOSAConfig.mandatory_dict_keys:
-            if key not in conf:
-                raise SATOSAConfigurationError("Missing key '%s' in config" % key)
+            if not conf.get(key, None):


in .get(x, None), None is not needed

c00kiemon5ter · 2019-07-11T11:53:38Z

src/satosa/satosa_config.py

+                    "Missing key {key} or value for {key} in config".format(
+                        key=key
+                    )
+                )


If I have three keys missing, should I have to try three times to get the complete information?
Instead, can we turn this into an actual validator?

c00kiemon5ter · 2019-07-11T12:07:46Z

Extends the validation of the mandatory_dict_keys...

Extracts _parse_config...

Moves the self._config check...

Extracts plugin loading...

Fixes Reading config fails if no microservcies are configured #184 ...

Adds test cases for invalid internal attributes...

Adds test case for invalid conf...

There are 7 bullets describing changes, but only two commits. Can we balance this in the future?

c00kiemon5ter · 2019-07-11T12:09:55Z

If the main config file is wrong, there's no need proceeding any more

If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.

c00kiemon5ter · 2019-07-11T12:12:49Z

subclassing the config errors is probably a better alternative

Yes, we should do this. Are there any other alternatives?

c00kiemon5ter · 2019-07-11T12:17:04Z

Extracts _parse_config method to utilize for loading configs

Extracts plugin loading to _load_plugins method

This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?

Ideally, this should be part of the commit message that introduces the respective change.

ioparaskev · 2019-07-11T14:27:21Z

There are 7 bullets describing changes, but only two commits. Can we balance this in the future?

Sure, I can rebase and split them to more commits

If the main config file is wrong, there's no need proceeding any more
If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.

The original behavior didn't change since my purpose was to refactor the code and not change the behavior. Every stop and every error is in the same order as it was before. I think that if we wish to change the behavior to that which you describe, it should be in a different PR not in the same PR as the refactoring.

subclassing the config errors is probably a better alternative

Yes, we should do this. Are there any other alternatives?

Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.

This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?

Initially this PR was about #184 . Reading through the code I realized that the __init__ method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the _verify_dict and the load_yaml and load_dict methods would be inside the constructor as well.

c00kiemon5ter · 2019-07-11T16:09:16Z

If the main config file is wrong, there's no need proceeding any more
If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.

The original behavior didn't change since my purpose was to refactor the code and not change the behavior. Every stop and every error is in the same order as it was before. I think that if we wish to change the behavior to that which you describe, it should be in a different PR not in the same PR as the refactoring.

We do not change things just because they can be changed. Refactoring is not about moving things around. The refactor should have a goal which should be stated and it should answer how it makes things "better". In this process it is definitely desirable to group things differently (prepare for the changes) and then change how they behave (make things better), to make the transition clear and understandable (it would be also be reflected by the commits.) But still, the goal is to change things to make them better.
How far we want to go with that is what we discuss here. I'm all in for pushing that a lot. If you feel we should restrict this and have separate PRs for related aspects, let's do that.

subclassing the config errors is probably a better alternative

Yes, we should do this. Are there any other alternatives?

Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.

We don't need what? Do you mean we do not need alternatives?
How does "carrying the erroneous config as a class variable" help?

Just to remind us, what we try to solve here is what you initially wrote: how can we know that an exception we expect came from a specific handler?

This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?

Initially this PR was about #184 . Reading through the code I realized that the __init__ method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the _verify_dict and the load_yaml and load_dict methods would be inside the constructor as well.

It is not obvious to everyone, as it was not to the original author. Additionally, it may not be obvious at all, when three years later all the surrounding code has changed. Having this as part of the history is very helpful when someone will need to go through it and understand. This particular case may not be very important, but we should still do this to push our thinking in that direction.

ioparaskev · 2019-07-11T20:27:33Z

If the main config file is wrong, there's no need proceeding any more
If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.

The original behavior didn't change since my purpose was to refactor the code and not change the behavior. Every stop and every error is in the same order as it was before. I think that if we wish to change the behavior to that which you describe, it should be in a different PR not in the same PR as the refactoring.

We do not change things just because they can be changed. Refactoring is not about moving things around. The refactor should have a goal which should be stated and it should answer how it makes things "better". In this process it is definitely desirable to group things differently (prepare for the changes) and then change how they behave (make things better), to make the transition clear and understandable (it would be also be reflected by the commits.) But still, the goal is to change things to make them better.

As I said, the point of this refactoring (which came as a side effect of #184) was to improve readability by reducing the nested for loops, reducing complexity (same way), and extracting operations with a mind of reusability (without pushing reusability too much to end up in coupling). There doesn't have to be always a behavioral change or feature to refactor although the real value of the refactoring is to restructure the code in a way that is easier to maintain, extend and change. So with that mindset, I believe that we should change things because they can be changed to provide the aforementioned things.

How far we want to go with that is what we discuss here. I'm all in for pushing that a lot. If you feel we should restrict this and have separate PRs for related aspects, let's do that.

I really believe this PR discussion has gone way out of track and the PR lost it's value.

subclassing the config errors is probably a better alternative

Yes, we should do this. Are there any other alternatives?

Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.

We don't need what? Do you mean we do not need alternatives?
How does "carrying the erroneous config as a class variable" help?

Just to remind us, what we try to solve here is what you initially wrote: how can we know that an exception we expect came from a specific handler?

Carrying the erroneous config would help to discern which config was erroneous not trace the line/caller the exception came from. I wrote what I wrote because this exception class was used everywhere and the only way to check the correct one was thrown was to check the error message which I consider wrong. If for some reason (what would that be?) one wants to trace the function caller that resulted in the exception, they could as well use the traceback module. It's like trying to trace the exit point of a multiple-exit-point function

This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?

Initially this PR was about #184 . Reading through the code I realized that the __init__ method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the _verify_dict and the load_yaml and load_dict methods would be inside the constructor as well.

It is not obvious to everyone, as it was not to the original author. Additionally, it may not be obvious at all, when three years later all the surrounding code has changed. Having this as part of the history is very helpful when someone will need to go through it and understand. This particular case may not be very important, but we should still do this to push our thinking in that direction.

I'm still not sure what explanation to write to justify an extraction to a method. Splitting functionality would need explanation (if for example the plugins loading was split instead of being in a for loop), but I consider this a really minor refactoring to explain and justify the extraction and the rewrite of a 4 level nested for loop

ioparaskev · 2019-07-11T20:28:36Z

I prefer to close this PR. This started as a simple and harmless imho refactoring to improve readability and complexity on the track of solving a minor issue and ended up discussing about behavioral changes, functional paradigms, readability and exception subclassing alternatives. What started as a simple change request, ended up in the longest PR discussion for this repo. I've really lost sight of what needs to be done here. So it's better to close this and handle anything that is needed after a corresponding issue is opened that would explain what needs to be done.

c00kiemon5ter · 2019-07-11T23:20:17Z

How far we want to go with that is what we discuss here. I'm all in for pushing that a lot. If you feel we should restrict this and have separate PRs for related aspects, let's do that.

I really believe this PR discussion has gone way out of track and the PR lost it's value.

The value is still here, both in the code and the discussion. And I actually believe that you've probably already learnt new things from these discussions. I did, others reading this did. There is no reason to dismiss this.

subclassing the config errors is probably a better alternative

Yes, we should do this. Are there any other alternatives?

Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.

We don't need what? Do you mean we do not need alternatives?
How does "carrying the erroneous config as a class variable" help?
Just to remind us, what we try to solve here is what you initially wrote: how can we know that an exception we expect came from a specific handler?

Carrying the erroneous config would help to discern which config was erroneous not trace the line/caller the exception came from.

Yes, we do not care about the line; we care about the handler in the sense you describe - not the handler as a caller, but the operation it was doing that will help us understand what went wrong.

I wrote what I wrote because this exception class was used everywhere and the only way to check the correct one was thrown was to check the error message which I consider wrong.

Right! We should not check the error message; and even if we did, it could be the same, so this is a no-go.

So, this topic is a super interesting topic, and I believe it is in the heart of complexity in software. It is also why monads are so useful (at least, for my understanding of monads). What we see here is the need to propagate the state of the operation to produce a meaningful error message. The same thing is needed in either direction of the flow. This propagation of state is useful both to stop and analyse the program, but also to progress the program and to make decisions.
There are simple ways to make this happen (save the information in some context), and this is what we will do for a start. But modelling the generic mechanism to get that information without having to redo the tedious plumbing is very interesting, and leads to a different paradigm; it is not about types, purity, or what-not, it is about information accumulation and decisions.

This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?

Initially this PR was about #184 . Reading through the code I realized that the __init__ method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the _verify_dict and the load_yaml and load_dict methods would be inside the constructor as well.

It is not obvious to everyone, as it was not to the original author. Additionally, it may not be obvious at all, when three years later all the surrounding code has changed. Having this as part of the history is very helpful when someone will need to go through it and understand. This particular case may not be very important, but we should still do this to push our thinking in that direction.

I'm still not sure what explanation to write to justify an extraction to a method. Splitting functionality would need explanation (if for example the plugins loading was split instead of being in a for loop), but I consider this a really minor refactoring to explain and justify the extraction and the rewrite of a 4 level nested for loop

You've already written that you though it was verbose, too big and complex. These are good reasons, but you're basing them on "taste" or some number that is supposed to indicate complexity, but you know that there is something bothering you and it is not the number of lines, neither the CC, because you have seen other code that may be longer or more complex, but it still felt fine.
The hard question is what makes anyone judge something as verbose, big or complex. In a previous PR you wrote something related and very important, and you wrote a similar thing elsewhere in this PR. Let me quote:

Now there are 3 functions: one for the creation of the metadata objects, one for the writing the metadata objects to file(s), and one for composing those two to a final result

So, what drives the split is exactly what you write: you want to separate concerns, and then you want to compose them to achieve a bigger effect. There is more than composition, but separating concerns and composing them is a first step that makes this feeling that is bothering us and that we attributed to taste, an actual guide to making such decisions. The way you separate concerns is by figuring out the level of detail that describe an operation. Finer details are encapsulated and split into separate functions, so that the function is concerned (not only with a single aspect, but) with details of the same abstraction level.
The next step is orchestration: every function is an orchestrator of data and effects. Values are derived by transforming data, where transformations are compositions of functions. Business results are derived by chaining effects that are built on top of those compositions. Orchestration provides a higher level way to think about and deal with the concerns of business requirements. It allows us to understand the abstraction details of business operations (which are implemented as steps of function compositions).

c00kiemon5ter · 2019-07-11T23:37:57Z

I prefer to close this PR. This started as a simple and harmless imho refactoring to improve readability and complexity on the track of solving a minor issue and ended up discussing about behavioral changes, functional paradigms, readability and exception subclassing alternatives.

This is a discussion. We explore what can be done better. It is the only way to design things, and I know you are capable of doing that. (I am also pretty sure you want to do that.)
All the things you describe above, are rooted to deeper questions, sometimes annoyingly abstract, and these are the questions I try to bring out. I dedicate time to this, because I know you can contribute to these questions and I know that many of these are things that have been bothering you, too.

What started as a simple change request, ended up in the longest PR discussion for this repo.

The length of the PR discussion is irrelevant.

I've really lost sight of what needs to be done here. So it's better to close this and handle anything that is needed after a corresponding issue is opened that would explain what needs to be done.

What needs to be done is what we will decide by discussing the different tradeoffs of the alternative choices we have. The reason to have multiple comments and not a big one, is so that we can conclude for each point separately.

Are you interested in doing this with me?

ioparaskev · 2019-07-12T21:43:12Z

#251 (comment) I totally agree

I prefer to close this PR. This started as a simple and harmless imho refactoring to improve readability and complexity on the track of solving a minor issue and ended up discussing about behavioral changes, functional paradigms, readability and exception subclassing alternatives.

This is a discussion. We explore what can be done better. It is the only way to design things, and I know you are capable of doing that. (I am also pretty sure you want to do that.)
All the things you describe above, are rooted to deeper questions, sometimes annoyingly abstract, and these are the questions I try to bring out. I dedicate time to this, because I know you can contribute to these questions and I know that many of these are things that have been bothering you, too.

What started as a simple change request, ended up in the longest PR discussion for this repo.

The length of the PR discussion is irrelevant.

I've really lost sight of what needs to be done here. So it's better to close this and handle anything that is needed after a corresponding issue is opened that would explain what needs to be done.

What needs to be done is what we will decide by discussing the different tradeoffs of the alternative choices we have. The reason to have multiple comments and not a big one, is so that we can conclude for each point separately.

Are you interested in doing this with me?

I'm more than interested in trying to discover possible answers to these questions. These questions are the real beauty of programming and the journey to find these answers is what is worth the try, not trying to get a working result.

My point was that this PR started as a simple refactoring to initiate a possible future bigger refactoring/rewrite of things in order to improve maintainability, extensibility and plugability (someone could say it is the same with extensibility). So I think that the discussion should discuss if this refactoring is paving the aforementioned road or not, not try to define every detail of that road here in comments. Designing the details of that road should (imho) be analyzed and discussed in a separate discussion so as to have the bigger picture in mind instead of discussing every detail (because details are a part of the bigger picture). In this case, since this refactoring raised concerns about the way the configuration is read, validated and handled, a separate issue should be opened to discuss how we want these things to change for a new PR to follow. Discussing these in this PR is losing the bigger picture (since we're talking in specific snippets of code) and in a way is denoting (at least in my mind) that this PR is following that path.

c00kiemon5ter · 2019-07-13T01:51:16Z

I'm more than interested in trying to discover possible answers to these questions.

cool; so I'm reopening this 🙂

My point was that this PR started as a simple refactoring to initiate a possible future bigger refactoring/rewrite of things
[we] should discuss if this refactoring is paving the aforementioned road or not

yes and no. For the local context of the mechanisms that handle the configurations, yes; it is fixing parts of the existing way of doing things, which is good. But, in the long run, the config format itself should be revisited. In other words, this PR is accepting that the current data source is correct; while I'm saying that to make this right we should start by modelling the data we get first.

I haven't mentioned that in the rest of the discussion, and it's not what we should focus on now. That's because there are things that will be the same regardless of the config format - it's not like everything needs to be written from scratch. These are the things I'm trying to keep this discussion focused on. Not the details of the handling and parsing, but the structure of the logic that we need in place.
These discussions are not tied to the config itself, but apply elsewhere too.

To be more precise, the questions about the difference of the error messages, who checks for the error, keeping a straight code-path, how to make clear what happened, and getting errors at once about the whole input, are all things that should be rewritten with a data-first approach. (We will see what this means in practice.)

Designing the details of that road should (imho) be analyzed and discussed in a separate discussion so as to have the bigger picture in mind instead of discussing every detail

again, yes and no. Discussions need to start somewhere. Once we figure what we need to change or how, we can write a clean description of what we discussed and to what we arrived. This cannot be done the other way around. Discussion needs to take place first. Then we can summarize and state the tradeoffs and the decision. Then make changes.
If we knew what should be done there would be no need to discuss it. So, I think it is good to have those discussions. Some will lead nowhere, some will lead to immediate changes, and others will unwind until we have a better understanding - at which point we can summarize in a separate email, issue or PR.

peppelinux · 2019-07-17T16:47:42Z

Yesterday I coded this incredible (pythonic?!?) bunch of things:
https://github.com/peppelinux/pyLDAP/blob/master/client.py#L46

Today when I read it my thought has come to you, here... I just want to say that it happened, Should I leave it there? I dunno, but I think that my son shouldn't read it, he should never code in this way, but it exists and it works and I'll never use debugger there. Sorry, I just want to say that things happen and somethime we should let exists :')

ioparaskev · 2019-07-17T19:43:56Z

Yesterday I coded this incredible (pythonic?!?) bunch of things:
https://github.com/peppelinux/pyLDAP/blob/master/client.py#L46

Today when I read it my thought has come to you, here... I just want to say that it happened, Should I leave it there? I dunno, but I think that my son shouldn't read it, he should never code in this way, but it exists and it works and I'll never use debugger there. Sorry, I just want to say that things happen and somethime we should let exists :')

if it makes you feel any better, that line cannot be written as a one-liner with map/filter. but even if you choose to do it in 2 lines, the result is still bad. so it would be better to refactor it unless you want to explain this to your son 😛

peppelinux · 2019-07-17T20:10:51Z

Great! I Just want to explain that everything could be refactored but the ideas are the most important things. Thank you for your suggestion, I'll back to that!

ioparaskev added 2 commits July 10, 2019 23:46

c00kiemon5ter requested changes Jul 11, 2019

View reviewed changes

ioparaskev closed this Jul 11, 2019

c00kiemon5ter reopened this Jul 13, 2019

Refactor config attributes loading #251

Are you sure you want to change the base?

Refactor config attributes loading #251

Uh oh!

Conversation

ioparaskev commented Jul 10, 2019

All Submissions:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peppelinux Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioparaskev Jul 12, 2019 • edited by c00kiemon5ter Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c00kiemon5ter Jul 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioparaskev Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c00kiemon5ter Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c00kiemon5ter Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peppelinux Jul 11, 2019 •

edited

Loading

ioparaskev Jul 12, 2019 •

edited by c00kiemon5ter

Loading

c00kiemon5ter Jul 13, 2019 •

edited

Loading

ioparaskev Jul 15, 2019 •

edited

Loading

c00kiemon5ter Jul 11, 2019 •

edited

Loading

c00kiemon5ter Jul 11, 2019 •

edited

Loading

ioparaskev commented Jul 11, 2019 •

edited

Loading