[OUTDATED] Major python refactoring #108

jmhvandoorn · 2022-10-01T23:49:32Z

I want to propose a somewhat bigger refactoring to the python files regarding:

Improve logic in file and function names
Improve import consistency
Prepare for adding/testing additional strategies
Remove redundant benchmark files
Get rid of having the static solver in four different places.

Since it might be quite a change, I am very curious what you think about it.

jmhvandoorn · 2022-10-01T23:57:06Z

The commit 8958291, that moves the solver into a function that returns the result, raised some issues. Possibly related to #95

Euro-NeurIPS-2022/benchmark.py

Lines 42 to 45 in 386f07e

    
           res = static.solve_static(instance, seed=seed, **kwargs) 
        
           best = res.get_best_found() 
        
           routes = [route for route in best.get_routes() if route]

"solve_dynamic.py", line 29, in solve_epoch
    routes = [route for route in best.get_routes() if route]
RuntimeError: Could not allocate list object!

Somehow it seems that we cannot always access this value anymore after the function containing the solver gets out of scope.

jmhvandoorn · 2022-10-02T08:01:53Z

I will continue on this refactoring later today or tomorrow.
Made the pull request mainly because I hoped getting help on why this code snippet does not work:

import tools

hgspy = tools.get_hgspy_module()


def solve_static(
    instance,
    *,
    seed=1,
    max_runtime=None,
    max_iterations=None,
    initial_solutions=(),
    excl_ops=(),
    solver_config=None,
    **kwargs
):
    config = hgspy.Config(
        seed=seed, **solver_config if solver_config is not None else {}
    )
    params = hgspy.Params(config, **tools.inst_to_vars(instance))

    rng = hgspy.XorShift128(seed=seed)
    pop = hgspy.Population(params, rng)

    for sol in initial_solutions:
        pop.add_individual(hgspy.Individual(params, sol))

    ls = hgspy.LocalSearch(params, rng)

    node_ops = [
        hgspy.operators.Exchange10(params),
        hgspy.operators.Exchange11(params),
        hgspy.operators.Exchange20(params),
        hgspy.operators.MoveTwoClientsReversed(params),
        hgspy.operators.Exchange21(params),
        hgspy.operators.Exchange22(params),
        hgspy.operators.TwoOpt(params),
    ]

    for op in node_ops:
        if not any(isinstance(op, excl) for excl in excl_ops):
            ls.add_node_operator(op)

    route_ops = [
        hgspy.operators.RelocateStar(params),
        hgspy.operators.SwapStar(params),
    ]

    for op in route_ops:
        if not any(isinstance(op, excl) for excl in excl_ops):
            ls.add_route_operator(op)

    algo = hgspy.GeneticAlgorithm(params, rng, pop, ls)

    crossover_ops = [
        hgspy.crossover.broken_pairs_exchange,
        hgspy.crossover.selective_route_exchange,
    ]

    for op in crossover_ops:
        if op not in excl_ops:
            algo.add_crossover_operator(op)

    if max_runtime is not None:
        stop = hgspy.stop.MaxRuntime(max_runtime)
    else:
        stop = hgspy.stop.MaxIterations(max_iterations)

    return algo.run(stop)

instance = tools.read_vrplib("instances/ORTEC-VRPTW-ASYM-5fa16580-d1-n329-k25.txt")
res = solve_static(instance, max_iterations=25)
best = res.get_best_found()
routes = [route for route in best.get_routes() if route]
cost = best.cost()

print(cost, routes)

And raises a memory allocation error for routes (see prev comment)

leonlan

This is a much needed re-organization of the code base. Overall I think the proposed changes are in the right direction. I think one particular point that we need to discuss in more detail is how to manage all the different solver-configs and operators, so that it can also be tuned more easily.

leonlan · 2022-10-02T08:02:26Z

static/solver.py

+def solve_static(
+    instance,
+    *,
+    seed=1,
+    max_runtime=None,
+    max_iterations=None,
+    initial_solutions=(),
+    excl_ops=(),
+    solver_config=None,
+    **kwargs
+):


Let's discuss what we need for solving static instances. In the bigger picture, there are 3 static problems that need to be solved:

The static instance (3-15 minutes)

The dynamic epoch instance (1-2 minutes)

The rollout simulation instance (0-1 seconds, or max iterations)

I suspect that all three will have different configs, different operators. Maybe the first two can be using the same solver, but for the last one we really need a different one.

See comment below

leonlan · 2022-10-02T08:10:23Z

dynamic/strategies/__init__.py

+from .baselines import greedy, random, lazy
+from .rollout import rollout_count


Suggestion: I think it works easier to have dynamic/baselines and dynamic/rollout. The strategies layer is redundant because we only have two strategies.

The main reason to introduce it, is as a namespace and a generic way to lookup strategies.

See the part getattr(dynamic.strategies, "name")

We could also do that without the namespace, but then we should add a list of strings and check if the strategy exist. To prevent that you could accidently call any other methods as strategy. That's why I proposed this as a cleaner way

We could also do that without the namespace, but then we should add a list of strings and check if the strategy exist. To prevent that you could accidently call any other methods as strategy. That's why I proposed this as a cleaner way

There are other ways. In particular, I am a big fan of explicitly grouping things as follows in module-level init files:

import s1 import s2 ... STRATEGIES = { "name1": s1, "name2": s2, ... }

Then all that's needed outside this module is

from module import STRATEGIES

The added benefit of this over getattr on an imported module is that it explicitly decouples external strategy names from their implementation name, and does not expose all sorts of unrelated fields on the module object which getattr could access. With a dict, we can also use STRATEGIES.keys() as the only allowed choices for the argument parser option. That would completely remove the need for a try/except around the strategy.

Sounds good, will do it like that!

leonlan · 2022-10-02T08:18:52Z

dynamic/__init__.py

+from . import strategies
+from . import utils
+
+from .solver import solve_dynamic, solve_epoch, run_oracle


Suggestion: having solve_dynamic and run_oracle in separate files like before makes it easier to manage changes and is more clear in intent.

Not sure what's best here. I would not prefer having a single file per function in general, if things are logical to group.

But we could indeed put it in separate files and just group them in the init file, like the strategies.

@N-Wouda what would you suggest?

I normally do one public (=to be used by code outside this file) definition per file. So one class, or one function. If the function needs helpers that are not exported, then they can exist in the same file.

If I have multiple public functions that are logically grouped, I'd put them in a package and organise the imports nicely in the package's __init__.py, so that the rest of the code can import from the package as if it were a single module.

One definition per file has some precedent in other languages:

Java requires one class per file. Java might be the only language that's this strict about it.

C/C++ people often put one function or class in a single file because then it's in its own translation unit, and the compiler can compile it faster in parallel. The reasoning here's more about pragmatism than style, I think, although I know some people that take this so far that every class method is in its own file, too (so if a class has 10 methods, there'll be ten separate source files for it). But that's the exception, not the rule.

I think the folks over at Node.js also do it this way, but my experience with backend JS is very limited.
But in Python there does not seem to be consensus: some people do it this way, others bundle everything up in one file.

I personally care more about files and functions not being too large and logically split, and modules and packages exposing sane APIs, than that I do about there being more than one function in a single file. But that's a bit inconsistent of me, because I would for example never accept more than one class in a single file :-).

See #113 and let's see if we can get to some structure everyone agrees on

run_oracle and solve_dynamic belong to the module dynamic/ but not to the same file in my opinion. A single solver file with multiple dynamic solvers is not clear enough: what solvers are available? My general logic for this is that a single file provides 1 function(ality), with some exceptions such as utils.py.

In that case, we don't even have to group them in the init file but can just expose directly:

from .solve_dynamic import solve_dynamic from .run_oracle import run_oracle

etc.

leonlan · 2022-10-02T08:25:12Z

dynamic/strategies/rollout/__init__.py

+with open("dynamic/strategies/rollout/config.yml") as file:
+    config = SimpleNamespace(**yaml.safe_load(file))
+
+from .algorithms import rollout_count


Suggestion: for each rollout variant, it's easier to manage if we make a separate files instead of packing it into algorithms.

Not sure what's best here. I would not prefer having a single file per function in general, if things are logical to group.

But we could indeed put it in separate files and just group them in the init file, like the strategies.

@N-Wouda what would you suggest?

I guess the same holds here

leonlan · 2022-10-02T08:44:12Z

FYI: flake8 is now also added to the CI. So in addition to running black, it's now required to run flake8 on the codebase and to address these errors to get a pass on the CI.

jmhvandoorn · 2022-10-02T10:00:21Z

This is a much needed re-organization of the code base. Overall I think the proposed changes are in the right direction. I think one particular point that we need to discuss in more detail is how to manage all the different solver-configs and operators, so that it can also be tuned more easily.

Agreed. I have some suggestion for that. I will elaborate on that later. To summarize, I would suggest a config file for each of the three cases you mention. + the possibility to pass a different config file name as argument for testing / tuning

N-Wouda · 2022-10-02T10:01:22Z

The commit 8958291, that moves the solver into a function that returns the result, raised some issues. Possibly related to #95

Euro-NeurIPS-2022/benchmark.py

Lines 42 to 45 in 386f07e

res = static.solve_static(instance, seed=seed, **kwargs)

best = res.get_best_found()

routes = [route for route in best.get_routes() if route]
"solve_dynamic.py", line 29, in solve_epoch
    routes = [route for route in best.get_routes() if route]
RuntimeError: Could not allocate list object!
Somehow it seems that we cannot always access this value anymore after the function containing the solver gets out of scope.

The good news is that this is unrelated to #95. This is just how cpp object lifetimes work. Result takes a reference to the best individual. That reference is by definition not owning - the individual remains owned by Population. When the Population goes out of scope, it deletes all its owned individuals, causing the reference to become invalid. Thus accessing data from that now-meaningless individual does not work anymore.

We cannot transfer ownership, so the only way around this is to either copy the individual into Result, or keep the population in scope when accessing the solver results. I'm leaning towards copying, because that's easy to do.

N-Wouda · 2022-10-02T10:13:45Z

Since it might be quite a change, I am very curious what you think about it.

My initial take on this is that I'm happy we're removing most of the other baseline strategies. I'm unhappy about the lack of cohesion in this PR - it's doing too many things at once. From what I can tell, those things include (at least): removing baselines, changing config management, and removing various duplications. All those should have been their own PRs.

I will continue on this refactoring later today or tomorrow.

This PR is big as-is. Can we work towards getting this thing merged in (possibly in reduced form), rather than adding additional changes here? A sequence of more focused PRs is much more manageable for me (and, I think, Leon) to think about and review.

leonlan · 2022-10-02T10:40:14Z

I agree with @N-Wouda. I suggest to split this PR into the parts that Niels suggested:

removing baselines, changing config management, and removing various duplications.

In general, it would also be very helpful if you make issues for each of the proposed changes before you open a PR. That way we can all discuss about what needs to be done and have more effective reviews. Ideally, the issues should be the place for discussions about what and how to change, and then in the PR we can focus on specific implementation details.

N-Wouda · 2022-10-02T10:17:54Z

benchmark_dynamic.py

+    if args.strategy.startswith("rollout"):
+        from dynamic.strategies.rollout import constants


Question: are we going to have more than one rollout? I suspect we'll just be changing the existing version, rather than write multiple different ones. Similar to how we're altering the static solver (of which we also do not have multiple versions).

We still need to experiment a lot, so having the option to include multiple rollout versions is something I would prefer.

Agreed with Leon. At least I have one ready on the shelf I want to test

N-Wouda · 2022-10-02T10:24:45Z

dynamic/strategies/__init__.py

+from .baselines import greedy, random, lazy
+from .rollout import rollout_count


We could also do that without the namespace, but then we should add a list of strings and check if the strategy exist. To prevent that you could accidently call any other methods as strategy. That's why I proposed this as a cleaner way

There are other ways. In particular, I am a big fan of explicitly grouping things as follows in module-level init files:

import s1 import s2 ... STRATEGIES = { "name1": s1, "name2": s2, ... }

Then all that's needed outside this module is

from module import STRATEGIES

The added benefit of this over getattr on an imported module is that it explicitly decouples external strategy names from their implementation name, and does not expose all sorts of unrelated fields on the module object which getattr could access. With a dict, we can also use STRATEGIES.keys() as the only allowed choices for the argument parser option. That would completely remove the need for a try/except around the strategy.

N-Wouda · 2022-10-02T10:30:54Z

dynamic/strategies/rollout/__init__.py

+import yaml
+
+with open("dynamic/strategies/rollout/config.yml") as file:
+    config = SimpleNamespace(**yaml.safe_load(file))


Question: can we keep config management in Python, e.g. via dicts? It's much easier to tune without a dependency on a config file in another format.

We certainly could. Main reason to not use python files as config files in general is that any code that's put in there will be executed and thus it's very unsafe. While yaml files are basically (nested) dictionaries in text format. But for this project, I am fine with python files as well.

Funnily enough, that's exactly how I feel about yaml files and its insane standard that's much, much more than just dictionaries. I like toml files a lot more!

Totally fine with toml files as well!

N-Wouda · 2022-10-02T10:33:08Z

dynamic/strategies/rollout/algorithms.py



-def rollout(info, obs, rng):
+def rollout_count(info, obs, rng):


Refactor: here, and in other places, can we inject the config into this function? That would make tuning much easier.

N-Wouda · 2022-10-02T10:37:29Z

static/solver.py

+def solve_static(
+    instance,
+    *,
+    seed=1,
+    max_runtime=None,
+    max_iterations=None,
+    initial_solutions=(),
+    excl_ops=(),
+    solver_config=None,
+    **kwargs
+):


About this signature:

Can we write out exclude_operators in full? It's currently the only abbreviated argument in the whole parameter list.

kwargs and solver_config seems to me to be the same thing (and this function does not currently appear to use kwargs). Can we remove solver_config, and add a docstring stating something like "any additional keyword arguments are passed to the static solver configuration"?

Sure

The reason for that is the following: We currently have some places where we parse the input arguments and pass them as kwargs to all other functions, meaning functions also get kwargs that they do not use. However, putting unknown arguments to the solver raises issues. That's why there is a seperate variable for the solver configuration, and **kwargs just "catches" all other arguments that are passed but not used. Agreed that this is all not very clear, but at that moment the only way to get things to work properly

N-Wouda · 2022-10-02T10:39:30Z

static/solver.py

+        if op not in excl_ops:
+            algo.add_crossover_operator(op)


Issue?: should this also use the isinstance stuff from above?

I don't think so. The crossover operators are functions, that you could just check like this.
isinstance is needed, because the other operators are objects of a certain class

N-Wouda · 2022-10-02T10:43:25Z

static/solver.py

+    node_ops = [
+        hgspy.operators.Exchange10(params),
+        hgspy.operators.Exchange11(params),
+        hgspy.operators.Exchange20(params),
+        hgspy.operators.MoveTwoClientsReversed(params),
+        hgspy.operators.Exchange21(params),
+        hgspy.operators.Exchange22(params),
+        hgspy.operators.TwoOpt(params),
+    ]
+
+    for op in node_ops:
+        if not any(isinstance(op, excl) for excl in excl_ops):
+            ls.add_node_operator(op)
+
+    route_ops = [
+        hgspy.operators.RelocateStar(params),
+        hgspy.operators.SwapStar(params),
+    ]
+
+    for op in route_ops:
+        if not any(isinstance(op, excl) for excl in excl_ops):
+            ls.add_route_operator(op)
+
+    algo = hgspy.GeneticAlgorithm(params, rng, pop, ls)
+
+    crossover_ops = [
+        hgspy.crossover.broken_pairs_exchange,
+        hgspy.crossover.selective_route_exchange,
+    ]
+
+    for op in crossover_ops:
+        if op not in excl_ops:
+            algo.add_crossover_operator(op)


This particular implementation allows selective filtering away of operators, but not changing their ordering. If we allow passing these in as node_ops = (), route_ops = (), and crossover_ops = () arguments, we can always set our defaults when these lists are empty. Something like:

if not node_ops: node_ops = [ hgspy.operators.Exchange10(params), hgspy.operators.Exchange11(params), hgspy.operators.Exchange20(params), hgspy.operators.MoveTwoClientsReversed(params), hgspy.operators.Exchange21(params), hgspy.operators.Exchange22(params), hgspy.operators.TwoOpt(params), ]

etc. That leaves full control to the call site in case something else is needed.

See #114 for further discussion

jmhvandoorn · 2022-10-03T09:13:32Z

I agree with @N-Wouda. I suggest to split this PR into the parts that Niels suggested:

removing baselines, changing config management, and removing various duplications.

In general, it would also be very helpful if you make issues for each of the proposed changes before you open a PR. That way we can all discuss about what needs to be done and have more effective reviews. Ideally, the issues should be the place for discussions about what and how to change, and then in the PR we can focus on specific implementation details.

Agreed, sorry for that. Has something todo with this being my first ever project using the full scope of possibilities in github 😅

I would propose the following for now:

Keep this PR for the discussions until now.
I will make an issue for the things we need to discuss further.
I will make seperate branches & pr's to get them in one by one.

Do you agree that is the right way forward now?

leonlan · 2022-10-03T10:02:21Z

Do you agree that is the right way forward now?

Sounds good! :-)

jmhvandoorn added 8 commits October 1, 2022 23:16

Remove unused dqn and supervised benchmark strategies

d870108

Move stategies to generic submodule

55104d3

Refactor imports & strategy lookup

e20a07b

Refactor more file names and inputs

fc0371f

Move constants to config file

1046b92

Remove duplicate static solvers into single function

8958291

Run black

f81b6f6

Rename static.solver and dynamic.solver

386f07e

jmhvandoorn requested review from N-Wouda and leonlan October 1, 2022 23:49

jmhvandoorn changed the title ~~Major python refactoring~~ [WIP] Major python refactoring Oct 1, 2022

Move run_oracle into dynamic.solver

4bd197f

leonlan suggested changes Oct 2, 2022

View reviewed changes

N-Wouda reviewed Oct 2, 2022

View reviewed changes

This was referenced Oct 3, 2022

TODOs in code #82

Closed

Improve rollout dispatching criteria #101

Closed

Handle config loading in tools function

de2c5f1

N-Wouda mentioned this pull request Oct 3, 2022

[Refactoring step 2] Introduce generic static solver #117

Closed

jmhvandoorn changed the title ~~[WIP] Major python refactoring~~ [OUTDATED] Major python refactoring Oct 5, 2022

jmhvandoorn closed this Oct 5, 2022

jmhvandoorn deleted the major-python-refactoring branch October 5, 2022 12:41

		from .baselines import greedy, random, lazy
		from .rollout import rollout_count

		if args.strategy.startswith("rollout"):
		from dynamic.strategies.rollout import constants



		def rollout(info, obs, rng):
		def rollout_count(info, obs, rng):

[OUTDATED] Major python refactoring #108

[OUTDATED] Major python refactoring #108

Conversation

jmhvandoorn commented Oct 1, 2022 • edited Loading

jmhvandoorn commented Oct 1, 2022 • edited Loading

jmhvandoorn commented Oct 2, 2022

leonlan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

N-Wouda Oct 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonlan Oct 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonlan commented Oct 2, 2022

jmhvandoorn commented Oct 2, 2022

N-Wouda commented Oct 2, 2022 • edited Loading

N-Wouda commented Oct 2, 2022

leonlan commented Oct 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

N-Wouda Oct 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmhvandoorn commented Oct 3, 2022

leonlan commented Oct 3, 2022

jmhvandoorn commented Oct 1, 2022 •

edited

Loading

jmhvandoorn commented Oct 1, 2022 •

edited

Loading

N-Wouda Oct 2, 2022 •

edited

Loading

leonlan Oct 2, 2022 •

edited

Loading

N-Wouda commented Oct 2, 2022 •

edited

Loading

leonlan commented Oct 2, 2022 •

edited

Loading

N-Wouda Oct 2, 2022 •

edited

Loading