Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workflow to better align with euro-calliope #16

Merged
merged 4 commits into from
Apr 15, 2021

Conversation

brynpickering
Copy link
Member

The idea here is to bring this workflow up-to-date with processes used in euro-calliope, such that making this a submodule of euro-calliope will be more straightforward.

Still to-do:

  1. Split out all python/shell script rules so that the PYTHON identifier is no longer needed
  2. Update dependencies
  3. Run entire workflow on cluster to ensure it all still works!

Copy link
Member

@timtroendle timtroendle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large change. Did you run it to see if the workflow still works?

Everything related to config is great!

I don't like that this change mixes the old mechanism and the new mechanism. I also don't like that we call the Python package "scripts" as this is confusing and may lead to wrong assumptions of users.

Fixing these things, however, would make this PR even larger and even more complex. So I guess it's fine to leave things as they are but we should, as a next step (in the next PR), introduce the library mechanism of euro-calliope too. All reused functions and definitions should then move to the library so that we minimise (better: get rid of) dependencies between "scripts". Then, we should also remove init.py and end up with an actual folder of scripts.

As soon as you've handled my comments, I'm fine with merging this.

Snakefile Show resolved Hide resolved
Snakefile Outdated Show resolved Hide resolved
Snakefile Outdated Show resolved Hide resolved
rules/capacityfactors.smk Outdated Show resolved Hide resolved
rules/capacityfactors.smk Outdated Show resolved Hide resolved
scripts/shared_coast.py Show resolved Hide resolved
scripts/technically_eligible_capacityfactor.py Outdated Show resolved Hide resolved
rules/potential.smk Outdated Show resolved Hide resolved
scripts/administrative_borders.py Show resolved Hide resolved
config/default.yaml Show resolved Hide resolved
@brynpickering
Copy link
Member Author

This is a large change. Did you run it to see if the workflow still works?

See my PR message, it's one of the to-dos!

I don't like that this change mixes the old mechanism and the new mechanism.

Also one of the to-dos, to separate out the few rules that currently call python from within a shell script, so that the python functions can be called with script: directly, then all old mechanism can go.

I also don't like that we call the Python package "scripts" as this is confusing and may lead to wrong assumptions of users.

Is there any real reason to have it be a Python package? To my mind the scripts within scripts are just that. Each rule calls a single file. There are occasional cross dependencies and there is an __init__.py, but it's more confusing to me to consider it a package than a collection of scripts accessed by the snakemake workflow.

Fixing these things, however, would make this PR even larger and even more complex. So I guess it's fine to leave things as they are but we should, as a next step (in the next PR), introduce the library mechanism of euro-calliope too. All reused functions and definitions should then move to the library so that we minimise (better: get rid of) dependencies between "scripts". Then, we should also remove init.py and end up with an actual folder of scripts.

This answers my comment above really, I like the sound of this approach in a new PR.

@timtroendle
Copy link
Member

RE scripts/Python package: I think we are on the same line here. As long as there are dependencies between the "scripts", we need the Python package. But it should go as soon as we have the solarandwindpotentialslib.

Btw, maybe that lib (and the repo?) should be renamed to solarandwindandhydroandbiofuelspotentialslib at one point?

@brynpickering
Copy link
Member Author

@timtroendle I just noticed that population.py seems to be unconnected from any rule - should the whole file be removed?

@timtroendle
Copy link
Member

I removed the rule using this file in 5a77847 -- when cutting this repo out of my Home made study. That makes sense: population was only relevant for estimating electricity demand -- which is not part of this repo.

@brynpickering
Copy link
Member Author

@timtroendle this is now ready to go. It builds every file required by all as well as the ninja input files. However, tests won't run. This is due to the current directory structure not fitting with the snakemake script functionality. I would handle this in another PR.

@brynpickering
Copy link
Member Author

Btw, maybe that lib (and the repo?) should be renamed to solarandwindandhydroandbiofuelspotentialslib at one point?

Sounds perfect, if only there was a readily agreed upon group name for these technologies...

@brynpickering
Copy link
Member Author

(btw, I know you said this could be merged once comments were addressed. The request for re-review is just to confirm that this can be merged with the test directory to be fixed at a later stage)

@timtroendle
Copy link
Member

Sounds perfect, if only there was a readily agreed upon group name for these technologies...

technologieswhichdontrequirefossilfuelsandthereforedontemitgreenhousegasemissionslib?

@timtroendle
Copy link
Member

(btw, I know you said this could be merged once comments were addressed. The request for re-review is just to confirm that this can be merged with the test directory to be fixed at a later stage)

Not very comfortable with that, but if you promise to tackle this soon, let's do it this way.

@brynpickering
Copy link
Member Author

I think some more restructuring is needed to get tests working again, so I'd prefer not to bloat this PR further with it. Will sort soon, promise!

@brynpickering brynpickering merged commit 3ca461b into main Apr 15, 2021
Copy link
Member

@timtroendle timtroendle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very comfortable with tests not running, but if you promise to tackle this soon, let's do it this way.

One thing I think should be changed is that you give Snakemake objects as inputs into almost all functions. Can you change that as I requested? (This is also in line with the way we do it in euro-calliope).

@@ -45,45 +29,45 @@ def potentials(path_to_units, path_to_eez, path_to_shared_coast,
* allocate the offshore potentials to exclusive economic zones (EEZ),
* allocate the offshore potential of EEZ to units based on the fraction of shared coast.
"""
with rasterio.open(path_to_eligibility_categories, "r") as src:
with rasterio.open(str(path_to_eligibility_categories), "r") as src:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this too much because this is a Snakemake detail. Rather solve it beofre calling this function.

snakemake.input.capacity_pv[0]

ids = f_ids.read(1)
meta = f_ids.meta
averages = map_id_to_average_capacity_factor(ids, path_to_timeseries, meta["nodata"])
meta["dtype"] = DTYPE
meta["nodata"] = NODATA
with rasterio.open(path_to_output, "w", **meta) as f_avg:
with rasterio.open(str(path_to_output), "w", **meta) as f_avg:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@timtroendle
Copy link
Member

Haha shit, seems I was too slow.

@timtroendle
Copy link
Member

Can you mabye tackle this when you fix tests?

@brynpickering
Copy link
Member Author

haha, will clean now! I actually preferred dealing with it at the final point, since there was otherwise mixing of when [0] would crop up. Sometimes it's in the snakefile, sometime's it's in the arguments being passed to the function in each python file. There should be some agreement on when we point to the first element of a 1-element snakemake NamedList object. Preferences?

@timtroendle
Copy link
Member

True! Actually the cleanest is to have it in the Snakefile, no?

@brynpickering
Copy link
Member Author

Sadly, I don't think you can do that in the snakefile unless you're pointing directly to another rule (i.e. it doesn't work with any outputs and it doesn't work with inputs that are defined as strings). The next available option is to put them all in the function call.

@timtroendle
Copy link
Member

But all of this only happens when you use rule.xx.output, no? Otherwise inputs are strings in the first place.

@timtroendle
Copy link
Member

It seems to me that euro-calliope handles this consistently (or almost consistently) wthin the Snakefiles (for inputs).

@brynpickering
Copy link
Member Author

ok yeah. Maybe the cleanest option is to always name outputs. That way you can refer to that name when you use a rule output as an input and you can refer to that name when you refer to an output in the python files.

@timtroendle
Copy link
Member

Maybe, yes. I don't have a strong opinion when it comes to naming or (int)-indexing.

@timtroendle timtroendle deleted the align-to-euro-calliope branch April 15, 2021 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants