-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API updates to specify components #88
Conversation
…d to automatically run all if no options specified
Also, I don't love how the warnings show up currently - you end up with a large block of text mixed in with all the Ploomber stuff and the unhelpful |
Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run |
Ah thanks for the heads up, I didn't realize how far along the timeseries generation work was! I think there are several different ways we could integrate these features...potentially even something like including the timeseries generation code for each component under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally approve these updates; but yes, let's talk more about the ts
integration tomorrow too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noted a handful of small changes that would help clean things up a bit, and I also have a bigger request: could we create subdirectories in examples/nblibrary
to match the new compute_notebook
keys and move the appropriate notebooks in? This would potentially lead to some minor tweaks in the file name itself... So
index.ipynb -> infrastructure/index.ipynb
adf_quick_run.ipynb -> atmosphere/quick_run.ipynb
ocean_surface.ipynb -> surface_plots.ipynb
land_comparison.ipynb -> comparison_plots.ipynb
seaice.ipynb -> seaice/[something descriptive].ipynb
We would need JupyterBook to respect this directory structure; maybe infrastructure/index.ipynb -> index.html
is a special case, but we'd want ocean/surface_plots.html
in case other components also want to have a surface_plots.ipynb
notebook.
|
||
# get toc files; ignore glob expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented out this whole block and think it would be fine to remove it - it seems like it's mostly additional checking that's handled fine by Jupyter Book itself, but I want some second opinions before I actually do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also connects to more generally cleaning up setup_book as per #36, but I'm going to save the rest of that for a different PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sure looks like the commented out code is un-necessary -- I vote for removing it altogether (if we realize we need it later, we can grab it from an old commit)
endyr1: 305 | ||
begyr2: 245 | ||
endyr2: 305 | ||
nyears: 25 | ||
|
||
|
||
########### JUPYTER BOOK CONFIG ########### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Jupyter book section, I went with explicitly specifying the path with the new folder for each one, rather than building in official handling for the "chapter caption" to match the directory structure. I think this keeps a bit more flexibility and makes it more explicit what the Jupyter book is doing (+ avoids an additional layer of hard-codey custom stuff), but we could choose to go the other way if we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this, @rmshkv ! I ran through the notebooks and generated a jupyter book, and this all runs smoothly as far as I can tell.
I think the explicit path specification in the Jupyter Book Config seems clear, and that seems fine to me to remove that commented block that you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make it very far through the code, and looking at my calendar I might not have time to come back to this until Tuesday... but I'll try to add more comments sooner than that
cupid/run.py
Outdated
if True not in [atmosphere, ocean, land, seaice, landice]: | ||
all = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a --all
option if we are setting all
here? What do we want to have happen if a user runs cupid-run --all --atmosphere config.yml
? If the answer is "a user shouldn't run with those options together", it makes more sense to drop all
from click
/ the input argument list and just set
all = (True not in [atmosphere, ocean, land, seaice, landice])
cupid/run.py
Outdated
if all: | ||
for comp_name, comp_nbs in control["compute_notebooks"].items(): | ||
for nb, info in comp_nbs.items(): | ||
all_nbs[nb] = info | ||
all_nbs[nb]['nb_path_root'] = nb_path_root + '/' + comp_name | ||
all_nbs[nb]['output_dir'] = output_dir + '/' + comp_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be in an if all
block? Could we do something like
for comp_name, comp_bool in component_options.items():
if comp_name in control['compute_notebooks']) and (comp_bool or all):
for nb, info in control['compute_notebooks'][comp_name].items():
all_nbs[nb] = info
all_nbs[nb]['nb_path_root'] = nb_path_root + '/' + comp_name
all_nbs[nb]['output_dir'] = output_dir + '/' + comp_name
elif comp_bool:
warnings.warn(f"No notebooks for {comp_name} component specified in config file.")
to encompass both the if all
and else
portions of the existing code? I think this snippet behaves exactly like what you've written, but there's only one loop over control['compute_notebooks'][comp_name]
to update if we change the config.yml
schema again.
I haven't tested the above block, so you might need a small tweak but the logic in the if
/ else
block should be sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm second-guessing the logic. I think we really want
if comp_name in control['compute_notebooks']) and comp_bool:
...
elif comp_bool and not all:
...
So if comp_bool
is true and the component has notebooks defined in control structure, add those to all_nbs
. If comp_bool and not all
then the user specified --comp
but we're in the else block so the component name does not have notebooks defined in the control structure and that's when we want to warn about requesting a specific component that isn't in the config file.
cupid/run.py
Outdated
if True not in [atmosphere, ocean, land, seaice, landice]: | ||
all = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks we have an identical block of code in 142 / 143, but that's in an if 'compute_notebooks' in control:
block and this is in if "compute_scripts" in control:
. Could we move the first instance of this out of the if statement (maybe directly after defining component_options
) and then remove this occurence?
cupid/run.py
Outdated
if True not in [atmosphere, ocean, land, seaice, landice]: | ||
all = True | ||
|
||
if all: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the compute_notebooks
block - let's reformulate the if checks and only have a single loop through control['compute_scripts'][comp_name]
Most of my comments on this pass-through are related to reducing the amount of duplicated code. There is still a lot of similarities between |
I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in. |
On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to. |
If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :) |
I made the change just a minute ago, but it's not too hard to change back later. I also prefer the long names for clarity, but I don't know much about what the timeseries code is doing and it looks like some of the directories it's creating might depend on the component shortname, so I didn't want to touch that myself. |
…names to match timeseries, removed -all arg, other code cleanup
All right, I think that addresses everything requested! Feel free to run some tests again and let me know what you think. |
Just as an FYI, the time series code isn't super dependent on how the components are named; There's a comment in line 285 that would need to be updated, as well as just the names in lines 329, 52, and 232. I'm fine with keeping the short names for now, though. |
All the tests I've run are looking good! I do also agree that updated the logic regarding 'all' is improved when the assumption is that 'all' is the default unless one component is specified, and a flag is not needed to specify 'all'. I think that the lack of a flag clarifies that 'all' is default instead of potential confusion with a default that can also be specified with a flag. |
The code in I was hoping the sidebar would have the name of all the components, and then links to each of the component's notebooks underneath. I really like how this turned out:
(And |
My bad, I just forgot to change the explicit paths in the jupyter book config with the new short folder names. Should be fixed now! |
After updating to f114b83 I needed to rerun |
Addresses #63.
This adds a layer of keys in
config.yml
that groups notebooks (and scripts) by ESM component, e.g.atmosphere
,ocean
, etc. These keys are referenced by flags that can be passed in tocupid-run
, e.g.-atm
,-ocn
, etc., as specified in the README. If no component flags are passed in, all components are run. This can also be done explicitly by using--all
or-a
.This update also prompted changes to how we're managing the environment (kernel) checking step that was being done by
util.get_control_dict()
called bycupid-run
. We still check the existence of all environments specified inconfig.yml
(regardless of whether that component is turned on by flags), but just raise a warning if the environment does not exist. If that notebook is specified to be run, another warning is raised and that notebook is not run, but the others still are. Also, if neither adefault_kernel_name
nor a notebook-specifickernel_name
is provided,cupid-analysis
is assumed and another warning is raised.