Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API updates to specify components #88

Merged
merged 14 commits into from
May 3, 2024
Merged

Conversation

rmshkv
Copy link
Contributor

@rmshkv rmshkv commented Mar 19, 2024

Addresses #63.

This adds a layer of keys in config.yml that groups notebooks (and scripts) by ESM component, e.g. atmosphere, ocean, etc. These keys are referenced by flags that can be passed in to cupid-run, e.g. -atm, -ocn, etc., as specified in the README. If no component flags are passed in, all components are run. This can also be done explicitly by using --all or -a.

This update also prompted changes to how we're managing the environment (kernel) checking step that was being done by util.get_control_dict() called by cupid-run. We still check the existence of all environments specified in config.yml (regardless of whether that component is turned on by flags), but just raise a warning if the environment does not exist. If that notebook is specified to be run, another warning is raised and that notebook is not run, but the others still are. Also, if neither a default_kernel_name nor a notebook-specific kernel_name is provided, cupid-analysis is assumed and another warning is raised.

@rmshkv
Copy link
Contributor Author

rmshkv commented Mar 19, 2024

Also, I don't love how the warnings show up currently - you end up with a large block of text mixed in with all the Ploomber stuff and the unhelpful warnings.warn()... code used to generate the warnings gets printed out too. Please feel free to propose a better way if you know one!

@TeaganKing
Copy link
Collaborator

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

@rmshkv
Copy link
Contributor Author

rmshkv commented Mar 19, 2024

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

Ah thanks for the heads up, I didn't realize how far along the timeseries generation work was! I think there are several different ways we could integrate these features...potentially even something like including the timeseries generation code for each component under compute_scripts, which already get turned on or off by the component flags but could then also be affected by the -ts flag? Maybe we can chat about it at the meeting tomorrow.

Copy link
Collaborator

@TeaganKing TeaganKing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally approve these updates; but yes, let's talk more about the ts integration tomorrow too!

cupid/run.py Outdated Show resolved Hide resolved
cupid/run.py Show resolved Hide resolved
examples/coupled_model/config.yml Show resolved Hide resolved
@rmshkv rmshkv marked this pull request as ready for review March 21, 2024 19:01
Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noted a handful of small changes that would help clean things up a bit, and I also have a bigger request: could we create subdirectories in examples/nblibrary to match the new compute_notebook keys and move the appropriate notebooks in? This would potentially lead to some minor tweaks in the file name itself... So

index.ipynb -> infrastructure/index.ipynb
adf_quick_run.ipynb -> atmosphere/quick_run.ipynb
ocean_surface.ipynb -> surface_plots.ipynb
land_comparison.ipynb -> comparison_plots.ipynb
seaice.ipynb -> seaice/[something descriptive].ipynb

We would need JupyterBook to respect this directory structure; maybe infrastructure/index.ipynb -> index.html is a special case, but we'd want ocean/surface_plots.html in case other components also want to have a surface_plots.ipynb notebook.

examples/coupled_model/config.yml Outdated Show resolved Hide resolved
cupid/util.py Outdated Show resolved Hide resolved
cupid/util.py Outdated Show resolved Hide resolved
cupid/util.py Outdated Show resolved Hide resolved
cupid/util.py Outdated Show resolved Hide resolved
cupid/util.py Outdated Show resolved Hide resolved
cupid/util.py Outdated Show resolved Hide resolved

# get toc files; ignore glob expressions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented out this whole block and think it would be fine to remove it - it seems like it's mostly additional checking that's handled fine by Jupyter Book itself, but I want some second opinions before I actually do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also connects to more generally cleaning up setup_book as per #36, but I'm going to save the rest of that for a different PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sure looks like the commented out code is un-necessary -- I vote for removing it altogether (if we realize we need it later, we can grab it from an old commit)

endyr1: 305
begyr2: 245
endyr2: 305
nyears: 25


########### JUPYTER BOOK CONFIG ###########
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Jupyter book section, I went with explicitly specifying the path with the new folder for each one, rather than building in official handling for the "chapter caption" to match the directory structure. I think this keeps a bit more flexibility and makes it more explicit what the Jupyter book is doing (+ avoids an additional layer of hard-codey custom stuff), but we could choose to go the other way if we want.

Copy link
Collaborator

@TeaganKing TeaganKing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this, @rmshkv ! I ran through the notebooks and generated a jupyter book, and this all runs smoothly as far as I can tell.

I think the explicit path specification in the Jupyter Book Config seems clear, and that seems fine to me to remove that commented block that you mentioned.

Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make it very far through the code, and looking at my calendar I might not have time to come back to this until Tuesday... but I'll try to add more comments sooner than that

cupid/run.py Outdated
Comment on lines 142 to 143
if True not in [atmosphere, ocean, land, seaice, landice]:
all = True
Copy link
Collaborator

@mnlevy1981 mnlevy1981 Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a --all option if we are setting all here? What do we want to have happen if a user runs cupid-run --all --atmosphere config.yml? If the answer is "a user shouldn't run with those options together", it makes more sense to drop all from click / the input argument list and just set

all = (True not in [atmosphere, ocean, land, seaice, landice])

cupid/run.py Outdated
Comment on lines 145 to 150
if all:
for comp_name, comp_nbs in control["compute_notebooks"].items():
for nb, info in comp_nbs.items():
all_nbs[nb] = info
all_nbs[nb]['nb_path_root'] = nb_path_root + '/' + comp_name
all_nbs[nb]['output_dir'] = output_dir + '/' + comp_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be in an if all block? Could we do something like

for comp_name, comp_bool in component_options.items():
  if comp_name in control['compute_notebooks']) and (comp_bool or all):
    for nb, info in control['compute_notebooks'][comp_name].items():
      all_nbs[nb] = info
      all_nbs[nb]['nb_path_root'] = nb_path_root + '/' + comp_name
      all_nbs[nb]['output_dir'] = output_dir + '/' + comp_name
  elif comp_bool:
        warnings.warn(f"No notebooks for {comp_name} component specified in config file.")

to encompass both the if all and else portions of the existing code? I think this snippet behaves exactly like what you've written, but there's only one loop over control['compute_notebooks'][comp_name] to update if we change the config.yml schema again.

I haven't tested the above block, so you might need a small tweak but the logic in the if / else block should be sound.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm second-guessing the logic. I think we really want

if comp_name in control['compute_notebooks']) and comp_bool:
  ...
elif comp_bool and not all:
  ...

So if comp_bool is true and the component has notebooks defined in control structure, add those to all_nbs. If comp_bool and not all then the user specified --comp but we're in the else block so the component name does not have notebooks defined in the control structure and that's when we want to warn about requesting a specific component that isn't in the config file.

cupid/run.py Outdated
Comment on lines 186 to 187
if True not in [atmosphere, ocean, land, seaice, landice]:
all = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks we have an identical block of code in 142 / 143, but that's in an if 'compute_notebooks' in control: block and this is in if "compute_scripts" in control:. Could we move the first instance of this out of the if statement (maybe directly after defining component_options) and then remove this occurence?

cupid/run.py Outdated
if True not in [atmosphere, ocean, land, seaice, landice]:
all = True

if all:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as the compute_notebooks block - let's reformulate the if checks and only have a single loop through control['compute_scripts'][comp_name]

@mnlevy1981
Copy link
Collaborator

Most of my comments on this pass-through are related to reducing the amount of duplicated code. There is still a lot of similarities between compute_notebooks and compute_scripts sections, but cleaning that up is probably best left for a separate issue

@rmshkv
Copy link
Contributor Author

rmshkv commented May 2, 2024

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

@rmshkv
Copy link
Contributor Author

rmshkv commented May 2, 2024

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

@mnlevy1981
Copy link
Collaborator

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

@rmshkv
Copy link
Contributor Author

rmshkv commented May 3, 2024

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

I made the change just a minute ago, but it's not too hard to change back later. I also prefer the long names for clarity, but I don't know much about what the timeseries code is doing and it looks like some of the directories it's creating might depend on the component shortname, so I didn't want to touch that myself.

…names to match timeseries, removed -all arg, other code cleanup
@rmshkv
Copy link
Contributor Author

rmshkv commented May 3, 2024

All right, I think that addresses everything requested! Feel free to run some tests again and let me know what you think.

@TeaganKing
Copy link
Collaborator

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

I made the change just a minute ago, but it's not too hard to change back later. I also prefer the long names for clarity, but I don't know much about what the timeseries code is doing and it looks like some of the directories it's creating might depend on the component shortname, so I didn't want to touch that myself.

Just as an FYI, the time series code isn't super dependent on how the components are named; There's a comment in line 285 that would need to be updated, as well as just the names in lines 329, 52, and 232. I'm fine with keeping the short names for now, though.

@TeaganKing
Copy link
Collaborator

All the tests I've run are looking good! I do also agree that updated the logic regarding 'all' is improved when the assumption is that 'all' is the default unless one component is specified, and a flag is not needed to specify 'all'. I think that the lack of a flag clarifies that 'all' is default instead of potential confusion with a default that can also be specified with a flag.

@mnlevy1981
Copy link
Collaborator

mnlevy1981 commented May 3, 2024

The code in run.py looks great and cupid-run config.yml generates output as expected, but cupid-build isn't getting the sidebar right anymore:

cupid book

I was hoping the sidebar would have the name of all the components, and then links to each of the component's notebooks underneath.

I really like how this turned out:

(cupid-dev) examples/coupled_model$ cupid-run -atm -glc config.yml
cupid/run.py:152: UserWarning: No notebooks for glc component specified in config file.

(And adf_quick_run.ipynb was still executed).

@rmshkv
Copy link
Contributor Author

rmshkv commented May 3, 2024

The code in run.py looks great and cupid-run config.yml generates output as expected, but cupid-build isn't getting the sidebar right anymore:

cupid book

I was hoping the sidebar would have the name of all the components, and then links to each of the component's notebooks underneath.

I really like how this turned out:

(cupid-dev) examples/coupled_model$ cupid-run -atm -glc config.yml
cupid/run.py:152: UserWarning: No notebooks for glc component specified in config file.

(And adf_quick_run.ipynb was still executed).

My bad, I just forgot to change the explicit paths in the jupyter book config with the new short folder names. Should be fixed now!

@mnlevy1981
Copy link
Collaborator

My bad, I just forgot to change the explicit paths in the jupyter book config with the new short folder names. Should be fixed now!

After updating to f114b83 I needed to rerun cupid-run and cupid-build, but the page looks great now!

cupid book 2

@mnlevy1981 mnlevy1981 merged commit d14ee72 into NCAR:main May 3, 2024
@rmshkv rmshkv deleted the component-api branch May 6, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants