API updates to specify components #88

rmshkv · 2024-03-19T22:21:03Z

Addresses #63.

This adds a layer of keys in config.yml that groups notebooks (and scripts) by ESM component, e.g. atmosphere, ocean, etc. These keys are referenced by flags that can be passed in to cupid-run, e.g. -atm, -ocn, etc., as specified in the README. If no component flags are passed in, all components are run. This can also be done explicitly by using --all or -a.

This update also prompted changes to how we're managing the environment (kernel) checking step that was being done by util.get_control_dict() called by cupid-run. We still check the existence of all environments specified in config.yml (regardless of whether that component is turned on by flags), but just raise a warning if the environment does not exist. If that notebook is specified to be run, another warning is raised and that notebook is not run, but the others still are. Also, if neither a default_kernel_name nor a notebook-specific kernel_name is provided, cupid-analysis is assumed and another warning is raised.

…d to automatically run all if no options specified

rmshkv · 2024-03-19T22:31:13Z

Also, I don't love how the warnings show up currently - you end up with a large block of text mixed in with all the Ploomber stuff and the unhelpful warnings.warn()... code used to generate the warnings gets printed out too. Please feel free to propose a better way if you know one!

TeaganKing · 2024-03-19T22:35:32Z

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

rmshkv · 2024-03-19T22:49:43Z

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

Ah thanks for the heads up, I didn't realize how far along the timeseries generation work was! I think there are several different ways we could integrate these features...potentially even something like including the timeseries generation code for each component under compute_scripts, which already get turned on or off by the component flags but could then also be affected by the -ts flag? Maybe we can chat about it at the meeting tomorrow.

TeaganKing

I generally approve these updates; but yes, let's talk more about the ts integration tomorrow too!

cupid/run.py

examples/coupled_model/config.yml

mnlevy1981

I've noted a handful of small changes that would help clean things up a bit, and I also have a bigger request: could we create subdirectories in examples/nblibrary to match the new compute_notebook keys and move the appropriate notebooks in? This would potentially lead to some minor tweaks in the file name itself... So

index.ipynb -> infrastructure/index.ipynb
adf_quick_run.ipynb -> atmosphere/quick_run.ipynb
ocean_surface.ipynb -> surface_plots.ipynb
land_comparison.ipynb -> comparison_plots.ipynb
seaice.ipynb -> seaice/[something descriptive].ipynb

We would need JupyterBook to respect this directory structure; maybe infrastructure/index.ipynb -> index.html is a special case, but we'd want ocean/surface_plots.html in case other components also want to have a surface_plots.ipynb notebook.

examples/coupled_model/config.yml

cupid/util.py

…lement the subdirectory handling

rmshkv · 2024-04-16T17:24:49Z

cupid/util.py


-    # get toc files; ignore glob expressions


I commented out this whole block and think it would be fine to remove it - it seems like it's mostly additional checking that's handled fine by Jupyter Book itself, but I want some second opinions before I actually do that

This also connects to more generally cleaning up setup_book as per #36, but I'm going to save the rest of that for a different PR

It sure looks like the commented out code is un-necessary -- I vote for removing it altogether (if we realize we need it later, we can grab it from an old commit)

rmshkv · 2024-04-16T17:31:45Z

examples/coupled_model/config.yml

+            endyr1: 305
+            begyr2: 245
+            endyr2: 305
+            nyears: 25


 ########### JUPYTER BOOK CONFIG ###########


For the Jupyter book section, I went with explicitly specifying the path with the new folder for each one, rather than building in official handling for the "chapter caption" to match the directory structure. I think this keeps a bit more flexibility and makes it more explicit what the Jupyter book is doing (+ avoids an additional layer of hard-codey custom stuff), but we could choose to go the other way if we want.

TeaganKing

Thanks for all your work on this, @rmshkv ! I ran through the notebooks and generated a jupyter book, and this all runs smoothly as far as I can tell.

I think the explicit path specification in the Jupyter Book Config seems clear, and that seems fine to me to remove that commented block that you mentioned.

mnlevy1981

I didn't make it very far through the code, and looking at my calendar I might not have time to come back to this until Tuesday... but I'll try to add more comments sooner than that

mnlevy1981 · 2024-04-25T22:45:05Z

cupid/run.py

+        if True not in [atmosphere, ocean, land, seaice, landice]:
+            all = True


Do we need a --all option if we are setting all here? What do we want to have happen if a user runs cupid-run --all --atmosphere config.yml? If the answer is "a user shouldn't run with those options together", it makes more sense to drop all from click / the input argument list and just set

all = (True not in [atmosphere, ocean, land, seaice, landice])

mnlevy1981 · 2024-04-26T16:26:30Z

cupid/run.py

+        if all:
+            for comp_name, comp_nbs in control["compute_notebooks"].items():
+                for nb, info in comp_nbs.items():
+                    all_nbs[nb] = info
+                    all_nbs[nb]['nb_path_root'] = nb_path_root + '/' + comp_name
+                    all_nbs[nb]['output_dir'] = output_dir + '/' + comp_name


Does this need to be in an if all block? Could we do something like

for comp_name, comp_bool in component_options.items(): if comp_name in control['compute_notebooks']) and (comp_bool or all): for nb, info in control['compute_notebooks'][comp_name].items(): all_nbs[nb] = info all_nbs[nb]['nb_path_root'] = nb_path_root + '/' + comp_name all_nbs[nb]['output_dir'] = output_dir + '/' + comp_name elif comp_bool: warnings.warn(f"No notebooks for {comp_name} component specified in config file.")

to encompass both the if all and else portions of the existing code? I think this snippet behaves exactly like what you've written, but there's only one loop over control['compute_notebooks'][comp_name] to update if we change the config.yml schema again.

I haven't tested the above block, so you might need a small tweak but the logic in the if / else block should be sound.

Now I'm second-guessing the logic. I think we really want

if comp_name in control['compute_notebooks']) and comp_bool: ... elif comp_bool and not all: ...

So if comp_bool is true and the component has notebooks defined in control structure, add those to all_nbs. If comp_bool and not all then the user specified --comp but we're in the else block so the component name does not have notebooks defined in the control structure and that's when we want to warn about requesting a specific component that isn't in the config file.

mnlevy1981 · 2024-04-26T16:30:01Z

cupid/run.py

+        if True not in [atmosphere, ocean, land, seaice, landice]:
+            all = True


It looks we have an identical block of code in 142 / 143, but that's in an if 'compute_notebooks' in control: block and this is in if "compute_scripts" in control:. Could we move the first instance of this out of the if statement (maybe directly after defining component_options) and then remove this occurence?

mnlevy1981 · 2024-04-26T16:31:05Z

cupid/run.py

+        if True not in [atmosphere, ocean, land, seaice, landice]:
+            all = True
+
+        if all:


Same comment as the compute_notebooks block - let's reformulate the if checks and only have a single loop through control['compute_scripts'][comp_name]

mnlevy1981 · 2024-04-26T16:34:47Z

Most of my comments on this pass-through are related to reducing the amount of duplicated code. There is still a lot of similarities between compute_notebooks and compute_scripts sections, but cleaning that up is probably best left for a separate issue

rmshkv · 2024-05-02T23:52:19Z

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

rmshkv · 2024-05-02T23:56:38Z

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

mnlevy1981 · 2024-05-03T00:01:34Z

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

rmshkv · 2024-05-03T00:08:23Z

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

I made the change just a minute ago, but it's not too hard to change back later. I also prefer the long names for clarity, but I don't know much about what the timeseries code is doing and it looks like some of the directories it's creating might depend on the component shortname, so I didn't want to touch that myself.

…names to match timeseries, removed -all arg, other code cleanup

rmshkv · 2024-05-03T00:31:09Z

All right, I think that addresses everything requested! Feel free to run some tests again and let me know what you think.

TeaganKing · 2024-05-03T17:42:28Z

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

I made the change just a minute ago, but it's not too hard to change back later. I also prefer the long names for clarity, but I don't know much about what the timeseries code is doing and it looks like some of the directories it's creating might depend on the component shortname, so I didn't want to touch that myself.

Just as an FYI, the time series code isn't super dependent on how the components are named; There's a comment in line 285 that would need to be updated, as well as just the names in lines 329, 52, and 232. I'm fine with keeping the short names for now, though.

TeaganKing · 2024-05-03T17:59:32Z

All the tests I've run are looking good! I do also agree that updated the logic regarding 'all' is improved when the assumption is that 'all' is the default unless one component is specified, and a flag is not needed to specify 'all'. I think that the lack of a flag clarifies that 'all' is default instead of potential confusion with a default that can also be specified with a flag.

mnlevy1981 · 2024-05-03T19:43:10Z

The code in run.py looks great and cupid-run config.yml generates output as expected, but cupid-build isn't getting the sidebar right anymore:

I was hoping the sidebar would have the name of all the components, and then links to each of the component's notebooks underneath.

I really like how this turned out:

(cupid-dev) examples/coupled_model$ cupid-run -atm -glc config.yml
cupid/run.py:152: UserWarning: No notebooks for glc component specified in config file.

(And adf_quick_run.ipynb was still executed).

rmshkv · 2024-05-03T20:33:57Z

The code in run.py looks great and cupid-run config.yml generates output as expected, but cupid-build isn't getting the sidebar right anymore:

I was hoping the sidebar would have the name of all the components, and then links to each of the component's notebooks underneath.

I really like how this turned out:
(cupid-dev) examples/coupled_model$ cupid-run -atm -glc config.yml
cupid/run.py:152: UserWarning: No notebooks for glc component specified in config file.
(And adf_quick_run.ipynb was still executed).

My bad, I just forgot to change the explicit paths in the jupyter book config with the new short folder names. Should be fixed now!

mnlevy1981 · 2024-05-03T21:45:00Z

My bad, I just forgot to change the explicit paths in the jupyter book config with the new short folder names. Should be fixed now!

After updating to f114b83 I needed to rerun cupid-run and cupid-build, but the page looks great now!

rmshkv added 6 commits March 14, 2024 20:24

First pass at component-specific flags

9adba1c

Cleaned up preliminary flag code

4ac7485

Added minimal description of new options to documentation, and update…

b650c35

…d to automatically run all if no options specified

Added cleaner environment checking functionality and related warnings

b97c394

Added more detail about flags to README and cleaned up config.yml

829c8f1

Small readme changes

5456990

rmshkv requested review from mnlevy1981 and TeaganKing March 19, 2024 22:21

TeaganKing reviewed Mar 19, 2024

View reviewed changes

cupid/run.py Outdated Show resolved Hide resolved

cupid/run.py Show resolved Hide resolved

examples/coupled_model/config.yml Show resolved Hide resolved

This was referenced Mar 19, 2024

Add flags to run particular component diagnostics #4

Closed

Add single-variable time series file generation function from ADF #78

Merged

TeaganKing mentioned this pull request Mar 21, 2024

Improvements to single-variable time series generation #86

Open

10 tasks

Added message to check README per Teagan's recommendation

676773c

rmshkv marked this pull request as ready for review March 21, 2024 19:01

mnlevy1981 requested changes Mar 27, 2024

View reviewed changes

rmshkv added 4 commits April 11, 2024 16:30

Made minor code changes recommended by @mnlevy1981, still need to imp…

ee14136

…lement the subdirectory handling

Forgot a bit of logic so compute_notebooks is optional

211fbd4

Implemented handling for new component directory structure

55d8030

Cleaning up some additional files

daa7908

rmshkv commented Apr 16, 2024

View reviewed changes

Addressed merge conflicts

bdfd551

TeaganKing approved these changes Apr 25, 2024

View reviewed changes

mnlevy1981 requested changes Apr 25, 2024

View reviewed changes

mnlevy1981 reviewed Apr 26, 2024

View reviewed changes

Made component flags apply to timeseries, renamed components to short…

d0b47e6

…names to match timeseries, removed -all arg, other code cleanup

Fixed jupyter book paths with short component names

f114b83

mnlevy1981 approved these changes May 3, 2024

View reviewed changes

mnlevy1981 merged commit d14ee72 into NCAR:main May 3, 2024

rmshkv deleted the component-api branch May 6, 2024 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API updates to specify components #88

API updates to specify components #88

rmshkv commented Mar 19, 2024 •

edited

Loading

rmshkv commented Mar 19, 2024 •

edited

Loading

TeaganKing commented Mar 19, 2024

rmshkv commented Mar 19, 2024

TeaganKing left a comment

mnlevy1981 left a comment

rmshkv Apr 16, 2024

rmshkv Apr 16, 2024

mnlevy1981 Apr 26, 2024

rmshkv Apr 16, 2024

TeaganKing left a comment

mnlevy1981 left a comment

mnlevy1981 Apr 25, 2024 •

edited

Loading

mnlevy1981 Apr 26, 2024

mnlevy1981 Apr 26, 2024

mnlevy1981 Apr 26, 2024

mnlevy1981 Apr 26, 2024

mnlevy1981 commented Apr 26, 2024

rmshkv commented May 2, 2024

rmshkv commented May 2, 2024

mnlevy1981 commented May 3, 2024

rmshkv commented May 3, 2024

rmshkv commented May 3, 2024

TeaganKing commented May 3, 2024

TeaganKing commented May 3, 2024

mnlevy1981 commented May 3, 2024 •

edited

Loading

rmshkv commented May 3, 2024

mnlevy1981 commented May 3, 2024

		if True not in [atmosphere, ocean, land, seaice, landice]:
		all = True

API updates to specify components #88

API updates to specify components #88

Conversation

rmshkv commented Mar 19, 2024 • edited Loading

rmshkv commented Mar 19, 2024 • edited Loading

TeaganKing commented Mar 19, 2024

rmshkv commented Mar 19, 2024

TeaganKing left a comment

Choose a reason for hiding this comment

mnlevy1981 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TeaganKing left a comment

Choose a reason for hiding this comment

mnlevy1981 left a comment

Choose a reason for hiding this comment

mnlevy1981 Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mnlevy1981 commented Apr 26, 2024

rmshkv commented May 2, 2024

rmshkv commented May 2, 2024

mnlevy1981 commented May 3, 2024

rmshkv commented May 3, 2024

rmshkv commented May 3, 2024

TeaganKing commented May 3, 2024

TeaganKing commented May 3, 2024

mnlevy1981 commented May 3, 2024 • edited Loading

rmshkv commented May 3, 2024

mnlevy1981 commented May 3, 2024

rmshkv commented Mar 19, 2024 •

edited

Loading

rmshkv commented Mar 19, 2024 •

edited

Loading

mnlevy1981 Apr 25, 2024 •

edited

Loading

mnlevy1981 commented May 3, 2024 •

edited

Loading