Skip to content

Conversation

@milenaveneziani
Copy link
Collaborator

This adds the oceanSubBasinsRegions feature file, by combining existent features into a North Atlantic (subpolar and subtropical gyre), a South Atlantic, North Pacific and South Pacific. Southern Ocean is left as is, and the Arctic Ocean feature is also created.

This is intended to be used for regional Hovmoeller diagrams, where more granularity in region definition is necessary.

@pep8speaks
Copy link

pep8speaks commented Oct 9, 2020

Hello @milenaveneziani! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-23 10:04:35 UTC

@milenaveneziani
Copy link
Collaborator Author

@xylar: I am trying to test this into MPAS-Analysis, but not having much luck at the moment. I did create a mask file for it and put it in the region_masks directory on compy, but it's still complaining about something.

One more thing: I noticed some seems in the subpolar North Atlantic and Pacific. Hope they are OK. Not sure why they form. Let me know if you have an easy way to get rid of them.

@milenaveneziani
Copy link
Collaborator Author

Here is a plot of the created features:
oceanSubBasins_cyl

@milenaveneziani milenaveneziani requested a review from xylar October 9, 2020 05:17
@milenaveneziani milenaveneziani self-assigned this Oct 9, 2020
@milenaveneziani
Copy link
Collaborator Author

The new subBasins do work in MPAS-Analysis, but I had forgotten that the new Hovmoeller plots show total Temperature, for example, not the anomaly like we do for the Global Ocean. Still, I think it is worthwhile to have these subBasins.

@xylar
Copy link
Collaborator

xylar commented Oct 9, 2020

@milenaveneziani, this is great work! I'm not worried about the little holes in some regions. I tried fixing them at some point but it's just too difficult.

I'd like to discuss with you the overall strategy for creating derived features like these for MPAS-Analysis and making masks from them. I'm not sure if geometric_features is the right place for these (I think feature_creation_scripts might not be the right folder either way, since that's for making the original features that end up in geometric_data, rather than derived features). But the work here will definitely fine a home somewhere.

We currently have similar functionality in COMPASS:
https://github.com/MPAS-Dev/MPAS-Model/blob/ocean/develop/testing_and_setup/compass/ocean/global_ocean/scripts/create_E3SM_coupling_files.py#L281-L334
But I don't think that's the right place either. It might make sense to move both this work and that part of COMPASS into MPAS-Tools.

Shall we chat more about this next week? In the meantime, feel free to add this new geojson to the LCRC server.

@milenaveneziani
Copy link
Collaborator Author

milenaveneziani commented Oct 9, 2020

There is the RGMA meeting next week, so not sure about being able to find a time for chatting, but let me throw a few ideas around. I feel like defining regions and geojson features definitely belongs to geometric_features: I'm open to how things are organized though, in terms of folders and so on so forth. As for getting the masking files, not sure. What I can tell you is that I was expecting for MPAS-Analysis to handle it because I think it now handles the extraction of the southern boundary for the MOC calculation, so I just assumed it would do the same for regional masks. But then I saw that the config file was asking for an already computed mask file ($meshname_$regionMaskSuffix.nc) so I figured it wouldn't do it on the fly.
So that could be one option.

At the moment, I computed the associated mask file for the EC30to60 r2 mesh and I have put it in the diagnostics directory on compy, so that we can use it for our v2 tuning/evaluation process.

I could move both that and the geojson file to the public folder.

@xylar
Copy link
Collaborator

xylar commented Oct 9, 2020

I feel like defining regions and geojson features definitely belongs to geometric_features: I'm open to how things are organized though, in terms of folders and so on so forth.

So the original idea was definitely thatt geometric_features would only have the basic building blocks. Also, anything that is on within geometric_featuers or geometric_data directories doesn't make it into the conda packages and is really only in the repo as documentation or provenance.

As for getting the masking files, not sure. What I can tell you is that I was expecting for MPAS-Analysis to handle it because I think it now handles the extraction of the southern boundary for the MOC calculation, so I just assumed it would do the same for regional masks. But then I saw that the config file was asking for an already computed mask file ($meshname_$regionMaskSuffix.nc) so I figured it wouldn't do it on the fly.
So that could be one option.

This works, but it is definitely not the preferred option. For every mesh that we consider to be "accepted" in E3SM or another project, we should create permanent versions of the masks and put them on the LCRC server with a date. This saves both a substantial amount of computation time every time the analysis runs and ensures that the mask is the same for every run of MPAS-Analysis. I have tried (successfully, I think) to make things robust enough that this isn't necessary but it's definitely the preferred approach.

At the moment, I computed the associated mask file for the EC30to60 r2 mesh and I have put it in the diagnostics directory on compy, so that we can use it for our v2 tuning/evaluation process.

I don't think putting files in the diagnostics directories is a good idea. It's fine to create a custom diagnostics directory that you use on compy but putting files in diagonstics on compy but not on LCRC/Anvil means I have a lot of trouble syncing the data later on. The proper way is to put it on LCRC and then sync it to all the machines, including Anvil itself so we have consistent copies everywhere. These kinds of differences between machines have caused problems on Cori recently -- I had to move aside a lot of the diagnostics and mpas_standalone files and resync them from Anvil to make sure they weren't glitchy.

could move both that and the geojson file to the public folder.

I think that's fine in this case but probably would wait in the future until the PR gets approved and merged (wherever that happens). Otherwise, we have to re-sync if any changes get made.

Again, this is precisely what a custom diagnostics folder in MPAS-Analysis is supposed to be for.

Over all, I would rally benefit from having a discussion about this. I don't have a problem with this becoming a new "example" here but I do strongly disagree that geometric_features, at least as currently envisioned, is the place for this script to live if it's going to be actively used to create geojson files that go to the LCRC server.

@milenaveneziani
Copy link
Collaborator Author

I can move the files out of the diagnostics directory, no problem.

sure, we can talk: week after next would be best. And we can leave this PR open for the time being.

@xylar
Copy link
Collaborator

xylar commented Oct 22, 2020

@milenaveneziani, we didn't get a chance to discuss this PR this week. Next week will be hard, obviously. We could chat the week after.

But let's keep the online discussion going a bit.

So I could see the benefit of having script for provenance for creating geojson files that will get put on the LCRC server.
Maybe we can create a new directory feature_aggregation_scripts for these? It would have a distinct purpose from feature_creation_scripts but would otherwise be pretty similar.

I still think it's not a good idea to include the geojson file itself in the repo both because it's very big and because it's redundant. We can either recreate it or we can get it off the LCRC server (particularly if the script comments on the URL where it can be found).

@milenaveneziani
Copy link
Collaborator Author

Yes, good idea to keep the online discussion going. Here are my 2c.

Maybe we can create a new directory feature_aggregation_scripts for these?

Sure, that sounds reasonable to me.

I still think it's not a good idea to include the geojson file itself in the repo both because it's very big and because it's redundant.

I tend not to agree with this one for a few reasons. We need to have an easy way to compute region masks that we deem significant for our purposes. The ones that I can think of at the moment are: 1) Southern Ocean and Arctic regions; 2) ocean basins and subbasins; 3) MOC related regions and transects; and 4) El Nino regions. I think it is great that MPAS-Analysis can now handle the create-a-mask-from-geojson-feature step, but then, for things to work nicely, we need for MPAS-Analysis to access those geojson files. Considering that those feature files are not mesh-dependent and they are not enormous (tens of Mb), I think it would be handy to keep them in the public repo.
And about the redundancy: I agree that we don't want that, so we should really consolidate around the 4 features that I mentioned above, perhaps combining 2) and 4). Considering that we have 3) already covered, this would really mean finalizing the first two big feature files.

@xylar
Copy link
Collaborator

xylar commented Oct 22, 2020

Considering that those feature files are not mesh-dependent and they are not enormous (tens of Mb), I think it would be handy to keep them in the public repo.

So the problem is that they would only be available to MPAS-Analysis and MPAS-Tools if they become part of the conda package, which is part of every conda environment that I install, so it really does add up to a lot of data. We would have to be sure that the time saved would be worth the space they take up.

If this is the route you want to go, we need to put them somewhere within geometric_data.

we need for MPAS-Analysis to access those geojson files.

MPAS-Analysis already needs a lot of data from the LCRC server, so it still seems to me that having the geojson files next to the mesh-specific mask files on that server makes sense.

Alternatively, if your script goes somewhere within geometric_features (e.g. geometric_fetures.aggregate.ocean.subbasins), MPAS-Analysis could call the function when it needs the fc and just reconstruct it on the fly from the individual features. Again, it's a question of how long this takes and whether that's really a major concern. My guess is it's in the noise compared with creating masks from the geojson file. Versioning of geometric_features is a more robust way of making sure you get the features you want, compared to a datestamp on a file name as well.

@milenaveneziani
Copy link
Collaborator Author

Alternatively, if your script goes somewhere within geometric_features (e.g. geometric_fetures.aggregate.ocean.subbasins), MPAS-Analysis could call the function when it needs the fc and just reconstruct it on the fly from the individual features.

Yes, I agree that this would be a good alternative. This way, if we decide to update or add some regions to the feature file, we don't have to update the repo everywhere, just the script in geometric_feature.

@milenaveneziani
Copy link
Collaborator Author

And yes, I think that the time it takes to make the feature is negligible. The making of the mask file is more substantial in terms of computational time for large meshes.

@milenaveneziani
Copy link
Collaborator Author

@xylar: I have rearranged the new script file as per our discussion.

@xylar
Copy link
Collaborator

xylar commented Oct 22, 2020

@milenaveneziani, if you want to go the route I suggested last, the script will need to become a module and it will need to move to a subdirectory within geometric_features. I'm going to make several comments on the code for how to do that.

The suggestion to move it to feature_aggregation_scripts was an alternative to including it in the conda package. Nothing outside of geometric_features and geometric_data gets included in the conda package, including the current location of the script.

@milenaveneziani milenaveneziani force-pushed the add_oceanSubBasins_script branch from d8ac788 to 8eab6c7 Compare October 22, 2020 22:47
@milenaveneziani milenaveneziani force-pushed the add_oceanSubBasins_script branch from c9ed895 to 786d80f Compare October 22, 2020 22:55
@milenaveneziani
Copy link
Collaborator Author

sorry, a rebase to squash commits didn't go perfectly, but now I'm almost done. Just need to make that last change.

@xylar
Copy link
Collaborator

xylar commented Oct 22, 2020

Okay, so there would be one last step: adding this to the documentation. If you want to do that, you would edit:
https://github.com/MPAS-Dev/geometric_features/blob/master/docs/api.rst
to add a section on Aggregation and point to the new function. Then, you could also add a new aggregation.rst file and point to it in index.rst. The new file should just describe what the aggregation module is for and mention ocean.subbasins as the only aggregation function currently available.

@xylar
Copy link
Collaborator

xylar commented Oct 22, 2020

If you want, I can do the last few steps tomorrow to get this over the finish line.

@milenaveneziani
Copy link
Collaborator Author

ok, I'll make these last two steps later this evening. Thanks @xylar!

@milenaveneziani
Copy link
Collaborator Author

I'll give it a try and if you could check it tomorrow that'd be great.

@milenaveneziani
Copy link
Collaborator Author

@xylar: I followed all the steps you mentioned to update the documentation, except for the change of api.rst, as I am not sure how to do it (I see the sections that are already in there, but it is not immediately clear to me what exactly I should add for the Aggregation section). Would you mind adding that last thing? Thanks.

@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

Yep, I added an Aggregation section to the API. I also slightly altered the main documentation on aggregation and fixed a few other small things. Give my changes a look and I'll merge once you're happy.

@xylar xylar force-pushed the add_oceanSubBasins_script branch from 5215980 to 5394c24 Compare October 23, 2020 10:04
@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

I needed to rebase because I made some conflicting changes in #152. I removed the unneeded and problematic imports. I added an image to the documentation to hopefully give it a little more pep.

@milenaveneziani
Copy link
Collaborator Author

Great, thanks @xylar! Looks great.

@xylar xylar merged commit 165ebfb into MPAS-Dev:master Oct 23, 2020
@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

Thank you for this hard work, @milenaveneziani

@milenaveneziani milenaveneziani deleted the add_oceanSubBasins_script branch October 23, 2020 14:39
@milenaveneziani
Copy link
Collaborator Author

Thanks @xylar!

Now, basically the thing to do is to tell MPAS-Analysis to use this module for subbasin regions?

@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

Well, we would need to do a new release of geometric_features and update the conda-forge package before MPAS-Analysis can even see these changes.

Which analysis task did you have in mind would use these new features?

@milenaveneziani
Copy link
Collaborator Author

Which analysis task did you have in mind would use these new features?

Originally, I was thinking of the regional Hovmoeller diagrams, but then I remembered that at the moment they show evolution of the absolute T and S, while I was thinking at regional evolution of the anomalies. So, one intermediate step would be to add a choice to plot Hovmoeller diagrams for anomalies (basically similarly to what we do for the global T and S, but without using the layerWeightedAvg AM). I could look into this.

The other obvious place(s) where these could be used is the regional profiles and/or regional T/S diagrams.

@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

Yep, I was looking into that ( regional profiles and regional T/S diagrams).

So your original idea of generating a geojson file with a date stamp is obviously a lot more compatible with how mask creation is done right now. We definitely still need a way of creating masks in COMPASS along with new meshes so they can be cached on the LCRC server and populated to the supported machines without people having to recompute them each time.

We will need to think of how to make that work with this new approach to aggregating features from geometric_features. Basically, we would need some code somewhere for creating a given feature collection (either loading it from a file or getting it from geometric_features) and that would need to be tied to a suffix for creating a mask file (or reading in an existing one).

@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

Regarding Hovmoller plots, we really need a new analysis task for that because the existing one is intended for computing climatologies and the Hovmoller plots are just a side effect. The time bounds aren't really what you would typically want. And, as you point out, you would at least want the option for them to be anomalies (or to plot both the full and anomaly time series).

@milenaveneziani
Copy link
Collaborator Author

Basically, we would need some code somewhere for creating a given feature collection (either loading it from a file or getting it from geometric_features) and that would need to be tied to a suffix for creating a mask file (or reading in an existing one).

One option is to keep the regionMaskSuffix and then have something like: does the related feature exist? Yes, use it. No, can you create it by aggregation? Yes, do it. No, exit task with error.
Does it sound too complicated?

@milenaveneziani
Copy link
Collaborator Author

About the Hovmoeller plots: I wonder if they could fit better under the regional time series task? We would 'only' have to load the full 3d fields instead of a z range.

@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

One option is to keep the regionMaskSuffix and then have something like: does the related feature exist? Yes, use it. No, can you create it by aggregation? Yes, do it. No, exit task with error.
Does it sound too complicated?

That sounds about right. I'll have to think about the details -- how we know which stuff maps to which geometric features aggregation function.

@xylar
Copy link
Collaborator

xylar commented Oct 23, 2020

About the Hovmoeller plots: I wonder if they could fit better under the regional time series task? We would 'only' have to load the full 3d fields instead of a z range.

That's a really great option I somehow hadn't thought of. It will still take some work but should indeed be doable.

@xylar
Copy link
Collaborator

xylar commented Nov 10, 2020

@milenaveneziani, just so you know, I haven't forgotten about this, just haven't got to it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants