-
Notifications
You must be signed in to change notification settings - Fork 70
Add an extractor for the southern boundary of MOCs #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an extractor for the southern boundary of MOCs #147
Conversation
TestingI ran the $ cd geometric_features
$ ./driver_scripts/setup_ocean_basins.py
$ cd -
$ ./MpasMaskCreator.x mesh.nc MOCBasins.nc -f geometric_features/MOCBasins.geojson
$ ./moc_southern_boundary_extractor.py -f MOCBasins.nc -m mesh.nc -o MOCBasinsWithSouthernBoundaries.nc
$ ./paraview_vtk_field_extractor/paraview
_vtk_field_extractor.py -f MOCBasinsWithSouthernBoundary.nc -m mesh.nc -v all -c -o moc_boundary_vtk/ -d maxEdges= TWO= vertexDegree= maxEdges2= nRegions=: nTransects=:Here are some example results showing the edge sign for each of the 5 transects: |
|
@milenaveneziani, @DieMuhkuh and @mark-petersen, I am happy to test these results using the MOC calculation code if someone can provide me with careful instructions on what I need to do (and point me to model results where the results are known). You can try it yourselves, of course, but the turnaround time for debugging will be a lot longer if it's not working as expected. |
|
@pwolfram, I'm assuming you're not more familiar with the MOC code than I am. What I'd appreciate from you is a quick look at the python code. I ran it through flake8 and got all the problems it noticed taken care of. I realize I need an author and date in the docstring so I'll add that now. |
1761299 to
c8a16a1
Compare
|
@xylar: this looks good. I can do my testing on the 60to30 case @vanroekel and I have been using so far. And here is the code to do the offline calculation (adapted from Mark's script), but you will need specific output variables in order to apply it: |
|
@milenaveneziani, okay, that looks like enough to get me started, assuming these two files are accessible on turquoise: With those files, should I expect to get results similar to these? I probably won't have any time over the weekend to work on this, but I'll put it on my list of early next week. |
|
Trying that script now. |
|
@xylar: the plot I get is still weird.. I am starting to think there is more to it than just the southern boundary definition. ps: the transects I get with the paraview extractor look fine, similar to your plots for the 240km mesh (mine use the EC60to30). |
|
@milenaveneziani, one possibility is that I don't have the edge signs right. I didn't have a good way to verify that I'd done them correctly, and having them wrong would definitely mess up the MOC calculation. I tried my best to do the same thing as the mask creator (by the way, are we sure it's computing the sign correctly?). @mark-petersen, do you know what the definition of the sign of an edge in MPAS is? The mask creator (and my script) assumes it's related to the order of vertices on edge but I would have expected it to have something to do with cells on edge. Anyway, I think I've done all I can on this until I have some more information either about how the southern boundary might be wrong, how the edge sign might be wrong, or what else might be helpful to fix the MOC unrelated to the southern boundary. |
|
@mark-petersen and @milenaveneziani, I found the mesh specs on one of Doug's branches: It says that the u velocity through a cell is defined by We're using vHat to determine the edge sign but the last cross product means that knowing the sign of vHat is exqivalent to knowing the sign of uHat: So I'd say I'm reasonably confident in the edge sign at this point, though it would be good to do some kind of a test to make sure it's correct. |
pwolfram
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level feedback:
- refractor to have small functions accomplishing one particular task if possible
- vectorization via increasing "pythonic" syntax may improve performance if necessary
- typically I use
__main__as just a place to call functions following use of argparse and I think this will help make the code clearer in terms of its logical flow
I have not verified the code for accuracy via the examples in this review.
| the contents of the MOC mask file, adding transects that mark the southern | ||
| boundary of each region in a file indicated with the -o flag. The transect | ||
| is applied only to vertices and edges, not cells, because the need for southern | ||
| boundary transect data on cells is not foreseen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great description.
| returns a list of edges | ||
| ''' | ||
|
|
||
| boundaryEdgesOnEdge = -numpy.ones((nEdges, 2), int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, I may be missing something simple here but where is nEdges defined? I'm guessing if you run this function there will be an error because this looks like an uninitialized global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't work as a standalone function. In response to other comments, this function as become a sub-function of another function, and nEdges is defined in the main function.
| boundaryEdges = numpy.arange(nEdges)[isBoundaryEdge] | ||
| nBoundaryEdges = len(boundaryEdges) | ||
|
|
||
| for index in range(nBoundaryEdges): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the numpy version of arange is better in terms of performance, e.g., http://stackoverflow.com/questions/10698858/built-in-range-or-numpy-arange-which-is-more-efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also do a zip-like operation here, which may be more pythonic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, I think you misread the results from the link you sent. range seems to outperform numpy.arange in all test on that page. It also seems to be the standard way people use it.
I doubt this array can become made into a zip-like operation -- it's not that simple. I also find code written with zip to be more confusing than helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to suggest specific changes to how this syntax could be made more compact, I'm willing to hear it.
| for index in range(nBoundaryEdges): | ||
| iEdge = boundaryEdges[index] | ||
| eIndex = 0 | ||
| for vIndex in range(2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably will be faster to just have the two cases here, but less clear potentially
| boundaryEdgesOnEdge[iEdge, eIndex] = otherEdge | ||
| eIndex += 1 | ||
| # make sure we found one and only one edge per vertex | ||
| assert(eIndex == vIndex+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
|
||
| nRegions = moc.dims['nRegions'] | ||
| vertexDegree = mesh.dims['vertexDegree'] | ||
| assert(moc.dims['nCells'] == nCells) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good check.
| moc['transectVertexGlobalIDs'] = (('nTransects', 'maxVerticesInTransect'), | ||
| transectVertexGlobalIDs) | ||
|
|
||
| moc.to_netcdf(args.out_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main loop is pretty long and hard to follow and should be split into smaller chunks contained in functions to make this code clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I've replaced the script body with two functions that do the heavy lifting.
| indices = numpy.mod(numpy.arange(endIndices[longest], | ||
| startIndices[longest] + | ||
| len(edgeSequence)), | ||
| len(edgeSequence)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor white space issue here (missing tab?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this is supposed to be handled but I think the white space is correct here, even though it looks strange. The argument is the second one to numpy.mod, so my understanding is that it should be aligned with the first argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke each of these into 2 steps to make them clearer and to avoid the white space issue.
|
|
||
| maxVerticesInTransect = \ | ||
| numpy.maximum(maxVerticesInTransect, | ||
| len(southernBoundaryVertices[iTransect])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use a list comprehension above to make this clearer, e.g., remove explicit for loop to make this more pythonic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make this change in the update. I wasn't sure how to do this with a max but I figured it out.
| belowToAbove = \ | ||
| numpy.logical_and(numpy.logical_not(aboveSouthernBoundary[0:-1]), | ||
| aboveSouthernBoundary[1:]) | ||
| startIndices = numpy.arange(1, len(edgeSequence))[belowToAbove] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably move this to right before the assert and loop below for clarity if possible.
| axis=2), 1))) | ||
|
|
||
|
|
||
| for iBoundary in range(nBoundaryEdges): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, I put a lot of effort into vectorizing what was previously a loop over edges and vertices here. However, I haven't figured out a way to avoid a loop in this case.
What I want to do is the following: for each boundary edge and each vertex on that edge, I simply want to find the one edge where otherEdgeMask is True. (I have verified with the above assert that it is True for one and only one edge on each vertex.) Perhaps the way to handle this would be to do a loop over vertexDegree instead. That would certainly be more compact and easier to read. I'll try that.
| signs = (-1, 1) | ||
| vIndex = 1 | ||
| nextEdge = boundaryEdgesOnEdge[iEdge, vIndex] | ||
| while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, as I said previously, I don't think this loop will be easy to make more "pythonic" but I would welcome your specific suggestions. I don't think it's a performance bottleneck, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, I think the change below will help simplify this and agree this is good.
| if boundaryEdgesOnEdge[nextEdge, 0] == iEdge: | ||
| vIndex = 1 | ||
| else: | ||
| vIndex = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
vIndex = int(boundaryEdgesOnEdge[nextEdge, 0] == iEdge)here for shorter code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes the code shorter but it's confusing. I prefer the more robust code currently there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. The only problem is that this likely comes at the cost of performance because of non-vectorized if/else statements, but I doubt this is going to be a performance bottleneck so this is probably a non-issue. I concur clarity should come first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, I went ahead and did it this way (with a comment saying this was a trick to find the index of the vertex on cell).
In general, my understanding is that people worry way too much about if/else statements because of how compilers and interpreters used to work. My understanding is that branch prediction basically makes that way of thinking obsolete and that one can write really nasty, hard-to-understand code in the name of optimization (without actually improving performance at all). As a community, let's stop worrying about if statements in loops unless they're provably a problem.
I profiled the code and this whole loop takes 0.027 seconds of the 1.5 seconds of the total runtime of the code, so nowhere here is a bottleneck.
|
|
||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, was there any reason __main__ went away? I think this makes the code clearer and if I recall correctly you had this in the previous version of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put it back. To me the if __name__ == "__main__": implies that the functions could be used on their own, as opposed to running the script. But given that the functions could be run on their own (though I doubt that would be useful), I have not real beef with putting it back.
In general, I find it bothersome to use a statement like if __name__ == "__main__": as a way of saying, "actually this is the main function". It would be one thing if this block of code really were intended to be optional, but my intention is for it to be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, this make sense and I agree that if may be better to not do that for this script because it really isn't a module. I'm ok either way but think you viewpoint is better than mine on reflection.
| nVertices = mesh.dims['nVertices'] | ||
|
|
||
| maxEdgesInTransect = numpy.amax([len(southernBoundaryEdges[iTransect]) | ||
| for iTransect in range(nTransects)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
83e6f8b to
80bc829
Compare
| boundaryEdgesOnEdge = -numpy.ones((nEdges, 2), int) | ||
| boundaryEdgesOnEdge[boundaryEdges[edgeIndices], voeIndices] = \ | ||
| edgesOnVerticesOnBoundaryEdge[edgeIndices, voeIndices, eovIndices] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to eliminate the looping entirely up to here by using the numpy.nonzero call above.
Also, I did some profiling and the majority of the time (by far) is spent in xarray. I suspect the same would still be true for a bigger problem (I'm using QU_240km for now) but can't really check on my laptop.
|
@pwolfram, I was able to address all your comments I think. Let's not worry about anything performance related right now. We're not even sure yet that this tool does what it needs to do (see @milenaveneziani's comments) so pointless to work too hard to optimize it. But I do appreciate your help improving my python writing. |
|
Note, @xylar, I did a pure code review at the high-level and have not verified this for correctness via explicit testing but am happy to do so if this is helpful. |
|
@milenaveneziani or @mark-petersen, I'm trying to work on this now. The file: would have been deleted when |
|
@xylar I had it all on gpfs archive. I copied that whole directory back to: Let me know if you need anything else. |
|
@mark-petersen, I can list the directory but don't have permission to read the files. Could change either the group or the world read permissions? |
|
@xylar try again. |
|
Got it, thanks! |
80bc829 to
6caee44
Compare
|
@milenaveneziani, comparing with the file you're currently using for the southern boundary, I was able to figure out that I had the sign flipped here compared with the transect @mark-petersen provided. I have fixed it now. Here's the MOC with the old transect: Obviously, there are some differences but that should be expected given that the old transect wasn't really following the southern boundary of the mask region. |
|
@milenaveneziani, I was also able to create regions and transects for the QU240 mesh. I will pull out only the Atlantic from that file and test that I can get the MOC code to run for that mesh. This would make testing much easier. |
|
Having trouble with the Atlantic at QU240 resolution. Could be that the resolution is just too coarse to produce a reasonable MOC but I think it's more likely that it indicates a problem. |
|
It's great that you are getting so much progress on this, @xylar. |
|
@milenaveneziani, the QU240 results really look weird. See below. They're from the first 2 years, which is all I have, so they might just be early in the simulation but I really think something is likely to be wrong. |
It's hard (impossible?) to have a continuous transect and avoid following land. I detect points that are within 3 degrees latitude of the southern-most point so some of the land boundaries come along. This extra computation is negligible, trust me. In fact, I'm pretty sure the whole computation of transport through the transect is negligible compared to the rest of the MOC. Plus, I had a look and there's definitely room for optimizing with some vectorization, but that's something for later. |
|
I'd say we shouldn't use the QU240 to try to debug the MOC computations, since we don't have any clue what it should look like. It's helpful just in terms of making sure the scripts don't crash, though. |
|
If the QU240 started from rest, it is way too early for the deep water formation to look reasonable. Indeed, the surface and upper ocean cell looks better. So, there might be anything wrong at all with the transect. But yes, let's test it with all the other EC60to30 simulations we have. |
|
@xylar: how about we try to fulfill this after the v0.2 update is over? It would allow us to do a few things:
|
|
@milenaveneziani, agreed. As far as I'm concerned, it seems to work and is ready to merge. So at your convenience. |
|
I'll squash the 2 "debugging" commits. |
The new script takes a NetCDF file with MOC region masks and another (or possibly the same) file with mesh data. The script prodcues a NetCDF file that includes the MOC regions as well as a transect for the southern boundary of each region.
6caee44 to
d82cb81
Compare
|
@xylar, @mark-petersen: I tested this (and the whole workflow for creating the MOC regions) on the EC60to30v3 meshfile. I then modified the postprocessing MOC script that we have in MPAS-Analysis to use the resulting region mask file properly, and applied it to the beta1 run. Here are results from the old Atlantic region and southern transect: and below are results from the new Atlantic_MOC region and transect extracted with the tool in this PR: What do you think? Are they too different? Pinging @maltrud as well. I can't decide if the differences are just due to the different transect or if it's something more substantial. If somebody wants to test this on their own, the new region/transect file is this one, on edison: |
If you take a look at the weird zig-zag formed by @mark-petersen's "southern boundary", I think it's much more likely the differences arise from the old transect not being the correct boundary than that the new transect isn't correct. I've used this tool to create the southern boundaries of all grids other than the EC60to30v1 (on both Edison and LANL IC) so we've been using these transects already for most MOC calculations already. |
|
I guess I thought the particular shape of the transect wouldn't make such a visible difference, but instead it must change the initial mass transport at the southern boundary enough to see a difference in the northern part of the domain. I have always used the old transect and we have the old regional mask defined in the config files. Also, the meridional_overturning_circulation routine needs to be slightly changed, because the new transect contains the various MOC regions in a certain order, while the old transect assumes that there is only one region, the Atlantic. Once I create all the new MOC region mask files, I will submit a PR with the changes. |
|
Wow! I didn't realize my old transect made a Z. I thought I'd looked, but maybe it was a different resolution. With a mismatching region, I can believe that it would change the whole MOC diagram by several Sverdrups. Anything westward in the Agulhas current would be counted as "north" on the first zig, so the southern boundary condition of the MOC would be artificially large. If the region mask is missing the zig-zag, it doesn't make up for the difference by 15 S. |
|
@xylar: do you mind adding a .gitignore in grid_gen/mesh_conversion_tools that includes the *.x files? |
|
@milenaveneziani, I'm not sure adding that |
|
good point. |
|
@xylar: as a reminder, this should be re-opened/fixed because things were not working on the high-res grids. I will open an issue about it. |














The new script takes a NetCDF file with MOC region masks and another
(or possibly the same) file with mesh data. The script produces
a NetCDF file that includes the MOC regions as well as a transect
for the southern boundary of each region.