-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collaborate with xmitgcm? #6
Comments
There is also the xecco project, https://github.com/xecco/gcmplots, which is in a very similar space. |
Ryan - the original motiviation of eccov4-py is to make reading and working with the netcdf output files of the ECCO 'production' releases easier. That was challenging because of the way the netcdf output files were designed. I think we now have all the code we need to do i/o from mds files in compact format and to then convert those arrays between representations as compact format, 5 'facets' and the '13 tiles'. What I'd like to do next is add your connectivity layer from the xgcm package to make global calculations easier. Maybe one way forward is to make sure that our different representations of the arrays can be recognized by plotting routines, say whatever you have cooking in gcmplots. |
I'm just pointing out that we also have code that does the same thing in xmitgcm, and have been working on it for several years. The code is here Myself, @raphaeldussin, and others have been working hard to make sure this code is well documented, well tested, and broadly useful for the mitgcm community. I am wondering why you saw a need to rewrite those functions. Two possibilities:
|
Btw, sorry if this is coming off the wrong way... I'm really happy you are working on python stuff for ecco and would genuinely really like to collaborate. The frustration comes from the feeling that in my group are working hard to develop this stuff that we are hoping is broadly useful, but somehow we are failing to connect with others in the community. That makes me a bit sad and makes me feel like we've wasted our time. |
may I add that we also have xesmf-based regridding for llc configurations, which combined with the newly implemented routines to convert from faces to compact and write binaries in xmitgcm allow to easily create initial conditions for llc configurations, regrid observational datasets,... the regridding code (and examples notebooks) live here, until they find their forever home: https://github.com/raphaeldussin/MITgcm-recipes also completely open for collaboration. The more the merrier! |
Hey all, I think there is no reason not to collaborate. I have become pretty familiar with xmitgcm, and I have recently found that this can work pretty much seamlessly with ECCOv4-py at least for plotting. I think streamlining and reducing redundancy is optimal especially considering all the work that has gone into these repos. I'll just jump in and say I have started to prepare scripts for the upcoming ECCO Summer School which uses (so far) xmitgcm routines for I/O, ECCOv4-py routines for plotting and general tutorials (this is essentially how I picked up a lot of python tools), and (tbd) xgcm routines for computing standard quantities (e.g. barotropic streamfunction). The main idea is to recreate the ECCO standard analysis, except in a python based jupyter notebook environment, which can be accessed by the students remotely utilizing servers running at TACC - pangeo style... I started making these in a private repo while I'm just working out kinks, but I'm curious if you all think these types of scripts should go in ECCOv4-py or in a separate repo dedicated to performing the standard analysis. The latter was my thought, but perhaps that's too scattered. While I've been working on these scripts, I've noticed a couple of redundancies and I can open up a couple PRs which utilizes this idea of ECCOv4-py as a sort of plotting utility and wrapper, xmitgcm and xgcm doing the I/O and computation. Thanks @rabernat for getting this conversation going again! PS https://github.com/xecco/gcmplots is an old repo that a few of us started before we realized how much work had already been done, and it should be deleted. |
Hi there,
Yes, thanks Ryan for bringing this up again. I think we have several excellent developments, Ian’s tutorials in python, along with Ryan’s and Raphael’s (+others') xmitgcm contributions. It’d be nice to merge some of these efforts where feasible and useful, especially at this still relatively early stage.
The ECCO summer school provides a great opportunity to make students familiar with these tools, and hence a good timeline/deadline to attempt streamlining some of this. Thanks to Tim who is happy & keen to take some of this on! And of course in consultation with others.
Perhaps worth a telecom with those interested (Tim, Ian, Ryan, Raphael, others…?) to discuss some of this - if needed?
…-Patrick
On Mar 21, 2019, at 12:25 AM, Timothy Smith ***@***.***> wrote:
Hey all, I think there is no reason not to collaborate. I have become pretty familiar with xmitgcm, and I have recently found that this can work pretty much seamlessly with ECCOv4-py at least for plotting. I think streamlining and reducing redundancy is optimal especially considering all the work that has gone into these repos.
I'll just jump in and say I have started to prepare scripts for the upcoming ECCO Summer School which uses (so far) xmitgcm routines for I/O, ECCOv4-py routines for plotting and general tutorials (this is essentially how I picked up a lot of python tools), and (tbd) xgcm routines for computing standard quantities (e.g. barotropic streamfunction). The main idea is to recreate the ECCO standard analysis, except in a python based jupyter notebook environment, which can be accessed by the students remotely utilizing servers running at TACC - pangeo style... I started making these in a private repo while I'm just working out kinks, but I'm curious if you all think these types of scripts should go in ECCOv4-py or in a separate repo dedicated to performing the standard analysis. The latter was my thought, but perhaps that's too scattered.
While I've been working on these scripts, I've noticed a couple of redundancies and I can open up a couple PRs which utilizes this idea of ECCOv4-py as a sort of plotting utility and wrapper, xmitgcm and xgcm doing the I/O and computation.
Thanks @rabernat for getting this conversation going again! PS https://github.com/xecco/gcmplots is an old repo that a few of us started before we realized how much work had already been done, and it should be deleted.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@heimbach +1 I'd be happy to participate in telecom or any other coordination effort. |
@rabernat I am not sure that even now after looking at xmitgcm and other packages that I'm fully aware of their capabilities! It is hard to keep track of the different efforts (so many different names and activities!) It's true that the xmitgcm page and example notebooks could use some additional demos (the single example notebook in github (MITgcm_global_budgets.ipynb) doesn't showcase its mds i/o capabilities at all). The html documentation of xmitgcm is certainly more complete but even there many of the flags seem to be undocumented (swap_dims?, shape?). I could not even figure out how to read a single .data file -- for example, the code block below did not work.
That aside, the last few ECCO releases were comprised of directories of netcdf files, with each variable broken down into one file for each tile (see, for example, ftp://ecco.jpl.nasa.gov/Version4/Release3/nctiles_monthly/THETA/ ). The original purpose of the eccov4-py package was to have routines to read those netcdf files along with the llc grid information (which was also provided as 13 netcdf tile files) and to reconstruct those files into data structures consistent with the llc geometry. (The next release will also be provided as netcdf files, this time CF-compliant). xmitgcm is excellent for reading mds files and the llc grid info and packaging that output in a useful data structure. One idea is to make sure that the data structures created by xmitgcm when pointing to output of the mitgcm configured for ECCO (say v4r3) end up looking the same as the data structures that are created by reading in the ECCO 'release' netcdf files. If we want to try to do that I think we have to confront two small issues, the names of the variable indices and the name of 13 arrays into which we reorganize the llc compact file. xmitgcm uses the i_g and j_g indices in ways that don't make too much sense to me. Specifically, in xmitgcm, u variables are given (i_g, j) indices, v variables (i, j_g) indices, and zeta variables (i_g, j_g) indices. I think your idea was by using (i_g, j) for u variables you were indicating that the 'column location' of the u variable is distinct from the 'column location' of the tracer variable with index (i,j) and that it has the same 'row location'. I think we should be more explict and instead use indicies like (i_u, j_u) for u variables; (i_v, j_v) for v variables; and (i_z, j_z) for zeta variables. Mixing i_g with j and i with j_g is confusing because the u variables are not 'at' (i_g, j). My vote is to be super explicit about what kind of variable we have (tracerk, u, v, or zeta). The second small issue has to do with the name 'face' or 'tile' or 'facet'. I tend to think 'tile' makes more sense than 'face' for the 13 NX x NY arrays into which we put the compact format files. We more often say that the llc grid has 5 'faces' or 'facets' , which itself is just a frenchified version of face. The word 'tile' appears in the caption of figure 8.3 of the mitgcm documentation |
I really appreciate you taking the time to write such a long and thoughtful response.
Thanks for this valuable feedback. We will really make an effort to improve the documentation and add more examples in response to the points you raise.
I think a big source of confusion is that the user-facing part of xmitgcm is designed not to read just the individual MDS .data files but the whole directory of MITgcm output files. As stated in the documentation, the first argument Did you try running through the quick start and actually downloading the example data and running the commands? If so, you will see that the Similarly, you can see it in action reading MDS files from an ecco LLC90 run here: There are also many low level utilities that are used internally by xmitgcm. But I would not recommend using them unless you are trying to do something really customized. It should be easy to read all possible MDS output via the main For your specific example, I would try data_dir = '/Users/ifenty/incoming/tmp/global_oce_llc90'
# open all variables mitgcm can find - might not work depending on what's there
ds_llc = xmitgcm.open_mdsdataset(data_dir, geometry='llc')
# or be explicit about just opening a specific file and iteration number
ds_S = xmitgcm.open_mdsdataset(data_dir, geometry='llc', iters=[9], prefix=['S'])
I see the point you are making here, and it is definitely worthy of discussion. (A very similar one was raised in xgcm/xgcm#108 in reference to ROMS outputs.) The choice made in xmitgcm about how to index the different c-grid positions was guided by the goal of allowing xarray to correctly apply broadcasting by dimension name (I strongly encourage you to review that documentation if the concept is not familiar). Xarray will let me directly multiply or add together two variables that share a dimension (e.g. 'i_g'). The MITgcm manual (e.g. the section of second order centered scheme) and code itself make it clear that U and THETA are always considered to be at the same j location. When constructing the second order centered flux in the i-direction, we have to interpolate THETA from cell center to cell face along the i dimension (xgcm provides this functionality), but not along the j direction. So, according to xarray's broadcasting rules, U and THETA have the same j dimension. That's why I decided that U should have the dimensions MITgcm's own netcdf files from the MNC package follow this convention (although they use different names for the coordinates). So do most other ocean GCMs (e.g. MOM, NEMO). POP makes the strange choice of using the same dimensions names from all variables, regardless of their grid cell location, a horrible choice IMHO. (Some code to fix POP netCDF files.)
This is what some versions of ROMS do (see xgcm/xgcm#108). It causes unnecessary complications in xarray-based analysis code, and it is currently impossible for xgcm to work with data labeled in this way. In my view, the position is specified uniquely and precisely by the combination of dimension names, not by the individual dimension names themselves. And all of this is distinct from geographical position, which should instead be encoded in the CF coordinates attributes. I respect your view here, and I recognize that this is not an obvious or trivial issue. My opinions on this are based on about four years experience of doing tons of xarray-based analysis of finite-volume GCM datasets from many different models. If the MITgcm community decides your convention is preferable, we can adopt it here. Effort will be required to adapt xgcm to work with such data. Laying out the pros and cons of different choices is an important discussion.
Fortunately this is a lot easier to change. 😄 If one wants a different name, you can just use xarray to do ds = ds.rename({'face': 'tile'}) My view is that no analysis software should ever assume any specific names for variables. What matters are the metadata (attributes), such as |
I am not sure I agree with your assessment about how the MITgcm manual and the code deal with indices. As far as I can tell, the model uses integer i,j indices exclusively while the documentation uses integer and half integer i,j values, such as when defining the delta notation (
They have the same dimension, of course, because there is, for example, one U and one V for every THETA. I think the real issue is, and I appreciate the discussion, is how to label properly for broadcasting?
Agreed, when calculating centered second order fluxes we interpolate THETA in the i-direction and not the j direction, but don't we still have to label the
Of course |
@rabernat All that said, I can get behind your idea that it is the combination of (i_x, j_x) that indicates the class of c-grid variable. So long as it we are very explicit about it in the documentation and explain our reasoning, then the chances for confusion will be far reduced. Going forward, eccov4-py will continue using the indexing convention that I carried over from you. That way I can construct netcdf files for ECCO releases that, when read using xarray, will look more or less the same as those read using xmitgcm. I made an attempt at routines to 'reorient' (I hesitate to say rotate) pairs of u/v variables so that u variables all are approximately aligned zonally and vice versa for v variables. Is this capability also present in another package? |
@ifenty - This isn't quite the same as the reorienting routines which you've written, because I think then those velocities would live on cell edges. However, I was able to rewrite the gcmfaces routine # first interpolate velocity to cell centers
vel_c = grid.interp_2d_vectors({'X': ds['UVELMASS'], 'Y': ds['VVELMASS']}, boundary='fill')
# Compute East and North components using cos() and sin()
u_east = vel_c['X'] * ds['CS'] - velc['Y'] * ds['SN']
v_north= vel_c['X'] * ds['SN'] + velc['Y'] * ds['CS'] where |
The latitude of the u point is not, in general, the same as the latitude of the tracer point. (Although it can be in many setups.) But MITgcm does not solve the equations in lat / lon coordinates. It uses orthogonal curvilinear coordinates. In the local orthogonal basis, the u points is at the same j-axis location as the tracer point. That is, in fact, the whole point of the orthogonal coordinate system. So our choice about dimension naming is fundamentally consistent with the model numerics.
In local orthogonal coordinates, "east/west" corresponds with the i-axis and "north/south" with the j-axis. What I mean is that U and that are at the same position with respect to the orthogonal j-axis. That's they both share |
Hi All,
Did any consensus emerge on i,j,k axis labeling/conventions? Be
nice to converge on a common practice.....
Thanks,
Chris
P.S. On (1) (FWIW my current bias (today) is for two separate indices.
That allows N, N+1 sizing for staggered
points which seems to be easier for human
comprehension/reasoning. It creates some headaches for dask, so
it depends a bit how much goal is dask performance v more
digestible explanations. In MITgcm core
we have favored performance, but not clear that same balance
makes sense for diagnostic work where
it may be more important to have abstractions that seem to be
more intuitive to more people. Don't want to
put my thumb on any scales too much though!!!
…On Fri, Mar 29, 2019 at 9:59 AM Ryan Abernathey ***@***.***> wrote:
The latitude of the u point is not, in general, the same as the latitude of the tracer point. (Although it can be in many setups.)
But MITgcm does not solve the equations in lat / lon coordinates. It uses orthogonal curvilinear coordinates. In the local orthogonal basis, the u points is at the same j-axis location as the tracer point. That is, in fact, the whole point of the orthogonal coordinate system. So our choice about dimension naming is fundamentally consistent with the model numerics.
U and THETA are not "at the same j location", obviously, the jth U is to the west of the jth THETA.
In local orthogonal coordinates, "east/west" corresponds with the i-axis and "north/south" with the j-axis. What I mean is that U and that are at the same position with respect to the orthogonal j-axis. That's they both share j as their dimension along that axis.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
My impression is that we all agree that any convention could work in principle, as long as we document it clearly. There is no right or wrong, and this is not a value judgement. Given that situation, I vote for the practical choice of using the xmitgcm convention because: |
I'm just a student here, but I agree with @rabernat. Being able to leverage the tools in xgcm is nice because a lot of the heavy lifting has been done in a general way, making a lot of standard routines (e.g. exchanging U/V fields) easy. Also I think this is good for ECCO because users familiar with xgcm from working with other ocean models would have an easier time analyzing and using ECCO output. |
@ifenty any thoughts/comments? It could be kind of nice to be able to mix and match tools from various places without having to jump through hoops (or explain to people that this tools does X and this tool does Y)? |
@christophernhill I agree with what you say and we will be re-generating the official 'ecco' output using the indexing convention used by xmitgcm. |
Perfect - I will follow the same for my bits. Closing this issue to reduce clutter. We can open new issues of specific details as needed. |
Hi ECCOv4-py folks!
I just stumbled upon this repository again. I see from your commit history you are hard at work!
It looks like many of the things you are doing, like reading mds files, converting llc flat binary files to tiles, and extracting data from mitgrid files are essentially identical to things we are doing in xmitgcm (e.g. MITgcm/xmitgcm#114).
Are there ways we could be working together, to save everyone precious time and effort?
The text was updated successfully, but these errors were encountered: