Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributor guide? #28

Open
1 task
JiaweiZhuang opened this issue Aug 20, 2018 · 5 comments
Open
1 task

Contributor guide? #28

JiaweiZhuang opened this issue Aug 20, 2018 · 5 comments

Comments

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Aug 20, 2018

So far I've been developing xESMF on my own. There are sporadic PRs (#23, #27), but I am actually not sure how to best handle them. Given the increasing community interest, it would be useful to talk about how people can contribute to xESMF.

Contributing to xarray is a good reference on the software engineering side (style, testing, documentation, bug fixes...). Here I am thinking more about the science & usability & algorithm sides that are specific to xESMF.

There are several types of contributions I can think of:

1. Contribution to examples and tutorials

This is the easiest one and is highly welcome. I am very interested in how people regrid all kinds of data in different Earth science disciplines (environmental, atmospheric, oceanic, land, remote sensing...). I often see xESMF being successfully used to deal with grid meshes that I haven't seen before (e.g. the "tri-polar grid" #14).

An example can be just a Jupyter notebook, focusing one or more of the following aspects:

  • An specific scientific application (e.g. converting CMIP5 data from multiple models grids to a common grid for comparison xesmf included in pangeo.pydata.org ? pangeo-data/pangeo#309)
  • The type of grid mesh (e.g. WRF's lambert conformal projection, MITgcm's lat-lon-cap, or even the Yin-Yang grid that most people have little experience with. Grid meshes are fun!)
  • The choice of algorithm (e.g. while bilinear and conservative are most common, the nearest neighbor method is actually great for categorical data such as land type index)

Guideline for tutorials/examples:

  • Full reproducibility is required. NCL has a wonderful page of regridding examples, but I have trouble running most scripts due to missing data. For xESMF doc, I only use data from xr.tutorial.load_dataset() or data computed on the fly. Small data used in the example (say less than 20 MB?) can be submitted and added to a xESMF-data repo, just like the xarray-data repo. For large data, a stable link must be provided. The Pangeo platform on GCP/AWS seems a good place for hosting large data.
  • A brief introduction to the scientific problem and why regridding is needed would be very useful.

2. Contribution to standalone, small utilities
Many issues on GitHub belong to "small utilities" (e.g. #15, #16). They do not have a large impact on the core regridding, but are crucial for usability and user experience. Developing those small utilities is much easier than hacking the regridding core, and they often do not require ESMF/ESMPy knowledge. It is also much easier for me to handle dependencies.

General principles:

  • The functionality should be closely related to regridding. An example is computing cell area, which is useful for checking mass conservation before/after conservative regridding. The computation of a certain grid mesh (unless extremely common one like regular lat-lon) had better go to examples, not utilities.
  • Avoid complicated data structure. Stick to xarray.Dataset and numpy whenever possible. Compatibility with pure numpy arrays is encouraged.
  • Minimize dependency on other functions, especially other "small utilities". xESMF is still young and the code base is in flux.

3. Contribution to core functionalities

I extremely welcome hard-core xarray/dask/ESMF/Pangeo developers to tackle some of the most challenging problems. For example:

For those big questions, better discuss on GitHub before starting serious coding.

When & Where to start

I am still planing some significant refactor of the code base, to better support critical features, notably dask support #3, accept Dataset #5, and retrieve weights #11. (It is slowly moving because xESMF is my personal, unfunded, side project😐. Have a lot of other projects in hand. My life would probably be easier if I write a GMD/JORS paper on it, so it can count towards my PhD...) At this stage, hacking the core might not be the best choice, because it is very likely to change (talking about internal code, not user API). Contributing examples & tutorials & use cases is the safe bet.

TODO:

  • Add a Contributor Guide to online docs
@rabernat
Copy link

@shoyer
Copy link

shoyer commented Aug 20, 2018

For xESMF doc, I only use data from xr.tutorial.load_dataset() or data computed on the fly

Note that we would be very happy to add more examples to xarray-data to round out our current set (which is quite limited).

@bekozi
Copy link

bekozi commented Aug 22, 2018

This is a great idea! As a near-future xESMF user and potential contributor, I would like to see this process work for everyone. It is worth a nod to @rabernat for making sure the importance of xESMF did not go unmentioned. The ESMF team is of course appreciative of your efforts to bring ESMPy into the xarray and Pangeo community.

I am pleased to see parallelism and unstructured grids as planned features. I am aiming to resurrect the spatial chunking PR and migrate it to a Pangeo use case using xESMF. I was able to spend some more time with ESMF and dask at the Pangeo workshop and am excited by the prospects.

On a practical note, I would like to achieve xESMF maintainer status eventually to help you manage the repo and minimize any associated stress. I definitely want you to keep creative control (e.g. final PR approval, interface design) and am thinking more along the lines of day-to-day stuff like CI management, random support, documentation, etc. I expect you will be able to find other willing developers. Hopefully the route to maintainer will be part of your contributor guide.

It's a good time to start using the "help wanted" label...

@JiaweiZhuang
Copy link
Owner Author

On a practical note, I would like to achieve xESMF maintainer status eventually to help you manage the repo and minimize any associated stress.

Thanks very much, I really appreciate that!

@jhamman
Copy link

jhamman commented Aug 23, 2018

Something that may be of interest on the xESMF roadmap is to move the repo to another namespace. xarray-contrib comes to mind as an obvious option. This may help increase the likelihood of gaining outside contributors and gives the package a more elevated platform. This is really just semantics at this point but something to think about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants