Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change workflow to mamba #693

Merged
merged 6 commits into from
Jun 19, 2023
Merged

Change workflow to mamba #693

merged 6 commits into from
Jun 19, 2023

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Jun 15, 2023

@forsyth2 forsyth2 added documentation Files in `docs` modified DevOps CI/CD, configuration, etc. labels Jun 15, 2023
Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with zppy and zstash, this looks good to me.

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang I tried to port the same changes from zppy's E3SM-Project/zppy#429 & E3SM-Project/zppy#428, as I did for zstash's E3SM-Project/zstash#271.

However, the docs and build files for zppy and zstash match up more closely with each other than e3sm_diags. So, hopefully these changes work.

@xylar @tomvothecoder Let me know if you see any issues, thanks.

@forsyth2
Copy link
Collaborator Author

Remaining instance of pip (excluding those in reference to piping output):

.github/workflows/build_workflow.yml:95:          python -m pip install .
.gitignore:35:pip-log.txt
.gitignore:36:pip-delete-this-directory.txt
analysis_data_preprocess/obs_data.yml:75:  - pip=19.1=py27_0
analysis_data_preprocess/obs_data.yml:99:  - pip:
conda-env/ci.yml:12:  - pip
conda-env/dev.yml:10:  - pip=22.3.1
conda-env/docs.yml:10:  - pip
conda-env/prod.yml:10:  - pip=22.3.1
docs/source/dev_guide/adding-new-diags-sets.rst:456:``pip install .`` **as explained in the "Putting It All Together And Running On Cori At NERSC" section.**
docs/source/dev_guide/adding-new-diags-sets.rst:700:        pip install .
docs/source/dev_guide/testing.rst:11:        pip install . # Install your changes
docs/source/dev_guide/testing.rst:80:        pip install . # Install your changes
docs/source/install.rst:167:Instead, the developer will ``pip install .`` to build ``e3sm-diags`` with changes (see step 6 below).
docs/source/install.rst:252:        pip install .
tests/integration/test_diags.py:17:# pip install /global/homes/f/<username>/e3sm_diags/

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Ryan. I had this on my backlog for quite sometime (#642).

I linked that issue to this PR. LGTM.

Copy link
Contributor

@chengzhuzhang chengzhuzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Thank you for the PR @forsyth2 !

@chengzhuzhang
Copy link
Contributor

The checks won't run, any insights? @forsyth2 @tomvothecoder @forsyth2

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang I'm running into issues with the checks on all the mamba PRs. For this, it looks like the commit status needs to meet some requirement? Am I missing a new requirement on E3SM Diags?

For reference:
zstash is having issues with sphinx-multiversion on E3SM-Project/zstash#271:

  Looking for: ['pip=22.2.2', 'python=3.9.13', 'six=1.16.0', 'globus-sdk=3.2.1', 'fair-research-login=0.2.0', 'black=22.8.0', 'flake8=5.0.4', 'flake8-isort=4.2.0', 'mypy=0.982', 'pre-commit=2.20.0', 'tbump=6.9.0', "jinja2[version='<3.1']", 'sphinx=5.2.3', 'sphinx_rtd_theme=1.0.0', 'sphinx-multiversion=0.2.4', 'docutils=0.16']
  
  
  Could not solve for environment specs
  The following package could not be installed
  └─ sphinx-multiversion 0.2.4**  does not exist (perhaps a typo or a missing channel).

Error: The process '/usr/share/miniconda3/condabin/mamba' failed with exit code 1

And I thought zppy's PR was fine, but I'm seeing a syntax error on https://github.com/E3SM-Project/zppy/actions/runs/5273105403/workflow (after E3SM-Project/zppy#429 merged), even though the syntax looks fine.

Check failure on line 66 in .github/workflows/build_workflow.yml

GitHub Actions
/ .github/workflows/build_workflow.yml
Invalid workflow file
You have an error in your yaml syntax on line 66

@forsyth2
Copy link
Collaborator Author

And I thought zppy's PR was fine

Ah, I just made E3SM-Project/zppy#434, and now zppy has the same sphinx-multiversion error as zstash.

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

@forsyth2 and @chengzhuzhang, I think the sphinx-multiversion problem is the same as in zppy and zstash, namely that you can't use the flag use-only-tar-bz2: true. At this point, conda-forge creates only .conda package files as far as I know, not .tar.bz2 files. And I believe the issue with not being able to cache conda packages that are .conda files has been fixed so the comment about caching only working with this flag is out of date.

Besides that one flag, I think E3SM_Diags is in better shape than zppy and zstash because it is already using a conda environment to build the docs (not just to test the package). That seems to have been the other major problem for zppy and probably for zstash, too.

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

There's at least 1 other problem: someone's editor is putting tab characters into the workflow yaml files, which is making CI not run.

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

Tests seem to be failing for reasons that may not have to do with this PR:

Error: -16 09:55:28,102 [ERROR]: e3sm_diags_driver.py(run_diag:296) >> Error in e3sm_diags.driver.lat_lon_driver
Traceback (most recent call last):
  File "/home/runner/work/e3sm_diags/e3sm_diags/e3sm_diags/driver/lat_lon_driver.py", line 134, in run_diag
    land_frac = test_data.get_climo_variable("LANDFRAC", season)
  File "/home/runner/work/e3sm_diags/e3sm_diags/e3sm_diags/driver/utils/dataset.py", line 160, in get_climo_variable
    filename = self.get_test_filename_climo(season)
  File "/home/runner/work/e3sm_diags/e3sm_diags/e3sm_diags/driver/utils/dataset.py", line 273, in get_test_filename_climo
    return self._get_climo_filename(path, data_name, season)
  File "/home/runner/work/e3sm_diags/e3sm_diags/e3sm_diags/driver/utils/dataset.py", line 300, in _get_climo_filename
    fnm = self._find_climo_file(path, data_name, season)
  File "/home/runner/work/e3sm_diags/e3sm_diags/e3sm_diags/driver/utils/dataset.py", line 318, in _find_climo_file
    dir_files = sorted(os.listdir(path_name))
FileNotFoundError: [Errno 2] No such file or directory: ''
...

Update: Hmm, that doesn't seem to be a new error -- it also shows up in recent CI that passed. There are 3 failing tests but I don't understand them well enough to know why this PR might have caused them.

@chengzhuzhang
Copy link
Contributor

@xylar thank you so much for troubleshooting and made the CI/CD tests running again...I will take over to look at the failed tests.

@chengzhuzhang
Copy link
Contributor

The trackback include errors with module 'numpy' has no attribute 'float'. from cdms2, which I recall that we have fixed. This led me to think that the new workflow introduced old version of packages in ci. I will perhaps leave this PR for next release candidates.

  File "/usr/share/miniconda/envs/e3sm_diags_ci/lib/python3.10/site-packages/cdms2/variable.py", line 74, in __init__
    self._numericType_ = numpy.float
  File "/usr/share/miniconda/envs/e3sm_diags_ci/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'cfloat'?

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

So in general the difficulty with a workflow where you pin ever package version like it:
https://github.com/E3SM-Project/e3sm_diags/blob/main/conda-env/dev.yml
is that you don't pick up changes to packages like the fix to cdms2 made in:
conda-forge/cdms2-feedstock#81
The exact reason why that fix isn't getting picked up would take some detective work but if you were to switch to a workflow where you use a test file to determine approximate package versions, rather than a yaml file to pin exact package versions, I think you would actually find that your workflow remains robust while requiring lest intervention.

@tomvothecoder and @mahf708, what are your thoughts here?

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

@chengzhuzhang, what is the general procedure for updating package versions in https://github.com/E3SM-Project/e3sm_diags/blob/main/conda-env/dev.yml? Because they seem generally pretty out of date.

@mahf708
Copy link
Contributor

mahf708 commented Jun 16, 2023

The exact reason why that fix isn't getting picked up would take some detective work but if you were to switch to a workflow where you use a test file to determine approximate package versions, rather than a yaml file to pin exact package versions, I think you would actually find that your workflow remains robust while requiring lest intervention.

Agree; from a quick look, the linked yaml is definitely stricter than needed and would generally lead to headaches like the one here (the detective work would be quite unwieldy). An alternative (and much more robust approach) is using conda-lock lockfiles with a more relaxed env yaml files, which would be refreshed often (either manually or automatically). I can link to some robust examples of this can be automated and we could consider here in the future (e.g., Pangea notebook docker images)

@mahf708
Copy link
Contributor

mahf708 commented Jun 16, 2023

At the very least, python, numpy and spicy shouldn't be that strictly pinned to the patch (major.minor.patch) unless for a really good reason. These three packages are very stable and conda-forge maintainers go to extreme lengths to maintain them in healthy harmony.

@mahf708
Copy link
Contributor

mahf708 commented Jun 16, 2023

The exact reason why that fix isn't getting

We can target the exact build number while we wait for answers (see below) though I think something like this is a recipe for more problems (if not now, then down the road).

diff --git a/conda-env/dev.yml b/conda-env/dev.yml
--- conda-env/dev.yml
+++ conda-env/dev.yml
@@ -11,9 +11,9 @@
   - beautifulsoup4=4.11.1
   - cartopy=0.21.1
   - cartopy_offlinedata=0.21.1
   - cdp=1.7.0
-  - cdms2=3.1.5
+  - cdms2>=3.1.5==*_15
   - cdutil=8.2.1
   - dask=2022.12.0
   - genutil=8.2.1
   - lxml=4.9.2

@chengzhuzhang
Copy link
Contributor

@xylar and @mahf708 thank you for chiming in ( in wired weekend time from your time zones).
I agree we need more robust dev.yml by not strictly pinning package verisons. @tomvothecoder has a workflow to update the ymls I think.
But I thought that this fix is not working because of the conda-env/ci.yml (not pinned) picked up an older versions of cdms2 (at least) somehow? I tried to revert some changes in the build/release workflow for CI, but still don't have a clue.

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

Ah, I had missed how many different yaml files there are. By the way, this should be fixed:
https://github.com/E3SM-Project/e3sm_diags/blob/main/.github/workflows/build_workflow.yml#L74
It should be ci.yml not dev.yml and I think that threw me off.

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

I think, in contrast to dev.yml which is too strict:
https://github.com/E3SM-Project/e3sm_diags/blob/main/conda-env/ci.yml
is too loose and need to pin versions that are used in production. To be honest, I think you all would be best off with a single yaml file for everything so that there aren't so many places to have to constrain dependencies. It also makes things quite confusing on conda-forge when I don't quite know which yaml file to trust here.

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

I believe you need the latest versions of most cdat packages and earlier version are too old to work. So those need to be constrained much like they are in E3SM-Unified:
https://github.com/E3SM-Project/e3sm-unified/blob/main/recipes/e3sm-unified/meta.yaml#L57-L68
(This range also captures several non-CDAT packages, which you can ignore).

It's quite important to have a sense for which packages need to be constrained (i.e. e3sm_diags won't work with earlier or later versions) and which are being over-constrained (e3sm_diags works with a broad range of versions). This is a tedious part of the package management process but it is also a giant headache when it isn't done right and things like failing tests result from it.

@xylar
Copy link
Contributor

xylar commented Jun 16, 2023

I will try increasing constraints on packages in ci.yml tomorrow (and then relaxing others in dev.yml) to get to a better match between these two.

@chengzhuzhang
Copy link
Contributor

@xylar All PRs for the 2.9.0rc are in, except for this one. But it seems like it is worthwhile to spend some time on sorting out the yaml files, before a release?

@xylar
Copy link
Contributor

xylar commented Jun 17, 2023

@chengzhuzhang, the problem is that I missed a bunch of numpy.float and numpy.int references when I patched cdms2. I have a PR to patch what I missed here:
conda-forge/cdms2-feedstock#85

You are noticing that here because you were previously testing with numpy <= 1.20 but you now are getting the latest version where these are no longer available. So there's nothing wrong with the testing, it's a problem with cdms2 itself.

@xylar
Copy link
Contributor

xylar commented Jun 17, 2023

I successfully fixed cdms2 but there are remaining issues in genutil: conda-forge/genutil-feedstock#25

@xylar
Copy link
Contributor

xylar commented Jun 17, 2023

Once I finished fixing cdsm2 and genutil, it turns out that e3sm_diags also has some numpy.int references that need to be fixed. I have the commit to fix this issue here for testing but I think it's better to keep it in a separate PR, #694.

@xylar
Copy link
Contributor

xylar commented Jun 17, 2023

We can target the exact build number while we wait for answers (see below) though I think something like this is a recipe for more problems (if not now, then down the road).

@mahf708, Fortunately, it doesn't seem like the problem was with picking up the desired build after all, but rather that a lot less had been fixed than I thought. I agree that pinning a particular build is a crutch we could use if we need to but fortunately it doesn't seem to have been needed in this case.

@xylar
Copy link
Contributor

xylar commented Jun 17, 2023

@forsyth2 and @chengzhuzhang, this seems to work but please merge #694 first and then this one only after a rebase to remove the redundant commit.

Please also see the new issue #695

@tomvothecoder
Copy link
Collaborator

To be honest, I think you all would be best off with a single yaml file for everything so that there aren't so many places to have to constrain dependencies. It also makes things quite confusing on conda-forge when I don't quite know which yaml file to trust here.

Originally the GH Actions build for this repo was using conda to build the environment. As we all know, conda takes a long time to resolve environments (especially those with a lot of dependencies). Certain tasks in the build only require a minimal number of dependencies (e.g,. building docs), so build time would be wasted reproducing this single large environment per task. This is why I setup individual yml files early on.

However, we recently switched to mamba and have caching setup (I think). I don't think build time is a major issue anymore. I am working on PR #696 which addresses simplification of the conda env yml files.

Also the e3sm_diags integration tests take like 10-20 minutes anyways so I don't think build time is big deal if we can handle that wait 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DevOps CI/CD, configuration, etc. documentation Files in `docs` modified
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up conda environment build step in GitHub actions by using Mamba
5 participants