Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing packages to Pangeo workspace #729

Closed
anilnatha opened this issue May 25, 2023 · 11 comments
Closed

Add missing packages to Pangeo workspace #729

anilnatha opened this issue May 25, 2023 · 11 comments
Assignees
Labels
ADE Algorithm Development Environment Subsystem Enhancement New feature or request JPL JPL related issues
Milestone

Comments

@anilnatha
Copy link

anilnatha commented May 25, 2023

While discussing missing packages with @wildintellect , he shared that geopandas is missing and that we should start with adding this to our Pangeo workspace where we recently adding other packages that were missing.

If all works well, we will later add it, and the other missing packages that were added to Pangeo, to our other workspaces.

NOTE (2023-12-4): Updated link to package reconciliation list.

@anilnatha anilnatha self-assigned this May 25, 2023
@anilnatha anilnatha changed the title Add geopandas to Pangeo workspace Add missing packages to Pangeo workspace May 25, 2023
@anilnatha
Copy link
Author

@wildintellect If there are other packages we are missing, please document them here and we'll get them integrated as part of this ticket.

@pahbs
Copy link

pahbs commented Jun 19, 2023

Below is a list of other python packages that are part of the boreal biomass geoprocessing workflow that we've prototyped on MAAP. I would think that these should be part of the Pangeo env by default, since they could get use across many MAAP projects.

These are derived from:
https://github.com/lauraduncanson/icesat2_boreal/blob/master/dps/build_command_main.sh
which installs this:
https://github.com/lauraduncanson/icesat2_boreal/blob/master/dps/env_main.yaml

which provides the packages that run our DPS jobs and our notebooks up until this point.

Note: it is not clear to me whether the specific package versions listed in that yaml are required. For example, that latest version of Geopandas should probably be deployed on Pangeo (unless @wildintellect indicates otherwise)

@pahbs
Copy link

pahbs commented Jun 19, 2023

I think these packages will require careful install and testing due to sensitive dependencies:
rio_tiler
rio_cogeo
cogeo_mosaic

For example, when i ran this:
!pip install -U rio_tiler

I saw this:
Installing collected packages: rio_tiler Attempting uninstall: rio_tiler Found existing installation: rio-tiler 4.1.12 Uninstalling rio-tiler-4.1.12: Successfully uninstalled rio-tiler-4.1.12 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. cogeo-mosaic 5.1.1 requires morecantile<4.0,>=3.1, but you have morecantile 4.2.0 which is incompatible. cogeo-mosaic 5.1.1 requires rio-tiler<5.0,>=4.0.0a0, but you have rio-tiler 5.0.0 which is incompatible. Successfully installed rio_tiler-5.0.0

This issue, for use, is the highest priority - because we cannot assess our DPS results that we need to complete in July 2023 without it.

One test (from my workspace) could look like this:
!python /projects/code/icesat2_boreal/lib/build_tindex_master.py -t Topo -y 2023 -m '06' --user 'montesano' --maap_version master -alg_name 'do_topo_stack_3-1-5'

@wildintellect
Copy link
Collaborator

@pahbs let me look into the rio* related packages since most of those are not in conda-forge we might just need to change versions. cc: @vincentsarago

@wildintellect
Copy link
Collaborator

@anilnatha can you dump a list of packages to compare against https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml
@pahbs can you please look at this ^^^ to see what key packages should be included from this list.

@anilnatha
Copy link
Author

@wildintellect Can you clarify the source of what I should dump from? Is it the pangeo workspace so that we can glean insights of all the dependencies that are installed using its respective environment.yml?

@wildintellect
Copy link
Collaborator

@anilnatha if you dump the MAAP Pangeo package list you can diff it against the Pangeo official environment.yml to see what is not currently included.

@wildintellect
Copy link
Collaborator

For comparison, the VEDA lockfile of their Pangeo instances https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/conda-linux-64.lock

@gchang gchang modified the milestones: 3.1.1, 3.1.2 Oct 12, 2023
@anilnatha anilnatha added the Enhancement New feature or request label Oct 26, 2023
@gchang gchang removed this from the 3.1.2 milestone Oct 26, 2023
@anilnatha anilnatha added this to the 3.1.4 milestone Jan 10, 2024
@anilnatha anilnatha added JPL JPL related issues ADE Algorithm Development Environment Subsystem labels Jan 22, 2024
@grallewellyn
Copy link
Collaborator

@wildintellect Our current pangeo package list is here.
These are the packages in the pangeo official environment.yml that are not in the MAAP pangeo environment.yml

adlfs
argopy
black
ciso
cmocean
cdsapi
cf_xarray
dask-ml
fastjmd95
fsspec
gcsfs
gh
gh-scoped-creds
git-lfs
gsw
line_profiler
memory_profiler
metpy
nb_conda_kernels
nbstripout
numbagg
numcodecs
python-graphviz
xarray-datatree
xarray_leaflet
xarray-spatial
xbatcher
xcape
xclim
xgboost
xgcm
xhistogram
xmip
xmitgcm
xpublish
xrft
xskillscore

@pahbs Are there any key packages from this list that you think we should add to the MAAP pangeo environment.yml in our next release?

@wildintellect
Copy link
Collaborator

cf_xarray
fsspec
xarray-datatree
xarray_leaflet
xarray-spatial

@grallewellyn is there a reason we can't add all of them? we are trying to reach parity with Pangeo-notebook images so they are interchangeable (VEDA uses Pangeo-notebook)

@grallewellyn
Copy link
Collaborator

grallewellyn commented Jan 23, 2024

Yes, I made a ticket to add all packages from the diff list I sent above
#898

@wildintellect wildintellect unpinned this issue Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADE Algorithm Development Environment Subsystem Enhancement New feature or request JPL JPL related issues
Projects
None yet
Development

No branches or pull requests

7 participants