Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential package name confusion breaks conda-lock #253

Closed
ozancaglayan opened this issue Oct 4, 2022 · 14 comments
Closed

Potential package name confusion breaks conda-lock #253

ozancaglayan opened this issue Oct 4, 2022 · 14 comments

Comments

@ozancaglayan
Copy link

ozancaglayan commented Oct 4, 2022

Hi,

I spent a couple of hours today on creating an environment.yml files that pulls in pytorch packages from the pytorch channel:

dependencies:
  - conda-lock
  - cmake
  - pip
  - python=3.10
  - cudatoolkit==11.6
  - pytorch::ignite==0.4.9
  - pytorch::pytorch==1.12.1
  - pytorch::torchvision==0.13.1
  - pytorch::torchaudio==0.12.1

This seems to create a conda-lock.yml file without an issue. But if I add a pip dependencies section, and list a PIP package that depends on the torch package in PyPI, e.g.:

dependencies:
  - conda-lock
  - cmake
  - pip
  - python=3.10
  - cudatoolkit==11.6
  - pytorch::ignite==0.4.9
  - pytorch::pytorch==1.12.1
  - pytorch::torchvision==0.13.1
  - pytorch::torchaudio==0.12.1
  - pip:
    - pytorch_revgrad==0.2.0

I get the following exception:

  File "/data/ozan/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__                                                                                              
    return self.main(*args, **kwargs)                                                                                                                                                          
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main                                                                                                  
    rv = self.invoke(ctx)                                                                                                                                                                      
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke                                                                                                
    return _process_result(sub_ctx.command.invoke(sub_ctx))                                                                                                                                    
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke                                                                                                
    return ctx.invoke(self.callback, **ctx.params)                                                                                                                                             
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke                                                                                                 
    return __callback(*args, **kwargs)                                                                                                                                                         
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func                                                                                          
    return f(get_current_context(), *args, **kwargs)                                                                                                                                           
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 1178, in lock                                                                                       
    lock_func(                                                                                                                                                                                 
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 948, in run_lock                                                                                    
    make_lock_files(                                                                                                                                                                           
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 388, in make_lock_files                                                                             
    lock_content = lock_content | create_lockfile_from_spec(                                                                                                                                   
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 736, in create_lockfile_from_spec                                                                   
    deps = _solve_for_arch(                                                                                                                                                                    
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 702, in _solve_for_arch
    pip_deps = solve_pypi(
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/pypi_solver.py", line 310, in solve_pypi
    src_parser._apply_categories(requested=pip_specs, planned=planned)
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/src_parser/__init__.py", line 297, in _apply_categories
    for dep in seperator_munge_get(planned, item).dependencies
  File "/data/ozan/mambaforge/lib/python3.10/site-packages/conda_lock/src_parser/__init__.py", line 288, in seperator_munge_get
    return d[key.replace("_", "-")]
KeyError: 'pytorch'

I believe this is due to the weird fact that:

  • In PyPI, pytorch package is called torch and pytorch_revgrad lists torch as a dependency.
  • However, in pytorch anaconda channel, the package is called pytorch
@ozancaglayan
Copy link
Author

There's actually logic to fulfill deps through a conda<->pip naming mapping which seems to work internally. Maybe that logic breaks somewhere later in the parser as the key above should have been converted to torch at some point or the dictionary should have been containing both torch and pytorch.

@ozancaglayan
Copy link
Author

ok it seems to me that the item = name assignment gets overwritten by the if todo block. I don't know if its intentional but the item becomes the dependency in this case (pytorch) instead of the package name itself(?)

image

@ozancaglayan
Copy link
Author

So I simply removed the reassignment of item above, although the exceptions are gone and the lock file is produced, I'm seeing that the deps listed for a pip package in the lock file are not correct, and are actually the deps for another pip package. So clearly, there's a more important issue there.

@ozancaglayan
Copy link
Author

okay I think I've found the culprit: it's the _apply_categories method which accesses the .dependencies attribute of the LockedDependency objects. Although solve_pypi registers each package with PyPI names into the dictionary, the values are still with conda names so the .dependencies pull in pytorch package which breaks the whole logic.

@bstadlbauer
Copy link
Contributor

Also just ran into this, although the issue was a bit flaky for me (would only show on some runs; seemed like a race condition).
The minimum reproducible example I used was:

dependencies:
  - matplotlib
  - pip:
      - matplotlib

channels:
  - conda-forge

which failed with KeyError: 'matplotlib-base' in the same line

@romain-intel
Copy link
Contributor

I ran into a similar issue. I believe this happens when a single package in pypi maps to more than one package in conda. In my case, an explicit PIP package needed dask as a dependency which maps to dask-core and dask in Conda which I think is what is causing the issue.

@romain-intel
Copy link
Contributor

A few observations (I was digging into this for the following tiny reproducible example which is similar to the matplotlib one):

dependencies:
  - dask
  - pip
  - pip:
    - dask
channels:
  - conda-forge

What seems to be happening is that:

  • when resolving the conda portion of the environment, two conda packages are needed: dask and dask-core (dask depends on dask-core).
  • in pypi_solver.py, after the solve, when it builds the planned structure, it maps conda package names to pip package names.
  • when it sees the dask conda package, it maps it to dask on the pypi side; this is actually NOT a mapping that is extracted from the YAML mapping file but the default mapping
  • when it sees the dask-core conda package, that also maps to the dask pypi name (this time because of that mapping being present in the YAML file).
  • depending on the order of the two steps above, things can go wrong in several ways. In my case, the dask-core was mapped FIRST and therefore overwritten by the mapping for dask.
  • in my case therefore, when looking for dependencies of the dask conda package, we find dask-core which does not exist in the planned dict (it mapped to the name dask and was overwritten anyways).

Still thinking of ways of working around this/fixing it.

@romain-intel
Copy link
Contributor

I think I found a fix. I'll open a PR. Not sure it's the correct one but it solves the issue for me.

@pmiam
Copy link

pmiam commented Jan 24, 2023

I'm rooting for your PR.

@romain-intel
Copy link
Contributor

@PanayotisManganaris : thanks. Don't know how to move it forward much though. No developer seems to take it out and it is now out of date (I can probably update easily though). Hopefully your mention of it will get it back on the radar. Not a very heavy user but have been using this patched version internally and it works :) (so far).

maresb pushed a commit to romain-intel/conda-lock that referenced this issue Feb 25, 2023
Before this patch, the following would fail:
```
channels:
  - conda-forge
dependencies:
  - pip
  - dask
  - pip:
    - dask
```
The core issue seems to be that two Conda packages (`dask` and `dask-core`)
both map to the `dask` pypi package. This was causing issues later on when
assigning categories. This patch properly deals with multiple Conda packages
per pip package and addresses the naming issue.

I have verified that the above package specification works as well as the others
listed in conda#253.
@steffen-fissler
Copy link

Hi @romain-intel ,

I confirm that it works for me (see #387 )

Thanks!

Best regards
Steffen

@lesteve
Copy link
Contributor

lesteve commented Apr 17, 2023

For completeness, I believe there is some overlap with some of the issues mentioned in #179

@addisonklinke
Copy link

addisonklinke commented Jan 4, 2024

The following env.yml was giving me a similar key error for python-tzdata using conda-lock==1.4. Upgrading to 2.5.1 resolved it - thanks @mariusvniekerk for the fix!

name: test
channels:
  - conda-forge
dependencies:
  - pandas
  - pip:
    - sklearn-pandas==1.7.0

NOTE: previous edits mentioning otherwise were because of a $PATH issue that was still pointing to the 1.4 install

@mariusvniekerk
Copy link
Collaborator

mariusvniekerk commented Jan 4, 2024

This package is very old at this point and the pandas functionality has been rolled into scikit learn for a while now so this env should really not need to be created

Glad it does work to fix it at least :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants