Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #253: package confusion between conda and pip #290

Merged
merged 13 commits into from
May 18, 2023

Conversation

romain-intel
Copy link
Contributor

Before this patch, the following would fail:

channels:
  - conda-forge
dependencies:
  - pip
  - dask
  - pip:
    - dask

The core issue seems to be that two Conda packages (dask and dask-core) both map to the dask pypi package. This was causing issues later on when assigning categories. This patch properly deals with multiple Conda packages per pip package and addresses the naming issue.

I have verified that the above package specification works as well as the others listed in #253.

Before this patch, the following would fail:
```
channels:
  - conda-forge
dependencies:
  - pip
  - dask
  - pip:
    - dask
```
The core issue seems to be that two Conda packages (`dask` and `dask-core`)
both map to the `dask` pypi package. This was causing issues later on when
assigning categories. This patch properly deals with multiple Conda packages
per pip package and addresses the naming issue.

I have verified that the above package specification works as well as the others
listed in conda#253.
@netlify
Copy link

netlify bot commented Dec 1, 2022

Deploy Preview for conda-lock ready!

Name Link
🔨 Latest commit ffcb2d5
🔍 Latest deploy log https://app.netlify.com/sites/conda-lock/deploys/63892ea559623f0008997a35
😎 Deploy Preview https://deploy-preview-290--conda-lock.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@romain-intel
Copy link
Contributor Author

I think there is some test flakiness given that it passes on other Python versions/platforms. Let me know if it is something else though.

@romain-intel romain-intel requested a review from a team as a code owner February 9, 2023 22:35
@netlify
Copy link

netlify bot commented Feb 9, 2023

Deploy Preview for conda-lock ready!

Name Link
🔨 Latest commit 452ff6b
🔍 Latest deploy log https://app.netlify.com/sites/conda-lock/deploys/6464fba34ca0800007ae8c40
😎 Deploy Preview https://deploy-preview-290--conda-lock.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@maresb
Copy link
Contributor

maresb commented Feb 25, 2023

pre-commit.ci autofix

@maresb maresb force-pushed the fix/pypi-conda-pkg-confusion branch from a84fe26 to 36f0985 Compare February 25, 2023 17:54
@romain-intel
Copy link
Contributor Author

Thanks for taking a look. I'll fix the pre-commit thing (I just need to extract a function out -- I had it in my first version and then undid it because I had forgotten why -- it was to remove the too complex function warning). I'll check the type issues too. May have missed something with the merge.

@maresb
Copy link
Contributor

maresb commented Feb 25, 2023

Awesome, thanks!!! I really want to try and get this in soon, but I still need to understand it first. Be warned there could be further merge conflicts.

@maresb
Copy link
Contributor

maresb commented Feb 25, 2023

Any thoughts on #322?

@maresb
Copy link
Contributor

maresb commented Mar 2, 2023

Closes #380

@maresb maresb linked an issue Mar 2, 2023 that may be closed by this pull request
2 tasks
@romain-intel
Copy link
Contributor Author

Sorry, I have been traveling -- I haven't forgotten. I'll try to get to it tonight or over the weekend.

@romain-intel
Copy link
Contributor Author

I will see if I can add a test as well.

@romain-intel
Copy link
Contributor Author

Any thoughts on #322?

I must say that I am not fully well versed in the mapping rules. My gut feeling though is that @PanayotisManganaris is probably right in saying that there are probably cases where the mapping done will go wrong and it may be nice to offer users a "way out". There is currently no option to change to a different lookup table and/or to add to it. I could see it becoming quite possibly complicated though:

  • different lookup tables for different channels
  • ability for users to specify their own yaml lookup (that should be fairly straightforward to enable and may cover a large majority of these cases
  • ability for users to specify overrides in the yaml file to resolve (that may also cover a good variety of use cases).

@romain-intel
Copy link
Contributor Author

OK, I added a simple test (nothing fancy) but hopefully this should be good to go. Apologies again for the delay -- it's been a very hectic few weeks but let me know if you need anything else. Would love to get this merged too (and released -- it's been a while since the last release :) ).

@maresb
Copy link
Contributor

maresb commented Mar 10, 2023

Thank you so much for this!!!

I really want to get this merged too, and I agree that we're due for a big release soon.

I am also getting somewhat lost with how the mapping is supposed to work. Now that you have your head wrapped around this part of the logic, would you have any time to add a few comments? I find the code in this vicinity really difficult to read, not just this PR. Alternatively, maybe I could try an figure it out myself, and you could review what I write to see if it's accurate.

Some of the things I'm getting confused about: what is the planned mapping? In particular, what exactly are the keys, and why are the values lists of LockedDependencys? What is the general iteration strategy with extract_planned_items? And it what sense are they "extracted"? (Maybe we should be using a different word here?)

@romain-intel
Copy link
Contributor Author

Let me see what I can add. I can't say I have a brilliant understanding of the code but can try to clarify some of the portions I understood. Hopefully with both of us we can come up with something sensible :).

@maresb
Copy link
Contributor

maresb commented Apr 3, 2023

I'm extremely time-constrained at the moment, and I still understand neither the old nor the new logic.

However, there seem to be a lot of people running into #253, and this patch seems to be extremely successful in resolving it. This is also not a radical change. @mariusvniekerk, I feel like this should be safe to merge, and then on the next iteration we can work on documentation and simplification.

Complementary to this effort, I do understand the upstream PyPI mapping quite a bit better, and I recently made a bunch of improvements. So in case the complications in the current code are to make up for previous insufficiencies in the PyPI mapping, this may allow for some significant simplifications.

@romain-intel
Copy link
Contributor Author

hello -- apologies. I said I would add some comments and then the end of the quarter came. Now that it is done, your message came at the perfect time so I made some time to put at least some comments. I can't say that I fully understand why it is doing this category assignation and what the role of it is but hopefully it clarifies how it is doing it. Let me know if this helps and if you need something else.

@leeleavitt
Copy link

I composed 3 manual tests,

pyproject.toml
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "tester"
version = "0.1.0"
description = "tester project"
requires-python = ">=3.9"

# Conda lock environment specs
[tool.conda-lock]
channels = [
    "bioconda",
    "conda-forge"
]
platforms = [
    "linux-64",
    "osx-64"
]

[tool.poetry.dependencies]
python = "3.9.12"
pip = "*"
pybedtools = "0.9.0"
miniwdl = "1.2.3"
pytest-wdl = {version = "*", source = "pypi"}
zenvironment-pytest-wdl.yaml
name: python
platforms:
  - linux-64
  - osx-64
channels:
  - bioconda
  - conda-forge
dependencies:
  - python=3.9.12
  - pip
  - miniwdl=1.2.3
  - pip:
    - pytest-wdl
zenvironment-python-tzdata.yaml
name: python
platforms:
  - linux-64
  - osx-64
channels:
  - bioconda
  - conda-forge
dependencies:
  - python=3.9.12
  - pybedtools
  - pip
  - pip:
    - pandas==1.3.5

I first tested merging your branch into mine for #398, i did not get the errors. So then i removed my fix and tested my examples on your branch and i did not get the errors.

I went one step further and removed _separator_munge_get to simply,

def _seperator_munge_get(
   d: Mapping[str, Union[List[LockedDependency], LockedDependency]], key: str
) -> Union[List[LockedDependency], LockedDependency]:
   # since separators are not consistent across managers (or even within) we need to do some double attempts here
   # try:
   return d[key]

And once again i did not get any error. Overall i believe this fixes my issues and i would love to see this get merged into main.

@maresb
Copy link
Contributor

maresb commented Apr 21, 2023

I went one step further and removed _separator_munge_get to simply...

Awesome, this is really great info @leeleavitt! It would be great to be able to remove it since it's so confusing.

I also really appreciate the effort to design those tests. Did you implement them in PyTest? If so, would you be willing to put them in a PR? That would certainly help me with gathering the courage to merge this.

@leeleavitt
Copy link

I went one step further and removed _separator_munge_get to simply...

Awesome, this is really great info @leeleavitt! It would be great to be able to remove it since it's so confusing.

I also really appreciate the effort to design those tests. Did you implement them in PyTest? If so, would you be willing to put them in a PR? That would certainly help me with gathering the courage to merge this.

I was looking into the testing organization, but i could not see how to add these as tests. Do you have any pointers for where and how i could add these as tests? Since these would require network interaction i'm assuming I would need to create some cassettes?

@romain-intel
Copy link
Contributor Author

Re _separator_munge_get: I did not change the function in my PR because I wasn't sure if there was a reason for it (chesterton's fence). I believe that it is there to make names that have _ or - the same as they are sometimes used interchangeably.

In terms of tests, I think you can add the .yml files just like I added the one test file for the one I ran. Is that what you are asking?

@lesteve
Copy link
Contributor

lesteve commented Apr 28, 2023

I believe that it is there to make names that have _ or - the same as they are sometimes used interchangeably.

In my experience pip does not care about - or _ (or . by the way) e.g. pip install scikit_learn or pip install scikit-learn (see normalization PyPA doc also). I believe this kind of caveats should be supported in conda-lock as well so _separator_munge_get should stay ...

It would be really great to merge this, this would get rid of a lot of issues with mixed conda/pip dependencies.

@mariusvniekerk
Copy link
Collaborator

@maresb Added these tests in. Will merge if they pass

@maresb
Copy link
Contributor

maresb commented May 17, 2023

Excellent, thanks a lot @mariusvniekerk for moving this forward!

@mariusvniekerk
Copy link
Collaborator

mariusvniekerk commented May 17, 2023

Probably worthwhile consolidating the various simple "solvability tests" at some point, but out of scope here

@maresb
Copy link
Contributor

maresb commented May 17, 2023

@mariusvniekerk I'd really like to get your thoughts on name canonicalization. For PyPI there is a canonicalize_name function, so I think if we canonicalize PyPI names everywhere then there should no longer be a problem with those.

Then the only question is Conda, which to the best of my knowledge is trickier. I patched a bunch of the upstream Grayskull mapping stuff, and I'm wondering if there is any know example where seperator_munge_get actually does something?

@romain-intel
Copy link
Contributor Author

Whoohoo! Thanks for moving this forward! If the tests don't pass, I'll take a look and see if there is something I can do.

@@ -228,6 +228,9 @@ def solve_pypi(
for locked_dep in conda_locked.values():
if locked_dep.name.startswith("__"):
continue
# ignore packages that don't depend on Python
if locked_dep.manager != "pip" and "python" not in locked_dep.dependencies:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty certain this change alone solves basically everything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intriguing!!! When does it trigger?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an example of a non-Python pkg which causes trouble?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tzdata is the classic one, so adding in the contraint causes pip to never attempt to assume it has tzadata already installed since it doesn't

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the failing tests on Windows...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures between the last two runs were disjoint. Maybe we just got unlucky with transient errors. Trying a third time.

@mariusvniekerk
Copy link
Collaborator

Then the only question is Conda, which to the best of my knowledge is trickier. I patched a bunch of the upstream Grayskull mapping stuff, and I'm wondering if there is any know example where seperator_munge_get actually does something?

Sadly conda packages have no canonical name form (practically many things are lowercase kebab, but it's not a hard rule and some things even use BOTH)

@mariusvniekerk mariusvniekerk merged commit 4a667bd into conda:main May 18, 2023
11 checks passed
@romain-intel
Copy link
Contributor Author

WHOOHOO!! Thanks for merging this!! Much appreciated! Now I can get off my custom build :)

@lesteve
Copy link
Contributor

lesteve commented May 19, 2023

Great to see this merged, thanks to everyone involved, this will get rid of a lot of issues with mixed conda/pip dependencies!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fails to resolve for dependency with a dash (-) in name
5 participants