-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uv fails to compile apache-airflow[all]
(performance resolution issue)
#1560
Comments
Interesting, thanks for the details.
|
Related discussion of resolver heuristics in #1398 |
When uv can resolve |
I was distinguishing the two based on that was just "slow" where as this ends in a failure. Although in both cases the high level issue is uv backtracks too far on a transitive dependency, the symptom, and possibly solution, are different. Any guidance on how to file resolution issues in the future that would best fit the uv teams workflow would be appreciated. I still have a lot more scenarios to try out from real world issues reported to Pip to see if uv can handle them. |
Feel free to open a new issue for each case. I'm cross-linking them for my own sake, but we also have a label. The absolute best thing one can do is write scenarios reproducing the problem so we can a clear statement of the problem and test coverage for the solution. |
Yeah, I am starting to take a look at packse as I would like to add it to the Pip test suite (pypa/pip#12526). However it seems like one has to manually build scenarios? Compared to say pip-resolver-benchmarks which can build a scenario automatically (though the JSONs are huge). But I'll take that up on the packse repo. |
I just assumed this was true, but I tried it right now and it didn't resolve for me:
Is this a seperate issue? This resolves fine:
|
Hey - Airlfow maintainer here and the "The CI/dev tool Airflow guy", Apache Beam is notroriusly complex to get right. This is by far biggest problem contibutor to airflow (including the fact that it's the only provider we have to disable for upcoming Python 3.12 support. The problem you likely have there is - I guess - that the old version of apache-beam has not enough metadata and you want to install it to get the metadata and this really really old version which does not have a binary wheel at all https://pypi.org/project/apache-beam/2.2.0/#files and fails the compilation - both cases should likely simply lead to skipping that version entirely in your resolution process. BTW. It's a little tangential @zanieb and other maintainers, But speaking of We are not yet switching for production - kind of usage to uv, not recommend it to our users (yet) - but I am quite impressed so far with the speed improvements especially and definitely will keep an aye and adopt (and maybe even some day contribute) to packaging tooling out there - finally it seems that the packaging tooling is getting close to provide a lot of the things we've been doing in custom ways in Airflow (because of our size and setup and complexity we had to do a lot on our own) - and we can slowly replace and improve pieces of our CI / development tooling with more standard solution (I love when I can do it with confidence). You can see mailing discushttps://lists.apache.org/thread/8gyzqc1d1lxs5z7g9h2lh2lcoksy2xf9 BTW. I will be at Pycon US in Pittsbuirgh and how to meet a lot of the packaging team and astral team there! Signed up to packaging summit and hope to see you and talk to you there. |
@potiuk - Thanks so much for chiming in and for the kind words -- that PR is so so cool! I'll be at PyCon US too and also signed up for the packaging summit. |
FYI I created a seperate issue for this: #2003 |
FYI. My non-scientific comparision (after > 24 hrs switching to uv) is that Airflow's workflow is getting a HUGE boost. We use our CI image also to make sure our In my case I got the worst case full rebuilt, with upgrade to latest dependencies and disabling docker cache altogether down from 12 minutes with https://lists.apache.org/thread/sq70ch6lllryv4cr5q0xjt6b9z5n0vd8 Thanks again for this one. I hope my "First time contributor's workshop" for Pycon will get accepted in the |
Btw, do you know if any of those optimizations are still required for pip? Or if you've had any specific resolution issues in last ~6 months? I know uv is a lot faster, but I only have a handful of examples where it resolves "better" (i.e. visits less packages to resolve), and I have a PR on pip side which fixes all of those. So any more examples would appreciated . And of course this issue is an example with pip chooses a better resolution path than uv. |
It's mostly for the design of caching by Docker layers and addressing several cases - when users in CI upgrade only dependencies, or when they upgrade to conflicitng dependencie - not the resolution itself. What is left from the "resolution helper" is the list of dependencies i add to address the
That one however was based on pure guesses - whenever |
PR running here to remove it apache/airflow#37745 |
Well uv can certainly suffer from the same issues: #1398 It will be interesting to see if uv is performant enough for you when it also goes backtracking in the wild. |
Hard to say - those extra requirements tend to solve itself over time. - I periodically removed them and added new ones when we got into bactracking issue, also what (obviously) helps is regular bumping of lower limits in some dependencies. This change is fine (I had to handle edge case where extra requirements are empty) - the changes generated by removal of those extra requirements works fine:
No excessive backtracking (the images were built in 3 minutes). |
For context, boto3/botocore are notoriously hard due to their large number of releases, where we have to backtrack through every single one: https://pypi.org/project/boto3/#history, https://pypi.org/project/botocore/#history. We're planning on improving the situation around boto specifically. |
Oh absolutely - approach of boto3/botocore is particularly difficult for package managers and resolution. When backtracking happens the first thing I do is trying botocore/boto limitation. |
FYI, the error output is different now, but the issue of uv backtracking too far back on apache-beam for Python 3.11 with latest version of airflow still exists as of today:
I don't know if pubgrub-rs is felxible enough, but I still strongly suggest that when backtracking and choosing between two packages, try as much as possible to avoid choosing one that involves having to compile sdists. I recently further discussed (pypa/pip#12035) this idea on the pip side and plan to eventually create a PR for pip. |
This is somewhat related to what I reported today in #2821 - similarly like in the other issue - backtracking is not only considering some really old versions of dependencies (apache-beam 2.2.0 has been released in 2017 (!) but also installation fails in case those considered candidates have some serious issues with metadata. I think while the "failing on non-installable candidate" has an easy solution, indeed some extra heuristics for candidate selection should speed the resolution even further. Hard to say what heuristics though. I also think there is something wrong with the current candidate selection. Currently (and likely for quite some time) - we limit apache-beam in We could potentially - of course - in the future versions of airflow to say for example Of course I know it's an NP-complete problem and choosing some heuristics will cause problems for some other cases - and I do not know the details of how Just a "lame" proposal - without knowing the internals - it might be - again - stating the obvious thing that already happens (or maybe I am missing somethign that I am not aware of), so apologies if that's the case :) |
Our pacakage selection heuristic is currently just going through packages in order we first see them in the requirements of another package - There's definitely room for improvement.
Lower bounds would be very helpful! My heuristic is that the lower bound should be the lowest version that still works (passes tests with The conflicts we encounter are often of a specific shape: Say we have two dependencies a and b. We see that for the latest versions, a==5 and b==5, a==5 wants c==2 and d==3 while b==5 wants c==3 and d==2. We can start iterating either over a's or over b's past releases. With a lower bound on a, say a>=4, we try few versions until we determine that no version of a works and we have to reject b==5 instead. I've confirmed that we resolve
|
Is there a reason you haven't taken pip's approach and prioritize certain heuristics? It's probably why pip can resolve this requirement and uv can not, the relevant heauristics here would be direct, pinned, and inferred depth.
The problem is that a library must support multiple versions of Python, let's say the lower bound for dependency Your heuristic is fine, but for library authors supporting wide ranges of Python it doesn't actually help that much.
Yes, or you can do this #1560 (comment). That's why I titled this a performance issue, not a bug with uv, uv does not do as good of a job limiting the number of candidates it is checking against compared to pip. |
I think the problem and it's solution are sufficiently understood to close this issue @potiuk You're enforcing a lower version bound on apache-airflow-providers-apache-beam in https://github.com/apache/airflow/blob/5fa80b6aea60f93cdada66f160e2b54f723865ca/airflow/providers/apache/beam/__init__.py#L37-L42, if you move that to the package metadata you should be able to drop the check there. I unfortunately do understand enough of your build system to change that. @notatallshaw Closing in favor of #1398. Please feel free to open a new issue issue if other performance (or any other kind of ) problems with apache airflow should arise. |
@konstin sorry I don't understand why you've closed this issue:
Unless I'm missing something, such as some root cause analysis, can you please reopen this ticket. |
Haven't been paying close attention to this issue but this heuristic has some downsides. For one, the output resolution for a given set of inputs could change despite no changes in the available versions on PyPI, just by way of wheels being uploaded for existing versions. |
I don't think so, except in the case where it is a transitive dependency that is optional, in the sense a solution can be found without it. In which case, how significantly negative is this? But I think uv should first try pip's priorities (#1560 (comment)), or similar, they seem to be strong enough here to find a solution while avoiding old versions of apache-beam, at least for pip which can install this requirement without issue. |
I mean, it's definitely true that changing the order in which you visit packages will change the output resolution in some cases. So whenever you change the prioritization, you will change the output resolution in some cases. And my point here is that you're now changing the resolution based on subtle properties that don't map to how users think about their requirements. Regardless, we should try out some of pip's priorities, would be happy to see a separate issue for that. |
That's true, but I think the word "some" is doing a lot of heavy lifting. I think you would be hard pressed to find a real world example (though I'm sure one could easily artificially construct an example) where it would impact resolution solution (beyond allowing it to actually find a solution in cases like this). But I understand if you consider the chance of this weird behavior appearing in unusual edge cases to be an extremely negative property of a resolution algorithm.
I am a long way off sufficient Rust knowledge to contribute to these projects, probably for a year or two, otherwise I would have been happy to make a PR to try this out. Regardless though of the solution, it remains my original issue as posted is still somewhere that uv fails to resolve and pip does not. #1398 was a wall clock performance issue, that didn't cause any failures, and that was solved. This issue is a performance of uv's resolution in the sense it visits too old packages (for some sense of old that means won't compile) which causes real world failures, and is not solved. |
There seems to be some communication issue here:
As such I've opened this as a new issue that is much more focused on the problem: #3078 |
I don't know why @konstin closed the issue. I'll just re-open until we can resolve this case. My prior comment was only meant to signal that I thought the idea of using different rules for prioritization was sensible. |
Sorry, now I feel like I'm doing the wrong thing by re-opening given the new issue, so I'll re-close and we'll continue from #3078. |
I don't mind which issue is kept open, I just don't want this to drop off as a known issue. This issue was the original but the other is way more focused and so doesn't have the baggage of this thread ¯\(ツ)/¯. |
Thanks, #3078 is much more actionable |
This is for Linux Python 3.11.6 with uv 0.1.3:
The exception from building apache-beam==2.2.0 is fine, the issue is that uv is backtracking all the way to apache-beam 2.2.0, which is too old to compile in my Python 3.11 environment
Pip does not suffer this issue and would install apache-beam-2.54.0:
Interestingly
rip
has a very similiar issue: prefix-dev/rip#174.I have speculated there that a good heuristic when resolving is to first resolve requirements that have wheels over sdists, or at least when a requirement has newer package versions with wheels and older package versions with sdists to prefer other requirements once the all wheels (metadata) have been collected by the resolution process and further collection (of metadata) requires building sdists.
I have suggested the same thing for Pip (and recently developed a loose idea of how to actually implement it), but I haven't tried doing it yet: pypa/pip#12035 (it's at the bottom of my list of things to try and improve about performance resolution in Pip).
You of course may have very different ideas on how to solve this issue! I would be interested to follow any improvements you make here.
The text was updated successfully, but these errors were encountered: