-
-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize dependency resolution algorithm? #652
Comments
FWIW, I just hit a case where it took approx 12 hours. As dub grows and gets more heavily used, this problem will probably only get bigger and more common. |
Sounds like a bug. Any idea what the bound is ? I take from your comment it's IO (registry) bound ? |
Appears to be CPU. I happen to have it running right now (without I've done it before with |
FWIW, the program I'm hitting this issue with (it's not on github right now) involves 17 packages total (including both direct and indirect dependencies). 19 if you count sub-packages separately. |
The issue is that the current ((hopefully) exhaustive) algorithm has a worst case exponential runtime. There are a number of quick paths that make the process linear in the usual case, but some dependency configurations contain pathological cases. There are basically two options how to solve this:
The latter would only be possible by making some assumptions, such as that a newer version of a package will have dependency specifications that are newer or equal to the older versions. Since this will not always be true, some dependency trees may yield a sub-optimal set of version selections, or, in the worst case, will fail to fine one at all. |
BTW, the problem in general is NP, so there is no algorithm that is faster than exponential (unless P==NP of course...). We'd really have to change the problem formulation to be able to fix this completely. |
Is there any way to at least just bypass the dependency resolution step? I tried |
|
It seems that dub doesn't use the dependencies of the earlier packages to narrow down the versions tried for later packages. |
And if no dependencies are specified, we should always try to use the latest version of a package in the allowed range. |
And when your package specifies 2 dependencies w/o versions and the latest of those have contradicting dependencies, then it's fine to fail, i.e. you need to select compatible versions when they have diamond dependencies. |
Ran across this with a barebones hello world that has a single dependencies line: "gfm": "~>3.0.4". Just in case you're looking for testcases. |
Yes, I also found this via gfm. |
Is there any easy way to avoid hitting the slow cases here? I've been waiting for 30 mins for what should be a simple build... |
I don't know of a way to work around this other than putting indirect dependencies into the root package, but a good way to avoid the most common slow case is to merge #733 (Reviewers welcome ;) |
I'm not familiar with the relevant section of dub's code (had trouble wrapping my head around what/how it was doing back when I did look), but I was just giving a little thought to how it might be done and I'm starting to wonder: Are we really sure the algorithm's scaling complexity really is the cause of these particular slowdowns after all? Little back-of-napkin calculation: Suppose we have a root project with 64 direct and indirect dependencies total. (I know the number of indirect dependencies can vary depending on the dependency versions selected, but let's say all the combinations average to 64 dependencies.) Now suppose each of those packages has, on average, 64 version tags available. Now, I know for a fact that I've hit this problem on a project with considerably less than that, more like 32x32, if even that. (Furthermore, I would imagine for any remotely realistic project, the list of all versions available for all potentially-used packages, and the deps list for each version, would easily fit in memory. So all that could be cached if it isn't already. So there's no additional nested algorithmic complexity stemming from any redundant loading of deps/version info.) That amounts to a worst-case-scenario (with a no-shortcuts, brute-force "check every possibility and select the best one found" algorithm), of needing to check 64 * 64 == 4,096 combinations. Unless I'm overlooking something, I'd imagine that once you have a combination to check, the check itself would be computationally very simple. So, suppose it takes a full second to check each combination (maybe I'm missing something, but I don't see how each check would realistically need to take that long, even if you include the time spent calculating the next combination to be checked). Then that's 4k/60==~68 seconds or just over one minute to resolve dependencies. And I've hit cases of around half-hour, maybe more, on projects with dependency graphs no more than a quarter the size of this example. I'm thinking we may be looking at a slowdown in the implementation, not the algorithm. Maybe some sort of overlooked nested complexity that doesn't need to be there, or something that could be cached but isn't. |
It's not #733 changes this to add an "invalid" version only for dependencies that are actually only referenced as optional, and it appends instead of prepending to the list, which means that the "invalid" version is tried last instead of first. We'll have to see how it turns out, but I hope that this will give us more time to come up with an algorithm with better worst-case run time. Ideally I'd also like to take that opportunity to let the algorithm also resolve the build configurations in addition to the versions. They are currently calculated separately, which could lead to undesired results in pathological cases. |
"It's not 32*32 but 32^^32!" Oops, yea, that's right. |
Moved to backlog and scheduled for the next few weeks when I'm done with fixing imports and symbol visibility. |
@MartinNowak ping me when you have a concept. I had a few ideas, but didn't fully think them through, yet. The algorithm should be able to handle version and configuration resolution at the same time (which should automatically work if it uses a generic key type like the current |
I'll take #652 (comment) as a starting point, dependency resolution and conservative partial upgrades work really well with bundler. |
Right, 04648ee solved the most pressing 2^N issue with gfm, that was caused by prepending invalid configurations (old method to support optional deps). |
Looking at other dependency resolvers, we can improve the following. Terminology
Improvement Ideas
|
Think we should close this. The urgent problem has been fixed and the resolver is fairly good/fast already. |
Is it possible to adjust the dependency resolution algorithm to scale better? I've hit situations (not easy to reproduce AFAICT so far) where the dependency resolution step can take several minutes or even half-hour or more, and (unless --vverbose is used) with no output indicating what it's doing or any indication that no, it has not actually locked up.
The text was updated successfully, but these errors were encountered: