Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dub: Cache PackageManager across dub{Package,Dependant} targets #281

Merged
merged 1 commit into from Feb 28, 2024

Conversation

kinke
Copy link
Contributor

@kinke kinke commented Feb 20, 2024

No description provided.

// With this cache here, this means that we only support that
// dir for the first root project, and apply it to all other
// root projects.
cachedPM = super.makePackageManager();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Geod24: Any suggestions? Maybe constructing directly here and feeding it NativePath.init as root package path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Geod24: Looking at https://github.com/dlang/dub/blob/0d4858eb473a31a058132c8692d499effdb1cb28/source/dub/packagemanager.d#L116-L124, it doesn't look like that special <root pkg path>/.dub/packages/ repo can be disabled. Is there a way?

Copy link
Contributor Author

@kinke kinke Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And AFAICT, I cannot use the reggae project or build directory here either (for a somewhat consistent local .dub/packages repo, not depending on the order of dubPackage targets), as each dub root project/config gets a reggae Options struct with projectPath and workingDir both overridden to the dub root project directory. And the original projectPath is a bunch of calls up:

projectPath = reggaeOptions.projectPath;
const path = buildPath(projectPath, dubPath.value);
subOptions = reggaeOptions.dup;
subOptions.projectPath = path;
subOptions.workingDir = path;
subOptions.dubConfig = dubPath.config.value;
// dubInfos in this case returns an associative array but there's
// only really one key.
auto output = () @trusted { return stdout; }();
dubInfo = dubInfos(output, subOptions)[dubPath.config.value];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PackageManager is not designed to handle changing Location, and was originally designed with 3 hardcoded Locations, making any kind of extensibility hard (e.g. the PlacementLocation enum).
Caching the PackageManager is an option but seems the most hacky one. Another option is to only instantiate one Dub instance (might require some changes to Dub), or to make package loading lazy. Per and myself have made some progress in the lazy loading, I don't know the Reggae codebase well enough to say how much Dub needs to change to allow for what it does. So ATM I don't have a satisfying answer, sorry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay nevermind, I found a way, and am using the original reggae project directory now.

@kinke
Copy link
Contributor Author

kinke commented Feb 20, 2024

After some quick tests, this seems to work. On my box, a buildgen runtime (for a reggaefile.d with ~23 dub root projects) of previously 6.2 seconds was reduced to 5.7 secs; with an almost empty ~/.dub/packages cache. I hope it's more noticeable with a populated cache (on Windows, the original buildgen runtime was 23 seconds, and after killing %LOCALAPPDATA%\dub\packages, it was cut in half to 11.5 seconds - but that was before this PR).

@kinke
Copy link
Contributor Author

kinke commented Feb 20, 2024

I hope it's more noticeable with a populated cache

Seems like it, nice. I've repopulated my dub packages cache on Windows, fetching 111 packages. Then the buildgen runtime of master is ~13.5 secs, and with this PR down to ~10 secs (best of 3 each). That's quite a lot, as 111 dub packages in the cache isn't a lot, and a good portion of that runtime is caused by preGenerateCommands (edit: which are unaffected).

Edit: I've also diffed the produced build.ninja files (1.4 MB!) - identical.

Copy link

@PetarKirov PetarKirov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I defer to @Geod24 for the final decision.

@Geod24
Copy link
Contributor

Geod24 commented Feb 21, 2024

Why does Reggae need to instantiate more than one Dub instance though ?

@kinke
Copy link
Contributor Author

kinke commented Feb 21, 2024

Why does Reggae need to instantiate more than one Dub instance though ?

As said, a reggaefile.d might represent an orchestrated build of dozens of dub root projects/configs. The Dub class ctors AFAICT don't really look as if a single instance could be used for all of them, switching from one root project to the next. So currently, if reggaefile.d contains N dubPackage targets, it creates N Dub instances (and used to create N PackageManager instances along with it).

@Geod24
Copy link
Contributor

Geod24 commented Feb 21, 2024

Then I think caching the PackageManager makes sense. I've been looking into removing the scanning behavior. It was reintroduced at some point because of a corner case that I didn't know how to handle at the time, but can now. But even without the scanning, you'd end up loading the same packages multiple times, which is wasteful.

In the long run, I would like to look into why it needs multiple root packages. The only thing that root package has on a regular package is the dependency resolution / config. Dub can obviously build dependencies without them being root packages so I would expect Reggae to be able to do so as well (conceptually, the API for it is probably terrible).

@kinke
Copy link
Contributor Author

kinke commented Feb 21, 2024

One unittest fails because it changes a dub.sdl that it previously built, and rightfully expects reggae to re-generate its configuration after the .sdl change. But caching the PackageManager (for the whole unittest process - per thread) leads to the .sdl not being reloaded.

auto lines = runIt("-d myvar=foo");
srcLine.should.not.be in lines;
cfgLine.should.be in lines;
// we cache dub's PackageManager per-thread; use a new thread to make sure the changed .sdl is reloaded
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the test needs to know waaay too much about the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the test needs to reparse a changed .sdl, something a real reggae process never does, and I thought this is the least invasive method to make that test still work.

@atilaneves
Copy link
Owner

I don't know enough about dub to review this. @Geod24 ?

@kinke
Copy link
Contributor Author

kinke commented Feb 22, 2024

Caching the PackageManager is an option but seems the most hacky one. Another option is to only instantiate one Dub instance (might require some changes to Dub), or to make package loading lazy. Per and myself have made some progress in the lazy loading, I don't know the Reggae codebase well enough to say how much Dub needs to change to allow for what it does.

I'm more than happy to revise this after according dub API changes; this performance problem is the only real blocker for switching our SIL build to lots of dubPackage targets in a central reggaefile.d, so I have to make do with what dub currently offers. What takes 10 secs in my Windows VM with this cache here currently takes 40 secs on our Windows CI runners (with an older reggae version missing some other existing optimizations, and a way more populated dub packages cache).

Wrt. the Dub class - what stands in reggae's way for using a single instance is that it's tightly coupled to a single root project: https://github.com/dlang/dub/blob/12c78d972680924446f205480c61167b82318286/source/dub/dub.d#L112-L129. With reggae, we might have a pretty large number of dub targets, specified by a pair of <path to dub pkg>, <dub config name>. So e.g. something like 3 configs for one package (executable, unittest runner, integration test runner), and one config (unittest runner only) for a library-only other package.

Lazy loading would be super-nice of course, so that the dub packages cache size hopefully wouldn't affect dub performance anymore.

@Geod24
Copy link
Contributor

Geod24 commented Feb 22, 2024

With reggae, we might have a pretty large number of dub targets, specified by a pair of , . So e.g. something like 3 configs for one package (executable, unittest runner, integration test runner), and one config (unittest runner only) for a library-only other package.

The one-config-per-root-project approach also doesn't work if we want to solve dlang/dub#1706 and dlang/dub#1217 .

@kinke
Copy link
Contributor Author

kinke commented Feb 22, 2024

Well reggae's dubPackage targets can come with custom options (incl. dub build type, and/or extra dflags applied to the entire dub tree for that specific root-pkg+config target etc. etc.). E.g., the integration tests config is usually coupled with the unittest dub build type (not the default debug one), which is the implicit one for the special unittest config. So I doubt there's much to gain if we tried to bundle all configs of a specific root project and process them as a bundle.

@atilaneves atilaneves merged commit b5ec502 into atilaneves:master Feb 28, 2024
20 checks passed
@kinke kinke deleted the cache_packagemanager branch February 28, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants