Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce ratchet memory usage for multi-module projects #1426

Merged
merged 5 commits into from Dec 31, 2022

Conversation

laurentgo
Copy link
Contributor

Ratchet creates a cache of Git repository per project but each Git repository is a new FileRepository instance even if two projects share the same repository on disk.

Since FileRepository is thread-safe, change GitRatchet to cache FileRepository instances per git repository

Please DO NOT FORCE PUSH. Don't worry about messy history, it's easier to do code review if we can tell what happened after the review, and force pushing breaks that.

Please make sure that your PR allows edits from maintainers. Sometimes its faster for us to just fix something than it is to describe how to fix it.

Allow edits from maintainers

After creating the PR, please add a commit that adds a bullet-point under the [Unreleased] section of CHANGES.md, plugin-gradle/CHANGES.md, and plugin-maven/CHANGES.md which includes:

  • a summary of the change
  • either
    • a link to the issue you are resolving (for small changes)
    • a link to the PR you just created (for big changes likely to have discussion)

If your change only affects a build plugin, and not the lib, then you only need to update the plugin-foo/CHANGES.md for that plugin.

If your change affects lib in an end-user-visible way (fixing a bug, updating a version) then you need to update CHANGES.md for both the lib and all build plugins. Users of a build plugin shouldn't have to refer to lib to see changes that affect them.

This makes it easier for the maintainers to quickly release your changes :)

laurentgo and others added 3 commits December 15, 2022 15:01
Ratchet creates a cache of Git repository per project but each Git
repository is a new FileRepository instance even if two projects share
the same repository on disk.

Since FileRepository is thread-safe, change GitRatchet to cache
FileRepository instances per git repository
@nedtwigg
Copy link
Member

Did you profile this (System.out.println would work) to confirm that it's actually creating fewer Repository instances? They are supposed to be cached already thanks to the gitRoots map

protected Repository repositoryFor(Project project) throws IOException {
Repository repo = gitRoots.get(project);
if (repo == null) {
if (isGitRoot(getDir(project))) {
repo = createRepo(getDir(project));
} else {
Project parentProj = getParent(project);
if (parentProj == null) {
repo = traverseParentsUntil(getDir(project).getParentFile(), null);
if (repo == null) {
throw new IllegalArgumentException("Cannot find git repository in any parent directory");
}
} else {
repo = traverseParentsUntil(getDir(project).getParentFile(), getDir(parentProj));
if (repo == null) {
repo = repositoryFor(parentProj);
}
}
}
gitRoots.put(project, repo);
}
return repo;

But maybe there's a mistake.

@laurentgo
Copy link
Contributor Author

Did you profile this (System.out.println would work) to confirm that it's actually creating fewer Repository instances? They are supposed to be cached already thanks to the gitRoots map

protected Repository repositoryFor(Project project) throws IOException {
Repository repo = gitRoots.get(project);
if (repo == null) {
if (isGitRoot(getDir(project))) {
repo = createRepo(getDir(project));
} else {
Project parentProj = getParent(project);
if (parentProj == null) {
repo = traverseParentsUntil(getDir(project).getParentFile(), null);
if (repo == null) {
throw new IllegalArgumentException("Cannot find git repository in any parent directory");
}
} else {
repo = traverseParentsUntil(getDir(project).getParentFile(), getDir(parentProj));
if (repo == null) {
repo = repositoryFor(parentProj);
}
}
}
gitRoots.put(project, repo);
}
return repo;

But maybe there's a mistake.

I actually profiled the whole build and inspected the heap before and after. Before my change, there was a different Repository instance per project and I guess it's because traverseParentsUntil doesn't use the cache at all?

Tbh I was not sure what was the full purpose of gitRoots (was the intent also to cache repositories directly in it?) but based on your comment, it seems it was, and I may have an idea on how to directly back this into gitRoot vs introducing a level of indirection.

@laurentgo
Copy link
Contributor Author

laurentgo commented Dec 19, 2022

After reading back my previous comment, I realized that something was amiss and so I re-did a debugging session.

The main issue is around the usage of isGitRoot: the name is actually misleading a bit and the method returns true if the provided File is located inside a git repository (vs returning true if the provided argument is the root of a git repository). This leads to repositoryFor method to always take the first branch of this test and never attempting to traverse the parents.

I still believe there's another issue with the second branch: if a project has no parent or if a child project is not located under its parent, then cache is not being used although those projects could share the same git root.

My change which uses git root location has the cache key actually addresses both issues.

Change GitRatchet cache logic to use git root instead of projects as a
caching key for FileRepository.
@laurentgo
Copy link
Contributor Author

@nedtwigg are you okay with the approach?

@nedtwigg
Copy link
Member

So I think the old code was trying to say:

  • finding a repository is a little bit expensive
  • and it's cheap to determine the relationship between projects
  • so let's take advantage of that so that we don't search for repositories more than we have to

But it had some subtle bugs, so it wasn't worth it. Thanks for cutting through the knot!

@nedtwigg nedtwigg merged commit 80f82d3 into diffplug:main Dec 31, 2022
@nedtwigg
Copy link
Member

nedtwigg commented Jan 2, 2023

Released in plugin-gradle 6.12.1 and plugin-maven 2.29.0.

@laurentgo laurentgo deleted the laurentgo/git-ratchet-memory-fix branch January 9, 2023 18:55
benkard added a commit to benkard/mulkcms2 that referenced this pull request Apr 2, 2023
…1.0 (mulk/mulkcms2!12)

This MR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [com.diffplug.spotless:spotless-maven-plugin](https://github.com/diffplug/spotless) | build | minor | `2.30.0` -> `2.31.0` |

---

### Release Notes

<details>
<summary>diffplug/spotless</summary>

### [`v2.31.0`](https://github.com/diffplug/spotless/blob/HEAD/CHANGES.md#&#8203;2310---2022-11-24)

##### Added

-   `importOrder` now support groups of imports without blank lines ([#&#8203;1401](diffplug/spotless#1401))

##### Fixed

-   Don't treat `@Value` as a type annotation [#&#8203;1367](diffplug/spotless#1367)
-   Support `ktlint_disabled_rules` in `ktlint` 0.47.x [#&#8203;1378](diffplug/spotless#1378)
-   Share git repositories across projects when using ratchet ([#&#8203;1426](diffplug/spotless#1426))

##### Changes

-   Bump default `ktfmt` version to latest `0.40` -> `0.41` ([#&#8203;1340](diffplug/spotless#1340))
-   Bump default `scalafmt` version to latest `3.5.9` -> `3.6.1` ([#&#8203;1373](diffplug/spotless#1373))
-   Bump default `diktat` version to latest `1.2.3` -> `1.2.4.2` ([#&#8203;1393](diffplug/spotless#1393))
-   Bump default `palantir-java-format` version to latest `2.10` -> `2.28` ([#&#8203;1393](diffplug/spotless#1393))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4yNC4wIiwidXBkYXRlZEluVmVyIjoiMzQuMjQuMCJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants