Gradle plugin: avoid printing duplicate dependency paths#2188
Gradle plugin: avoid printing duplicate dependency paths#2188
Conversation
adce0c0 to
dc80a47
Compare
dc80a47 to
d38c8db
Compare
| } | ||
| } else { | ||
| stack.add(child); | ||
| recordDependencyPaths(output, stack, targetCoordinates, checkedCircularDependency); |
There was a problem hiding this comment.
Before this PR, the tree traversal was depth-first manner.
| // or transitive dependencies. | ||
| Set<ResolvedComponentResult> nodesDependOnTarget = new HashSet<>(); | ||
|
|
||
| while (!queue.isEmpty()) { |
There was a problem hiding this comment.
With this PR, the tree traversal is done by breath-first manner (level-order)
| } | ||
| } else { | ||
| ResolvedComponentResultNode childNode = new ResolvedComponentResultNode(child, node); | ||
| queue.add(childNode); |
There was a problem hiding this comment.
This is the core of the level-order traversal (breadth-first). The children of the node are added to the queue.
| ListMultimap<String, String> dependencyPaths = | ||
| groupCoordinatesToDependencyPaths(componentResult, targetCoordinates); |
There was a problem hiding this comment.
The caller of the method recordDependencyPaths, which is now groupCoordinatesToDependencyPaths is now much cleaner.
gradle-plugin/src/main/java/com/google/cloud/tools/dependencies/gradle/LinkageCheckTask.java
Show resolved
Hide resolved
|
This still causes my test case. Edit: it worked with |
| if (nodesDependOnTarget.contains(item)) { | ||
| // Do not show duplicate dependency paths. If we know that this node contains dependency | ||
| // having targetCoordinates in its direct/transitive dependencies, there is no need to | ||
| // print the dependency paths again. | ||
| String dependencyPath = node.pathFromRoot() + " (omitted for duplicate)"; | ||
| coordinatesToDependencyPaths.put(targetCoordinates, dependencyPath); |
There was a problem hiding this comment.
This is the main purpose of this PR.
| // Ensure the node closest to the root is printed | ||
| result.output.contains("g:test-123:0.1.0-SNAPSHOT / io.grpc:grpc-core:1.29.0") | ||
| // Ensure that the relationship between grpc-netty-shaded to grpc-core is only printed once | ||
| result.output.count("io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0") == 1 |
There was a problem hiding this comment.
This is the effect of this PR.
gradle-plugin/src/main/java/com/google/cloud/tools/dependencies/gradle/LinkageCheckTask.java
Show resolved
Hide resolved
elefeint
left a comment
There was a problem hiding this comment.
Looks great overall, but I'd love it if it were possible to split the BFS + dependency-extraction-and-validation logic into separate pieces. BFS part should be trivial to read, while the individual dependency methods would allow us to focus on only a particular piece at a time (like "enumerate all dependencies of this node" or "does this node cause a circular dependency"?
It's funny that you are actually writing complicated algorithmic code that we all got interviewed on! A rarity.
| if (componentResult.equals(other)) { | ||
| return true; | ||
| } | ||
| if (parent == null) { | ||
| return false; | ||
| } | ||
| return parent.hasParent(other); |
There was a problem hiding this comment.
You could also skip one extra recursive call by comparing the current node's parent to the passed-in parameter.
Something like this:
if (parent == null) {
return false;
} else if (parent == other) {
return true;
}
return parent.hasParent(other);
(this was a comment from the start of review, but now that I got to the end, I am no longer sure I had a point there) Would isDescendantOf() be more or less confusing as a method name?
There was a problem hiding this comment.
Updated to isDescendantOf.
| List<String> dependencyPathElementsReversed = new ArrayList<>(); | ||
| for (ResolvedComponentResultNode iter = this; iter != null; iter = iter.parent) { | ||
| dependencyPathElementsReversed.add(formatComponentResult(iter.componentResult)); | ||
| } | ||
| Collections.reverse(dependencyPathElementsReversed); |
There was a problem hiding this comment.
You can use ArrayDeque to avoid reversing the list -- it can be used as both stack and queue; you can insert like in stack (reverse) and retrieve like a queue (proper order). Good for both, readability and performance; it's amortized constant time for everything useful.
There was a problem hiding this comment.
That's a nice idea. Updating.
There was a problem hiding this comment.
Updated to use ArrayDequeue.
| ListMultimap<String, String> coordinatesToDependencyPaths = | ||
| MultimapBuilder.hashKeys().arrayListValues().build(); | ||
| // No need to print the same circular dependency multiple times | ||
| Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>(); |
There was a problem hiding this comment.
How about a name like seen or circularDependencyCandidates? Technically, at the point nodes are added to this set, they have not yet been checked.
There was a problem hiding this comment.
Actually an item is added to this collection only when it's a circular dependency.
There was a problem hiding this comment.
Added source code comment.
| Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>(); | ||
|
|
||
| for (String targetCoordinates : targetCoordinatesSet) { | ||
| // Queue of dependnecy nodes. Each node knows its parent. |
|
|
||
| // Mapping to omit duplicate dependency paths in the output. This is a mapping from Maven | ||
| // coordinates to dependency nodes that has the dependency of the coordinate in their direct | ||
| // or transitive dependencies. |
There was a problem hiding this comment.
This comment sounds like a javadoc for a helper method. Would it make sense to extract BFS for dependencies into some kind of findDependencies() helper?
There was a problem hiding this comment.
This comment was outdated. Updated as
// A set to omit duplicate dependency paths in the output. When a node is found to be in this
// set while traversing the graph, we do not need to check the children, because we know that
// the dependency paths from that node to the targetCoordinates are already added to
// coordinatesToDependencyPaths.
Set<ResolvedComponentResult> nodesDependOnTarget = new HashSet<>();
There was a problem hiding this comment.
I didn't think of a such findDependencies() that would improve code here. But newly added getDependencies should give more clarity on the code.
| * @param targetCoordinatesSet The Maven coordinates to check their dependency paths | ||
| */ | ||
| private ListMultimap<String, String> groupCoordinatesToDependencyPaths( | ||
| ResolvedComponentResult componentResult, Set<String> targetCoordinatesSet) { |
There was a problem hiding this comment.
Is componentResult expected to be the root of dependency tree? Then maybe we could name it this way. Or I could be confused...
There was a problem hiding this comment.
Updated to rootProject.
| } | ||
| } | ||
|
|
||
| if (nodesDependOnTarget.contains(item)) { |
There was a problem hiding this comment.
Should this if block go above the if (targetCoordinates.equals(coordinates)) {? What if a transitive dependency on targetCoordinates is found before the direct dependency? Or is it a "this should never happen in a gradle resolution" type of situation?
There was a problem hiding this comment.
If nodesDependOnTarget.contains(item) is true, then targetCoordinates (the coordinates of item) is not equal to coordinates.
What if a transitive dependency on targetCoordinates is found before the direct dependency?
Would you explain an example?
There was a problem hiding this comment.
I was thinking it's possible to have a duplicate if both if (targetCoordinates.equals(coordinates)) from ~10 lines above and this current if statement are both true.
There was a problem hiding this comment.
The 2 if-statements don't become true at the same time. I just updated this to the below to deliver a more clear meaning.
if (targetCoordinates.equals(coordinates)) {
...
} else if (nodesDependOnTarget.contains(item)) {
...
} else {
queue.addAll(getDependencies(node));
}
| String dependencyPath = node.pathFromRoot() + " (omitted for duplicate)"; | ||
| coordinatesToDependencyPaths.put(targetCoordinates, dependencyPath); | ||
|
|
||
| for (ResolvedComponentResultNode iter = node; iter != null; iter = iter.parent) { |
There was a problem hiding this comment.
Since these 3 lines turn up twice (for a direct and all transitive dependencies), could you extract them into a helper method? addParents() or nodesDependOnTarget.addAll(getAllParents()) or some such.
There was a problem hiding this comment.
Updated to rootToNode() and using addAll.
| if (dependencyResult instanceof ResolvedDependencyResult) { | ||
| ResolvedDependencyResult resolvedDependencyResult = | ||
| (ResolvedDependencyResult) dependencyResult; | ||
| ResolvedComponentResult child = resolvedDependencyResult.getSelected(); | ||
|
|
||
| if (node.hasParent(child)) { | ||
| // Circular dependency check | ||
| if (checkedCircularDependency.add(child)) { | ||
| getLogger() | ||
| .error( | ||
| "Circular dependency for: " | ||
| + resolvedDependencyResult | ||
| + "\n The stack is: " | ||
| + node.pathFromRoot()); | ||
| } |
There was a problem hiding this comment.
I'd try to extract this into a causesCircularDependency() or similarly named helper. Just generally, the overall BFS structure will be easier to read if the actually useful pieces of functionality were in helper methods.
There was a problem hiding this comment.
Extracted the bigger for-loop into getDependencies.
There was a problem hiding this comment.
Renamed getDependencies to findDependencies.
suztomo
left a comment
There was a problem hiding this comment.
Thank you for review. I'll update the code.
| List<String> dependencyPathElementsReversed = new ArrayList<>(); | ||
| for (ResolvedComponentResultNode iter = this; iter != null; iter = iter.parent) { | ||
| dependencyPathElementsReversed.add(formatComponentResult(iter.componentResult)); | ||
| } | ||
| Collections.reverse(dependencyPathElementsReversed); |
There was a problem hiding this comment.
That's a nice idea. Updating.
| ListMultimap<String, String> coordinatesToDependencyPaths = | ||
| MultimapBuilder.hashKeys().arrayListValues().build(); | ||
| // No need to print the same circular dependency multiple times | ||
| Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>(); |
There was a problem hiding this comment.
Actually an item is added to this collection only when it's a circular dependency.
| if (componentResult.equals(other)) { | ||
| return true; | ||
| } | ||
| if (parent == null) { | ||
| return false; | ||
| } | ||
| return parent.hasParent(other); |
There was a problem hiding this comment.
Updated to isDescendantOf.
| List<String> dependencyPathElementsReversed = new ArrayList<>(); | ||
| for (ResolvedComponentResultNode iter = this; iter != null; iter = iter.parent) { | ||
| dependencyPathElementsReversed.add(formatComponentResult(iter.componentResult)); | ||
| } | ||
| Collections.reverse(dependencyPathElementsReversed); |
There was a problem hiding this comment.
Updated to use ArrayDequeue.
| * @param targetCoordinatesSet The Maven coordinates to check their dependency paths | ||
| */ | ||
| private ListMultimap<String, String> groupCoordinatesToDependencyPaths( | ||
| ResolvedComponentResult componentResult, Set<String> targetCoordinatesSet) { |
There was a problem hiding this comment.
Updated to rootProject.
| ListMultimap<String, String> coordinatesToDependencyPaths = | ||
| MultimapBuilder.hashKeys().arrayListValues().build(); | ||
| // No need to print the same circular dependency multiple times | ||
| Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>(); |
There was a problem hiding this comment.
Added source code comment.
| } | ||
| } | ||
|
|
||
| if (nodesDependOnTarget.contains(item)) { |
There was a problem hiding this comment.
If nodesDependOnTarget.contains(item) is true, then targetCoordinates (the coordinates of item) is not equal to coordinates.
What if a transitive dependency on targetCoordinates is found before the direct dependency?
Would you explain an example?
| String dependencyPath = node.pathFromRoot() + " (omitted for duplicate)"; | ||
| coordinatesToDependencyPaths.put(targetCoordinates, dependencyPath); | ||
|
|
||
| for (ResolvedComponentResultNode iter = node; iter != null; iter = iter.parent) { |
There was a problem hiding this comment.
Updated to rootToNode() and using addAll.
| if (dependencyResult instanceof ResolvedDependencyResult) { | ||
| ResolvedDependencyResult resolvedDependencyResult = | ||
| (ResolvedDependencyResult) dependencyResult; | ||
| ResolvedComponentResult child = resolvedDependencyResult.getSelected(); | ||
|
|
||
| if (node.hasParent(child)) { | ||
| // Circular dependency check | ||
| if (checkedCircularDependency.add(child)) { | ||
| getLogger() | ||
| .error( | ||
| "Circular dependency for: " | ||
| + resolvedDependencyResult | ||
| + "\n The stack is: " | ||
| + node.pathFromRoot()); | ||
| } |
There was a problem hiding this comment.
Extracted the bigger for-loop into getDependencies.
| Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>(); | ||
|
|
||
| for (String targetCoordinates : targetCoordinatesSet) { | ||
| // Queue of dependnecy nodes. Each node knows its parent. |
|
|
||
| // Mapping to omit duplicate dependency paths in the output. This is a mapping from Maven | ||
| // coordinates to dependency nodes that has the dependency of the coordinate in their direct | ||
| // or transitive dependencies. |
There was a problem hiding this comment.
This comment was outdated. Updated as
// A set to omit duplicate dependency paths in the output. When a node is found to be in this
// set while traversing the graph, we do not need to check the children, because we know that
// the dependency paths from that node to the targetCoordinates are already added to
// coordinatesToDependencyPaths.
Set<ResolvedComponentResult> nodesDependOnTarget = new HashSet<>();
|
|
||
| // Mapping to omit duplicate dependency paths in the output. This is a mapping from Maven | ||
| // coordinates to dependency nodes that has the dependency of the coordinate in their direct | ||
| // or transitive dependencies. |
There was a problem hiding this comment.
I didn't think of a such findDependencies() that would improve code here. But newly added getDependencies should give more clarity on the code.
Avoid printing duplicate dependency paths. For a large dependency graph, the previous approach of not omitting duplicate paths would cause OutOfMemoryError.
Before this change
https://gist.github.com/suztomo/31a67085163a8a99a171cb22345a188b
Notice
com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0is appearing twice (1st and 5th lines).This was bad because the number of paths would become much larger. Suppose there are 5 paths from artifact A to artifact B, 3 paths from artifact B to artifact C, and 6 paths from C to D. Then the previous implementation would print 5 x 3 x 6 = 90 dependency paths.
After this PR
https://gist.github.com/suztomo/202c001c682238ee0fe922383d9a2e4d
io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0appears only once, and it now shows "omitted for duplicate" message.