Skip to content

Gradle plugin: avoid printing duplicate dependency paths#2188

Merged
suztomo merged 8 commits intomasterfrom
path_explosion
Aug 17, 2021
Merged

Gradle plugin: avoid printing duplicate dependency paths#2188
suztomo merged 8 commits intomasterfrom
path_explosion

Conversation

@suztomo
Copy link
Contributor

@suztomo suztomo commented Aug 13, 2021

Avoid printing duplicate dependency paths. For a large dependency graph, the previous approach of not omitting duplicate paths would cause OutOfMemoryError.

Before this change

https://gist.github.com/suztomo/31a67085163a8a99a171cb22345a188b

io.grpc:grpc-core:1.29.0 is at:
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-grpclb:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-grpclb:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / com.google.cloud:google-cloud-logging:1.101.1 / io.grpc:grpc-core:1.29.0
  :gradle-project:unspecified / io.grpc:grpc-core:1.29.0

Notice com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0 is appearing twice (1st and 5th lines).

This was bad because the number of paths would become much larger. Suppose there are 5 paths from artifact A to artifact B, 3 paths from artifact B to artifact C, and 6 paths from C to D. Then the previous implementation would print 5 x 3 x 6 = 90 dependency paths.

After this PR

https://gist.github.com/suztomo/202c001c682238ee0fe922383d9a2e4d

io.grpc:grpc-core:1.29.0 is at:
  g:test-123:0.1.0-SNAPSHOT / io.grpc:grpc-core:1.29.0
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / io.grpc:grpc-core:1.29.0
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / io.grpc:grpc-core:1.29.0
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-core:1.29.0
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-netty-shaded:1.28.1 (omitted for duplicate)
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-netty-shaded:1.28.1 (omitted for duplicate)
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / com.google.cloud:google-cloud-core-grpc:1.93.4 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 (omitted for duplicate)
  g:test-123:0.1.0-SNAPSHOT / com.google.cloud:google-cloud-logging:1.101.1 / com.google.api:gax-grpc:1.56.0 / io.grpc:grpc-alts:1.28.1 / io.grpc:grpc-grpclb:1.28.1 / io.grpc:grpc-core:1.29.0

io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0 appears only once, and it now shows "omitted for duplicate" message.

@suztomo suztomo changed the title Level-order to traverse Gradle dependency tree Gradle plugin: avoid printing duplicate dependency paths Aug 13, 2021
}
} else {
stack.add(child);
recordDependencyPaths(output, stack, targetCoordinates, checkedCircularDependency);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, the tree traversal was depth-first manner.

// or transitive dependencies.
Set<ResolvedComponentResult> nodesDependOnTarget = new HashSet<>();

while (!queue.isEmpty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, the tree traversal is done by breath-first manner (level-order)

}
} else {
ResolvedComponentResultNode childNode = new ResolvedComponentResultNode(child, node);
queue.add(childNode);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the level-order traversal (breadth-first). The children of the node are added to the queue.

Comment on lines +373 to +374
ListMultimap<String, String> dependencyPaths =
groupCoordinatesToDependencyPaths(componentResult, targetCoordinates);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller of the method recordDependencyPaths, which is now groupCoordinatesToDependencyPaths is now much cleaner.

@suztomo
Copy link
Contributor Author

suztomo commented Aug 13, 2021

This still causes my test case.

Edit: it worked with org.gradle.jvmargs=-Xmx4g in gradle.properties

@suztomo suztomo closed this Aug 13, 2021
@suztomo suztomo reopened this Aug 13, 2021
@suztomo suztomo requested a review from a team August 13, 2021 21:29
Comment on lines +312 to +317
if (nodesDependOnTarget.contains(item)) {
// Do not show duplicate dependency paths. If we know that this node contains dependency
// having targetCoordinates in its direct/transitive dependencies, there is no need to
// print the dependency paths again.
String dependencyPath = node.pathFromRoot() + " (omitted for duplicate)";
coordinatesToDependencyPaths.put(targetCoordinates, dependencyPath);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main purpose of this PR.

// Ensure the node closest to the root is printed
result.output.contains("g:test-123:0.1.0-SNAPSHOT / io.grpc:grpc-core:1.29.0")
// Ensure that the relationship between grpc-netty-shaded to grpc-core is only printed once
result.output.count("io.grpc:grpc-netty-shaded:1.28.1 / io.grpc:grpc-core:1.29.0") == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the effect of this PR.

Copy link
Contributor

@elefeint elefeint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, but I'd love it if it were possible to split the BFS + dependency-extraction-and-validation logic into separate pieces. BFS part should be trivial to read, while the individual dependency methods would allow us to focus on only a particular piece at a time (like "enumerate all dependencies of this node" or "does this node cause a circular dependency"?

It's funny that you are actually writing complicated algorithmic code that we all got interviewed on! A rarity.

Comment on lines +218 to +224
if (componentResult.equals(other)) {
return true;
}
if (parent == null) {
return false;
}
return parent.hasParent(other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also skip one extra recursive call by comparing the current node's parent to the passed-in parameter.

Something like this:

if (parent == null) {
  return false;
} else if (parent == other) {
 return true;
}
return parent.hasParent(other);

(this was a comment from the start of review, but now that I got to the end, I am no longer sure I had a point there) Would isDescendantOf() be more or less confusing as a method name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to isDescendantOf.

Comment on lines +228 to +232
List<String> dependencyPathElementsReversed = new ArrayList<>();
for (ResolvedComponentResultNode iter = this; iter != null; iter = iter.parent) {
dependencyPathElementsReversed.add(formatComponentResult(iter.componentResult));
}
Collections.reverse(dependencyPathElementsReversed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use ArrayDeque to avoid reversing the list -- it can be used as both stack and queue; you can insert like in stack (reverse) and retrieve like a queue (proper order). Good for both, readability and performance; it's amortized constant time for everything useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea. Updating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use ArrayDequeue.

ListMultimap<String, String> coordinatesToDependencyPaths =
MultimapBuilder.hashKeys().arrayListValues().build();
// No need to print the same circular dependency multiple times
Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a name like seen or circularDependencyCandidates? Technically, at the point nodes are added to this set, they have not yet been checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually an item is added to this collection only when it's a circular dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added source code comment.

Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>();

for (String targetCoordinates : targetCoordinatesSet) {
// Queue of dependnecy nodes. Each node knows its parent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/dependnecy/dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


// Mapping to omit duplicate dependency paths in the output. This is a mapping from Maven
// coordinates to dependency nodes that has the dependency of the coordinate in their direct
// or transitive dependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment sounds like a javadoc for a helper method. Would it make sense to extract BFS for dependencies into some kind of findDependencies() helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was outdated. Updated as

      // A set to omit duplicate dependency paths in the output. When a node is found to be in this
      // set while traversing the graph, we do not need to check the children, because we know that
      // the dependency paths from that node to the targetCoordinates are already added to
      // coordinatesToDependencyPaths.
      Set<ResolvedComponentResult> nodesDependOnTarget = new HashSet<>();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of a such findDependencies() that would improve code here. But newly added getDependencies should give more clarity on the code.

* @param targetCoordinatesSet The Maven coordinates to check their dependency paths
*/
private ListMultimap<String, String> groupCoordinatesToDependencyPaths(
ResolvedComponentResult componentResult, Set<String> targetCoordinatesSet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is componentResult expected to be the root of dependency tree? Then maybe we could name it this way. Or I could be confused...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to rootProject.

}
}

if (nodesDependOnTarget.contains(item)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this if block go above the if (targetCoordinates.equals(coordinates)) {? What if a transitive dependency on targetCoordinates is found before the direct dependency? Or is it a "this should never happen in a gradle resolution" type of situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nodesDependOnTarget.contains(item) is true, then targetCoordinates (the coordinates of item) is not equal to coordinates.

What if a transitive dependency on targetCoordinates is found before the direct dependency?

Would you explain an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it's possible to have a duplicate if both if (targetCoordinates.equals(coordinates)) from ~10 lines above and this current if statement are both true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 if-statements don't become true at the same time. I just updated this to the below to deliver a more clear meaning.

        if (targetCoordinates.equals(coordinates)) {
          ...
        } else if (nodesDependOnTarget.contains(item)) {
          ...
        } else {
          queue.addAll(getDependencies(node));
        }

String dependencyPath = node.pathFromRoot() + " (omitted for duplicate)";
coordinatesToDependencyPaths.put(targetCoordinates, dependencyPath);

for (ResolvedComponentResultNode iter = node; iter != null; iter = iter.parent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these 3 lines turn up twice (for a direct and all transitive dependencies), could you extract them into a helper method? addParents() or nodesDependOnTarget.addAll(getAllParents()) or some such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to rootToNode() and using addAll.

Comment on lines +322 to +336
if (dependencyResult instanceof ResolvedDependencyResult) {
ResolvedDependencyResult resolvedDependencyResult =
(ResolvedDependencyResult) dependencyResult;
ResolvedComponentResult child = resolvedDependencyResult.getSelected();

if (node.hasParent(child)) {
// Circular dependency check
if (checkedCircularDependency.add(child)) {
getLogger()
.error(
"Circular dependency for: "
+ resolvedDependencyResult
+ "\n The stack is: "
+ node.pathFromRoot());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to extract this into a causesCircularDependency() or similarly named helper. Just generally, the overall BFS structure will be easier to read if the actually useful pieces of functionality were in helper methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted the bigger for-loop into getDependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed getDependencies to findDependencies.

Copy link
Contributor Author

@suztomo suztomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review. I'll update the code.

Comment on lines +228 to +232
List<String> dependencyPathElementsReversed = new ArrayList<>();
for (ResolvedComponentResultNode iter = this; iter != null; iter = iter.parent) {
dependencyPathElementsReversed.add(formatComponentResult(iter.componentResult));
}
Collections.reverse(dependencyPathElementsReversed);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea. Updating.

ListMultimap<String, String> coordinatesToDependencyPaths =
MultimapBuilder.hashKeys().arrayListValues().build();
// No need to print the same circular dependency multiple times
Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually an item is added to this collection only when it's a circular dependency.

Copy link
Contributor Author

@suztomo suztomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elefeint PTAL.

Comment on lines +218 to +224
if (componentResult.equals(other)) {
return true;
}
if (parent == null) {
return false;
}
return parent.hasParent(other);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to isDescendantOf.

Comment on lines +228 to +232
List<String> dependencyPathElementsReversed = new ArrayList<>();
for (ResolvedComponentResultNode iter = this; iter != null; iter = iter.parent) {
dependencyPathElementsReversed.add(formatComponentResult(iter.componentResult));
}
Collections.reverse(dependencyPathElementsReversed);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use ArrayDequeue.

* @param targetCoordinatesSet The Maven coordinates to check their dependency paths
*/
private ListMultimap<String, String> groupCoordinatesToDependencyPaths(
ResolvedComponentResult componentResult, Set<String> targetCoordinatesSet) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to rootProject.

ListMultimap<String, String> coordinatesToDependencyPaths =
MultimapBuilder.hashKeys().arrayListValues().build();
// No need to print the same circular dependency multiple times
Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added source code comment.

}
}

if (nodesDependOnTarget.contains(item)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nodesDependOnTarget.contains(item) is true, then targetCoordinates (the coordinates of item) is not equal to coordinates.

What if a transitive dependency on targetCoordinates is found before the direct dependency?

Would you explain an example?

String dependencyPath = node.pathFromRoot() + " (omitted for duplicate)";
coordinatesToDependencyPaths.put(targetCoordinates, dependencyPath);

for (ResolvedComponentResultNode iter = node; iter != null; iter = iter.parent) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to rootToNode() and using addAll.

Comment on lines +322 to +336
if (dependencyResult instanceof ResolvedDependencyResult) {
ResolvedDependencyResult resolvedDependencyResult =
(ResolvedDependencyResult) dependencyResult;
ResolvedComponentResult child = resolvedDependencyResult.getSelected();

if (node.hasParent(child)) {
// Circular dependency check
if (checkedCircularDependency.add(child)) {
getLogger()
.error(
"Circular dependency for: "
+ resolvedDependencyResult
+ "\n The stack is: "
+ node.pathFromRoot());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted the bigger for-loop into getDependencies.

Set<ResolvedComponentResult> checkedCircularDependency = new HashSet<>();

for (String targetCoordinates : targetCoordinatesSet) {
// Queue of dependnecy nodes. Each node knows its parent.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


// Mapping to omit duplicate dependency paths in the output. This is a mapping from Maven
// coordinates to dependency nodes that has the dependency of the coordinate in their direct
// or transitive dependencies.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was outdated. Updated as

      // A set to omit duplicate dependency paths in the output. When a node is found to be in this
      // set while traversing the graph, we do not need to check the children, because we know that
      // the dependency paths from that node to the targetCoordinates are already added to
      // coordinatesToDependencyPaths.
      Set<ResolvedComponentResult> nodesDependOnTarget = new HashSet<>();


// Mapping to omit duplicate dependency paths in the output. This is a mapping from Maven
// coordinates to dependency nodes that has the dependency of the coordinate in their direct
// or transitive dependencies.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of a such findDependencies() that would improve code here. But newly added getDependencies should give more clarity on the code.

Copy link
Contributor

@elefeint elefeint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@suztomo suztomo merged commit 062331a into master Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants