Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM while collecting transitive dependencies #2580

Closed
wants to merge 1 commit into from
Closed

Conversation

MIOB
Copy link
Contributor

@MIOB MIOB commented Nov 11, 2022

Hi,

I have faced an issue with coursier in my project. I tried to fetch our company's internal artifacts with the '--json-output-file` flag, but the coursier failed with OOM. Actually, the problem arose quite a long ago but giving it ~ 50Gb heap solved the issue (actually building JSON report took about 10 minutes with such settings, but it was acceptable), so I preferred to do nothing. Now the artifact tree has become ever more complex and this does not work anymore.

Unfortunately, I can't provide a reproducible example because mentioned artifacts are part of our closed-source project and are available only in an internal repo.

I have made some investigation and have found that the issue is caused by building a JSON report, more specifically it is in calculating all transitive dependencies of each particular artifact. Imagine that you have artifact A which depends on artifacts B and C. And artifact B depends on artifact C too. In order to calculate all transitive dependencies of A the current algorithm first goes to B and tries to calculate all transitive dependencies of B, then it goes to C (because B depends on C) and does the same procedure. Then it goes to C (because A depends on C) and does the same procedure for C again instead of using the already calculated value. In the usual artifacts tree, it is not a problem because usually, you do not have a lot of complex dependencies with a lot of dependents. But in our project, we have, for example, some util artifact which has about ~100 non-unique dependencies, and almost any artifact in the project depends on util. Because of that, the current algorithm visited all ~100 utils' dependencies hundreds of thousands of times...

I'm not a scala developer and not good at functional programming (really, I have no idea what Cofree is and how it is supposed to work :) ), so I was not able to fix the issue without rewriting functional code to imperative style. Thus, I just replaced the current algorithm with plain old DFS and it fixed the issue (the JSON report now is built with the default memory setting for about 4 seconds). I do not expect my code to be merged into the master (I created PR just for highlighting the problematic code) because I believe there should be a much more elegant solution in a scala-way functional style. But I would appreciate it if you find time to fix the issue.

Thanks in advance!

@mzuehlke
Copy link

The problem of the exzessive memory usage should be fixed with a recently merged PR #2533.
Have you tried a new version ?

@MIOB
Copy link
Contributor Author

MIOB commented Nov 11, 2022

Yes, I reproduced the problem yesterday in master.

@alexarchambault
Copy link
Member

alexarchambault commented Nov 14, 2022

Thanks @MIOB!

I opened #2588, to make the code more "functional", and fix formatting, as I can't push on the branch of this PR.

The CI errors look legit. The two failing tests can be run locally with

$ ./mill -i -w cli.test "coursier.cli.util.JsonReportTests.JsonReport should prevent walking a tree with cycles"
$ ./mill -i -w cli.test "coursier.cli.util.JsonReportTests.JsonReport should prevent walking a tree in which a dependency depends on itself"

It seems it has to see with cycles. (Either test fixtures that need to be updated, or cycles handling that need to be slightly tweaked.)

@nkoroste
Copy link

+1 I'm facing the same exact issue at my company. Even though #2533 significantly helped, I still see the issue on v2.1.0-RC2

@nkoroste
Copy link

nkoroste commented Apr 3, 2023

Any updates on this? @alexarchambault do you think it's hard to fix those tests? I just tried your change in our repo and it not only that it fixed OOM but also improved execution time from 15 min to like 20 seconds

@mauriciogg
Copy link
Contributor

First time seeing scala so I'm not sure what Im I doing, but isn't the fix just avoid traversing seen nodes?
With the following changes all tests (./mill -i -w cli.test) pass:

private def transitiveOf[A: Eq: Show](
   elem: A,
   fetchChildren: A => Seq[A]
 ): Eval[Map[A, Set[String]]] = {
   val knownElems = mutable.HashMap[String, mutable.Set[String]]()
   def collectDeps[A: Eq: Show](
     elem: A,
     fetchChildren: A => Seq[A],
     seen: mutable.Set[String]
   ): mutable.Set[String] = {
     val key = elem.show
     if (seen.contains(key)) {
       return seen
     }
     seen.add(key)
     knownElems.get(key).getOrElse {
       val deps = mutable.HashSet[String]()
       knownElems.put(key, deps)
       for (child <- fetchChildren(elem)) {
         if (!seen.contains(child.show)) {
           deps.add(child.show)
           val childDeps = collectDeps(child, fetchChildren, seen)
           for (dep <- childDeps)
             deps.add(dep)
         }
       }
       deps
     }
   }
   val result = collectDeps(elem, fetchChildren, mutable.HashSet[String]()).toSet[String]
   Eval.now(Map(elem -> result))
 }

alexarchambault added a commit to alexarchambault/coursier that referenced this pull request Apr 17, 2023
Co-Authored-By: Mauricio Galindo <mauriciogg@users.noreply.github.com>
alexarchambault added a commit that referenced this pull request Apr 18, 2023
* Redone transitive deps collection for json report

* Adjust code style and formatting

* Add mauriciogg's suggestion from #2580

---------

Co-authored-by: obidin <mr.santak@gmail.com>
Co-authored-by: Mauricio Galindo <mauriciogg@users.noreply.github.com>
@alexarchambault
Copy link
Member

I took and tweaked a bit @mauriciogg's suggestion in #2588 (thanks for it, @mauriciogg!), that's just been merged. So closing this one, thanks for having opened it, @MIOB!

mauriciogg pushed a commit to mauriciogg/rules_jvm_external that referenced this pull request Jun 21, 2023
This includes the official fix for OOM issues we were seeing
coursier/coursier#2580
which we had patched internally here
https://github.sc-corp.net/Snapchat/coursier/pull/1/files
And has been merged
coursier/coursier#2588

see https://jira.sc-corp.net/browse/BUILD-4230
---
Automatic squash commit from https://github.sc-corp.net/Snapchat/gcs_bazel_rules_jvm_external/pull/52
Cooled by mgalindo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants