Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting large collections and streams to strings can be more efficient #3123

Closed
etellman opened this issue Jul 25, 2023 · 3 comments
Closed

Comments

@etellman
Copy link
Contributor

Describe the bug
Converting a large collection with a time-consuming string conversion for each element takes longer than it needs to because AssertJ converts all the elements in the collection to strings but only uses the first and last 500 strings.

This is related to #3065, but for Streams and Iterables instead of arrays.

  • assertj core version: 3.25.0-SNAPSHOT
  • java version: openjdk 17.0.7
  • test framework version: JUnit5
  • os (if relevant): OSX, but probably not relevant

Test case reproducing the bug

Converting this collection a string takes 27 minutes on my MacBook:

    int elementsPerArray = 1000;
    List<int[]> numbers = new ArrayList<>();
    for (int i = 0; i < 1 << 20; i++) {
      numbers.add(new int[elementsPerArray]);
    }

It's also possible to get OOM error with more elements in the list and smaller arrays in each element.

A possible fix would be to narrow the stream down to the elements are actually needed and then just convert those to strings. I think this can be done by changing StandardRepresentation.representElements() to:

  private List<String> representElements(Stream<?> elements, String start, String end, String elementSeparator,
                                         String indentation, Object root) {
    // new
   final PrintingAccumulator accumulator = new PrintingAccumulator(maxElementsForPrinting);
   elements.forEach(accumulator::add);

   // same, but uses narrowed down list of elements from the accumulator instead of the original stream
   return accumulator.toList().stream().map(element -> safeStringOf(element, start, end, elementSeparator, indentation, root))
                  .collect(toList());
  }

PrintingAccumulator only retains the elements that will be used in the final string conversion and discards everything in the middle of the collection.

I gave this a quick try and it produced the correct string in around 2 seconds. I can clean this up a bit and attach a PR with the proposed solution.

@etellman
Copy link
Contributor Author

I just noticed this very trivial thing to fix...I failed to follow the test class name convention in my last PR, leaving out the underscore before 'Test'. You fixed the class names before committing, but the file names were still off, so this PR makes the file names match the class names: #3124

I would have attached to the previous issue, but that one is already closed and it didn't seem worth re-opening just for this.

etellman added a commit to etellman/assertj that referenced this issue Jul 27, 2023
Instead of converting all the elements to strings and then narrowing the
strings down to the head and tail of the list, narrow the elements down
first.

see: assertj#3123
@etellman
Copy link
Contributor Author

this should fix it: #3123

@etellman
Copy link
Contributor Author

etellman commented Aug 1, 2023

I created a new PR with a much smaller test, since the previous one causes OOM for the automated build system: #3133

This one takes about 10 seconds with the original implementation and around 250 ms with the new approach, on my laptop.

I also added a timeout, since the point of the test is that the code produces the correct result in a reasonable time, and not just that it produces the correct result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant