Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New html formatter produces very large report files #62

Closed
ben-nc2 opened this issue Oct 30, 2020 · 23 comments
Closed

New html formatter produces very large report files #62

ben-nc2 opened this issue Oct 30, 2020 · 23 comments

Comments

@ben-nc2
Copy link

ben-nc2 commented Oct 30, 2020

The new html formatter produces very large report files. Please consider reducing the size or providing the old html formatter as an option to switch back to for 6.x.x

We have been using cucumber 4 for a number of years and recently made the changes to first upgrade to 5.7.0 and then on to 6.8.1. Our tests are running fine and although the new html formatter is much easier to read and use for debugging of test failures for our build (which currently runs 423 scenarios that invoke selenium for ui testing).

Our 423 scenarios produce an html report file of 314MB which when we are storing this test artifact uses up a lot of space fast.

@ben-nc2
Copy link
Author

ben-nc2 commented Nov 3, 2020

Currently this is a blocker to use moving to 6.x as we need the test reports with screenshots for helping debug build failures but don't want to have to handle the large report size. Any suggestions for a workaround also welcome.

@aslakhellesoy
Copy link

Hi @ben-nc2 - we're aware that the large size can be a problem.

How big were your HTML reports prior to upgrading to v6?

Screenshot sizes should be identical between v5 and v6, so the increase in size might come from elsewhere. I think it might have to do with the large number of duplicated stepDefinition messages on Cucumber-JVM, as well as the inclusion of more JavaScript.

Some things we can try to minimize the size:

More space-efficient message format

Some options:

  • Use binary representation and use a more compact protobuf library
  • Keep using NDJSON, but don't include messages.js + the protobuf library in the bundle
  • Use msgpack

More space efficient JavaScript

  • Tree shaking
  • Minification

Fewer messages

  • Cucumber-JVM outputs all the step definition messages once for every single scenario. It only needs to be outputted once.

@ben-nc2
Copy link
Author

ben-nc2 commented Nov 4, 2020

@aslakhellesoy running with 5.7.0 html reports were 7MB in size where all tests have passed. The same tests running with 6.8.1 create html reports of 310MB in size. If it would be helpful to share both I can check with my employers if the information is sensitive or not.

@aslakhellesoy
Copy link

Hi @ben-nc2 having some example reports would be very helpful! If you're allowed to share, you can email them to aslak hellesoy smartbear com (add dots and @)

@ben-nc2
Copy link
Author

ben-nc2 commented Nov 5, 2020

I hope the example reports I emailed help you find ways to reduce the overall size of the report files.

I'm very willing to test any alpha/beta releases against the same scenarios to provide a direct comparison if that is at all helpful.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Nov 5, 2020

I can think of a few options:

a. Fix Cucumber JVM to only emit step definition once.
b. Externalize attachments when creating the html report
c. Use a different encoding then Base64 to reduce the overhead and make zipping more efficient
d. Filter out messages that are not used by the html report.

None of these are ideal.

a. Is a significant amount of work.
b. Removes the SPA nature.
c. Assumes the attachments are the problem

However after running a suite of ~550 scenarios without any screenshots:

$ ls -l --block-size=M cucumber.ndjson
142M nov  5 22:48 cucumber.ndjson
$ cat cucumber.ndjson | cut -d \" -f 2 | sort | uniq -c 
   8934 attachment
    112 gherkinDocument
  10584 hook
      1 meta
    469 pickle
    112 source
 380583 stepDefinition
    441 testCase
    441 testCaseFinished
    441 testCaseStarted
      1 testRunFinished
      1 testRunStarted
  11399 testStepFinished
  11399 testStepStarted
$ sed '/${"stepDefinition/d' cucumber.ndjson > stripped.ndjson
$ ls -l --block-size=M stripped.ndjson
-rw-rw-r-- 1 mpkorstanje mpkorstanje  15M nov  5 22:31 stripped.ndjson

So originally the file was 142M and after removing stepDefinition messages only 15M remains. So that removes about 90%. So the attachments are not the problem.

While d. requires that the MessagesToHtmlWriter knows the implementation details of cucumber/react it would be a pretty significant optimization overall at this point in time. If the html report does use the step definitions in the future then we have the same problem back though.

@aslakhellesoy for the short term could do something like cucumber/common#1236. And it would be even better if react could output a list of messages it consumes because then the filter could be generated in all implementations.

@mpkorstanje
Copy link
Contributor

@ben-nc2 if you don't mind building snapshots you could add the following to the HtmlFormatter in cucumber-jvm.

    private void write(Envelope event) {
        if (event.hasStepDefinition() || event.hasHook()) {
            return;
        }

@aslakhellesoy
Copy link

a. Is a significant amount of work.

Why do you think it's a significant amount of work @mpkorstanje?

I think I understand why we need a new instance of each step definition for each scenario. Can't we emit step definition messages once without changing that?

@aslakhellesoy
Copy link

@ben-nc2 if you don't mind building snapshots you could add the following to the HtmlFormatter in cucumber-jvm.

    private void write(Envelope event) {
        if (event.hasStepDefinition() || event.hasHook()) {
            return;
        }

I see you have proposed this over in cucumber/common#1236 - I think we should fix this differently.

@mpkorstanje
Copy link
Contributor

Fixing this in Cucumber JVM is a problem because the glue is currently not immutable. So there must be a distinct set for each runner/thread.

Additionally because parameter types influence the the regexs in the glue, and because Cucumber JVM supports scenario scoped glue we are forced to use a parameter type registry per scenario. This in turn means we generate and new set of regexs for each scenario. And thus emit a new set of step definitions.

Because each step definition has a unique identifier all must be emitted. Otherwise it's not possible to link back.

Fixing this is possible but it requires a redesign of how glue is discovered and shared between threads.

Once that is in place it should be fairly easy to determine if a step definition can be reused or not.

@DougMaisells
Copy link

I, for one, like the new test summary output at the top of the formatted html output so please keep that. Anyone know how to get pretty_inspect method to print output to the html file. That is no longer working and instead prints a #.

@mpkorstanje
Copy link
Contributor

Cucumber JVM 6.9.0 has been released and will be available in Maven Central soonish. This includes a workaround to reduce the size of the html report.

@ben-nc2
Copy link
Author

ben-nc2 commented Nov 17, 2020

Thanks guys, have got it running locally and it does make a sizeable difference in our reports.

@stale
Copy link

stale bot commented Sep 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two months if no further activity occurs.

@epragtbeamtree
Copy link

Our current HTML reports are around 10MB, but they mostly contain ignored tests, since we execute every Cucumber test separately (so we can parallelise them easier on our build servers). This results in 100s of 10MB files, with mostly ignored information.

Is there a way to not have these ignored tests in the HTML output, so we can make these reports significantly smaller?

@aurelien-reeves
Copy link
Contributor

Did we already consider the possibility to optionally not embed attachments data into the formatters but storing those as dedicated file beside the report file?

Or do we prefer let users managing that option by their own?

@mattwynne mattwynne transferred this issue from cucumber/common Apr 1, 2022
@mattwynne
Copy link
Member

It seems to me that a key problem with the report size is that the overall architecture currently assumes that all the messages have to be on the browser page for the HTML to be rendered.

I realise that this makes the formatter easy to install (since you don't need a JavaScript runtime), but could we consider changing the way this all hangs together so that the rendering happens outside of the browser, and the generated HTML is just a static page?

@luke-hill
Copy link
Contributor

Given the original issue has been fixed. I propose we close this and if future issues come up (In the years since the original report there have been a lot of major releases of the formatter / flavours), then we have better reproduction scenarios

@mattwynne
Copy link
Member

I still think we have the same underlying architectural problem with, at least in cucumber-ruby and cucumber-js, still sending all the messages to the browser for rendering, don't we?

@davidjgoss
Copy link
Contributor

davidjgoss commented Dec 9, 2023

I still think we have the same underlying architectural problem with, at least in cucumber-ruby and cucumber-js, still sending all the messages to the browser for rendering, don't we?

That situation is unchanged since this was raised, yeah. I think for cucumber-jvm users, omitting the step definition messages took most of the edge off.

[earlier] but could we consider changing the way this all hangs together so that the rendering happens outside of the browser, and the generated HTML is just a static page?

I don't think that would meaningfully change the story - either way, all of the relevant data is still inside of the generated HTML file, whether it's in JSON or HTML form. Externalising the data would be one approach, but I think the report being a self-contained file is too useful to lose. Maybe this a way Cucumber Reports could differentiate itself, if we develop it a bit more.

For the formatter, I think effort would be well-spent trying to reduce the volume of data we need for a useful report and reducing the size of the generated file. A couple of thoughts:

  1. As you hinted before, cucumber-jvm has some code to filter out messages that the formatter doesn't actually need - we could apply this to all languages (and actually shift it left into this library, ideally) - this won't move the needle much but it's worth doing
  2. With Proposal: Switch to a test case-centric model #283, I think we could omit gherkin documents and sources, and even all pickles except those actually realised as test cases
  3. We should also work to ensure the baseline size is as small as possible i.e. the bundle size of the React app (BTW, this was just cut in half by Replace search engine and improve UX react-components#337).

I'm happy to raise new issues for 1 and 3 and work on these.

@mattwynne
Copy link
Member

I don't think that would meaningfully change the story - either way, all of the relevant data is still inside of the generated HTML file, whether it's in JSON or HTML form.

My sense was that, by carrying all the messages into the document, we have a lot of irrelevant data as well as the relevant stuff.

I hear you that keeping the data inside the file offers interesting options, and the opposite (rendering React compontents outside the page) would be complicated to implement in non-JS cucumbers.

So if we accept that we will have some kind of message data in the page, we:

  • re-imagine the data structure of messages passed into the page in a more consumer-driven way, since a lot of the messages in the overall "message protocol" are about execution rather than for rendering results. They're also very de-normalized which can make consuming them awkward. Filtering out some of the messages is one way, but there might be other things we could do, such as adding another results-focussed protocol on top, and building a thing that can transform a stream of existing messages into a results-focussed stream.
  • compress the data somehow, maybe using https://developer.mozilla.org/en-US/docs/Web/API/Compression_Streams_API?

@davidjgoss
Copy link
Contributor

re-imagine the data structure of messages passed into the page in a more consumer-driven way

I think this is a good thread to pull on! Again with #283, we could build new components that define their own data structures for props rather than working directly with messages. Then, we can see what that conversion from messages looks like, how much smaller it makes the data, and whether we can shift that left from the React app to the runtime formatter. What do you think?

@mattwynne
Copy link
Member

mattwynne commented Dec 10, 2023

we could build new components that define their own data structures for props rather than working directly with messages. Then, we can see what that conversion from messages looks like, how much smaller it makes the data, and whether we can shift that left from the React app to the runtime formatter. What do you think?

💯

The current protocol, for me, covers three quite distinct sub-domains:

  • parsing and compiling a test suite from gherkin documents and into test cases
  • automating and executing that test suite (using step definitions)
  • reporting results

Although we need all three to do any meaningful work in a Cucumber, I think it's worth reminding people of the context that the original vision of that protocol was to allow for a shared binary (cucumber-engine) that could reduce the amount of duplication across the different implementations. We abandoned that effort after learning that deploying binaries was challenging in some environments (although I sometimes wonder whether Rust/WASM could make this a more viable idea again). So the context has changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants