New html formatter produces very large report files #62

ben-nc2 · 2020-10-30T10:20:24Z

The new html formatter produces very large report files. Please consider reducing the size or providing the old html formatter as an option to switch back to for 6.x.x

We have been using cucumber 4 for a number of years and recently made the changes to first upgrade to 5.7.0 and then on to 6.8.1. Our tests are running fine and although the new html formatter is much easier to read and use for debugging of test failures for our build (which currently runs 423 scenarios that invoke selenium for ui testing).

Our 423 scenarios produce an html report file of 314MB which when we are storing this test artifact uses up a lot of space fast.

ben-nc2 · 2020-11-03T14:36:20Z

Currently this is a blocker to use moving to 6.x as we need the test reports with screenshots for helping debug build failures but don't want to have to handle the large report size. Any suggestions for a workaround also welcome.

aslakhellesoy · 2020-11-04T10:22:47Z

Hi @ben-nc2 - we're aware that the large size can be a problem.

How big were your HTML reports prior to upgrading to v6?

Screenshot sizes should be identical between v5 and v6, so the increase in size might come from elsewhere. I think it might have to do with the large number of duplicated stepDefinition messages on Cucumber-JVM, as well as the inclusion of more JavaScript.

Some things we can try to minimize the size:

More space-efficient message format

Some options:

Use binary representation and use a more compact protobuf library
Keep using NDJSON, but don't include messages.js + the protobuf library in the bundle
Use msgpack

More space efficient JavaScript

Tree shaking
Minification

Fewer messages

Cucumber-JVM outputs all the step definition messages once for every single scenario. It only needs to be outputted once.

ben-nc2 · 2020-11-04T10:44:54Z

@aslakhellesoy running with 5.7.0 html reports were 7MB in size where all tests have passed. The same tests running with 6.8.1 create html reports of 310MB in size. If it would be helpful to share both I can check with my employers if the information is sensitive or not.

aslakhellesoy · 2020-11-04T12:31:17Z

Hi @ben-nc2 having some example reports would be very helpful! If you're allowed to share, you can email them to aslak hellesoy smartbear com (add dots and @)

ben-nc2 · 2020-11-05T09:23:24Z

I hope the example reports I emailed help you find ways to reduce the overall size of the report files.

I'm very willing to test any alpha/beta releases against the same scenarios to provide a direct comparison if that is at all helpful.

mpkorstanje · 2020-11-05T22:32:49Z

I can think of a few options:

a. Fix Cucumber JVM to only emit step definition once.
b. Externalize attachments when creating the html report
c. Use a different encoding then Base64 to reduce the overhead and make zipping more efficient
d. Filter out messages that are not used by the html report.

None of these are ideal.

a. Is a significant amount of work.
b. Removes the SPA nature.
c. Assumes the attachments are the problem

However after running a suite of ~550 scenarios without any screenshots:

$ ls -l --block-size=M cucumber.ndjson
142M nov  5 22:48 cucumber.ndjson
$ cat cucumber.ndjson | cut -d \" -f 2 | sort | uniq -c 
   8934 attachment
    112 gherkinDocument
  10584 hook
      1 meta
    469 pickle
    112 source
 380583 stepDefinition
    441 testCase
    441 testCaseFinished
    441 testCaseStarted
      1 testRunFinished
      1 testRunStarted
  11399 testStepFinished
  11399 testStepStarted
$ sed '/${"stepDefinition/d' cucumber.ndjson > stripped.ndjson
$ ls -l --block-size=M stripped.ndjson
-rw-rw-r-- 1 mpkorstanje mpkorstanje  15M nov  5 22:31 stripped.ndjson

So originally the file was 142M and after removing stepDefinition messages only 15M remains. So that removes about 90%. So the attachments are not the problem.

While d. requires that the MessagesToHtmlWriter knows the implementation details of cucumber/react it would be a pretty significant optimization overall at this point in time. If the html report does use the step definitions in the future then we have the same problem back though.

@aslakhellesoy for the short term could do something like cucumber/common#1236. And it would be even better if react could output a list of messages it consumes because then the filter could be generated in all implementations.

mpkorstanje · 2020-11-05T22:59:33Z

@ben-nc2 if you don't mind building snapshots you could add the following to the HtmlFormatter in cucumber-jvm.

    private void write(Envelope event) {
        if (event.hasStepDefinition() || event.hasHook()) {
            return;
        }

aslakhellesoy · 2020-11-05T23:24:56Z

a. Is a significant amount of work.

Why do you think it's a significant amount of work @mpkorstanje?

I think I understand why we need a new instance of each step definition for each scenario. Can't we emit step definition messages once without changing that?

aslakhellesoy · 2020-11-05T23:29:48Z

@ben-nc2 if you don't mind building snapshots you could add the following to the HtmlFormatter in cucumber-jvm.
    private void write(Envelope event) {
        if (event.hasStepDefinition() || event.hasHook()) {
            return;
        }

I see you have proposed this over in cucumber/common#1236 - I think we should fix this differently.

mpkorstanje · 2020-11-05T23:59:22Z

Fixing this in Cucumber JVM is a problem because the glue is currently not immutable. So there must be a distinct set for each runner/thread.

Additionally because parameter types influence the the regexs in the glue, and because Cucumber JVM supports scenario scoped glue we are forced to use a parameter type registry per scenario. This in turn means we generate and new set of regexs for each scenario. And thus emit a new set of step definitions.

Because each step definition has a unique identifier all must be emitted. Otherwise it's not possible to link back.

Fixing this is possible but it requires a redesign of how glue is discovered and shared between threads.

Once that is in place it should be fairly easy to determine if a step definition can be reused or not.

DougMaisells · 2020-11-06T15:56:11Z

I, for one, like the new test summary output at the top of the formatted html output so please keep that. Anyone know how to get pretty_inspect method to print output to the html file. That is no longer working and instead prints a #.

mpkorstanje · 2020-11-12T18:01:14Z

Cucumber JVM 6.9.0 has been released and will be available in Maven Central soonish. This includes a workaround to reduce the size of the html report.

ben-nc2 · 2020-11-17T09:26:25Z

Thanks guys, have got it running locally and it does make a sizeable difference in our reports.

stale · 2021-09-21T20:54:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two months if no further activity occurs.

epragtbeamtree · 2021-10-07T02:08:08Z

Our current HTML reports are around 10MB, but they mostly contain ignored tests, since we execute every Cucumber test separately (so we can parallelise them easier on our build servers). This results in 100s of 10MB files, with mostly ignored information.

Is there a way to not have these ignored tests in the HTML output, so we can make these reports significantly smaller?

aurelien-reeves · 2021-10-25T09:05:57Z

Did we already consider the possibility to optionally not embed attachments data into the formatters but storing those as dedicated file beside the report file?

Or do we prefer let users managing that option by their own?

mattwynne · 2022-06-15T17:49:49Z

It seems to me that a key problem with the report size is that the overall architecture currently assumes that all the messages have to be on the browser page for the HTML to be rendered.

I realise that this makes the formatter easy to install (since you don't need a JavaScript runtime), but could we consider changing the way this all hangs together so that the rendering happens outside of the browser, and the generated HTML is just a static page?

luke-hill · 2023-12-08T16:50:34Z

Given the original issue has been fixed. I propose we close this and if future issues come up (In the years since the original report there have been a lot of major releases of the formatter / flavours), then we have better reproduction scenarios

mattwynne · 2023-12-08T18:05:27Z

I still think we have the same underlying architectural problem with, at least in cucumber-ruby and cucumber-js, still sending all the messages to the browser for rendering, don't we?

davidjgoss · 2023-12-09T14:42:20Z

I still think we have the same underlying architectural problem with, at least in cucumber-ruby and cucumber-js, still sending all the messages to the browser for rendering, don't we?

That situation is unchanged since this was raised, yeah. I think for cucumber-jvm users, omitting the step definition messages took most of the edge off.

[earlier] but could we consider changing the way this all hangs together so that the rendering happens outside of the browser, and the generated HTML is just a static page?

I don't think that would meaningfully change the story - either way, all of the relevant data is still inside of the generated HTML file, whether it's in JSON or HTML form. Externalising the data would be one approach, but I think the report being a self-contained file is too useful to lose. Maybe this a way Cucumber Reports could differentiate itself, if we develop it a bit more.

For the formatter, I think effort would be well-spent trying to reduce the volume of data we need for a useful report and reducing the size of the generated file. A couple of thoughts:

As you hinted before, cucumber-jvm has some code to filter out messages that the formatter doesn't actually need - we could apply this to all languages (and actually shift it left into this library, ideally) - this won't move the needle much but it's worth doing
With Proposal: Switch to a test case-centric model #283, I think we could omit gherkin documents and sources, and even all pickles except those actually realised as test cases
We should also work to ensure the baseline size is as small as possible i.e. the bundle size of the React app (BTW, this was just cut in half by Replace search engine and improve UX react-components#337).

I'm happy to raise new issues for 1 and 3 and work on these.

mattwynne · 2023-12-09T15:49:51Z

I don't think that would meaningfully change the story - either way, all of the relevant data is still inside of the generated HTML file, whether it's in JSON or HTML form.

My sense was that, by carrying all the messages into the document, we have a lot of irrelevant data as well as the relevant stuff.

I hear you that keeping the data inside the file offers interesting options, and the opposite (rendering React compontents outside the page) would be complicated to implement in non-JS cucumbers.

So if we accept that we will have some kind of message data in the page, we:

re-imagine the data structure of messages passed into the page in a more consumer-driven way, since a lot of the messages in the overall "message protocol" are about execution rather than for rendering results. They're also very de-normalized which can make consuming them awkward. Filtering out some of the messages is one way, but there might be other things we could do, such as adding another results-focussed protocol on top, and building a thing that can transform a stream of existing messages into a results-focussed stream.
compress the data somehow, maybe using https://developer.mozilla.org/en-US/docs/Web/API/Compression_Streams_API?

davidjgoss · 2023-12-09T19:36:43Z

re-imagine the data structure of messages passed into the page in a more consumer-driven way

I think this is a good thread to pull on! Again with #283, we could build new components that define their own data structures for props rather than working directly with messages. Then, we can see what that conversion from messages looks like, how much smaller it makes the data, and whether we can shift that left from the React app to the runtime formatter. What do you think?

mattwynne · 2023-12-10T00:20:50Z

we could build new components that define their own data structures for props rather than working directly with messages. Then, we can see what that conversion from messages looks like, how much smaller it makes the data, and whether we can shift that left from the React app to the runtime formatter. What do you think?

💯

The current protocol, for me, covers three quite distinct sub-domains:

parsing and compiling a test suite from gherkin documents and into test cases
automating and executing that test suite (using step definitions)
reporting results

Although we need all three to do any meaningful work in a Cucumber, I think it's worth reminding people of the context that the original vision of that protocol was to allow for a shared binary (cucumber-engine) that could reduce the amount of duplication across the different implementations. We abandoned that effort after learning that deploying binaries was challenging in some environments (although I sometimes wonder whether Rust/WASM could make this a more viable idea again). So the context has changed.

mpkorstanje assigned mpkorstanje and aslakhellesoy and unassigned mpkorstanje Oct 30, 2020

aslakhellesoy transferred this issue from cucumber/cucumber-jvm Nov 4, 2020

aslakhellesoy added library: html-formatter 🐛 bug library: cucumber-react labels Nov 4, 2020

mpkorstanje mentioned this issue Nov 5, 2020

cucumber-html: Reduce size of generated file cucumber/common#1236

Closed

12 tasks

mpkorstanje mentioned this issue Nov 12, 2020

[Core] Significantly reduce the size of the html report cucumber/cucumber-jvm#2172

Merged

natfatykhova mentioned this issue Nov 13, 2020

java.lang.OutOfMemoryError at Courgette.java:54 prashant-ramcharan/courgette-jvm#214

Closed

ben-nc2 closed this as completed Nov 17, 2020

mpkorstanje reopened this Nov 17, 2020

aurelien-reeves added the 🧷 pinned label Sep 22, 2021

aurelien-reeves mentioned this issue Oct 12, 2021

Formatters: HTML - Is there a --format-options to ignore unknow results? cucumber/cucumber-js#1816

Closed

mattwynne transferred this issue from cucumber/common Apr 1, 2022

mattwynne mentioned this issue Jun 16, 2022

Incremental rendering of HTML report #121

Draft

7 tasks

luke-hill closed this as completed Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New html formatter produces very large report files #62

New html formatter produces very large report files #62

ben-nc2 commented Oct 30, 2020

ben-nc2 commented Nov 3, 2020

aslakhellesoy commented Nov 4, 2020

ben-nc2 commented Nov 4, 2020

aslakhellesoy commented Nov 4, 2020

ben-nc2 commented Nov 5, 2020

mpkorstanje commented Nov 5, 2020 •

edited

mpkorstanje commented Nov 5, 2020

aslakhellesoy commented Nov 5, 2020

aslakhellesoy commented Nov 5, 2020

mpkorstanje commented Nov 5, 2020

DougMaisells commented Nov 6, 2020

mpkorstanje commented Nov 12, 2020

ben-nc2 commented Nov 17, 2020

stale bot commented Sep 21, 2021

epragtbeamtree commented Oct 7, 2021

aurelien-reeves commented Oct 25, 2021

mattwynne commented Jun 15, 2022

luke-hill commented Dec 8, 2023

mattwynne commented Dec 8, 2023

davidjgoss commented Dec 9, 2023 •

edited

mattwynne commented Dec 9, 2023

davidjgoss commented Dec 9, 2023

mattwynne commented Dec 10, 2023 •

edited

New html formatter produces very large report files #62

New html formatter produces very large report files #62

Comments

ben-nc2 commented Oct 30, 2020

ben-nc2 commented Nov 3, 2020

aslakhellesoy commented Nov 4, 2020

More space-efficient message format

More space efficient JavaScript

Fewer messages

ben-nc2 commented Nov 4, 2020

aslakhellesoy commented Nov 4, 2020

ben-nc2 commented Nov 5, 2020

mpkorstanje commented Nov 5, 2020 • edited

mpkorstanje commented Nov 5, 2020

aslakhellesoy commented Nov 5, 2020

aslakhellesoy commented Nov 5, 2020

mpkorstanje commented Nov 5, 2020

DougMaisells commented Nov 6, 2020

mpkorstanje commented Nov 12, 2020

ben-nc2 commented Nov 17, 2020

stale bot commented Sep 21, 2021

epragtbeamtree commented Oct 7, 2021

aurelien-reeves commented Oct 25, 2021

mattwynne commented Jun 15, 2022

luke-hill commented Dec 8, 2023

mattwynne commented Dec 8, 2023

davidjgoss commented Dec 9, 2023 • edited

mattwynne commented Dec 9, 2023

davidjgoss commented Dec 9, 2023

mattwynne commented Dec 10, 2023 • edited

mpkorstanje commented Nov 5, 2020 •

edited

davidjgoss commented Dec 9, 2023 •

edited

mattwynne commented Dec 10, 2023 •

edited