[BEAM-9779] Patch HL7v2IOWriteIT Flakiness #11450

jaketf · 2020-04-17T19:20:57Z

Use TestPipeline in ITs
Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores
Make HL7v2MessageCoder constructor public

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

* Use TestPipeline in ITs * Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores * Make HL7v2MessageCoder constructor public

chamikaramj · 2020-04-17T19:25:18Z

Run Java PostCommit

...oogle-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOWriteIT.java

chamikaramj · 2020-04-17T20:49:35Z

Run Java PostCommit

chamikaramj · 2020-04-17T20:49:48Z

Run Java PostCommit

jaketf · 2020-04-17T23:36:43Z

This seemed to pass in at least this PostCommit.
In order to have a more maintainable solution perhaps we should add the sleep to the end of FhirIOTestUtils.writeMessages so others do not forget this in future tests.

jaketf · 2020-04-17T23:50:53Z

I also have a small tweak based on customer feedback and to solve an issue we can avoid in the future with an additional IT. If we'd like to split into two smaller PRs that's fine or I can add here.

jaketf · 2020-04-20T16:24:13Z

R: @chamikaramj

chamikaramj

Thanks!

chamikaramj · 2020-04-22T19:22:23Z

...e-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java

+    PCollection<Long> numReadMessages =
+        readResult.getMessages().setCoder(new HL7v2MessageCoder()).apply(Count.globally());
+    PAssert.thatSingleton(numReadMessages).isEqualTo((long) MESSAGES.size());
+    PAssert.that(readResult.getFailedReads()).empty();


Seems like read and write sections are unrelated. Should these be two different tests ?

Alternatively we can convert this to a single "write and then read pipeline" where data needed for the read step are generated in the write step (and remove data generation in the setUp method.

The sections are actually related.
the write section starts with readResult.getMessages() (which is returns the output PCollection of the read).

I specifically wanted to add this test because in testing w/ customer we noticed that (before the changes in this PR that set only data and labels) if you ran read and went straight to write you would get errors on ingest because our messages would have fields that should be output only.

This test will help us ensure we don't run introduce a regression in the future for this read to write case.

For context, we already have "just read" and "just write" integration tests for this connector.

...ogle-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTestUtil.java

...e-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java

* allow HL7v2 Message listing to emit early panes rather than waiting on pagination of all list results * add EBO on HL7v2 Message listing reaching a certain expected length in ITs to account for async indexing BEAM-9779

chamikaramj

Thanks.

chamikaramj · 2020-04-27T15:25:41Z

...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HttpHealthcareApiClient.java

@@ -127,6 +114,7 @@ public ListMessagesResponse makeHL7v2ListRequest(
            .messages()
            .list(hl7v2Store)
            .set("view", "full")
+            .setPageSize(1000)


This is a pretty large sleep for a single test. Java post commit test suite is run pretty regularly with limited Jenkins resources so I suggest adding a separate test suite for HI7v2 tests and removing this and any other tests that need large sleeps from general Java post-commit test suite.

This is not a sleep.
I think this is a comment on the wrong line.

Currently the approach I've taken is to retry listing of HL7v2 messages until the desired number of messages is returned with EBO and an over all timeout of 10 minutes.
This is very different than a 10 minute sleep as it's expected to succeed well under 10 mins.
I'm pretty sure this is an extremely over kill timeout for indexing 3 messages.
I've asked the internal team about stats we have on this async indexing process to increase confidence here.

I'm not sure how to move this out of the post commit test suite.

So I have some questions:

What would an acceptable timeout be to keep this in the post commit?

If I were to run this test 1000x on a VM in the same region as the jenkins VMs with the contents of this PR to prove that it fixes the flakiness, is there additional stats (beyond 1000/1000 runs pass) you'd find helpful (e.g. distribution of total runtime for this test)?

How to move this to a "sick bay" or other test suite? Does this already exist in beam code base?

Thanks for clarifying. I agree waiting till completion with a timeout much better than waiting.

There's some information on adding new test suites here.
https://beam.apache.org/documentation/io/testing/

chamikaramj · 2020-04-27T15:29:53Z

...e-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java

+          client,
+          healthcareDataset + "/hl7V2Stores/" + OUTPUT_HL7V2_STORE_NAME,
+          MESSAGES.size(),
+          Duration.standardMinutes(10));


Ditto. I suggest moving tests that require a large sleep to a new test suite.

...e-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java

jaketf · 2020-04-27T16:48:04Z

...google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadIT.java

@@ -86,36 +86,6 @@ public void tearDown() throws Exception {
    deleteAllHL7v2Messages(this.client, healthcareDataset + "/hl7V2Stores/" + HL7V2_STORE_NAME);
  }

-  @Test


these tests are actually redundant with the non-deleted testHL7v2IO_ListHL7v2Messages and testHL7v2IO_ListHL7v2Messages_filtered

chamikaramj · 2020-04-27T18:00:53Z

Thanks. This looks much cleaner. I think we can get this in to try to stabilize post-commit test suite.

chamikaramj · 2020-04-27T18:01:00Z

Retest this please

chamikaramj · 2020-04-27T18:01:19Z

Retest this please

chamikaramj · 2020-04-27T18:02:00Z

Run Java PreCommit

jaketf · 2020-04-27T18:28:52Z

FYI moved unrelated improvements to #11538

chamikaramj · 2020-04-27T18:49:11Z

Retest this please

chamikaramj · 2020-04-27T20:43:09Z

Run Java PreCommit

* Patches for HL7v2IO * Use TestPipeline in ITs * Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores * Make HL7v2MessageCoder constructor public * block on run * add sleep to avoid flakiness due to asyncronous HL7v2 indexing * E2E integration test * fix merge issue * reconcile double sleeping * improve error hanlding * improve error handling * fix docs typo * add latency distribution metrics * remove unused imports * ingest only data and labels * fix comment * call spliterator directly, use page size 1000 * output elements more eagerly in ListHL72MessageFn * eagerly emit data from early pages * Optimization of Listing and Stablization of ITs * allow HL7v2 Message listing to emit early panes rather than waiting on pagination of all list results * add EBO on HL7v2 Message listing reaching a certain expected length in ITs to account for async indexing BEAM-9779 * revert unrelated changes * add back test * Add constant for HL7v2 indexing timeout minutes * Add constant for HL7v2 indexing timeout minutes * fix checkstyle

Patches for HL7v2IO

f47a977

* Use TestPipeline in ITs * Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores * Make HL7v2MessageCoder constructor public

probot-autolabeler bot added gcp io java labels Apr 17, 2020

block on run

55ab181

jaketf commented Apr 17, 2020

View reviewed changes

...oogle-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOWriteIT.java Show resolved Hide resolved

jaketf changed the title ~~[BEAM-9468] Hl7v2 io patches~~ [BEAM-9779] Hl7v2 io patches Apr 17, 2020

add sleep to avoid flakiness due to asyncronous HL7v2 indexing

96d8e93

Jacob Ferriero added 3 commits April 17, 2020 16:51

E2E integration test

1bbaea6

fix merge issue

6ede0a8

reconcile double sleeping

40a23ef

Jacob Ferriero added 3 commits April 20, 2020 16:42

improve error hanlding

194cf3e

improve error handling

7af97dd

fix docs typo

6f86186

chamikaramj reviewed Apr 22, 2020

View reviewed changes

Jacob Ferriero added 9 commits April 22, 2020 13:02

add latency distribution metrics

766e6fd

Merge branch 'metrics/HL7v2IO' into patch/HL7v2IO

e989ec4

remove unused imports

16399f1

Merge branch 'metrics/HL7v2IO' into patch/HL7v2IO

9ef3a33

ingest only data and labels

c8b5766

fix comment

50ba3d8

call spliterator directly, use page size 1000

461b7cd

output elements more eagerly in ListHL72MessageFn

b2602b4

eagerly emit data from early pages

1503fd4

Optimization of Listing and Stablization of ITs

5f9bad7

* allow HL7v2 Message listing to emit early panes rather than waiting on pagination of all list results * add EBO on HL7v2 Message listing reaching a certain expected length in ITs to account for async indexing BEAM-9779

chamikaramj reviewed Apr 27, 2020

View reviewed changes

Jacob Ferriero added 2 commits April 27, 2020 09:40

revert unrelated changes

ee771ba

add back test

d0b3349

jaketf commented Apr 27, 2020

View reviewed changes

Jacob Ferriero added 3 commits April 27, 2020 09:55

Add constant for HL7v2 indexing timeout minutes

92af300

Add constant for HL7v2 indexing timeout minutes

a01e7d0

fix checkstyle

d80107e

jaketf changed the title ~~[BEAM-9779] Hl7v2 io patches~~ [BEAM-9779] Patch HL7v2IOWriteIT Flakiness Apr 27, 2020

chamikaramj merged commit 81d7cbe into apache:master Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-9779] Patch HL7v2IOWriteIT Flakiness #11450

[BEAM-9779] Patch HL7v2IOWriteIT Flakiness #11450

jaketf commented Apr 17, 2020

chamikaramj commented Apr 17, 2020

chamikaramj commented Apr 17, 2020

chamikaramj commented Apr 17, 2020

jaketf commented Apr 17, 2020 •

edited

jaketf commented Apr 17, 2020

jaketf commented Apr 20, 2020

chamikaramj left a comment

chamikaramj Apr 22, 2020

jaketf Apr 23, 2020

chamikaramj left a comment

chamikaramj Apr 27, 2020

jaketf Apr 27, 2020

chamikaramj Apr 27, 2020

chamikaramj Apr 27, 2020

jaketf Apr 27, 2020

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

jaketf commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

[BEAM-9779] Patch HL7v2IOWriteIT Flakiness #11450

[BEAM-9779] Patch HL7v2IOWriteIT Flakiness #11450

Conversation

jaketf commented Apr 17, 2020

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

chamikaramj commented Apr 17, 2020

chamikaramj commented Apr 17, 2020

chamikaramj commented Apr 17, 2020

jaketf commented Apr 17, 2020 • edited

jaketf commented Apr 17, 2020

jaketf commented Apr 20, 2020

chamikaramj left a comment

Choose a reason for hiding this comment

chamikaramj Apr 22, 2020

Choose a reason for hiding this comment

jaketf Apr 23, 2020

Choose a reason for hiding this comment

chamikaramj left a comment

Choose a reason for hiding this comment

chamikaramj Apr 27, 2020

Choose a reason for hiding this comment

jaketf Apr 27, 2020

Choose a reason for hiding this comment

chamikaramj Apr 27, 2020

Choose a reason for hiding this comment

chamikaramj Apr 27, 2020

Choose a reason for hiding this comment

jaketf Apr 27, 2020

Choose a reason for hiding this comment

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

jaketf commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

chamikaramj commented Apr 27, 2020

jaketf commented Apr 17, 2020 •

edited