Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9779] Patch HL7v2IOWriteIT Flakiness #11450

Merged
merged 24 commits into from Apr 27, 2020

Conversation

jaketf
Copy link

@jaketf jaketf commented Apr 17, 2020

  • Use TestPipeline in ITs
  • Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores
  • Make HL7v2MessageCoder constructor public

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

* Use TestPipeline in ITs
* Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores
* Make HL7v2MessageCoder constructor public
@chamikaramj
Copy link
Contributor

Run Java PostCommit

@jaketf jaketf changed the title [BEAM-9468] Hl7v2 io patches [BEAM-9779] Hl7v2 io patches Apr 17, 2020
@chamikaramj
Copy link
Contributor

Run Java PostCommit

1 similar comment
@chamikaramj
Copy link
Contributor

Run Java PostCommit

@jaketf
Copy link
Author

jaketf commented Apr 17, 2020

This seemed to pass in at least this PostCommit.
In order to have a more maintainable solution perhaps we should add the sleep to the end of FhirIOTestUtils.writeMessages so others do not forget this in future tests.

@jaketf
Copy link
Author

jaketf commented Apr 17, 2020

I also have a small tweak based on customer feedback and to solve an issue we can avoid in the future with an additional IT. If we'd like to split into two smaller PRs that's fine or I can add here.

@jaketf
Copy link
Author

jaketf commented Apr 20, 2020

R: @chamikaramj

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

PCollection<Long> numReadMessages =
readResult.getMessages().setCoder(new HL7v2MessageCoder()).apply(Count.globally());
PAssert.thatSingleton(numReadMessages).isEqualTo((long) MESSAGES.size());
PAssert.that(readResult.getFailedReads()).empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like read and write sections are unrelated. Should these be two different tests ?

Alternatively we can convert this to a single "write and then read pipeline" where data needed for the read step are generated in the write step (and remove data generation in the setUp method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sections are actually related.
the write section starts with readResult.getMessages() (which is returns the output PCollection of the read).

I specifically wanted to add this test because in testing w/ customer we noticed that (before the changes in this PR that set only data and labels) if you ran read and went straight to write you would get errors on ingest because our messages would have fields that should be output only.

This test will help us ensure we don't run introduce a regression in the future for this read to write case.

For context, we already have "just read" and "just write" integration tests for this connector.

* allow HL7v2 Message listing to emit early panes rather than waiting on pagination of all list results
* add EBO on HL7v2 Message listing reaching a certain expected length in ITs to account for async indexing BEAM-9779
Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -127,6 +114,7 @@ public ListMessagesResponse makeHL7v2ListRequest(
.messages()
.list(hl7v2Store)
.set("view", "full")
.setPageSize(1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty large sleep for a single test. Java post commit test suite is run pretty regularly with limited Jenkins resources so I suggest adding a separate test suite for HI7v2 tests and removing this and any other tests that need large sleeps from general Java post-commit test suite.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a sleep.
I think this is a comment on the wrong line.

Currently the approach I've taken is to retry listing of HL7v2 messages until the desired number of messages is returned with EBO and an over all timeout of 10 minutes.
This is very different than a 10 minute sleep as it's expected to succeed well under 10 mins.
I'm pretty sure this is an extremely over kill timeout for indexing 3 messages.
I've asked the internal team about stats we have on this async indexing process to increase confidence here.

I'm not sure how to move this out of the post commit test suite.

So I have some questions:

  1. What would an acceptable timeout be to keep this in the post commit?
  2. If I were to run this test 1000x on a VM in the same region as the jenkins VMs with the contents of this PR to prove that it fixes the flakiness, is there additional stats (beyond 1000/1000 runs pass) you'd find helpful (e.g. distribution of total runtime for this test)?
  3. How to move this to a "sick bay" or other test suite? Does this already exist in beam code base?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I agree waiting till completion with a timeout much better than waiting.

There's some information on adding new test suites here.
https://beam.apache.org/documentation/io/testing/

client,
healthcareDataset + "/hl7V2Stores/" + OUTPUT_HL7V2_STORE_NAME,
MESSAGES.size(),
Duration.standardMinutes(10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. I suggest moving tests that require a large sleep to a new test suite.

@@ -86,36 +86,6 @@ public void tearDown() throws Exception {
deleteAllHL7v2Messages(this.client, healthcareDataset + "/hl7V2Stores/" + HL7V2_STORE_NAME);
}

@Test
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are actually redundant with the non-deleted testHL7v2IO_ListHL7v2Messages and testHL7v2IO_ListHL7v2Messages_filtered

@chamikaramj
Copy link
Contributor

Thanks. This looks much cleaner. I think we can get this in to try to stabilize post-commit test suite.

@chamikaramj
Copy link
Contributor

Retest this please

1 similar comment
@chamikaramj
Copy link
Contributor

Retest this please

@chamikaramj
Copy link
Contributor

Run Java PreCommit

@jaketf
Copy link
Author

jaketf commented Apr 27, 2020

FYI moved unrelated improvements to #11538

@jaketf jaketf changed the title [BEAM-9779] Hl7v2 io patches [BEAM-9779] Patch HL7v2IOWriteIT Flakiness Apr 27, 2020
@chamikaramj
Copy link
Contributor

Retest this please

@chamikaramj
Copy link
Contributor

Run Java PreCommit

@chamikaramj chamikaramj merged commit 81d7cbe into apache:master Apr 27, 2020
yirutang pushed a commit to yirutang/beam that referenced this pull request Jul 23, 2020
* Patches for HL7v2IO

* Use TestPipeline in ITs
* Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores
* Make HL7v2MessageCoder constructor public

* block on run

* add sleep to avoid flakiness due to asyncronous HL7v2 indexing

* E2E integration test

* fix merge issue

* reconcile double sleeping

* improve error hanlding

* improve error handling

* fix docs typo

* add latency distribution metrics

* remove unused imports

* ingest only data and labels

* fix comment

* call spliterator directly, use page size 1000

* output elements more eagerly in ListHL72MessageFn

* eagerly emit data from early pages

* Optimization of Listing and Stablization of ITs

* allow HL7v2 Message listing to emit early panes rather than waiting on pagination of all list results
* add EBO on HL7v2 Message listing reaching a certain expected length in ITs to account for async indexing BEAM-9779

* revert unrelated changes

* add back test

* Add constant for HL7v2 indexing timeout minutes

* Add constant for HL7v2 indexing timeout minutes

* fix checkstyle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants