New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-9779] Patch HL7v2IOWriteIT Flakiness #11450
Conversation
* Use TestPipeline in ITs * Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores * Make HL7v2MessageCoder constructor public
Run Java PostCommit |
...oogle-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOWriteIT.java
Show resolved
Hide resolved
Run Java PostCommit |
1 similar comment
Run Java PostCommit |
This seemed to pass in at least this PostCommit. |
I also have a small tweak based on customer feedback and to solve an issue we can avoid in the future with an additional IT. If we'd like to split into two smaller PRs that's fine or I can add here. |
R: @chamikaramj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
PCollection<Long> numReadMessages = | ||
readResult.getMessages().setCoder(new HL7v2MessageCoder()).apply(Count.globally()); | ||
PAssert.thatSingleton(numReadMessages).isEqualTo((long) MESSAGES.size()); | ||
PAssert.that(readResult.getFailedReads()).empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like read and write sections are unrelated. Should these be two different tests ?
Alternatively we can convert this to a single "write and then read pipeline" where data needed for the read step are generated in the write step (and remove data generation in the setUp method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sections are actually related.
the write section starts with readResult.getMessages() (which is returns the output PCollection of the read).
I specifically wanted to add this test because in testing w/ customer we noticed that (before the changes in this PR that set only data and labels) if you ran read and went straight to write you would get errors on ingest because our messages would have fields that should be output only.
This test will help us ensure we don't run introduce a regression in the future for this read to write case.
For context, we already have "just read" and "just write" integration tests for this connector.
...ogle-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOTestUtil.java
Outdated
Show resolved
Hide resolved
...e-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java
Outdated
Show resolved
Hide resolved
* allow HL7v2 Message listing to emit early panes rather than waiting on pagination of all list results * add EBO on HL7v2 Message listing reaching a certain expected length in ITs to account for async indexing BEAM-9779
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -127,6 +114,7 @@ public ListMessagesResponse makeHL7v2ListRequest( | |||
.messages() | |||
.list(hl7v2Store) | |||
.set("view", "full") | |||
.setPageSize(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty large sleep for a single test. Java post commit test suite is run pretty regularly with limited Jenkins resources so I suggest adding a separate test suite for HI7v2 tests and removing this and any other tests that need large sleeps from general Java post-commit test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a sleep.
I think this is a comment on the wrong line.
Currently the approach I've taken is to retry listing of HL7v2 messages until the desired number of messages is returned with EBO and an over all timeout of 10 minutes.
This is very different than a 10 minute sleep as it's expected to succeed well under 10 mins.
I'm pretty sure this is an extremely over kill timeout for indexing 3 messages.
I've asked the internal team about stats we have on this async indexing process to increase confidence here.
I'm not sure how to move this out of the post commit test suite.
So I have some questions:
- What would an acceptable timeout be to keep this in the post commit?
- If I were to run this test 1000x on a VM in the same region as the jenkins VMs with the contents of this PR to prove that it fixes the flakiness, is there additional stats (beyond 1000/1000 runs pass) you'd find helpful (e.g. distribution of total runtime for this test)?
- How to move this to a "sick bay" or other test suite? Does this already exist in beam code base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. I agree waiting till completion with a timeout much better than waiting.
There's some information on adding new test suites here.
https://beam.apache.org/documentation/io/testing/
client, | ||
healthcareDataset + "/hl7V2Stores/" + OUTPUT_HL7V2_STORE_NAME, | ||
MESSAGES.size(), | ||
Duration.standardMinutes(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. I suggest moving tests that require a large sleep to a new test suite.
...e-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IOReadWriteIT.java
Outdated
Show resolved
Hide resolved
@@ -86,36 +86,6 @@ public void tearDown() throws Exception { | |||
deleteAllHL7v2Messages(this.client, healthcareDataset + "/hl7V2Stores/" + HL7V2_STORE_NAME); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are actually redundant with the non-deleted testHL7v2IO_ListHL7v2Messages
and testHL7v2IO_ListHL7v2Messages_filtered
Thanks. This looks much cleaner. I think we can get this in to try to stabilize post-commit test suite. |
Retest this please |
1 similar comment
Retest this please |
Run Java PreCommit |
FYI moved unrelated improvements to #11538 |
Retest this please |
Run Java PreCommit |
* Patches for HL7v2IO * Use TestPipeline in ITs * Drop schematized data before calling message ingest (should be output only) to help pipelines that read/write from/to two HL7v2 stores * Make HL7v2MessageCoder constructor public * block on run * add sleep to avoid flakiness due to asyncronous HL7v2 indexing * E2E integration test * fix merge issue * reconcile double sleeping * improve error hanlding * improve error handling * fix docs typo * add latency distribution metrics * remove unused imports * ingest only data and labels * fix comment * call spliterator directly, use page size 1000 * output elements more eagerly in ListHL72MessageFn * eagerly emit data from early pages * Optimization of Listing and Stablization of ITs * allow HL7v2 Message listing to emit early panes rather than waiting on pagination of all list results * add EBO on HL7v2 Message listing reaching a certain expected length in ITs to account for async indexing BEAM-9779 * revert unrelated changes * add back test * Add constant for HL7v2 indexing timeout minutes * Add constant for HL7v2 indexing timeout minutes * fix checkstyle
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.