New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3060] Add Compressed TextIOIT #4149
Conversation
retest this please |
@chamikaramj (this message is a kind reminder) :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Sorry about the delay.
.getPerDestinationOutputFilenames().apply(Values.<String>create()); | ||
/** IO IT with no compression. */ | ||
@RunWith(JUnit4.class) | ||
public static class UncompressedTextIOIT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does perfkitbenchmarker-based execution (#4120) still work with these changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works but runs all the 4 tests that are there in the file. But now I think this is probably not what we want. This won't be a problem as you suggested an even better solution in the comment below.
.apply("Write content to files", TextIO.write().to(filenamePrefix).withOutputFilenames()) | ||
.getPerDestinationOutputFilenames().apply(Values.<String>create()); | ||
/** IO IT with no compression. */ | ||
@RunWith(JUnit4.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that this test will be picked up by all test suites (including Java pre-commit), isn't it ? Not sure if we want to do that due to the size of this test. Adding to post-commit tests should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double-checked that by running the preCommit job on my machine - those are not fired in PreCommit phase. Also, out of curiosity 😄 I investigated a little bit the project's mvn structure:
besides the @RunWith(JUnit.class)
annotation that is required by JUnit, we have two mvn plugins that look (scan) for tests:
- surefire (looks for unit tests and searches for classes with *Test suffix)
- failsafe (looks for integration tests and searches for classes with *IT suffix)
As failsafe is not fired in the PreCommit phase, the tests are not invoked. Please look at io parent pom, where failsafe plugin is activated only when io-it profile is active.
PCollection<String> consolidatedHashcode = testFilenames | ||
.apply("Read all files", TextIO.readAll()) | ||
.apply("Calculate hashcode", Combine.globally(new HashingFn())); | ||
PCollection<String> testFilenames = pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and uncompressed version have the same pipeline. Can't we share to code between tests (and keep the same test class TextIOIT) and add "compression type" as a parameter to the test (a Maven -D parameter for the perfkit based runs) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's hard to do right now without modifying perfkit's code. As we checked, perfkit ignores -D parameters because builds the mvn verify command by itself from the parameters passed . I think this could be done in some future contribution. We will file a bug report in perfkit soon.
I think the best solution (at least for now) is to leave the compression type in pipeline options. We pass them to perfkit either way (through beam_it_options
) and, what imo is more important, compressionType is very test specific (same as numberOfRecords). WDYT?
@chamikaramj thanks for the review! Here's another batch of changes, as commented above. |
LGTM |
Merged. Closing. |
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue.mvn clean verify
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.This is a parametrized test for Compressed TextIO. Only the Java code - @DariuszAniszewski is working on Perfkit support and Dataflow runner support on his separate branch. As ZIP compression type is unsupported, I skipped it in the test.
@chamikaramj could you take a look?