Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-3060] Add Compressed TextIOIT #4149

Closed
wants to merge 2 commits into from

Conversation

lgajowy
Copy link
Contributor

@lgajowy lgajowy commented Nov 20, 2017

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

This is a parametrized test for Compressed TextIO. Only the Java code - @DariuszAniszewski is working on Perfkit support and Dataflow runner support on his separate branch. As ZIP compression type is unsupported, I skipped it in the test.

@chamikaramj could you take a look?

@lgajowy
Copy link
Contributor Author

lgajowy commented Nov 22, 2017

retest this please

@lgajowy
Copy link
Contributor Author

lgajowy commented Nov 23, 2017

@chamikaramj (this message is a kind reminder) :)

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sorry about the delay.

.getPerDestinationOutputFilenames().apply(Values.<String>create());
/** IO IT with no compression. */
@RunWith(JUnit4.class)
public static class UncompressedTextIOIT {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does perfkitbenchmarker-based execution (#4120) still work with these changes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works but runs all the 4 tests that are there in the file. But now I think this is probably not what we want. This won't be a problem as you suggested an even better solution in the comment below.

.apply("Write content to files", TextIO.write().to(filenamePrefix).withOutputFilenames())
.getPerDestinationOutputFilenames().apply(Values.<String>create());
/** IO IT with no compression. */
@RunWith(JUnit4.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that this test will be picked up by all test suites (including Java pre-commit), isn't it ? Not sure if we want to do that due to the size of this test. Adding to post-commit tests should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked that by running the preCommit job on my machine - those are not fired in PreCommit phase. Also, out of curiosity 😄 I investigated a little bit the project's mvn structure:

besides the @RunWith(JUnit.class) annotation that is required by JUnit, we have two mvn plugins that look (scan) for tests:

  • surefire (looks for unit tests and searches for classes with *Test suffix)
  • failsafe (looks for integration tests and searches for classes with *IT suffix)

As failsafe is not fired in the PreCommit phase, the tests are not invoked. Please look at io parent pom, where failsafe plugin is activated only when io-it profile is active.

PCollection<String> consolidatedHashcode = testFilenames
.apply("Read all files", TextIO.readAll())
.apply("Calculate hashcode", Combine.globally(new HashingFn()));
PCollection<String> testFilenames = pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and uncompressed version have the same pipeline. Can't we share to code between tests (and keep the same test class TextIOIT) and add "compression type" as a parameter to the test (a Maven -D parameter for the perfkit based runs) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's hard to do right now without modifying perfkit's code. As we checked, perfkit ignores -D parameters because builds the mvn verify command by itself from the parameters passed . I think this could be done in some future contribution. We will file a bug report in perfkit soon.

I think the best solution (at least for now) is to leave the compression type in pipeline options. We pass them to perfkit either way (through beam_it_options) and, what imo is more important, compressionType is very test specific (same as numberOfRecords). WDYT?

@lgajowy
Copy link
Contributor Author

lgajowy commented Nov 24, 2017

@chamikaramj thanks for the review! Here's another batch of changes, as commented above.

@chamikaramj
Copy link
Contributor

LGTM

@chamikaramj
Copy link
Contributor

Merged. Closing.

@lgajowy lgajowy deleted the compressed-text-io-test branch March 14, 2018 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants