Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-747] Improve FileChecksumMatcher That Inconsistent With Filesystem #1189

Closed

Conversation

markflyhigh
Copy link
Contributor

@markflyhigh markflyhigh commented Oct 25, 2016

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

Add retry in FileChecksumMatcher when following conditions happens:

  • IOException raised from filesystem
  • No file found from output directory
  • number of files found from fs doesn't equal to expected number, which is parsed from shard name using a name template. Default template "SSS-of-NNN" will be used when no template is specified.

Default retry times are 4. Default sleep duration between each retry are 10s.

@markflyhigh
Copy link
Contributor Author

+R: @jasonkuster

private String actualChecksum;

public FileChecksumMatcher(String checksum, String filePath) {
this(checksum, filePath, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing in null here, probably clearer to pass in DEFAULT_SHARD_TEMPLATE. Leave the null-handling in the constructor below of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

IOChannelUtils.resolve(tmpFolder.getRoot().getPath(), "*"));

assertThat(pResult, matcher);
}

@Test
public void testReadWithRetriesFailsWhenTemplateIncorrect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a test here that verifies incorrect template behavior, but no test which verifies any template other than the default template. Probably a good idea to add one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/**
* Check if total number of files is correct by comparing with the number that
* is parsed from shard name using a name template. If no template are specified,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no template is specified

@markflyhigh
Copy link
Contributor Author

PTAL @jasonkuster

@markflyhigh markflyhigh changed the title [BEAM-747] Fix FileChecksumMatcher That Inconsistent With Filesystem [BEAM-747] Improve FileChecksumMatcher That Inconsistent With Filesystem Nov 9, 2016
@markflyhigh
Copy link
Contributor Author

@kennknowles

@kennknowles
Copy link
Member

@markflyhigh can you rebase to kick the tests until we get a clean execution? (or until you uncover a real bug)

@markflyhigh markflyhigh force-pushed the file-matcher-read-retry branch 2 times, most recently from 5385c10 to 65a09aa Compare November 14, 2016 19:27
@markflyhigh
Copy link
Contributor Author

Rebased with latest master and all integration tests passed. The Jenkins build failed because known existing bugs that doesn't related to this PR.

PTAL: @kennknowles @jasonkuster

@jasonkuster
Copy link
Contributor

LGTM, assuming Jenkins failure is transient. Suggest rebasing and kicking tests off again.

@asfbot
Copy link

asfbot commented Nov 21, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_Build/23/
--none--

@markflyhigh
Copy link
Contributor Author

friendly ping @kennknowles

*/
@VisibleForTesting
boolean checkTotalNumOfFiles(Collection<String> files) {
for (String filePath : files.toArray(new String[0])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need toArray here? Should be able to just use for (String filePath : files) { ... }. If not, please comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, should just use the simple way to iterate collections. done

.withInitialBackoff(DEFAULT_SLEEP_DURATION)
.withMaxRetries(MAX_READ_RETRIES);

private static final String DEFAULT_SHARD_TEMPLATE = "\\S*\\d+-of-(\\d+)$";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have type Pattern unless there is a special reason not to compile it right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.withInitialBackoff(DEFAULT_SLEEP_DURATION)
.withMaxRetries(MAX_READ_RETRIES);

private static final String DEFAULT_SHARD_TEMPLATE = "\\S*\\d+-of-(\\d+)$";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using named groups and allowing space via (?x) for greater clarity. Here I've just demonstrated adding also a capture for shardnum.

// Whitespace is ignored
Pattern.compile("(?x) \\S* (?shardnum \\d+) -of- (?numshards \\d+)")

This will allow your later use of .group(1) to be .group("numshards").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! done

this(checksum, filePath, DEFAULT_SHARD_TEMPLATE);
}

public FileChecksumMatcher(String checksum, String filePath, String shardTemplate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should take a Pattern unless there is a special reason not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -66,48 +99,105 @@ public FileChecksumMatcher(String checksum, String filePath) {

this.expectedChecksum = checksum;
this.filePath = filePath;
this.shardTemplate =
Pattern.compile(shardTemplate == null ? DEFAULT_SHARD_TEMPLATE : shardTemplate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't check for null but should require the user to call the other constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a precondition check to make sure it's non-null.

LOG.info("Generated checksum for output data: {}", actualChecksum);
// Verify outputs. Checksum is computed using SHA-1 algorithm
actualChecksum = computeHash(outputs);
LOG.info("Generated checksum: {}", actualChecksum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silence this or LOG.debug. If there is a mismatch the error thrown should contain this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

LOG.info(
"[{} of {}] Read {} lines from file: {}", i, files.size() - 1, lines.size(), file);
"[{} of {}] Read {} lines from file: {}", i, files.size(), lines.size(), file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silence, or LOG.debug.

try {
// Match inputPath which may contains glob
Collection<String> files = factory.match(filePath);
LOG.info("Found {} file(s) by matching the path: {}", files.size(), filePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silence or LOG.debug

// once match, extract total number of shards and compare to file list
return files.size() == Integer.parseInt(matcher.group(1));
}
LOG.warn("No name matches the shard template: {}", shardTemplate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should either be an error or remain silent. When is this expected to be OK and when is it not OK?

continue;
}
// once match, extract total number of shards and compare to file list
return files.size() == Integer.parseInt(matcher.group(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above how to make this more self-explanatory as .group("numshards"). Either way, requires a brief comment about the expectations of the shard template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I'll add javadoc to explain more for customized shard template.

@markflyhigh
Copy link
Contributor Author

PTAL @kennknowles

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I'll merge. Thanks!

@asfgit asfgit closed this in b75a764 Nov 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants