-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-1715] Fix Python WordCount on Dataflow Mismatch #2244
[BEAM-1715] Fix Python WordCount on Dataflow Mismatch #2244
Conversation
+R: @aaltay |
@patch.object(ChannelFactory, 'glob') | ||
def test_file_checksum_matcher_read_failed(self, mock_glob): | ||
def test_file_checksum_matcher_read_failed(self, mock_glob, _): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is the right order and not mock_time, mock_glob?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, otherwise mock_glob will be wrong and following asserts must failed.
for path in matched_path: | ||
with ChannelFactory.open(path, 'r') as f: | ||
for line in f: | ||
read_lines.append(line) | ||
return read_lines | ||
|
||
def _matches(self, _): | ||
# Wait to have output file ready on FS | ||
wait_time = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can yo make this configurable, I would assume that not every FS will need such a wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. I can move it to object constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is problem coming from GCS, I think it may better to be configured by test writer in IT when they use GCS as output storage. I'd rather leave default sleep to 0.
Refer to this link for build results (access rights to CI server needed): |
Run Python PostCommit |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Do you know why the postcommit test failed? |
From the log in BEAM-1715, the first time when matcher read from gcs only got 3179 lines which is less than expected (4784 lines). I guess the output file is not fully staged at that time, so waiting for few seconds will help. |
Run Python PostCommit |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Run Python PostCommit |
Refer to this link for build results (access rights to CI server needed): |
PTAL @aaltay |
@@ -43,9 +43,11 @@ def test_wordcount_it(self): | |||
output = '/'.join([test_pipeline.get_option('output'), | |||
test_pipeline.get_option('job_name'), | |||
'results']) | |||
sleep_secs = int(test_pipeline.get_option('sleep_secs')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean this sleep_secs
to be an optional argument or a required one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep_secs
should be optional for wordcount_it_test
. You are right, this will break test if no sleep_secs
specified. Will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Run Python PostCommit |
Refer to this link for build results (access rights to CI server needed): Failed Tests: 1beam_PreCommit_Java_MavenInstall/org.apache.beam:beam-runners-spark: 1
--none-- |
Refer to this link for build results (access rights to CI server needed): |
PTAL @aaltay |
@@ -43,9 +43,12 @@ def test_wordcount_it(self): | |||
output = '/'.join([test_pipeline.get_option('output'), | |||
test_pipeline.get_option('job_name'), | |||
'results']) | |||
arg_sleep_secs = test_pipeline.get_option('sleep_secs') | |||
sleep_time = int(arg_sleep_secs) if arg_sleep_secs is not None else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep_time
-> sleep_secs
for consistency.
self.sleep_secs = sleep_secs | ||
else: | ||
raise ValueError('Sleep seconds, if received, must be int. ' | ||
'But received: %r' % sleep_secs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding the type information here, might be useful.
PTAL @aaltay |
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 580.74 KB...] at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.MojoFailureException: You have 1 Checkstyle violation. at org.apache.maven.plugin.checkstyle.CheckstyleViolationCheckMojo.execute(CheckstyleViolationCheckMojo.java:588) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 more2017-03-22T22:57:40.976 [ERROR] 2017-03-22T22:57:40.976 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-03-22T22:57:40.976 [ERROR] 2017-03-22T22:57:40.976 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-03-22T22:57:40.976 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException2017-03-22T22:57:40.976 [ERROR] 2017-03-22T22:57:40.976 [ERROR] After correcting the problems, you can resume the build with the command2017-03-22T22:57:40.976 [ERROR] mvn -rf :beam-runners-core-javachannel stoppedSetting status of 0104081 to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8679/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install--none-- |
retest this please |
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 826.43 KB...] at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.MojoFailureException: You have 1 Checkstyle violation. at org.apache.maven.plugin.checkstyle.CheckstyleViolationCheckMojo.execute(CheckstyleViolationCheckMojo.java:588) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 more2017-03-22T23:20:20.140 [ERROR] 2017-03-22T23:20:20.140 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-03-22T23:20:20.140 [ERROR] 2017-03-22T23:20:20.140 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-03-22T23:20:20.140 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException2017-03-22T23:20:20.140 [ERROR] 2017-03-22T23:20:20.140 [ERROR] After correcting the problems, you can resume the build with the command2017-03-22T23:20:20.141 [ERROR] mvn -rf :beam-runners-core-javachannel stoppedSetting status of 0104081 to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8681/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install--none-- |
0104081
to
3eaf3b4
Compare
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
LGTM |
Refer to this link for build results (access rights to CI server needed): |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull request
mvn clean verify
. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.