Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5442] Store duplicate unknown options in a list argument #6600

Merged
merged 1 commit into from Oct 10, 2018

Conversation

Projects
None yet
5 participants
@mxm
Copy link
Contributor

commented Oct 8, 2018

As of BEAM-5442, we parse unknown pipeline options to pass them on to the actual
Runner. If the same unknown option appeared multiple times, it would be
registered a second time and throw an exception.

This change converts duplicate items to a value list.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---
@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2018

Run Python PostCommit

@mxm mxm requested a review from tweise Oct 8, 2018

@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2018

@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2018

PostCommit appears to be flaky. Can't spot any errors related to this PR here: https://builds.apache.org/job/beam_PostCommit_Python_Verify_PR/138/

@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2018

Run Python PostCommit

@mxm

This comment was marked as resolved.

Copy link
Contributor Author

commented Oct 8, 2018

Somebody seen this before?

NotFound: 404 Not found: Table apache-beam-testing:game_stats_it_dataset1539000789.game_stats_sessions (POST https://www.googleapis.com/bigquery/v2/projects/apache-beam-testing/queries)

All failures:

======================================================================
ERROR: test_leader_board_it (apache_beam.examples.complete.game.leader_board_it_test.LeaderBoardIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py", line 161, in test_leader_board_it
    self.test_pipeline.get_full_options_as_args(**extra_opts))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/complete/game/leader_board.py", line 345, in run
    'total_score': 'INTEGER',
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 423, in __exit__
    self.run().wait_until_finish()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 403, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 416, in run
    return self.runner.run_pipeline(self)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 65, in run_pipeline
    hc_assert_that(self.result, pickler.loads(on_success_matcher))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py", line 43, in assert_that
    _assert_match(actual=arg1, matcher=arg2, reason=arg3)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py", line 49, in _assert_match
    if not matcher.matches(actual):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/core/allof.py", line 16, in matches
    if not matcher.matches(item):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/base_matcher.py", line 28, in matches
    match_result = self._matches(item)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py", line 81, in _matches
    response = self._query_with_retry(bigquery_client)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 197, in wrapper
    raise_with_traceback(exn, exn_traceback)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py", line 98, in _query_with_retry
    query.run()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/google/cloud/bigquery/query.py", line 381, in run
    method='POST', path=path, data=self._build_resource())
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/google/cloud/_http.py", line 303, in api_request
    error_info=method + ' ' + url)
NotFound: 404 Not found: Table apache-beam-testing:leader_board_it_dataset1539000788.leader_board_users (POST https://www.googleapis.com/bigquery/v2/projects/apache-beam-testing/queries)
-------------------- >> begin captured stdout << ---------------------
Found: https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_05_13_15-11047745058497888751?project=apache-beam-testing.

--------------------- >> end captured stdout << ----------------------

======================================================================
ERROR: test_leader_board_it (apache_beam.examples.complete.game.leader_board_it_test.LeaderBoardIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/utils.py", line 61, in delete_bq_table
    (table, project, dataset))
GcpTestIOError: Failed to cleanup. Bigquery table leader_board_teams doesn't exist in project apache-beam-testing, dataset leader_board_it_dataset1539000788.

======================================================================
ERROR: test_leader_board_it (apache_beam.examples.complete.game.leader_board_it_test.LeaderBoardIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/utils.py", line 61, in delete_bq_table
    (table, project, dataset))
GcpTestIOError: Failed to cleanup. Bigquery table leader_board_users doesn't exist in project apache-beam-testing, dataset leader_board_it_dataset1539000788.

======================================================================
ERROR: test_game_stats_it (apache_beam.examples.complete.game.game_stats_it_test.GameStatsIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py", line 152, in test_game_stats_it
    self.test_pipeline.get_full_options_as_args(**extra_opts))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/complete/game/game_stats.py", line 390, in run
    'mean_duration': 'FLOAT',
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 423, in __exit__
    self.run().wait_until_finish()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 403, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 416, in run
    return self.runner.run_pipeline(self)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 65, in run_pipeline
    hc_assert_that(self.result, pickler.loads(on_success_matcher))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py", line 43, in assert_that
    _assert_match(actual=arg1, matcher=arg2, reason=arg3)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py", line 49, in _assert_match
    if not matcher.matches(actual):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/core/allof.py", line 16, in matches
    if not matcher.matches(item):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/base_matcher.py", line 28, in matches
    match_result = self._matches(item)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py", line 81, in _matches
    response = self._query_with_retry(bigquery_client)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 197, in wrapper
    raise_with_traceback(exn, exn_traceback)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py", line 98, in _query_with_retry
    query.run()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/google/cloud/bigquery/query.py", line 381, in run
    method='POST', path=path, data=self._build_resource())
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/google/cloud/_http.py", line 303, in api_request
    error_info=method + ' ' + url)
NotFound: 404 Not found: Table apache-beam-testing:game_stats_it_dataset1539000789.game_stats_sessions (POST https://www.googleapis.com/bigquery/v2/projects/apache-beam-testing/queries)
-------------------- >> begin captured stdout << ---------------------
Found: https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_05_13_15-15161388551688112062?project=apache-beam-testing.

--------------------- >> end captured stdout << ----------------------

======================================================================
ERROR: test_game_stats_it (apache_beam.examples.complete.game.game_stats_it_test.GameStatsIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/utils.py", line 61, in delete_bq_table
    (table, project, dataset))
GcpTestIOError: Failed to cleanup. Bigquery table game_stats_teams doesn't exist in project apache-beam-testing, dataset game_stats_it_dataset1539000789.

======================================================================
ERROR: test_game_stats_it (apache_beam.examples.complete.game.game_stats_it_test.GameStatsIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/utils.py", line 61, in delete_bq_table
    (table, project, dataset))
GcpTestIOError: Failed to cleanup. Bigquery table game_stats_sessions doesn't exist in project apache-beam-testing, dataset game_stats_it_dataset1539000789.

======================================================================
ERROR: test_wordcount_fnapi_it (apache_beam.examples.wordcount_it_test.WordCountIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/nose/plugins/multiprocess.py", line 812, in run
    test(orig)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/nose/case.py", line 45, in __call__
    return self.run(*arg, **kwarg)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/nose/case.py", line 133, in run
    self.runTest(result)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/nose/case.py", line 151, in runTest
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/wordcount_it_test.py", line 79, in test_wordcount_fnapi_it
    on_success_matcher=PipelineStateMatcher()))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/wordcount_fnapi.py", line 125, in run
    result = p.run()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 403, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 416, in run
    return self.runner.run_pipeline(self)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 68, in run_pipeline
    self.result.cancel()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 1188, in cancel
    return self.state
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 1128, in state
    self._update_job()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 1084, in _update_job
    self._job = self._runner.dataflow_client.get_job(self.job_id())
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", line 629, in get_job
    response = self._client.projects_locations_jobs.Get(request)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 604, in Get
    config, request, global_params=global_params)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/apitools/base/py/base_api.py", line 720, in _RunMethod
    http, http_request, **opts)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 346, in MakeRequest
    check_response_func=check_response_func)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 396, in _MakeRequestNoRetry
    redirections=redirections, connection_type=connection_type)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1694, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1434, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1390, in _conn_request
    response = conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1136, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 480, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/lib/python2.7/ssl.py", line 756, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 643, in read
    v = self._sslobj.read(len)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/nose/plugins/multiprocess.py", line 276, in signalhandler
    raise TimedOutException()
TimedOutException: 'test_wordcount_fnapi_it (apache_beam.examples.wordcount_it_test.WordCountIT)'

======================================================================
FAIL: test_streaming_wordcount_it (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", line 104, in test_streaming_wordcount_it
    self.test_pipeline.get_full_options_as_args(**extra_opts))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", line 101, in run
    result = p.run()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 416, in run
    return self.runner.run_pipeline(self)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 65, in run_pipeline
    hc_assert_that(self.result, pickler.loads(on_success_matcher))
AssertionError: 
Expected: (Test pipeline expected terminated in state: RUNNING and Expected 500 messages.)
     but: Expected 500 messages. Got 0 messages. Diffs (item, count):
  Expected but not in actual: [('403: 1', 1), ('481: 1', 1), ('207: 1', 1), ('187: 1', 1), ('401: 1', 1), ('356: 1', 1), ('15: 1', 1), ('297: 1', 1), ('28: 1', 1), ('426: 1', 1), ('3: 1', 1), ('125: 1', 1), ('469: 1', 1), ('436: 1', 1), ('126: 1', 1), ('57: 1', 1), ('477: 1', 1), ('454: 1', 1), ('458: 1', 1), ('180: 1', 1), ('100: 1', 1), ('161: 1', 1), ('99: 1', 1), ('326: 1', 1), ('96: 1', 1), ('233: 1', 1), ('434: 1', 1), ('61: 1', 1), ('21: 1', 1), ('433: 1', 1), ('149: 1', 1), ('324: 1', 1), ('384: 1', 1), ('470: 1', 1), ('289: 1', 1), ('347: 1', 1), ('0: 1', 1), ('371: 1', 1), ('491: 1', 1), ('36: 1', 1), ('26: 1', 1), ('213: 1', 1), ('87: 1', 1), ('9: 1', 1), ('75: 1', 1), ('278: 1', 1), ('398: 1', 1), ('80: 1', 1), ('317: 1', 1), ('298: 1', 1), ('45: 1', 1), ('238: 1', 1), ('242: 1', 1), ('360: 1', 1), ('420: 1', 1), ('443: 1', 1), ('336: 1', 1), ('362: 1', 1), ('396: 1', 1), ('206: 1', 1), ('40: 1', 1), ('244: 1', 1), ('441: 1', 1), ('389: 1', 1), ('190: 1', 1), ('223: 1', 1), ('170: 1', 1), ('115: 1', 1), ('273: 1', 1), ('373: 1', 1), ('456: 1', 1), ('164: 1', 1), ('224: 1', 1), ('374: 1', 1), ('59: 1', 1), ('267: 1', 1), ('12: 1', 1), ('229: 1', 1), ('154: 1', 1), ('400: 1', 1), ('235: 1', 1), ('63: 1', 1), ('258: 1', 1), ('460: 1', 1), ('480: 1', 1), ('200: 1', 1), ('494: 1', 1), ('214: 1', 1), ('300: 1', 1), ('216: 1', 1), ('127: 1', 1), ('287: 1', 1), ('285: 1', 1), ('23: 1', 1), ('414: 1', 1), ('34: 1', 1), ('121: 1', 1), ('106: 1', 1), ('130: 1', 1), ('315: 1', 1), ('101: 1', 1), ('378: 1', 1), ('131: 1', 1), ('181: 1', 1), ('142: 1', 1), ('257: 1', 1), ('338: 1', 1), ('325: 1', 1), ('176: 1', 1), ('117: 1', 1), ('305: 1', 1), ('97: 1', 1), ('380: 1', 1), ('411: 1', 1), ('72: 1', 1), ('153: 1', 1), ('151: 1', 1), ('43: 1', 1), ('209: 1', 1), ('357: 1', 1), ('471: 1', 1), ('166: 1', 1), ('8: 1', 1), ('340: 1', 1), ('299: 1', 1), ('486: 1', 1), ('44: 1', 1), ('54: 1', 1), ('140: 1', 1), ('6: 1', 1), ('218: 1', 1), ('342: 1', 1), ('318: 1', 1), ('394: 1', 1), ('195: 1', 1), ('440: 1', 1), ('416: 1', 1), ('418: 1', 1), ('361: 1', 1), ('496: 1', 1), ('435: 1', 1), ('118: 1', 1), ('392: 1', 1), ('197: 1', 1), ('85: 1', 1), ('309: 1', 1), ('263: 1', 1), ('352: 1', 1), ('14: 1', 1), ('252: 1', 1), ('69: 1', 1), ('331: 1', 1), ('457: 1', 1), ('228: 1', 1), ('465: 1', 1), ('29: 1', 1), ('13: 1', 1), ('91: 1', 1), ('221: 1', 1), ('382: 1', 1), ('259: 1', 1), ('62: 1', 1), ('215: 1', 1), ('241: 1', 1), ('359: 1', 1), ('355: 1', 1), ('286: 1', 1), ('73: 1', 1), ('296: 1', 1), ('372: 1', 1), ('70: 1', 1), ('35: 1', 1), ('328: 1', 1), ('312: 1', 1), ('314: 1', 1), ('406: 1', 1), ('367: 1', 1), ('51: 1', 1), ('275: 1', 1), ('141: 1', 1), ('276: 1', 1), ('302: 1', 1), ('186: 1', 1), ('104: 1', 1), ('261: 1', 1), ('160: 1', 1), ('116: 1', 1), ('294: 1', 1), ('333: 1', 1), ('167: 1', 1), ('293: 1', 1), ('184: 1', 1), ('152: 1', 1), ('124: 1', 1), ('16: 1', 1), ('250: 1', 1), ('169: 1', 1), ('19: 1', 1), ('231: 1', 1), ('476: 1', 1), ('343: 1', 1), ('490: 1', 1), ('147: 1', 1), ('98: 1', 1), ('22: 1', 1), ('58: 1', 1), ('138: 1', 1), ('219: 1', 1), ('474: 1', 1), ('323: 1', 1), ('280: 1', 1), ('196: 1', 1), ('379: 1', 1), ('417: 1', 1), ('499: 1', 1), ('421: 1', 1), ('135: 1', 1), ('129: 1', 1), ('84: 1', 1), ('388: 1', 1), ('449: 1', 1), ('447: 1', 1), ('368: 1', 1), ('430: 1', 1), ('466: 1', 1), ('432: 1', 1), ('172: 1', 1), ('31: 1', 1), ('47: 1', 1), ('220: 1', 1), ('245: 1', 1), ('383: 1', 1), ('423: 1', 1), ('158: 1', 1), ('234: 1', 1), ('424: 1', 1), ('397: 1', 1), ('71: 1', 1), ('358: 1', 1), ('354: 1', 1), ('482: 1', 1), ('77: 1', 1), ('143: 1', 1), ('204: 1', 1), ('78: 1', 1), ('277: 1', 1), ('377: 1', 1), ('303: 1', 1), ('375: 1', 1), ('55: 1', 1), ('271: 1', 1), ('198: 1', 1), ('82: 1', 1), ('185: 1', 1), ('256: 1', 1), ('105: 1', 1), ('120: 1', 1), ('260: 1', 1), ('445: 1', 1), ('334: 1', 1), ('177: 1', 1), ('251: 1', 1), ('349: 1', 1), ('332: 1', 1), ('428: 1', 1), ('64: 1', 1), ('212: 1', 1), ('95: 1', 1), ('329: 1', 1), ('67: 1', 1), ('468: 1', 1), ('475: 1', 1), ('162: 1', 1), ('226: 1', 1), ('247: 1', 1), ('455: 1', 1), ('487: 1', 1), ('203: 1', 1), ('24: 1', 1), ('5: 1', 1), ('136: 1', 1), ('201: 1', 1), ('134: 1', 1), ('42: 1', 1), ('407: 1', 1), ('236: 1', 1), ('295: 1', 1), ('132: 1', 1), ('27: 1', 1), ('410: 1', 1), ('369: 1', 1), ('492: 1', 1), ('119: 1', 1), ('210: 1', 1), ('191: 1', 1), ('412: 1', 1), ('345: 1', 1), ('448: 1', 1), ('431: 1', 1), ('269: 1', 1), ('110: 1', 1), ('155: 1', 1), ('248: 1', 1), ('113: 1', 1), ('103: 1', 1), ('319: 1', 1), ('122: 1', 1), ('316: 1', 1), ('459: 1', 1), ('393: 1', 1), ('402: 1', 1), ('17: 1', 1), ('139: 1', 1), ('90: 1', 1), ('461: 1', 1), ('322: 1', 1), ('175: 1', 1), ('50: 1', 1), ('353: 1', 1), ('4: 1', 1), ('79: 1', 1), ('188: 1', 1), ('351: 1', 1), ('409: 1', 1), ('405: 1', 1), ('376: 1', 1), ('485: 1', 1), ('339: 1', 1), ('366: 1', 1), ('422: 1', 1), ('7: 1', 1), ('32: 1', 1), ('478: 1', 1), ('327: 1', 1), ('193: 1', 1), ('442: 1', 1), ('304: 1', 1), ('444: 1', 1), ('308: 1', 1), ('66: 1', 1), ('437: 1', 1), ('450: 1', 1), ('385: 1', 1), ('211: 1', 1), ('38: 1', 1), ('60: 1', 1), ('386: 1', 1), ('452: 1', 1), ('254: 1', 1), ('89: 1', 1), ('391: 1', 1), ('230: 1', 1), ('163: 1', 1), ('246: 1', 1), ('364: 1', 1), ('463: 1', 1), ('237: 1', 1), ('497: 1', 1), ('363: 1', 1), ('2: 1', 1), ('183: 1', 1), ('202: 1', 1), ('274: 1', 1), ('320: 1', 1), ('11: 1', 1), ('49: 1', 1), ('92: 1', 1), ('239: 1', 1), ('46: 1', 1), ('52: 1', 1), ('451: 1', 1), ('281: 1', 1), ('25: 1', 1), ('413: 1', 1), ('493: 1', 1), ('217: 1', 1), ('128: 1', 1), ('268: 1', 1), ('473: 1', 1), ('168: 1', 1), ('112: 1', 1), ('290: 1', 1), ('370: 1', 1), ('313: 1', 1), ('292: 1', 1), ('18: 1', 1), ('429: 1', 1), ('266: 1', 1), ('156: 1', 1), ('86: 1', 1), ('265: 1', 1), ('114: 1', 1), ('311: 1', 1), ('137: 1', 1), ('279: 1', 1), ('173: 1', 1), ('123: 1', 1), ('283: 1', 1), ('306: 1', 1), ('174: 1', 1), ('284: 1', 1), ('438: 1', 1), ('20: 1', 1), ('498: 1', 1), ('37: 1', 1), ('222: 1', 1), ('179: 1', 1), ('350: 1', 1), ('165: 1', 1), ('108: 1', 1), ('330: 1', 1), ('483: 1', 1), ('150: 1', 1), ('208: 1', 1), ('484: 1', 1), ('144: 1', 1), ('146: 1', 1), ('408: 1', 1), ('344: 1', 1), ('404: 1', 1), ('489: 1', 1), ('495: 1', 1), ('76: 1', 1), ('194: 1', 1), ('83: 1', 1), ('192: 1', 1), ('288: 1', 1), ('387: 1', 1), ('427: 1', 1), ('453: 1', 1), ('425: 1', 1), ('205: 1', 1), ('189: 1', 1), ('107: 1', 1), ('381: 1', 1), ('39: 1', 1), ('255: 1', 1), ('94: 1', 1), ('270: 1', 1), ('390: 1', 1), ('88: 1', 1), ('341: 1', 1), ('464: 1', 1), ('227: 1', 1), ('93: 1', 1), ('419: 1', 1), ('462: 1', 1), ('56: 1', 1), ('159: 1', 1), ('321: 1', 1), ('479: 1', 1), ('53: 1', 1), ('232: 1', 1), ('148: 1', 1), ('264: 1', 1), ('65: 1', 1), ('310: 1', 1), ('346: 1', 1), ('348: 1', 1), ('291: 1', 1), ('365: 1', 1), ('262: 1', 1), ('439: 1', 1), ('74: 1', 1), ('249: 1', 1), ('133: 1', 1), ('301: 1', 1), ('102: 1', 1), ('307: 1', 1), ('415: 1', 1), ('178: 1', 1), ('335: 1', 1), ('182: 1', 1), ('81: 1', 1), ('399: 1', 1), ('30: 1', 1), ('33: 1', 1), ('41: 1', 1), ('243: 1', 1), ('157: 1', 1), ('337: 1', 1), ('10: 1', 1), ('68: 1', 1), ('171: 1', 1), ('253: 1', 1), ('240: 1', 1), ('272: 1', 1), ('446: 1', 1), ('109: 1', 1), ('395: 1', 1), ('48: 1', 1), ('145: 1', 1), ('111: 1', 1), ('488: 1', 1), ('1: 1', 1), ('472: 1', 1), ('225: 1', 1), ('199: 1', 1), ('282: 1', 1), ('467: 1', 1)]
  Unexpected: []

-------------------- >> begin captured stdout << ---------------------
Found: https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_05_19_45-7228284334284390878?project=apache-beam-testing.

--------------------- >> end captured stdout << ----------------------

======================================================================
FAIL: test_streaming_data_only (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", line 171, in test_streaming_data_only
    self._test_streaming(with_attributes=False)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", line 167, in _test_streaming
    timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py", line 91, in run_pipeline
    result = p.run()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 416, in run
    return self.runner.run_pipeline(self)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 65, in run_pipeline
    hc_assert_that(self.result, pickler.loads(on_success_matcher))
AssertionError: 
Expected: (Test pipeline expected terminated in state: RUNNING and Expected 2 messages.)
     but: Expected 2 messages. Got 0 messages. Diffs (item, count):
  Expected but not in actual: [('data002-seen', 1), ('data001-seen', 1)]
  Unexpected: []

-------------------- >> begin captured stdout << ---------------------
Found: https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_05_28_21-2642470300035457157?project=apache-beam-testing.

--------------------- >> end captured stdout << ----------------------

======================================================================
FAIL: test_streaming_with_attributes (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", line 175, in test_streaming_with_attributes
    self._test_streaming(with_attributes=True)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", line 167, in _test_streaming
    timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py", line 91, in run_pipeline
    result = p.run()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 416, in run
    return self.runner.run_pipeline(self)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 65, in run_pipeline
    hc_assert_that(self.result, pickler.loads(on_success_matcher))
AssertionError: 
Expected: (Test pipeline expected terminated in state: RUNNING and Expected 2 messages.)
     but: Expected 2 messages. Got 0 messages. Diffs (item, count):
  Expected but not in actual: [(PubsubMessage(data001-seen, {'processed': 'IT'}), 1), (PubsubMessage(data002-seen, {'timestamp_out': '2018-07-11T02:02:50.149000Z', 'processed': 'IT'}), 1)]
  Unexpected: []
  Stripped attributes: ['id', 'timestamp']

-------------------- >> begin captured stdout << ---------------------
Found: https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_05_36_45-10021422568373532354?project=apache-beam-testing.

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
XML: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/nosetests.xml
----------------------------------------------------------------------
Ran 17 tests in 3470.875s

FAILED (errors=7, failures=3)
@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2018

Run Python PostCommit

1 similar comment
@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2018

Run Python PostCommit

@mxm mxm force-pushed the mxm:BEAM-5442 branch from e22a416 to 709673d Oct 10, 2018

[BEAM-5442] Store duplicate unknown options in a list argument
As of BEAM-5442, we parse unknown pipeline options to pass them on to the actual
Runner. If the same unknown option appeared multiple times, it would be
registered a second time and throw an exception.

This change converts duplicate items to a value list.

@mxm mxm force-pushed the mxm:BEAM-5442 branch from 709673d to f274757 Oct 10, 2018

@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 10, 2018

Run Python PostCommit

@tweise tweise merged commit 3f803f1 into apache:master Oct 10, 2018

2 checks passed

Python ("Run Python PreCommit") SUCCESS
Details
Python SDK PostCommit Tests SUCCESS
Details

@mxm mxm deleted the mxm:BEAM-5442 branch Oct 10, 2018

@charlesccychen

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

Is there an alternative we can do to this approach of inferring a list from duplicates / automatically populating these options into the parser? It makes certain things brittle--for example, every time we try to access such an option, we now need to branch on whether a possibly-multiply-passed option is passed once or more than once.

Can we do this parsing at the runner side instead, where instead of inferring this behavior, we pass a list of unparsed options? In this way, the runner side could parse these as they wish, and we don't need this "magic" behavior.

CC: @robertwb @aaltay

@robertwb

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

Personally, I think the options should be parsed at most once, by the initial SDK, and from then on be passed around in a more structured format that doesn't involve re-parsing. Options and flag parsing are currently way to coupled in both Java and Python.

@robertwb

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

(I suppose this doesn't handle the "SDK doesn't know the option, but the runner does, and the SDK wants to take this option as a command line flag" case. But this allows for no validation. Perhaps we should have a single "runner options" flag that has nested structure to handle this.

@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2018

@charlesccychen Can you elaborate on what is problematic about this approach? The parsing is cheap and not different for the other options. We only parse options which are not recognized by the SDK to pass them on to the Runner. We don't change any builtin options. Handling list options became a necessity. If that is somehow problematic, we could also ignore list options and only allow single value options.

My first approach was to have a "Runner" option which stores options unrecognized by the SDK, but IMHO it is cleaner to pass them as top-level options because that is what they are. If we had a separate flag, the Runner would merge them all together again. Perhaps we could have an "SDK options" flag.

I'm open to suggestions to improve on the current approach.

@charlesccychen

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

Thanks. My concern wasn't about the runtime cost. It introduces an inconsistency where single and multiply passed options are treated differently (and this requires special casing when using the value) and it also promotes the use of the "magical" behavior as opposed to explicit definition of pipeline options. We should not have users depend on this, since it would discourage explicit definition of pipeline options in the user pipeline. I would therefore suggest passing "unused options" which can be parsed by the runner using a (potentially runner-specific) explicitly-defined parser. What do you think?

@charlesccychen

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

To clarify again, I believe that with the current approach, a user no longer needs to explicitly define a pipeline option to use it--they can just pass it (as --myparam abc) and it will be "magically" available for use (as options.myparam). This is not good for backwards compatibility, since the user should not rely on this implementation detail, and it will become problematic if we decide to change this after the user starts using it. I would therefore prefer to isolate these options so that they are at least not user-visible.

@charlesccychen

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

CC: @tweise

@aaltay

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

Should we revert this change (and the 2 related changes before) for the release branch? I think we should address @charlesccychen's concerns before we release with these changes. (Perhaps a mailing discussion would help.)

@mxm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2018

@charlesccychen Fair points. Let's fix this more programmatically then. Builtin SDK options and user-defined options should be the only top-level options. "Unknown" options should only be available to the Runners via a separate option list which is transmitted through the Proto alongside the regular options.

@aaltay +1 Would make sense to revert this on the release branch.

charlesccychen added a commit to charlesccychen/beam that referenced this pull request Oct 12, 2018

Revert "Merge pull request apache#6600: [BEAM-5442] Store duplicate u…
…nknown options in a list argument"

This reverts commit 3f803f1, reversing
changes made to 1dca49f.

@charlesccychen charlesccychen referenced this pull request Oct 12, 2018

Merged

Revert PRs #6557 #6589 #6600 #6675

0 of 2 tasks complete

charlesccychen added a commit that referenced this pull request Oct 12, 2018

charlesccychen added a commit that referenced this pull request Oct 13, 2018

tweise added a commit that referenced this pull request Oct 13, 2018

chamikaramj added a commit to chamikaramj/beam that referenced this pull request Nov 21, 2018

chamikaramj added a commit to chamikaramj/beam that referenced this pull request Nov 22, 2018

chamikaramj added a commit to chamikaramj/beam that referenced this pull request Nov 22, 2018

charlesccychen added a commit that referenced this pull request Nov 22, 2018

Merge pull request #7119 from chamikaramj/revert_pr_6683_branch_2
[BEAM-6085] Revert "[BEAM-5442] Revert #6675 "Revert PRs #6557 #6589 #6600""

mxm added a commit to mxm/beam that referenced this pull request Jan 18, 2019

mxm added a commit to mxm/beam that referenced this pull request Jan 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.