Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test]: beam_PreCommit_Python_Cron failing TestWriteBigTable.test_write_metrics #26673

Closed
2 of 15 tasks
Abacn opened this issue May 11, 2023 · 19 comments · Fixed by #27085
Closed
2 of 15 tasks

[Failing Test]: beam_PreCommit_Python_Cron failing TestWriteBigTable.test_write_metrics #26673

Abacn opened this issue May 11, 2023 · 19 comments · Fixed by #27085
Assignees
Labels
dependencies Pull requests that update a dependency file failing test io P2 python tests

Comments

@Abacn
Copy link
Contributor

Abacn commented May 11, 2023

What happened?

Permared since May 10.

Error message: https://ci-beam.apache.org/view/PostCommit/job/beam_PreCommit_Python_Cron/lastCompletedBuild/testReport/apache_beam.io.gcp.bigtableio_test/TestWriteBigTable/test_write_metrics/

apache_beam/io/gcp/bigtableio_test.py:84: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apache_beam/io/gcp/bigtableio_test.py:138: in verify_write_call_metric
    self.assertTrue(
E   AssertionError: False is not true : Did not find write call metric with status: already_exists

No code change involved. Also, checkout a snapshot prior to May 10 and run this test locally:

pytest -v apache_beam/io/gcp/bigtableio_test.py::TestWriteBigTable::test_write_metric

it fails for the same reason.

Likely due to Bigtable service change

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 2 (backlog / disabled test but we think the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Abacn
Copy link
Contributor Author

Abacn commented May 11, 2023

Change to P1 because this is a precommit permared

@Abacn
Copy link
Contributor Author

Abacn commented May 11, 2023

Found the cause: google-cloud-bigtable==2.18.0 breaks the test:

To reproduce: in sdks/python run command:

pip install .[gcp]
pytest -v apache_beam/io/gcp/bigtableio_test.py::TestWriteBigTable::test_write_metrics
apache_beam/io/gcp/bigtableio_test.py::TestWriteBigTable::test_write_metrics FAILED               [100%]
pip install google-cloud-bigtable==2.17.0
pytest -v apache_beam/io/gcp/bigtableio_test.py::TestWriteBigTable::test_write_metrics
apache_beam/io/gcp/bigtableio_test.py::TestWriteBigTable::test_write_metrics PASSED               [100%]

@Abacn
Copy link
Contributor Author

Abacn commented May 11, 2023

With google-cloud-bigtable=2.18.0, query runner api monitoring infos returns an empty dict here:

https://github.com/apache/beam/blob/2209da6c084c224462bea947c428942c2f6efcce/sdks/python/apache_beam/io/gcp/bigtableio_test.py#LL116C54-L116C54

WIth google-cloud-bigtable=2.17.0, MetricsEnvironment.process_wide_container().to_runner_api_monitoring_infos(None) returns a dict like this:

{frozenset({('SERVICE', 'BigTable'), 'beam:metric:io:api_request_count:v1', ('INSTANCE_ID', 'python-test-6aa8fa26'), ('STATUS', 'already_exists'), ('TABLE_ID', 'python-test-d77bc18a'), ('METHOD', 'google.bigtable.v2.MutateRows'), ('RESOURCE', '//bigtable.googleapis.com/projects/python-test-8b2a433b/instances/python-test-6aa8fa26/tables/python-test-d77bc18a'), ('BIGTABLE_PROJECT_ID', 'python-test-8b2a433b')}): urn: "beam:metric:io:api_request_count:v1"
type: "beam:metrics:sum_int64:v1"
payload: "\002"
labels {
  key: "BIGTABLE_PROJECT_ID"
  value: "python-test-8b2a433b"
}
labels {
  key: "INSTANCE_ID"
  value: "python-test-6aa8fa26"
}
labels {
  key: "METHOD"
  value: "google.bigtable.v2.MutateRows"
}
labels {
  key: "RESOURCE"
  value: "//bigtable.googleapis.com/projects/python-test-8b2a433b/instances/python-test-6aa8fa26/tables/python-test-d77bc18a"
}
labels {
  key: "SERVICE"
  value: "BigTable"
}
labels {
  key: "STATUS"
  value: "already_exists"
}
labels {
  key: "TABLE_ID"
  value: "python-test-d77bc18a"
}
start_time {
  seconds: 1683832726
  nanos: 768536000
}
, frozenset({('SERVICE', 'BigTable'), ('STATUS', 'ok'), 'beam:metric:io:api_request_count:v1', ('INSTANCE_ID', 'python-test-6aa8fa26'), ('TABLE_ID', 'python-test-d77bc18a'), ('METHOD', 'google.bigtable.v2.MutateRows'), ('RESOURCE', '//bigtable.googleapis.com/projects/python-test-8b2a433b/instances/python-test-6aa8fa26/tables/python-test-d77bc18a'), ('BIGTABLE_PROJECT_ID', 'python-test-8b2a433b')}): urn: "beam:metric:io:api_request_count:v1"
type: "beam:metrics:sum_int64:v1"
payload: "\002"
labels {
  key: "BIGTABLE_PROJECT_ID"
  value: "python-test-8b2a433b"
}
labels {
  key: "INSTANCE_ID"
  value: "python-test-6aa8fa26"
}
labels {
  key: "METHOD"
  value: "google.bigtable.v2.MutateRows"
}
labels {
  key: "RESOURCE"
  value: "//bigtable.googleapis.com/projects/python-test-8b2a433b/instances/python-test-6aa8fa26/tables/python-test-d77bc18a"
}
labels {
  key: "SERVICE"
  value: "BigTable"
}
labels {
  key: "STATUS"
  value: "ok"
}
labels {
  key: "TABLE_ID"
  value: "python-test-d77bc18a"
}
start_time {
  seconds: 1683832726
  nanos: 768579000
}
}

@Abacn
Copy link
Contributor Author

Abacn commented May 12, 2023

mitigated by excluding the certain version. Remain open to investigate the root cause (either a client bug or intended client change that can be solved by changing the test).

@tvalentyn
Copy link
Contributor

looks like this error reappeared this morning.

@tvalentyn
Copy link
Contributor

tvalentyn commented May 16, 2023

Caused by google-cloud-bigquery==2.18.1

@tvalentyn
Copy link
Contributor

fwiw we should be able to git bisect this to a particular google-cloud-bigquery commit assuming we can install that dep from sources

@tvalentyn tvalentyn added P1 and removed P2 labels May 16, 2023
@Abacn
Copy link
Contributor Author

Abacn commented May 16, 2023

For now we should pin google-cloud-bigtable<2.18.0 for now. I just excluded it as google-cloud-bigtable!=2.18.0. I suggested to investigate the cause googleapis/python-bigtable#775 (comment) but appearently new release still has the same issue

@Abacn
Copy link
Contributor Author

Abacn commented May 16, 2023

@tvalentyn the cause is known due to the commit that got reverted. Reverting that PR the beam test passes. The new tag 2.18.1 is identical to 2.18.0

It contains the two commit:

  • Revert "fix: Revert "Feat: Threaded MutationsBatcher" (#773)" (#775)
  • fix: Revert "Feat: Threaded MutationsBatcher" (#773)

Possible some miscommunication

@Abacn
Copy link
Contributor Author

Abacn commented May 16, 2023

opened #26717

@Abacn Abacn added P2 dependencies Pull requests that update a dependency file and removed P1 labels May 16, 2023
@tvalentyn
Copy link
Contributor

thanks a lot, do we need to open an issue in https://github.com/googleapis/python-bigtable/issues?

@Abacn
Copy link
Contributor Author

Abacn commented May 18, 2023

Yeah, I planned to do. Also notified bigtable maintainers. I wished to extract a minimum working example showing what previously worked now break, from the unit test, just haven't get into it

@Abacn
Copy link
Contributor Author

Abacn commented May 23, 2023

The cause is the member MutationsBatcher.rows changed its name to MutationsBatcher._rows in googleapis/python-bigtable#722 , so

if len(self.rows) != 0:

is always false.

In detail, Beam extends the client MutationsBatcher as a _MutationsBatcher, which overwrites its member rows. In newer version, this member is renamed as private (_rows) so iiuc current design would no longer work. Python BigtableIO is broken in google-cloud-bigtable>=2.18.0

@tvalentyn
Copy link
Contributor

given that we pre install this IO dep in containers, would our users be affected?

@tvalentyn
Copy link
Contributor

tvalentyn commented May 23, 2023

in other words, does this matter also at job submission or only at runtime?

@Abacn
Copy link
Contributor Author

Abacn commented May 23, 2023

Users using our published containers won't get affected (e.g. typical Dataflow use case) as it pre-installed google-cloud-bigtable < 2.18

Users build custom container after May 10 will be affected (google-cloud-bigtable 2.18 added to the container)

Should only affects run time. _MutationsBatcher is constructed in start_bundle (pipeline run time) so submission time dependency does not affect it

@mutianf
Copy link
Contributor

mutianf commented May 25, 2023

Thanks @Abacn for looking into this. We're not sure why bigtableio.py overrides MutationsBatcher. I wonder if using the batcher from python client here:

self.batcher = _MutationsBatcher(self.table)
will fix the issue. We'll look into it and work on a fix.

@Abacn
Copy link
Contributor Author

Abacn commented May 25, 2023

Yeah, deriving from client MutationsBatcher sounds not a good idea to me either, especially in Python where class is too flexible

@mutianf
Copy link
Contributor

mutianf commented May 25, 2023

I think the _MutationBatcher wrapper was created because of 2 reasons:

  • in batcher client implementation < 2.18.0, if some mutations failed and the RPC succeeded, batcher will not return any errors. _MutationBatcher wrapper addresses this issue so mutations don't silently fail.
  • add a callback for metrics.

The first issue is fixed in the 2.18.0 client. So we shouldn't need to wrap the MutationBatcher anymore. We can get the exception count from https://github.com/googleapis/python-bigtable/blob/main/google/cloud/bigtable/batcher.py#L382 and update the metrics in finishBundle.

I'll work on a fix soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file failing test io P2 python tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants