Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][Python Bigtable Cross-language]: Connector mishandles records that don't explicitly set a timestamp #28632

Closed
2 of 15 tasks
ahmedabu98 opened this issue Sep 23, 2023 · 0 comments · Fixed by #28624
Closed
2 of 15 tasks

Comments

@ahmedabu98
Copy link
Contributor

ahmedabu98 commented Sep 23, 2023

What happened?

When users don't explicitly set a timestamp on their records, the Python BT client defaults the timestamp to -1, which Bigtable handles by attaching system time at ingestion. The connector mishandles these rows by not sending over the -1 timestamp and instead dropping it here. When the records get to the underlying Java IO, it doesn't see any explicit timestamp set. Unlike the Python client, the Java BT client defaults timestamps to 0, which Bigtable handles by attaching epoch time.

The result is instead of attaching the current timestamp to cells, we attach epoch time for each of them.

This can affect users in two ways:

  1. Users can set a garbage collection policy that cleans up old records in their table. These records with unset timestamps will show up as really old (1970-1-1) and will be garbage collected
  2. Bigtable keeps the history of a cell in a table. When users write to a cell multiple times, this bug will cause the cell history to be overwritten because the same timestamp (epoch time) is used each time.

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant