-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Description
What needs to happen?
Using SpannerIO.ReadAll in a streaming pipeline has negative effects and is probably not what the customer wants to do:
- Streaming pipelines run effectively forever (unless manually stopped)
- SpannerIO.ReadAll creates a session and ReadOnlyTransaction on pipeline startup, and uses it for the rest of the pipeline duration.
- This will mean that all data read from spanner will be 'stale' and from the timestamp when the pipeline was first started.
- and if no reads occur for more than an hour, the session and transaction will be auto-closed from the spanner server side, causing the pipeline to fail.
Adding warnings to and documentation to SpannerIO.ReadAll about using it in a streaming pipeline and the negative side-effect
Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner