Description
What would you like to happen?
As far as I can tell, there's no way to access the Pub/Sub message publish time / publish_time in ReadFromPubSub for the purposes of passing the time down to output messages/records.
I'd see this as a new config field named publish_time_attribute
, used as follows so that my_publish_time
is a new field that exists on the record.
type: ReadFromPubSub
config:
topic: "topic"
subscription: "subscription"
format: "format"
schema: schema
attributes:
- "attribute"
- "attribute"
- ...
attributes_map: "attributes_map"
id_attribute: "id_attribute"
publish_time_field: "my_publish_time"
timestamp_attribute: "timestamp_attribute"
error_handling:
output: "output"
With docs:
publish_time_field string (Optional) : Field to add on the output message with the message publish time. If None, no such field is added.
The use case is to store the publish time in the final record for diagnostics, e.g., compare the time a record was written into BigQuery vs the time it was originally published to Pub/Sub.
A possibly workaround is to have the message publisher add an attribute which is the current/publish time at the time of publish to Pub/Sub.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner