Skip to content

Python API: PTransform should be immutable and reusable #19675

@damccorm

Description

@damccorm

While the Java API seems fine the Python API is (at least) counterintuitive.

Let's see the following example:


p1 = beam.Pipeline()
p2 = beam.Pipeline()
node = 'ReadTrainData' >> beam.io.ReadFromText("/tmp/aaa.txt")
p1
| node 
p2 | node //fails here 

The code above will fail because the node somehow remembers that it was already attached to p1. In fact, unlike in Java, the | (apply) method is defined on the PTransform.

If any, only the pipeline object should be mutable here.

Imported from Jira BEAM-8140. Original Jira may contain additional context.
Reported by: chris_suchanek.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions