Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions website/src/get-started/quickstart-py.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,17 +176,20 @@ This runner is not yet available for the Python SDK.

{:.runner-flink-local}
```
This runner is not yet available for the Python SDK.
Currently, running wordcount.py on Flink requires a full download of the Beam source code.
See https://beam.apache.org/roadmap/portability/#python-on-flink for more information.
```

{:.runner-flink-cluster}
```
This runner is not yet available for the Python SDK.
Currently, running wordcount.py on Flink requires a full download of the Beam source code.
See https://beam.apache.org/documentation/runners/flink/ for more information.
```

{:.runner-spark}
```
This runner is not yet available for the Python SDK.
Currently, running wordcount.py on Spark requires a full download of the Beam source code.
See https://beam.apache.org/roadmap/portability/#python-on-spark for more information.
```

{:.runner-dataflow}
Expand Down
9 changes: 6 additions & 3 deletions website/src/get-started/wordcount-example.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,17 +407,20 @@ This runner is not yet available for the Python SDK.

{:.runner-flink-local}
```
This runner is not yet available for the Python SDK.
Currently, running wordcount.py on Flink requires a full download of the Beam source code.
See https://beam.apache.org/roadmap/portability/#python-on-flink for more information.
```

{:.runner-flink-cluster}
```
This runner is not yet available for the Python SDK.
Currently, running wordcount.py on Flink requires a full download of the Beam source code.
See https://beam.apache.org/documentation/runners/flink/ for more information.
```

{:.runner-spark}
```
This runner is not yet available for the Python SDK.
Currently, running wordcount.py on Spark requires a full download of the Beam source code.
See https://beam.apache.org/roadmap/portability/#python-on-spark for more information.
```

{:.runner-dataflow}
Expand Down
29 changes: 19 additions & 10 deletions website/src/roadmap/portability.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,25 +144,34 @@ their respective components.

MVP, and FeatureCompletness nearly done (missing SDF, timers) for
SDKs, Python ULR, and shared java runners library.
Flink is the first runner to fully leverage this, with focus moving to
Performance.
Currently, the Flink and Spark runners support portable pipeline execution.
See the
[Portability support table](https://s.apache.org/apache-beam-portability-support-table)
for details.

### Running Python wordcount on Flink or Spark {#python-on-flink}
### Running Python wordcount on Flink {#python-on-flink}

Currently, the Flink and Spark runners support portable pipeline execution.
To run a basic Python wordcount (in batch mode) with embedded Flink or Spark:
To run a basic Python wordcount (in batch mode) with embedded Flink:

1. Run once to build the SDK harness container: `./gradlew :sdks:python:container:docker`
2. Choose one:
* Start the Flink portable JobService endpoint: `./gradlew :runners:flink:1.5:job-server:runShadow`
* Or start the Spark portable JobService endpoint: `./gradlew :runners:spark:job-server:runShadow`
3. Submit the wordcount pipeline to above endpoint: `./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 -PenvironmentType=LOOPBACK`
2. Start the Flink portable JobService endpoint: `./gradlew :runners:flink:1.5:job-server:runShadow`
3. In a new terminal, submit the wordcount pipeline to above endpoint: `./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 -PenvironmentType=LOOPBACK`

To run the pipeline in streaming mode (currently only supported on Flink): `./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 -Pstreaming`
To run the pipeline in streaming mode: `./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 -Pstreaming`

Please see the [Flink Runner page]({{ site.baseurl }}/documentation/runners/flink/) for more information on
how to run portable pipelines on top of Flink.

### Running Python wordcount on Spark {#python-on-spark}

To run a basic Python wordcount (in batch mode) with embedded Spark:

1. Run once to build the SDK harness container: `./gradlew :sdks:python:container:docker`
2. Start the Spark portable JobService endpoint: `./gradlew :runners:spark:job-server:runShadow`
3. In a new terminal, submit the wordcount pipeline to above endpoint: `./gradlew :sdks:python:portableWordCount -PjobEndpoint=localhost:8099 -PenvironmentType=LOOPBACK`

Python streaming mode is not yet supported on Spark.

Please see the [Spark Runner page]({{ site.baseurl }}/documentation/runners/spark/) for more information on
how to run portable pipelines on top of Spark.