Skip to content

[BEAM-2360] Add Beam Runner Guide#251

Merged
asfgit merged 2 commits intoapache:asf-sitefrom
kennknowles:runner-guide
Jun 9, 2017
Merged

[BEAM-2360] Add Beam Runner Guide#251
asfgit merged 2 commits intoapache:asf-sitefrom
kennknowles:runner-guide

Conversation

@kennknowles
Copy link
Member

I did a quick port of the doc at https://s.apache.org/beam-runner-guide.

I wanted to open this early, because there are technical issues to address:

  • The language toggles are inappropriate, as I'd like to include Java and Python on the same page, so a runner can read about them both. Similarly, it is not really useful to have a bit label as to which language is being shown.
  • The toggles actually totally break the page if I try to use syntax highlighting on the protobufs (I've left it in the state it should probably end up in to demonstrate).

Beyond the draft doc, I've added a ton of links to API docs and code. There may still be comments on the GDoc and I'll incorporate them here.

@asfbot
Copy link

asfbot commented May 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/500/

Jenkins built the site at commit id b0095ec with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfbot
Copy link

asfbot commented May 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/501/

Jenkins built the site at commit id 4174838 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfbot
Copy link

asfbot commented May 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/503/

Jenkins built the site at commit id 3a9b3d9 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@kennknowles
Copy link
Member Author

Solved the code toggle issue and revised a little.

@asfbot
Copy link

asfbot commented May 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/504/

Jenkins built the site at commit id a516294 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfbot
Copy link

asfbot commented May 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/505/

Jenkins built the site at commit id d12dcb6 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@kennknowles kennknowles force-pushed the runner-guide branch 2 times, most recently from 40f48d1 to df61b70 Compare May 25, 2017 04:12
@asfbot
Copy link

asfbot commented May 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/506/

Jenkins built the site at commit id 40f48d1 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfbot
Copy link

asfbot commented May 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/507/

Jenkins built the site at commit id df61b70 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfbot
Copy link

asfbot commented May 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/508/

Jenkins built the site at commit id 6692cd1 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@kennknowles
Copy link
Member Author

R: @melap if/when you have time, I wonder if you might make a pass over this.

@asfgit
Copy link

asfgit commented Jun 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/520/

Jenkins built the site at commit id cc3143b with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

Copy link

@melap melap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a high level look and left a few comments... I'll do a full edit pass after merge and open a PR with the changes.

permalink: /contribute/runner-guide/
---

# Runner Guide
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if calling it Runner Implementation Guide or Runner Authoring Guide (or something similar) would be good just for this page title? (I know it wouldn't fit well in the top bar menus) it took me a minute to determine what this was (runner authoring as opposed to how to use existing runners)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

You have correctly surmised that I was trying to make it fit well. But actually when rendered it seems to fit fine with the longer, clearer, title.


# Runner Guide

This document is aimed at someone who has a data processing system and wants
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on similar note, what about initial sentence at the start here...
This guide walks through how to implement a new runner. It is aimed at someone who has a data processing system and wants to use it to execute a Beam pipeline. The guide starts ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Took your words exactly.

### Pipeline

A pipeline in Beam is a graph of PTransforms operating on PCollections. In the
picture, the boxes are PTransforms and the arrows represent the contents of the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what picture is this referring to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to paste in our very intro picture, but forgot. I don't think it adds much. Elsewhere pictures might be useful, but I don't imagine that a runner author needs to be reminded that it is a graph of transforms.


Every element in a PCollection has a timestamp associated with it.

Sources of data read elements from the outside world with timestamps provided.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled with this first sentence. is this still accurate?
I/O transforms read data elements with timestamps from the outside world.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your phrasing is appropriate for pipeline authors, I think. For a runner author, what they need to know is that the Read.from(Source) primitive is responsible for figuring out the timestamps, and the runner's job is to propagate it. This section exists just to cement the idea that there is no such thing as data without a timestamp.

I've rephrase this now - what do you think?

#### The DoFn Lifecycle

While each language's SDK is free to make different decisions, the Python and
Java SDK's share an API with the following stages of a DoFn's lifecycle.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDKs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

graph of `PTransforms`.

The entry point for this in Java is
[`Pipeline.traverseTopologically`](https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/Pipeline.html#traverseTopologically-org.apache.beam.sdk.Pipeline.PipelineVisitor-)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comment for all the javadoc/pydoc links -- should this use the latest version variable instead of 2.0.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable...

I just tried it, but it seems that we redirect https://beam.apache.org/documentation/sdks/javadoc/ to https://beam.apache.org/documentation/sdks/javadoc/2.0.0 at the root but it doesn't work for deeper links.

I also tried latest which is pretty common, but that is also a 404.

So if we set something like that up, I would definitely use it.


### Traversing a pipeline

Something you will certain do is to traverse a pipeline, probably to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certain -> likely

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

### The Runner API

The Runner API is an SDK-independent schema for a pipeline along with RPC
interfaces (TBD) for launching a pipeline and checking the status of a job.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this TBD?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "RPC" bit has not been implemented. I rephrased a little bit. I could also just drop that part or put it elsewhere.

API](https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/beam_runner_api.proto)
refers to a specific manifestation of the concepts in the Beam model, as a
protocol buffers schema. Even though you should not manipulate these messages
direclty, it can be helpful to know the canonical data that makes up a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo - directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @melap!

permalink: /contribute/runner-guide/
---

# Runner Guide
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

You have correctly surmised that I was trying to make it fit well. But actually when rendered it seems to fit fine with the longer, clearer, title.


# Runner Guide

This document is aimed at someone who has a data processing system and wants
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Took your words exactly.

### Pipeline

A pipeline in Beam is a graph of PTransforms operating on PCollections. In the
picture, the boxes are PTransforms and the arrows represent the contents of the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to paste in our very intro picture, but forgot. I don't think it adds much. Elsewhere pictures might be useful, but I don't imagine that a runner author needs to be reminded that it is a graph of transforms.


Every element in a PCollection has a timestamp associated with it.

Sources of data read elements from the outside world with timestamps provided.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your phrasing is appropriate for pipeline authors, I think. For a runner author, what they need to know is that the Read.from(Source) primitive is responsible for figuring out the timestamps, and the runner's job is to propagate it. This section exists just to cement the idea that there is no such thing as data without a timestamp.

I've rephrase this now - what do you think?

#### The DoFn Lifecycle

While each language's SDK is free to make different decisions, the Python and
Java SDK's share an API with the following stages of a DoFn's lifecycle.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Traversing a pipeline

Something you will certain do is to traverse a pipeline, probably to
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

graph of `PTransforms`.

The entry point for this in Java is
[`Pipeline.traverseTopologically`](https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/Pipeline.html#traverseTopologically-org.apache.beam.sdk.Pipeline.PipelineVisitor-)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable...

I just tried it, but it seems that we redirect https://beam.apache.org/documentation/sdks/javadoc/ to https://beam.apache.org/documentation/sdks/javadoc/2.0.0 at the root but it doesn't work for deeper links.

I also tried latest which is pretty common, but that is also a 404.

So if we set something like that up, I would definitely use it.

### The Runner API

The Runner API is an SDK-independent schema for a pipeline along with RPC
interfaces (TBD) for launching a pipeline and checking the status of a job.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "RPC" bit has not been implemented. I rephrased a little bit. I could also just drop that part or put it elsewhere.

API](https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/beam_runner_api.proto)
refers to a specific manifestation of the concepts in the Beam model, as a
protocol buffers schema. Even though you should not manipulate these messages
direclty, it can be helpful to know the canonical data that makes up a
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kennknowles
Copy link
Member Author

R: @davorbonaci I think now that we have had a pass over this we should get this up and improve it as things change and through feedback from runner authors.

@asfgit
Copy link

asfgit commented Jun 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/522/

Jenkins built the site at commit id 065e5a6 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@davorbonaci
Copy link
Member

LGTM. Please merge! This is exciting!

@kennknowles
Copy link
Member Author

retest this please

@asfgit asfgit merged commit 15175f7 into apache:asf-site Jun 9, 2017
asfgit pushed a commit that referenced this pull request Jun 9, 2017
  Add Beam Runner Guide
  Add support for "no-toggle" code snippets
@asfgit
Copy link

asfgit commented Jun 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/523/

Jenkins built the site at commit id 15175f7 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfgit
Copy link

asfgit commented Jun 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/524/

Jenkins built the site at commit id 15175f7 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
  Add Beam Runner Guide
  Add support for "no-toggle" code snippets
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
  Add Beam Runner Guide
  Add support for "no-toggle" code snippets
melap pushed a commit to apache/beam that referenced this pull request Jun 20, 2018
  Add Beam Runner Guide
  Add support for "no-toggle" code snippets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants