-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-1013, BEAM-1353] Website fixups after PTransform Style Guide changes #243
Conversation
Refer to this link for build results (access rights to CI server needed): Jenkins built the site at commit id 001f20e with Jekyll and staged it here. Happy reviewing. Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again. |
Refer to this link for build results (access rights to CI server needed): Jenkins built the site at commit id 98b9b70 with Jekyll and staged it here. Happy reviewing. Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -24,7 +24,7 @@ Apex was built as stateful stream processor from the ground up. Operators [check | |||
|
|||
## Translation to Apex DAG | |||
|
|||
A Beam runner needs to implement the translation from the Beam model to the underlying frameworks execution model. In the case of Apex, the runner will translate the pipeline into the [native (compositional, low level) DAG API](https://www.datatorrent.com/blog/tracing-dags-from-specification-to-execution/) (which is also the base for a number of other API that are available to specify applications that run on Apex). The DAG consists of operators (functional building blocks that are connected with streams. The runner provides the execution layer. In the case of Apex it is distributed stream processing, operators process data event by event. The minimum set of operators covers Beam’s primitive transforms: `ParDo.Bound`, `ParDo.BoundMulti`, `Read.Unbounded`, `Read.Bounded`, `GroupByKey`, `Flatten.FlattenPCollectionList` etc. | |||
A Beam runner needs to implement the translation from the Beam model to the underlying frameworks execution model. In the case of Apex, the runner will translate the pipeline into the [native (compositional, low level) DAG API](https://www.datatorrent.com/blog/tracing-dags-from-specification-to-execution/) (which is also the base for a number of other API that are available to specify applications that run on Apex). The DAG consists of operators (functional building blocks that are connected with streams. The runner provides the execution layer. In the case of Apex it is distributed stream processing, operators process data event by event. The minimum set of operators covers Beam’s primitive transforms: `ParDo.SingleOutput`, `ParDo.MultiOutput`, `Read.Unbounded`, `Read.Bounded`, `GroupByKey`, `Flatten.PCollections` etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert; we don't change blog posts -- they are dated and should be accurate as the date of publication.
If warranted, we can add a note at the top of the blog that it may be outdated -- but, I wouldn't change the text.
@@ -420,8 +420,9 @@ PCollection<KV<UserId, Event>> events = ... | |||
PCollectionView<Map<UserId, Model>> userModels = events | |||
|
|||
// A tradeoff between latency and cost | |||
.apply(Window.triggering( | |||
AfterProcessingTime.pastFirstElementInPane(Duration.standardSeconds(1))) | |||
.apply(Window.configure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(leave this one as is)
Fixes usages of PTransforms affected by changes as part of https://issues.apache.org/jira/browse/BEAM-1353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes!
@@ -1227,7 +1206,7 @@ Each pipeline object has a `CoderRegistry`. The `CoderRegistry` represents a map | |||
The Beam SDK for Python has a `CoderRegistry` that represents a mapping of Python types to the default coder that should be used for `PCollection`s of each type. | |||
|
|||
{:.language-java} | |||
By default, the Beam SDK for Java automatically infers the `Coder` for the elements of an output `PCollection` using the type parameter from the transform's function object, such as `DoFn`. In the case of `ParDo`, for example, a `DoFn<Integer, String>function` object accepts an input element of type `Integer` and produces an output element of type `String`. In such a case, the SDK for Java will automatically infer the default `Coder` for the output `PCollection<String>` (in the default pipeline `CoderRegistry`, this is `StringUtf8Coder`). | |||
By default, the Beam SDK for Java automatically infers the `Coder` for the elements of a `PCollection` produced by a `PTransform` using the type parameter from the transform's function object, such as `DoFn`. In the case of `ParDo`, for example, a `DoFn<Integer, String>` function object accepts an input element of type `Integer` and produces an output element of type `String`. In such a case, the SDK for Java will automatically infer the default `Coder` for the output `PCollection<String>` (in the default pipeline `CoderRegistry`, this is `StringUtf8Coder`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest adding a "by" here: "produced by a PTransform
by using the type parameter"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(adding the second by in that snippet, I realize upon re-reading there's another by already there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't notice this before merging. Maybe include this in the next change whatever it is?.. probably doesn't warrant a separate PR.
Refer to this link for build results (access rights to CI server needed): Jenkins built the site at commit id 2a55166 with Jekyll and staged it here. Happy reviewing. Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again. |
R: @davorbonaci
CC: @melap