-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-505] Fill in the documentation/runners/direct portion of the website #76
Conversation
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): Jenkins built the site at commit id 38e0f07 with Jekyll and staged it here. Happy reviewing. Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! Just a few minor comments ;-)
* enforcing immutability of elements | ||
* enforcing encodability of elements | ||
* elements are processed in an arbitrary order at all points | ||
* serialization of user Fns (DoFn, CombineFn, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Fns -> functions?
Perhaps code formatting for DoFn, CombineFn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Using the direct runner for testing and development helps ensure that pipelines are robust across different Beam runners. In addition, debugging failed runs can be a non-trivial task when a pipeline executes on a remote cluster. Instead, it is often faster and simpler to perform local unit testing on your pipeline code. Unit testing your pipeline locally also allows you to use your preferred local debugging tools. | ||
|
||
Here are some resources with information about how to test your pipelines. | ||
* [Testing Unbounded Pipelines in Apache Beam](https://beam.incubator.apache.org/blog/2016/10/20/test-stream.html) talks about the use of Java classes [PAssert]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html) and [TestStream]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/TestStream.html) to test your pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't hardcode the full URL -- it is fragile and guaranteed to break when we go TLP. Please link to baseurl/blog...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
Here are some resources with information about how to test your pipelines. | ||
* [Testing Unbounded Pipelines in Apache Beam](https://beam.incubator.apache.org/blog/2016/10/20/test-stream.html) talks about the use of Java classes [PAssert]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html) and [TestStream]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/TestStream.html) to test your pipelines. | ||
* The [Apache Beam WordCount Example](http://beam.incubator.apache.org/get-started/wordcount-example/) contains an example of logging and testing a pipeline with `PAssert`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, no full URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
For example, if you are using Maven for development and want to use the SDK for Java with [DirectRunner]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/direct/DirectRunner.html), add the following dependency to your `pom.xml` file: | ||
|
||
``` | ||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you abstract the text to be SDK-independent and use language toggles for the code snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bit troublesome. If I put the text that specifies it's for Maven/Java outside of the language toggle, it looks strange if the user toggles python... but it also looks strange when I put the text inside the Java toggle because it forces everything inside to be code formatted, along with syntax highlighting. I will see how it looks with the Maven/Java specific text in a comment inside the Java toggle. Other ideas welcome!
</dependency> | ||
``` | ||
|
||
## Pipeline options for the direct runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are mentions of "Direct Runner", "direct runner" and "Direct runner". I think it would be good to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change everything to Direct Runner to be consistent with other runners.
|
||
## Pipeline options for the direct runner | ||
|
||
When executing your pipeline from the command-line, set `runner` to `direct` or `directrunner`. The default values for the pipeline options are generally sufficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the other pipeline options are generally sufficient.
No need to mention directrunner
, perhaps leave just direct
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
## Additional information and caveats | ||
|
||
Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a Create transform, or you can use a Read transform to work with small local or remote files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps code formatting for Create
and Read
along with Javadoc/Py links to them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): Jenkins built the site at commit id 8157e4e with Jekyll and staged it here. Happy reviewing. Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good to go!
|
||
Here are some resources with information about how to test your pipelines. | ||
* [Testing Unbounded Pipelines in Apache Beam]({{ site.baseurl }}/blog/2016/10/20/test-stream.html) talks about the use of Java classes [`PAssert`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html) and [`TestStream`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/TestStream.html) to test your pipelines. | ||
* The [Apache Beam WordCount Example]({{ site.baseurl }}/get-started/wordcount-example/) contains an example of logging and testing a pipeline with Java [`PAssert`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will probably be language independent. I'd remove word "Java" before PAssert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
```java | ||
// If you use Maven and the SDK for Java, add the following dependency to your pom.xml file: | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: unfortunately this is trying to apply Java formatting to an XML snippet. Sad, but orthogonal to this PR.
You must specify your dependency on the Direct Runner. | ||
|
||
```java | ||
// If you use Maven and the SDK for Java, add the following dependency to your pom.xml file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this comment: //
isn't a comment here. It's XML, so we'd need something like <!--
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LGTM. Merging. |
R: @davorbonaci @francesperry