Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-505] Fill in the documentation/runners/direct portion of the website #76

Closed
wants to merge 3 commits into from

Conversation

melap
Copy link

@melap melap commented Nov 11, 2016

@asfbot
Copy link

asfbot commented Nov 11, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/93/
--none--

@asfbot
Copy link

asfbot commented Nov 11, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/48/

@davorbonaci
Copy link
Member

retest this please

@asfbot
Copy link

asfbot commented Nov 11, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/50/

@asfbot
Copy link

asfbot commented Nov 11, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/95/

Jenkins built the site at commit id 38e0f07 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

Copy link
Member

@davorbonaci davorbonaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! Just a few minor comments ;-)

* enforcing immutability of elements
* enforcing encodability of elements
* elements are processed in an arbitrary order at all points
* serialization of user Fns (DoFn, CombineFn, etc.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Fns -> functions?
Perhaps code formatting for DoFn, CombineFn?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Using the direct runner for testing and development helps ensure that pipelines are robust across different Beam runners. In addition, debugging failed runs can be a non-trivial task when a pipeline executes on a remote cluster. Instead, it is often faster and simpler to perform local unit testing on your pipeline code. Unit testing your pipeline locally also allows you to use your preferred local debugging tools.

Here are some resources with information about how to test your pipelines.
* [Testing Unbounded Pipelines in Apache Beam](https://beam.incubator.apache.org/blog/2016/10/20/test-stream.html) talks about the use of Java classes [PAssert]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html) and [TestStream]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/TestStream.html) to test your pipelines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't hardcode the full URL -- it is fragile and guaranteed to break when we go TLP. Please link to baseurl/blog....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Here are some resources with information about how to test your pipelines.
* [Testing Unbounded Pipelines in Apache Beam](https://beam.incubator.apache.org/blog/2016/10/20/test-stream.html) talks about the use of Java classes [PAssert]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html) and [TestStream]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/TestStream.html) to test your pipelines.
* The [Apache Beam WordCount Example](http://beam.incubator.apache.org/get-started/wordcount-example/) contains an example of logging and testing a pipeline with `PAssert`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, no full URLs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

For example, if you are using Maven for development and want to use the SDK for Java with [DirectRunner]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/direct/DirectRunner.html), add the following dependency to your `pom.xml` file:

```
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you abstract the text to be SDK-independent and use language toggles for the code snippet?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit troublesome. If I put the text that specifies it's for Maven/Java outside of the language toggle, it looks strange if the user toggles python... but it also looks strange when I put the text inside the Java toggle because it forces everything inside to be code formatted, along with syntax highlighting. I will see how it looks with the Maven/Java specific text in a comment inside the Java toggle. Other ideas welcome!

</dependency>
```

## Pipeline options for the direct runner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are mentions of "Direct Runner", "direct runner" and "Direct runner". I think it would be good to be consistent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change everything to Direct Runner to be consistent with other runners.


## Pipeline options for the direct runner

When executing your pipeline from the command-line, set `runner` to `direct` or `directrunner`. The default values for the pipeline options are generally sufficient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the other pipeline options are generally sufficient.

No need to mention directrunner, perhaps leave just direct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## Additional information and caveats

Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a Create transform, or you can use a Read transform to work with small local or remote files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps code formatting for Create and Read along with Javadoc/Py links to them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@asfbot
Copy link

asfbot commented Nov 14, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/69/
--none--

@asfbot
Copy link

asfbot commented Nov 14, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/113/

Jenkins built the site at commit id 8157e4e with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

Copy link
Member

@davorbonaci davorbonaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good to go!


Here are some resources with information about how to test your pipelines.
* [Testing Unbounded Pipelines in Apache Beam]({{ site.baseurl }}/blog/2016/10/20/test-stream.html) talks about the use of Java classes [`PAssert`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html) and [`TestStream`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/TestStream.html) to test your pipelines.
* The [Apache Beam WordCount Example]({{ site.baseurl }}/get-started/wordcount-example/) contains an example of logging and testing a pipeline with Java [`PAssert`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will probably be language independent. I'd remove word "Java" before PAssert.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```java
// If you use Maven and the SDK for Java, add the following dependency to your pom.xml file:

<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: unfortunately this is trying to apply Java formatting to an XML snippet. Sad, but orthogonal to this PR.

You must specify your dependency on the Direct Runner.

```java
// If you use Maven and the SDK for Java, add the following dependency to your pom.xml file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this comment: // isn't a comment here. It's XML, so we'd need something like <!--.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@davorbonaci
Copy link
Member

LGTM. Merging.

@asfgit asfgit closed this in a82a0f3 Nov 15, 2016
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
melap pushed a commit to apache/beam that referenced this pull request Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants