New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[BEAM-508] Fill in the documentation/runners/dataflow portion of the website #77

Closed

melap wants to merge 4 commits into apache:asf-site from melap:dataflow

melap commented Nov 11, 2016

R: @davorbonaci @francesperry


          [BEAM-508] Fill in the documentation/runners/dataflow portion of the …

c84b49e

…website

asfbot commented Nov 11, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/94/

Jenkins built the site at commit id c84b49e with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

asfbot commented Nov 11, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/49/

This was referenced Nov 13, 2016

[BEAM-899] Add Flink Instructions to quickstart.md #79

Merged

[BEAM-845] Update Apex runner info after merge to master. #78

Merged

davorbonaci requested changes

View reviewed changes

Member

davorbonaci left a comment

Beautiful! Just a few minor comments ;-)

src/documentation/runners/dataflow.md


		The Google Cloud Dataflow runner uses the [Cloud Dataflow managed service](https://cloud.google.com/dataflow/service/dataflow-service-desc). When you run your pipeline with the Cloud Dataflow service, the runner uploads your executable code and dependencies to a Google Cloud Storage bucket and creates a Cloud Dataflow job, which executes your pipeline on managed resources in Google Cloud Platform.

		The Cloud Dataflow runner and service is suitable for large scale continuous jobs, and provides:

Member

davorbonaci Nov 13, 2016

is -> are?
large scale, continuous (add comma)?

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md

+              The Cloud Dataflow runner and service is suitable for large scale continuous jobs, and provides:
+              * a fully managed service
+              * [autoscaling](https://cloud.google.com/dataflow/service/dataflow-service-desc#autoscaling) of the number of VMs throughout the lifetime of the job

Member

davorbonaci Nov 13, 2016

VMs -> workers

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md


		2. Enable billing for your project.

		3. Enable APIs: Cloud Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Cloud Storage JSON, BigQuery, Cloud Pub/Sub, and Cloud Datastore.

Member

davorbonaci Nov 13, 2016

BigQuery, PubSub, and Datastore are optional, I think. Perhaps you can say: May may need to enable other APIs, such as (these three), if you use them in your pipeline code.

Author

melap Nov 15, 2016

done. also changed the logging API name to match the name that shows up in the console.

src/documentation/runners/dataflow.md


		3. Enable APIs: Cloud Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Cloud Storage JSON, BigQuery, Cloud Pub/Sub, and Cloud Datastore.

		4. Install the Cloud SDK.

Member

davorbonaci Nov 13, 2016

Google Cloud SDK.

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md

+. Install the Cloud SDK.
+. Create a Cloud Storage bucket.
+                  * In the Cloud Platform Console, go to the Cloud Storage browser.

Member

davorbonaci Nov 13, 2016

Google Cloud Platform Console -- I think we should use the full name on the first reference to the term, but not afterwards.

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md


		## Pipeline options for the Cloud Dataflow runner

		When executing your pipeline from the command-line, set these pipeline options.

Member

davorbonaci Nov 13, 2016

This is needed even if not executing from the command line.

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md

+              <tr>
+                <td><code>project</code></td>
+                <td>The project ID for your Google Cloud Project.</td>
+                <td>If not set, defaults to the default project of the current user.</td>

Member

davorbonaci Nov 13, 2016

I think there's no such thing as default project of the current user. I think there's a default project in the current environment set via gcloud.

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md

+              </tr>
+              <tr>
+                <td><code>streaming</code></td>
+                <td>Whether streaming mode is enabled or disabled; <code>true</code> if enabled.</td>

Member

davorbonaci Nov 13, 2016

Set to true if running pipelines with unbounded PCollections?

Author

melap Nov 15, 2016

Done. I followed the style of the programming guide, though it looks a bit strange because the code block font is so different from the non-code font. Other possible options would be "PCollection objects" or just "collections", if the visual ickyness is too much.

src/documentation/runners/dataflow.md

+              </tr>
+              <tr>
+                <td><code>stagingLocation</code></td>
+                <td>Cloud Storage bucket path for staging your binary and any temporary files. Must be a valid Cloud Storage URL that begins with <code>gs://</code>.</td>

Member

davorbonaci Nov 13, 2016

Optional.

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md


		### Blocking Execution

		To connect to your job and block until it is completed, call `waitToFinish` on the `PipelineResult` returned from `pipeline.run()`. The Cloud Dataflow runner prints job status updates and console messages while it waits. While the result is connected to the active job, note that typing Ctrl+C from the command line does not cancel your job. To cancel the job, you can use the [Dataflow Monitoring Interface](https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf) or the [Dataflow Command-line Interface](https://cloud.google.com/dataflow/pipelines/dataflow-command-line-intf).

Member

davorbonaci Nov 13, 2016

typing -> pressing

Author

melap Nov 15, 2016

done


          Addressing code review comments, round 1

664eded

davorbonaci approved these changes

View reviewed changes

Member

davorbonaci left a comment

LGTM

src/documentation/runners/dataflow.md

+              <tr>
+                <td><code>tempLocation</code></td>
+                <td>Optional. Path for temporary files. If set to a valid Google Cloud Storage URL that begins with <code>gs://</code>, <code>tempLocation</code> is used as the default value for <code>gcpTempLocation</code>.</td>
+                <td>No default value</td>

Member

davorbonaci Nov 15, 2016

. (dot) at the end of the sentence.

Author

melap Nov 15, 2016

done

src/documentation/runners/dataflow.md

+              <dependency>
+                <groupId>org.apache.beam</groupId>
+                <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
+                <version>0.3.0-incubating</version>

Member

davorbonaci Nov 15, 2016

can you use jekyll variable here?
(sorry for forgetting that on a previous one. Update too?)

Author

melap Nov 15, 2016

Added latest version variable. Will also update direct runner with this.

melap added 2 commits

November 14, 2016 16:51


          Addressing code review comments, round 2

a8dec88


          Remove renegade site version line

a19af36

asfbot commented Nov 15, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/119/

Jenkins built the site at commit id 664eded with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

asfbot commented Nov 15, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/74/
--none--

asfbot commented Nov 15, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/76/
--none--

asfbot commented Nov 15, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/77/
--none--

asfbot commented Nov 15, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/121/

Jenkins built the site at commit id a8dec88 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

Member

davorbonaci commented Nov 15, 2016

Merging.

asfgit closed this in

d5b722e

asfbot commented Nov 15, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/122/

Jenkins built the site at commit id a19af36 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

melap deleted the dataflow branch

November 15, 2016 23:33

robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request


          This closes apache/beam-site#77

d0558c0

melap pushed a commit to apache/beam that referenced this pull request


          This closes apache/beam-site#77

01e12a5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment