Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2839, BEAM-2838] Add MapReduce runner to Beam asf-site. #313

Closed
wants to merge 4 commits into from

Conversation

peihe
Copy link

@peihe peihe commented Sep 7, 2017

No description provided.

@asfgit
Copy link

asfgit commented Sep 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/670/

Jenkins built the site at commit id 7b6a866 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfgit
Copy link

asfgit commented Sep 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/671/

Jenkins built the site at commit id 6a3304c with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfgit
Copy link

asfgit commented Sep 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/672/

Jenkins built the site at commit id 7b6a866 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@peihe peihe changed the title [BEAM-2666] Add JStorm runner to Beam asf-site. [Ignore] Add runner to Beam asf-site. Sep 8, 2017
@peihe peihe changed the title [Ignore] Add runner to Beam asf-site. WIP [BEAM-2839, BEAM-2838] Add MapReduce runner to Beam asf-site. Sep 8, 2017
@asfgit
Copy link

asfgit commented Sep 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/681/

Jenkins built the site at commit id 04a5d2d with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfgit
Copy link

asfgit commented Sep 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/689/

Jenkins built the site at commit id 444c04d with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@peihe peihe changed the title WIP [BEAM-2839, BEAM-2838] Add MapReduce runner to Beam asf-site. [BEAM-2839, BEAM-2838] Add MapReduce runner to Beam asf-site. Sep 11, 2017
@peihe
Copy link
Author

peihe commented Sep 11, 2017

R: @kennknowles

@asfgit
Copy link

asfgit commented Sep 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/690/

Jenkins built the site at commit id 5fead25 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@kennknowles
Copy link
Member

R: @melap

On the overview page, we just list the runners that are on master.

@@ -11,6 +11,8 @@ columns:
name: Apache Apex
- class: gearpump
name: Apache Gearpump
- class: mapreduce
name: MapReduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Hadoop MapReduce

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

@melap melap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made some minor edit suggestions below.

one general comment along the same lines as @kennknowles -- are we adding runners not in master to the main menu pulldown? I'm not sure. I ask as header.html will need a line item for this in the runners section, otherwise I don't think people can get to it. Or alternately, an additional column in the table on https://beam.apache.org/contribute/work-in-progress/ for links to the feature branch runner's pages?


The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 version.

You can add a dependency on the latest version of the Apache Hadoop MapReduce runner by adding to your pom.xml the following:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by adding the following to your pom.xml:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## Apache Hadoop MapReduce Runner prerequisites and setup
You need to have an Apache Hadoop environment with either [Single Node Setup](https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html) or [Cluster Setup](https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html)

The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 version.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Hadoop version 2.8.1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 version.

You can add a dependency on the latest version of the Apache Hadoop MapReduce runner by adding to your pom.xml the following:
```java
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend making this a regular block with just ``` ... sometimes the language toggling can be strange and it might be blank if the user chose python SDK on another page.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

```

## Deploying Apache Hadoop MapReduce with your application
To execute in a local hadoop environment, use this command:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize Hadoop (multiple places on the page)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

--fileOutputDir=<directory for intermediate outputs>"
```

To execute in a hadoop cluster, you need to package your program along will all dependencies in a so-called fat jar.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove "you need to"
will -> with
remove "so-called"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

<tr>
<td><code>runner</code></td>
<td>The pipeline runner to use. This option allows you to determine the pipeline runner at runtime.</td>
<td>Set to <code>MapReduceRunner</code> to run using the Apache Hadoop MapReduce.</td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "the" after using

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

</tr>
<tr>
<td><code>fileOutputDir</code></td>
<td>The directory for files output.</td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files output -> output files

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


To execute in a hadoop cluster, you need to package your program along will all dependencies in a so-called fat jar.

If you follow along the [Beam Quickstart]({{ site.baseurl }}/get-started/quickstart/) this is the command that you can run:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this quickstart link is a redirect to the Java page.. perhaps link directly to the Java version and text something like this?

If you are using through the [Beam Java SDK Quickstart]({{ site.baseurl }}/get-started/quickstart-java/), you can run this command:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with a minor change to "following through"

@@ -36,6 +36,8 @@ Beam currently supports Runners that work with the following distributed process
alt="Apache Flink">
* Apache Gearpump (incubating) <img src="{{ site.baseurl }}/images/logos/runners/gearpump.png"
alt="Apache Gearpump">
* Apache Hadoop MapReduce <img src="{{ site.baseurl }}/images/logos/runners/mapreduce.png"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this image is very big compared to the others on the page... may need to specify dimensions or use a smaller image. more a comment for later, as from Kenn's comment, it sounds like this runner shoudn't be listed here yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Author

@peihe peihe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

I have to update header.html in a follow-up PR. Otherwise, it will complain about dead link.

@@ -11,6 +11,8 @@ columns:
name: Apache Apex
- class: gearpump
name: Apache Gearpump
- class: mapreduce
name: MapReduce
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## Apache Hadoop MapReduce Runner prerequisites and setup
You need to have an Apache Hadoop environment with either [Single Node Setup](https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html) or [Cluster Setup](https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html)

The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 version.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 version.

You can add a dependency on the latest version of the Apache Hadoop MapReduce runner by adding to your pom.xml the following:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 version.

You can add a dependency on the latest version of the Apache Hadoop MapReduce runner by adding to your pom.xml the following:
```java
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

```

## Deploying Apache Hadoop MapReduce with your application
To execute in a local hadoop environment, use this command:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

--fileOutputDir=<directory for intermediate outputs>"
```

To execute in a hadoop cluster, you need to package your program along will all dependencies in a so-called fat jar.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

<tr>
<td><code>runner</code></td>
<td>The pipeline runner to use. This option allows you to determine the pipeline runner at runtime.</td>
<td>Set to <code>MapReduceRunner</code> to run using the Apache Hadoop MapReduce.</td>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


To execute in a hadoop cluster, you need to package your program along will all dependencies in a so-called fat jar.

If you follow along the [Beam Quickstart]({{ site.baseurl }}/get-started/quickstart/) this is the command that you can run:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with a minor change to "following through"

</tr>
<tr>
<td><code>fileOutputDir</code></td>
<td>The directory for files output.</td>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -36,6 +36,8 @@ Beam currently supports Runners that work with the following distributed process
alt="Apache Flink">
* Apache Gearpump (incubating) <img src="{{ site.baseurl }}/images/logos/runners/gearpump.png"
alt="Apache Gearpump">
* Apache Hadoop MapReduce <img src="{{ site.baseurl }}/images/logos/runners/mapreduce.png"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@asfgit
Copy link

asfgit commented Sep 12, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/693/

Jenkins built the site at commit id f95c84b with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

- class: mapreduce
l1: 'Yes'
l2: fully supported
l3: ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, I'd want one sentence about the MapReduce feature used to implement them, but I think with the upcoming revamp it is not worth filling it out. We should have this page revised before the runner is on master.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -505,6 +575,10 @@ categories:
l1: 'No'
l2: ''
l3: ''
- class: mapreduce
l1: 'Yes'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most useful thing for users here is to say "No" and indicate that it is a batch-only runner.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -532,6 +606,10 @@ categories:
l1: 'Yes'
l2: fully supported
l3: ''
- class: mapreduce
l1: 'Yes'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you say 'No' to triggering, you can leave these blank or just say 'No' without the explanation. It is a nice explanation, though, so you could also leave it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -668,6 +762,10 @@ categories:
l1: 'Yes'
l2: fully supported
l3: ''
- class: mapreduce
l1: 'Yes'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff is another area where it doesn't really make sense for a batch-only runner to have any answer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

layout: default
title: "Apache Hadoop MapReduce Runner"
permalink: /documentation/runners/mapreduce/
redirect_from: /learn/runners/mapreduce/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need this, since there's no past URL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

---
layout: default
title: "Apache Hadoop MapReduce Runner"
permalink: /documentation/runners/mapreduce/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since we will not link from the overview and will not link from the header, you should follow this up by adding a link from the work-in-progress page.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Author

@peihe peihe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

- class: mapreduce
l1: 'Yes'
l2: fully supported
l3: ''
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -505,6 +575,10 @@ categories:
l1: 'No'
l2: ''
l3: ''
- class: mapreduce
l1: 'Yes'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -668,6 +762,10 @@ categories:
l1: 'Yes'
l2: fully supported
l3: ''
- class: mapreduce
l1: 'Yes'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -532,6 +606,10 @@ categories:
l1: 'Yes'
l2: fully supported
l3: ''
- class: mapreduce
l1: 'Yes'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

---
layout: default
title: "Apache Hadoop MapReduce Runner"
permalink: /documentation/runners/mapreduce/
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

layout: default
title: "Apache Hadoop MapReduce Runner"
permalink: /documentation/runners/mapreduce/
redirect_from: /learn/runners/mapreduce/
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@asfgit
Copy link

asfgit commented Sep 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/695/

Jenkins built the site at commit id f2ae6e6 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@kennknowles
Copy link
Member

Looks mostly good. When I expand details on the capability matrix, the formatting is a bit weird. There are some boxes that start with a :

@peihe
Copy link
Author

peihe commented Sep 13, 2017

PTAL

Added "No" back, and it should fix the ':' issue

@asfgit
Copy link

asfgit commented Sep 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/696/

Jenkins built the site at commit id fa0c7a1 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@melap
Copy link

melap commented Sep 14, 2017

LGTM from a text perspective

@melap
Copy link

melap commented Sep 14, 2017

Actually, one thing -- does the capability matrix need to note somewhere that this runner is in development and isn't part of the release yet?

@peihe
Copy link
Author

peihe commented Sep 18, 2017

We had gearpump in the capability matrix before merging it to master.

PTAL @kennknowles

@kennknowles
Copy link
Member

It would be nice to also call it out on the capability matrix page, but I don't think that is so critical. Mostly the header and overview are where we should stick to released runners. We need a major revamp there anyhow, and that will give a good opportunity to add verbiage about in-progress runners. Let's get this added to the site so that anyone dropping in might see it.

@kennknowles
Copy link
Member

@asfgit merge

@asfgit asfgit closed this in 9bc6678 Sep 19, 2017
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
melap pushed a commit to apache/beam that referenced this pull request Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants