Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-899] Add Flink Instructions to quickstart.md #79

Merged
merged 1 commit into from
Nov 16, 2016
Merged

[BEAM-899] Add Flink Instructions to quickstart.md #79

merged 1 commit into from
Nov 16, 2016

Conversation

aljoscha
Copy link

I added a section about runner specific dependencies in the Quickstart.

R: @francesperry

@asfbot
Copy link

asfbot commented Nov 12, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/52/

@asfbot
Copy link

asfbot commented Nov 12, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/97/

Jenkins built the site at commit id a31273c with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

Copy link
Member

@davorbonaci davorbonaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aljoscha! Great -- just a few comments.

@@ -53,6 +53,20 @@ MinimalWordCount.java WordCount.java

For a detailed introduction to the Beam concepts used in these examples, see the [WordCount Example Walkthrough]({{ site.baseurl }}/get-started/wordcount-example). Here, we'll just focus on executing `WordCount.java`.

## Runner-Specific Dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be needed. The quickstart is using examples/WordCount archetype, which already has (or, should have) this set up correctly. So, no action needed from the user.

I think this generally falls under Ensure you've done any runner-specific setup bullet below -- we should put it on the runner page. See #76 and #77 for good examples, IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, if you follow the Quickstart steps you will end up with a pom that only has the direct runner and dataflow runner as dependencies. (This is tracked in https://issues.apache.org/jira/browse/BEAM-905) I think we either have to make BEAM-905 a blocker for this or add in specific instructions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already done in examples; I think I already saw a PR for archetypes too. So, I'd go with the assumption this will be done very shortly. Sure, we'd need a release, but there are other rough edges too -- perhaps we can point to nightlies at the top of the page.

```
TODO BEAM-899
```
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we teach the users here how to run on Flink locally and on a remote cluster?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of putting that in the Flink Runner-specific page to not overload this because it requires some more involved steps.

If you want, I can also put that here, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be a simple pipeline option with hostname/port only to point to an already configured cluster? (How to set up such a cluster would be overload for sure.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, the Flink Runner does not automatically pick up class files and jars like the Dataflow runner does. You need to create a jar that contains all the dependencies (which a mvn package does because we have maven shade in the examples archetype) and then run it as you would run a Flink job, i.e. something like bin/flink run -c my.WordCountClass my-jarfile.jar -my-parameters.

Also, there is a bug right now which requires users to place the jar into the lib folder of the Flink installation. Something with Flink not using the proper Classloader in a certain place.

I can still put that in, though. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Perhaps a separate tab for the time being? Flink-local and Flink-remote?

(but, let's make sure we improve the user experience over time)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, do we already have some javascript/css magic for that in place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Javascript doesn't hardcode anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, what do you mean by that?

Also, should we maybe put all of the runner specific instructions into tabs that are labeled with the runner? As it is now, we just have four consecutive boxes of shell commands that are not labeled. I imagine the confusion to only get bigger once we add more runners to this.

```
TODO BEAM-899
```
$ ls counts*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some UI that users can see if running locally/remotely?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do as you see fit.

From my perspective, I'd point the users to it, but not teach them how to set up a cluster.

@@ -147,6 +147,7 @@ <h1 id="apache-beam-java-sdk-quickstart">Apache Beam Java SDK Quickstart</h1>
<ul id="markdown-toc">
<li><a href="#set-up-your-development-environment" id="markdown-toc-set-up-your-development-environment">Set up your Development Environment</a></li>
<li><a href="#get-the-wordcount-code" id="markdown-toc-get-the-wordcount-code">Get the WordCount Code</a></li>
<li><a href="#runner-specific-dependencies" id="markdown-toc-runner-specific-dependencies">Runner-Specific Dependencies</a></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI only -- we are trying the practice of avoiding content/* in the pull requests, and generate new content when merging PR. Much easier to handle. (Of course, works either way.)

@aljoscha
Copy link
Author

@davorbonaci I updated this to have tabs for Flink-local and Flink-cluster. Also, I'm now using the -Pflink-runner profile which doesn't exist yet in the archetype but which is also used for the apex instructions. This will only work once we update the archetypes.

@asfbot
Copy link

asfbot commented Nov 16, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/90/
--none--

@asfbot
Copy link

asfbot commented Nov 16, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/134/

Jenkins built the site at commit id a53e4d8 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfgit asfgit merged commit a53e4d8 into apache:asf-site Nov 16, 2016
asfgit pushed a commit that referenced this pull request Nov 16, 2016
@davorbonaci
Copy link
Member

LGTM. A clear improvement, so merging. Thanks @aljoscha.

  • Test failure is due to a broken link to the Apex feature branch. It's being fixed elsewhere.
  • Instead of --runner=org.apache.beam.runners.flink.FlinkRunner, --runner=flink should be enough, no?

@aljoscha
Copy link
Author

For running it locally yes, for the bin/flink invocation no, because the class path is (probably) not setup correctly and it doesn't seem to pick up the runners so these short names don't work.

@aljoscha aljoscha deleted the flink-quickstart branch November 21, 2016 09:24
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
melap pushed a commit to apache/beam that referenced this pull request Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants