-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-899] Add Flink Instructions to quickstart.md #79
Conversation
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): Jenkins built the site at commit id a31273c with Jekyll and staged it here. Happy reviewing. Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aljoscha! Great -- just a few comments.
@@ -53,6 +53,20 @@ MinimalWordCount.java WordCount.java | |||
|
|||
For a detailed introduction to the Beam concepts used in these examples, see the [WordCount Example Walkthrough]({{ site.baseurl }}/get-started/wordcount-example). Here, we'll just focus on executing `WordCount.java`. | |||
|
|||
## Runner-Specific Dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be needed. The quickstart is using examples/WordCount archetype, which already has (or, should have) this set up correctly. So, no action needed from the user.
I think this generally falls under Ensure you've done any runner-specific setup
bullet below -- we should put it on the runner page. See #76 and #77 for good examples, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, if you follow the Quickstart steps you will end up with a pom that only has the direct runner and dataflow runner as dependencies. (This is tracked in https://issues.apache.org/jira/browse/BEAM-905) I think we either have to make BEAM-905 a blocker for this or add in specific instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already done in examples
; I think I already saw a PR for archetypes too. So, I'd go with the assumption this will be done very shortly. Sure, we'd need a release, but there are other rough edges too -- perhaps we can point to nightlies at the top of the page.
``` | ||
TODO BEAM-899 | ||
``` | ||
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we teach the users here how to run on Flink locally and on a remote cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of putting that in the Flink Runner-specific page to not overload this because it requires some more involved steps.
If you want, I can also put that here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be a simple pipeline option with hostname/port only to point to an already configured cluster? (How to set up such a cluster would be overload for sure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not, the Flink Runner does not automatically pick up class files and jars like the Dataflow runner does. You need to create a jar that contains all the dependencies (which a mvn package
does because we have maven shade in the examples archetype) and then run it as you would run a Flink job, i.e. something like bin/flink run -c my.WordCountClass my-jarfile.jar -my-parameters
.
Also, there is a bug right now which requires users to place the jar into the lib folder of the Flink installation. Something with Flink not using the proper Classloader in a certain place.
I can still put that in, though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Perhaps a separate tab for the time being? Flink-local
and Flink-remote
?
(but, let's make sure we improve the user experience over time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, do we already have some javascript/css magic for that in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Javascript doesn't hardcode anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, what do you mean by that?
Also, should we maybe put all of the runner specific instructions into tabs that are labeled with the runner? As it is now, we just have four consecutive boxes of shell commands that are not labeled. I imagine the confusion to only get bigger once we add more runners to this.
``` | ||
TODO BEAM-899 | ||
``` | ||
$ ls counts* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some UI that users can see if running locally/remotely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do as you see fit.
From my perspective, I'd point the users to it, but not teach them how to set up a cluster.
@@ -147,6 +147,7 @@ <h1 id="apache-beam-java-sdk-quickstart">Apache Beam Java SDK Quickstart</h1> | |||
<ul id="markdown-toc"> | |||
<li><a href="#set-up-your-development-environment" id="markdown-toc-set-up-your-development-environment">Set up your Development Environment</a></li> | |||
<li><a href="#get-the-wordcount-code" id="markdown-toc-get-the-wordcount-code">Get the WordCount Code</a></li> | |||
<li><a href="#runner-specific-dependencies" id="markdown-toc-runner-specific-dependencies">Runner-Specific Dependencies</a></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI only -- we are trying the practice of avoiding content/*
in the pull requests, and generate new content when merging PR. Much easier to handle. (Of course, works either way.)
@davorbonaci I updated this to have tabs for |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): Jenkins built the site at commit id a53e4d8 with Jekyll and staged it here. Happy reviewing. Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again. |
LGTM. A clear improvement, so merging. Thanks @aljoscha.
|
For running it locally yes, for the |
I added a section about runner specific dependencies in the Quickstart.
R: @francesperry