Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4118] The docker-flink image is outdated (1.0.2) and can be slimmed down #2176

Closed
wants to merge 9 commits into from

Conversation

iemejia
Copy link
Member

@iemejia iemejia commented Jun 28, 2016

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

  • General
    • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
    • The pull request addresses only one issue
    • Each commit in the PR has a meaningful commit message (including the JIRA id)
  • Documentation
    • Documentation has been added for new functionality
    • Old documentation affected by the pull request has been updated
    • JavaDoc for public methods has been added
  • Tests & Build
    • Functionality added by the pull request is covered by tests
    • mvn clean verify has been executed successfully locally or a Travis build has passed

Some of the changes include:

- Remove unneeded dependencies (nano, wget)
- Remove apt lists to reduce image size
- Reduce number of layers on the docker image (best docker practice)
- Remove useless variables and base the code in generic ones e.g.
FLINK_HOME
- Change the default JDK from oracle to openjdk-8-jre-headless, based on
two reasons:

1. You cannot legally repackage the oracle jdk in docker images
2. The open-jdk headless is more appropriate for a server image (no GUI stuff)

- Return port assignation to the standard FLINK one:

Variable: docker-flink -> flink

taskmanager.rpc.port: 6121 -> 6122
taskmanager.data.port: 6122 -> 6121
jobmanager.web.port: 8080 -> 8081
@iemejia
Copy link
Member Author

iemejia commented Jun 28, 2016

The docker images script was simplified and the image size was reduced.

Previous image:
flink latest 6475add651c7 24 minutes ago 711.6 MB

Image after FLINK-4118
flink latest 555e60f24c10 20 seconds ago 252.5 MB

@iemejia iemejia changed the title Flink 4118 [FLINK-4118] The docker-flink image is outdated (1.0.2) and can be slimmed down Jun 28, 2016
@iemejia
Copy link
Member Author

iemejia commented Jun 28, 2016

Sorry I had to rebase my previous PR but this is the definitive one.

We don't use the conf folder anymore for the docker image.

parallelization.degree.default: %parallelism%
# general configuration
sed -i -e "s/taskmanager.numberOfTaskSlots: 1/taskmanager.numberOfTaskSlots: `grep -c ^processor /proc/cpuinfo`/g" $FLINK_HOME/conf/flink-conf.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be executed on the JobManager since it will print taskmanager_1 | sed: can't create temp file '/usr/local/flink/conf/flink-conf.yamlXXXXXX': Read-only file system which might worry some users.

@aljoscha
Copy link
Contributor

Nice work! I didn't know how to use docker but I managed to set it up and use the new version on OS X without a problem. So it seems to work well, and the code is a lot simpler and the image is smaller.

LGTM minus the one comment I had about the config file.

@iemejia
Copy link
Member Author

iemejia commented Jun 30, 2016

Nice, I just fixed as you suggested. I have three questions:

  1. This container is based on the Java JRE (to keep it small), Does Flink in any part do some magic that requires a full JDK (like live recompiles) ? If no, I think this is almost perfect now.
  2. Are you aware of any other flink dependency that uses any native OS library (I ask this because if this is the case it must be added to the container, I did this for snappy because I found the issue while testing a Beam pipeline, but I don't know if there are others).
  3. In the docker image I left supervisor because I didn't find an easy way to start flink in normal mode, the scripts to start both taskmanager and jobmanager go into daemon mode immediately, is there something that can be done to change this (this will reduce the image in 40 more MB), but well I can work on that for a different PR.


Images are based on Ubuntu Trusty 14.04 and run Supervisord to stay alive when running containers.
docker build -t "flink" flink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine this doesn't work, but this does: docker build -t "flink" . Is this maybe a leftover from the earlier version where there was a flink directory?

(Earlier I was using sh build.sh, that's why I didn't spot the problem.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, just fixed :)

@aljoscha
Copy link
Contributor

Re 1. I think this should be fine. There is some dynamic code generation but this uses Janino as a library so that shouldn't be a problem.

Re 2. I'm not aware of any but it should be easy to add them if we find any in the future.

Re 3. This is not easily possible since the & is hardcoded in flink-daemon.sh:

$JAVA_RUN $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}" > "$out" 2>&1 < /dev/null &

I'd be very happy if you'd like to work on adding a flag or setting to make the daemons start non-daemonized. It should not be too hard to add that, IMHO.

@iemejia
Copy link
Member Author

iemejia commented Jun 30, 2016

Awasome, thanks @aljoscha, let's merge !

@aljoscha
Copy link
Contributor

One last thing I would like to try is running a job from an existing Flink installation using $FLINK_HOME/bin/flink run -m <jobmanager:port> <your_jar> as described in the README.

I suspect it has something to do with setting the right IP or network setting because Akka is very particular about the IP to which it is bound. Did you get this to work? I'm only managing to access the Web Dashboard.

@iemejia
Copy link
Member Author

iemejia commented Jul 1, 2016

Hi, I tested it with the basic word count and with the beam pipeline example that @ecesena put for his flink/beam demo.

I don't know if it does not work because you are running on docker for mac, but check two things:

  1. docker assigns an internal IP to each running container (the one corresponding to the jobmanager is the one to put in the -m argument).
  2. The path you pass in the examples must be accesible from the container, e.g. you must copy the file in all the nodes (as I mention in the README), or mount them with a volume (-v on docker).
    e.g.
$FLINK_HOME/bin/flink run -m 172.18.0.2:6123 examples/batch/WordCount.jar --input file:///tmp/kinglear.txt

The kinglear file must be in all the nodes. I copy those like this:

for i in $(docker ps --filter name=flink --format={{.ID}}); do                                
    docker cp ~/Desktop/flink-beam/kinglear.txt $i:/tmp/
done

I haven't seen that you can even execute that example without arguments:

bin/flink run -m 172.18.0.2:6123 examples/batch/WordCount.jar

@aljoscha
Copy link
Contributor

aljoscha commented Jul 1, 2016

Yes, this is exactly what I was trying on OS X. I'm quickly setting up a ubuntu VM to see if it works there.

@aljoscha
Copy link
Contributor

aljoscha commented Jul 1, 2016

You were right, I did exactly the same thing I did on OS X on a new Ubuntu 16.04 installation and it worked. 😃


- Upload a jar to the cluster

`scp -P 220 <your_jar> root@localhost:/<your_path>`
for i in $(docker ps --filter name=flink --format={{.ID}}); do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jar only needs to be uploaded to the JobManager container, so something like this should suffice:

docker cp <your_jar> $(docker ps --filter name=flink_jobmanager --format={{.ID}}):/<your_path>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice I didn't know that flink took care of this, fix in mins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the TaskManagers pull it from the JobManager which keeps it in a component called BlobManager.

@aljoscha
Copy link
Contributor

aljoscha commented Jul 1, 2016

I had two more comments about the README but after that it should be good to merge.

@iemejia
Copy link
Member Author

iemejia commented Jul 1, 2016

This should be ok now. In further PRs I expect to fix the daemon thing + maybe add a HA version using zookeeper of the docker-compose file.
One more question aljoscha, I intend to add the Beam Flink runner and contribute a similar version into Beam, however I don't know what is the best approach for this, I just tried naively to put the jars in $FLINK_HOME/lib but it didn't work, any ideas ?

@aljoscha
Copy link
Contributor

aljoscha commented Jul 1, 2016

That's great to hear! I'll write something on the Beam ML thread.

@iemejia
Copy link
Member Author

iemejia commented Jul 1, 2016

Great, thanks for your review.

@asfgit asfgit closed this in ffaf10d Jul 4, 2016
@aljoscha
Copy link
Contributor

aljoscha commented Jul 4, 2016

I merged it, thanks again for your work!

@iemejia iemejia deleted the FLINK-4118 branch July 4, 2016 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants