Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4208] Support Running Flink processes in foreground mode #2239

Closed
wants to merge 2 commits into from

Conversation

iemejia
Copy link
Member

@iemejia iemejia commented Jul 13, 2016

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

  • General
    • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
    • The pull request addresses only one issue
    • Each commit in the PR has a meaningful commit message (including the JIRA id)
  • Documentation
    • Documentation has been added for new functionality
    • Old documentation affected by the pull request has been updated
    • JavaDoc for public methods has been added
  • Tests & Build
    • Functionality added by the pull request is covered by tests
    • mvn clean verify has been executed successfully locally or a Travis build has passed

nohup ignores the HUP terminal signals, so the process is still alive
even after the terminal ends
@iemejia
Copy link
Member Author

iemejia commented Jul 13, 2016

@aljoscha As discussed in FLINK-4118 I am doing this to support the use case of running flink as a docker container and eventually remove the dependency on python and supervisord from the docker image.

@iemejia
Copy link
Member Author

iemejia commented Jul 13, 2016

I hope this gets into 1.1.0 so I can push the changes to the docker image once this is available in the official binary distribution.

@greghogan
Copy link
Contributor

Hi @iemejia, is the situation with Docker that if the Flink processes are started as daemons and the script returns that Docker assumes the process has terminated?

Skipping the pid file might work fine for a container where one wouldn't start multiple TaskManagers but would cause logfile issues otherwise. We do need to add locking to the pid file.

@iemejia
Copy link
Member Author

iemejia commented Jul 13, 2016

Hi, Yes, this is exactly the situation, in a previous pull request I was optimizing the flink docker image, however I found that the image used supervisord to catch and keep alive those daemons, so I wanted to remove this dependency (because it adds around 40MB to the image + python and some extra stuff).
And this changes go in that direction.

Can you give me some hints on the best way to address this ? Or how can I improve my current approach (notice that I took the start-foreground idea from zookeeper).

echo "Starting $DAEMON daemon on host $HOSTNAME."
$JAVA_RUN $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}" > "$out" 2>&1 < /dev/null &
nohup $JAVA_RUN $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}" > "$out" 2>&1 < /dev/null &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances is the nohup necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because I was trying to grasp what makes a daemon a daemon and I found a reference that convinced me that nohup was missing:
https://stackoverflow.com/questions/3430330/best-way-to-make-a-shell-script-daemon

Additionally when I looked for inspiration for my changes (the start-foreground name), I look
at how they started the daemon in zookeeper and I noticed they use nohup
too.
https://github.com/apache/zookeeper/blob/trunk/bin/zkServer.sh#L219

This is an extra thing and not the core of the Pull Request, if you don't agree
I can rebase and remove that commit, but I think it is worth the addition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent documentation. It looks like (from a grep daemon through the code) that Flink is daemonizing its threads internally. Perhaps @StephanEwen can look at this.

@greghogan
Copy link
Contributor

What if instead of changing how we start the daemon (so continue to always start as a background process), we instead add a wait after the PID file has been updated when starting a foreground process?

If FLINK-4212 is accepted we would also need to release the file lock before calling wait.

@iemejia
Copy link
Member Author

iemejia commented Jul 14, 2016

I tried not to change the current daemon behavior, that's the reason why I took
the decision to add an additional option. I am not sure if using wait may work for what I want but if it does, perfect, can you give me hints of how to test this ? I naively did this and it does not seem to work.

        if [[ ${mypid} =~ ${IS_NUMBER} ]] && kill -0 $mypid > /dev/null 2>&1 ; then
            echo $mypid >> $pid
+           wait $mypid # I also tried with $pid and it does not work either
        else

@greghogan
Copy link
Contributor

Using wait $mypid or just wait works for me if I ./bin/jobmanager.sh start cluster (jobmanager starts in foreground), then in another terminal ./bin/jobmanager.sh stop and both terminals are now at the command prompt.

@iemejia
Copy link
Member Author

iemejia commented Jul 15, 2016

Thanks Greg, I was probably too tired last night because I put the wait in a weird place, I just tried now and everything is working, it is still not 'real' foreground because Ctrl-C gets captured by the 'wait', but it fixes the issue for the docker image so I am going to close this pull request and the issue.
If you or anybody else is still interested I will keep it, but I am going to close it if you don't mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants