Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple bringup jobs cause multiple roscores leading to high CPU usage #16

Closed
dlaz opened this issue Feb 10, 2015 · 4 comments
Closed

Comments

@dlaz
Copy link

dlaz commented Feb 10, 2015

I think this is multiple bugs (at least some of which are upstream of robot_upstart). Here's what happens:
I have two upstart jobs (install script for reference: https://github.com/OSUrobotics/wheelchair-automation/blob/cleanup/wheelchair_bringup/scripts/install). Both of them try to start roscores. I'm not sure why this happens, since roslaunch is only supposed to start roscore if it isn't already running. One of them can't start because port 11311 is already in use. That second roscore process sits there without exiting. Over time, this drives the CPU usage of the first roscore up to 100%, requiring a hard reboot of the i3 machine on the robot.

I found a workaround by editing the startup script at /usr/sbin/wheelchair-logging-start, and adding --wait to theroslaunch invocation. This fixes the problem. However, if the network interface goes down and comes back up (like wifi briefly goes down), another instance of roscore tries to start for the job without --wait.

tl;dr: Multiple jobs -> multiple roscores -> high CPU usage
This is hydro on Ubuntu 12.04.

@mikepurvis
Copy link
Member

The rewrite version has two changes which will make your life easier:

  • If you use the new Python interface to set up your jobs (example), you can set the the roslaunch_wait member to true, and --wait will be appended to the roslaunch line. This is not properly documented, but I would definitely accept a PR to the Sphinx docs to add it.
  • The default launch trigger now is local-filesystems rather than a network interface, so ROS will come up and stay up.

It's only released into Indigo, but you can use the indigo-devel branch on Hydro from source. Alternatively, if you want to persist with the Hydro deb package, you can hack in the --wait as you are already doing, and set it to trigger on an interface with a static IP rather than the wifi.

Hope that helps.

@mikepurvis
Copy link
Member

Having looked at your setup script, is there a reason you're making this two separate jobs? Part of the idea of robot_upstart is that you can glob together multiple launchfiles at runtime and start them up with the same roslaunch invocation— no --wait required.

@dlaz
Copy link
Author

dlaz commented Feb 10, 2015

Thanks for the suggestions - the reason it's two separate jobs is because I wanted the ability to stop logging on its own, though as we begin to use the system, that hasn't been necessary for the most part. I think for the time being, I'm going to trigger things on the loopback interface which shouldn't be susceptible to the whims of the wifi network.

My real concern is why multiple instances of roscore are trying to run, and why roscore is pegging the CPU, but that is almost certainly an upstream issue.

@mikepurvis
Copy link
Member

Closing this due to upstream issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants