Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new initial command stage to mutex all reads of the installspace. #391

Merged
merged 4 commits into from
Jan 28, 2017

Conversation

mikepurvis
Copy link
Member

@mikepurvis mikepurvis commented Jul 23, 2016

This is the solution to #378 proposed in #378 (comment).

There's probably a small perf hit here which could be mitigated by going to a readers-writer lock, but it doesn't seem Python has one natively (example implementation), and the cost of switching the locked_resource scheme to support such a thing would increase complexity. Given how fast the loadenv stage is relative to the rest of the build, I don't feel it's worth it.

@jbohren @wjwwood


My validation methodology for this is as described in #378 (comment), with one small change— I've duplicated the test workspace and am running two instances of it in parallel, in the hopes of increasing overall entropy on the box and triggering problems.

As of July 24th, I've built ros_base ~2000 times across the two workspaces, with no failed builds. Previous tests with catkin_tools 0.4.2 get failures of the ros_base workspace every 100-200 builds (we see them much more frequently than this on our bundle builds, but those are much larger and more heavily parallelized than the ros_base workspace is).


Upon switching back to the master branch (4f5fa9d), I got the following failures in my two test ros_base workspaces. The error outputs below are manifestations of the underlying concurrency problem which this PR corrects.

Run 11 in the first workspace:

Errors     << shape_msgs:cmake /home/mpurvis/catkin_tools_fail/logs/shape_msgs/build.cmake.000.log
CMake Warning at /opt/clearpath/2.1devel/sdk/share/catkin/cmake/catkinConfig.cmake:76 (find_package):
  Could not find a package configuration file provided by "geometry_msgs"
  with any of the following names:

    geometry_msgsConfig.cmake
    geometry_msgs-config.cmake

  Add the installation prefix of "geometry_msgs" to CMAKE_PREFIX_PATH or set
  "geometry_msgs_DIR" to a directory containing one of the above files.  If
  "geometry_msgs" provides a separate development package or SDK, be sure it
  has been installed.

Run 107 in the second workspace:

Errors     << sensor_msgs:cmake /home/mpurvis/catkin_tools_fail2/logs/sensor_msgs/build.cmake.000.log
Traceback (most recent call last):
  File "/home/mpurvis/catkin_tools_fail2/build/sensor_msgs/catkin_generated/generate_cached_setup.py", line 20, in <module>
    from catkin.environment_cache import generate_environment_script
ImportError: No module named catkin.environment_cache

Next failure on the 22nd run:

/usr/bin/python: can't open file '/home/mpurvis/catkin_tools_fail2/install/_setup_util.py': [Errno 2] No such file or directory
Finished  <<< mk                                   [ 2.0 seconds ]
Starting  >>> geometry_msgs
Starting  >>> rosgraph_msgs
______________________________________________________________________________________
Errors     << diagnostic_msgs:cmake /home/mpurvis/catkin_tools_fail2/logs/diagnostic_msgs/build.cmake.000.log
Traceback (most recent call last):
  File "/home/mpurvis/catkin_tools_fail2/build/diagnostic_msgs/catkin_generated/generate_cached_setup.py", line 20, in <module>
    from catkin.environment_cache import generate_environment_script
ImportError: No module named catkin.environment_cache

Then run 59:

Starting  >>> sensor_msgs
Starting  >>> shape_msgs
Starting  >>> trajectory_msgs
Starting  >>> visualization_msgs
sh: 0: Can't open /home/mpurvis/catkin_tools_fail/install/env.sh
WARNING: Sourced environment from `/home/mpurvis/catkin_tools_fail/install/env.sh` has no environment variables. Something is wrong.
Finished  <<< rosconsole                           [ 8.7 seconds ]
Starting  >>> pluginlib
Starting  >>> rosconsole_bridge
Starting  >>> roscpp
Finished  <<< roslisp                              [ 1.8 seconds ]
______________________________________________________________________________________
Errors     << trajectory_msgs:cmake /home/mpurvis/catkin_tools_fail/logs/trajectory_msgs/build.cmake.000.log
CMake Warning at /opt/clearpath/2.1devel/sdk/share/catkin/cmake/catkinConfig.cmake:76 (find_package):
  Could not find a package configuration file provided by
  "message_generation" with any of the following names:

    message_generationConfig.cmake
    message_generation-config.cmake


for env_loader_path in env_loader_paths:
if logger:
logger.out('Loading environment from: {}'.format(env_loader_path))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger.out could just be dropped if we don't like it; the conditional is because of the print_build_env invocation, which doesn't have a logger to pass it.

@wjwwood
Copy link
Member

wjwwood commented Jul 25, 2016

The code changes and approach seem ok to me, though I share your concern about the performance impact. I wonder how the impact on performance changes as the workspace size increases.

I'd like a once over by @jbohren before merging this though, as he's more familiar with the env loading code.

@mikepurvis
Copy link
Member Author

mikepurvis commented Jul 25, 2016

The perf impact will be proportional to parallelism rather than number of packages, I think, since the impact will be having a bunch of packages which all need to start at once having to take turns using the installspace to set up their environment.

That said, if there's any measurable impact as a result of this, it's us you can expect a fix from. :)


Edit to add: Would like to get this merged and released as soon as possible, but as of this morning we're running it from source in production— will report back if there are any anomalies with it.

@mikepurvis
Copy link
Member Author

mikepurvis commented Jul 29, 2016

We've been running this for the last couple days for our bundle builds, and the spurious failures have completely disappeared. I have a report from a ros-install-osx user who hit the problem on a multi-core Mac build, so I am keen to see this merged and released.

I'd appreciate input particularly on the business of running the env setup for create_catkin_clean_job and create_cmake_clean_job. My instinct is that it shouldn't ever be necessary, given that those jobs are mostly composed of FunctionStage instances which don't use the environment anyway, and the only CommandStage is to run make clean, which only uses the cwd.

@jbohren Have you had a chance to look at this at all?

This better aligns the function name with other similar
helpers in catkin_tools.jobs.utils.
@mikepurvis
Copy link
Member Author

@wjwwood @jbohren Would be delighted for you to look at this soon so that we can get this merged and released— it does resolve an actual bug in the wild which affects large parallel builds.

I have now renamed the new function to loadenv, made the FunctionStage name match, and also dropped this stage from clean jobs. I've tested with building, partially and fully cleaning, and re-building workspaces using this change, and have had no problems. This simplifies the overall diff and avoids some superstitious functionality.

@mikepurvis
Copy link
Member Author

mikepurvis commented Aug 16, 2016

Small fix at the behest of flake8; got a clean build now.

@mikepurvis
Copy link
Member Author

ping

@NikolausDemmel
Copy link
Member

I haven't looked at the implementation in detail, but I have been running with this patch on my dev machine for two weeks without any issues.

So +1 from me.

@mikepurvis
Copy link
Member Author

ping

1 similar comment
@mikepurvis
Copy link
Member Author

ping

@wjwwood
Copy link
Member

wjwwood commented Sep 28, 2016

I'm sorry @mikepurvis, I'm not likely to have time to review and merge this until after ROSCon. I'll do my best to have a look at it soon. Also, I don't think @jbohren ever got around to skimming it.

@mikepurvis
Copy link
Member Author

Just wanting to make sure it doesn't fall off the radar altogether. It's pretty inconvenient for us not having these changes in the native python-catkin-tools deb package (we have to pip install it from the master branch tarball inside debian/rules), so I'm definitely keen to see this merged and released at some point.

I could just release a fork under a different name, or patch in the fix with quilt, but those are their own headaches. And I'd like to continue making contributions upstream, which will become considerably harder to justify if I'm also maintaining a fork.

@wjwwood
Copy link
Member

wjwwood commented Nov 1, 2016

Sorry @mikepurvis, I'm trying to balance my time between this, rviz, and robot_model-ish stuff. I've recently spent some time on the others, but I haven't had a slice to spend on catkin tools yet. Hopefully I can do that this week.

We still don't have any feedback from @jbohren on this.

@mikepurvis
Copy link
Member Author

mikepurvis commented Nov 17, 2016

For anyone else interested, I've set up a PPA and released a version of python-catkin-tools with this patch applied. You can find it here:

https://launchpad.net/~mikepurvis/+archive/ubuntu/catkin

Process was:

dget http://packages.ros.org/ros/ubuntu/pool/main/p/python-catkin-tools/python-catkin-tools_0.4.2-1.dsc
wget https://github.com/catkin/catkin_tools/pull/391.diff
cd python-catkin-tools-0.4.2
quilt import ../391.diff
dch  # Must remove previous changelog entry so that the original archive is include in .changes file.
dpkg-buildpackage -S
dput ppa:mikepurvis/catkin ../python-catkin-tools_0.4.2-1mpu1_source.changes

@mikepurvis
Copy link
Member Author

@wjwwood ping

@mikepurvis
Copy link
Member Author

Since 0.4.3 got released (b538dea) without this fix included, I've updated my PPA to a new patch release:

https://launchpad.net/~mikepurvis/+archive/ubuntu/catkin/+packages

@wjwwood
Copy link
Member

wjwwood commented Jan 28, 2017

Ok, I've just tested this for a few hours today myself. Given a second code review from me (+1) and @mikepurvis's long term usage of catkin tools with this patch, I think I'm going to merge this now, without feedback from @jbohren.

Sorry it took so long @mikepurvis. I'll try to avoid delays like this in the future. I'd also like to look into sharing ownership with others in the community so things don't get hung up waiting on the maintainers again (currently me and @jbohren have push rights).

I want to try and include some more changes before doing a release, but I will avoid delaying too long before making one.

@wjwwood wjwwood merged commit 5262dec into catkin:master Jan 28, 2017
@wjwwood
Copy link
Member

wjwwood commented Feb 8, 2017

Just released 0.4.4.

@davetcoleman
Copy link
Contributor

Thanks @mikepurvis !

@emielsteerneman
Copy link

I am not sure if this is the correct place to ask this, but I still get this error.

-- catkin 0.7.8
Traceback (most recent call last):
    File "/media/emiel/HDD/roboteam/workspace/src/roboteam_input/cmake-build-debug/catkin_generated/generate_cached_setup.py", line 20, in <module>
        from catkin.environment_cache import generate_environment_script
ImportError: No module named catkin.environment_cache
CMake Error at /opt/ros/kinetic/share/catkin/cmake/safe_execute_process.cmake:11 (message):
    execute_process(/usr/bin/python
    "/media/emiel/HDD/roboteam/workspace/src/roboteam_input/cmake-build-debug/catkin_generated/generate_cached_setup.py")
    returned error code 1
Call Stack (most recent call first):
    /opt/ros/kinetic/share/catkin/cmake/all.cmake:186 (safe_execute_process)
    /opt/ros/kinetic/share/catkin/cmake/catkinConfig.cmake:20 (include)
    CMakeLists.txt:10 (find_package)

I am not really at home with CMake, so I have no idea where I would have to start fixing this. Could someone point me in the right direction? Thanks

@wjwwood
Copy link
Member

wjwwood commented Mar 20, 2018

@emielsteerneman you might want to make a new issue since this cannot be reopened (as pull requests cannot be opened after merging).

Also, you should include more information, like which version of ROS you're using and what version of catkin tools you're using.

@emielsteerneman
Copy link

Thanks for your response Wjwwood, now I'll know what to next time I run into such an issue. Fortunately, I managed to fix my current issue. My editor didn't source all the required files, such as setup.bash.

@rarestg
Copy link

rarestg commented Nov 13, 2018

Thanks for your response Wjwwood, now I'll know what to next time I run into such an issue. Fortunately, I managed to fix my current issue. My editor didn't source all the required files, such as setup.bash.

Hi! I'm trying to get Clion working with ROS, so I can step through code, and I'm getting the same error you were getting. Do you remember how/where you changed your editor to source setup.bash and other files? Thanks!

@emielsteerneman
Copy link

Hi! I'm trying to get Clion working with ROS, so I can step through code, and I'm getting the same error you were getting. Do you remember how/where you changed your editor to source setup.bash and other files? Thanks!

Hey rarestg! I did not modify the editor. The trick is to open CLion via the terminal. It is explained here : https://www.comkieffer.com/clion-ros.html . Good luck!

@FransGH
Copy link

FransGH commented Nov 19, 2019

Old thread but had this error too ("No module named catkin.environment_cache") with CLion and full remote development (clion on mac, target on linux-vm). Solution was to set CMAKE_PREFIX_PATH in CMakeLists.txt:

list(APPEND CMAKE_PREFIX_PATH "/opt/ros/kinetic")
list(APPEND CMAKE_PREFIX_PATH "/opt/ros/kinetic/share/catkin/cmake")
find_package(catkin REQUIRED COMPONENTS
        roscpp
        rospy
        std_msgs
        geometry_msgs
        sensor_msgs
        cv_bridge
        image_transport
        message_generation
)

@zhouwuxiong
Copy link

@FransGH so crazy,it helps a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants