Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better python package identification #215

Closed

Conversation

rotu
Copy link
Contributor

@rotu rotu commented Aug 27, 2019

@rotu rotu force-pushed the better_package_identification branch from 09eb1d8 to e17f0ff Compare August 27, 2019 23:48
@dirk-thomas dirk-thomas added enhancement New feature or request in progress Actively being worked on (Kanban column) labels Aug 28, 2019
@codecov-io
Copy link

codecov-io commented Aug 28, 2019

Codecov Report

Merging #215 into master will decrease coverage by <.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #215      +/-   ##
==========================================
- Coverage   80.24%   80.23%   -0.01%     
==========================================
  Files          54       55       +1     
  Lines        3098     3097       -1     
  Branches      512      511       -1     
==========================================
- Hits         2486     2485       -1     
  Misses        576      576              
  Partials       36       36
Impacted Files Coverage Δ
colcon_core/task/python/build.py 33.56% <0%> (ø) ⬆️
colcon_core/task/python/test/__init__.py 26.31% <0%> (ø) ⬆️
colcon_core/run_setup_py.py 37.5% <37.5%> (ø)
colcon_core/package_identification/python.py 93.05% <88.46%> (+5.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a75968f...3f670a8. Read the comment docs.

@rotu
Copy link
Contributor Author

rotu commented Aug 28, 2019

@dirk-thomas, Finally got it to pass CI! This was a bear - for some reason colcon doesn't seem to work properly when installed in editable (developer) mode. Shall I squash before or after you take a first review pass?

@rotu rotu changed the title WIP: Better package identification Better python package identification Aug 28, 2019
Copy link
Member

@dirk-thomas dirk-thomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason colcon doesn't seem to work properly when installed in editable (developer) mode.

I would be very surprised if that is the case. A symlink install / develop mode is my primary work environment and it works just fine for me. Please feel free to create a separate ticket with specifics if there is something which needs fixing.

Shall I squash before or after you take a first review pass?

No need to. I will take a look at it right away...

colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
test/test_package_identification_python.py Outdated Show resolved Hide resolved
test/test_package_identification_python.py Outdated Show resolved Hide resolved
colcon_core/package_identification/python.py Outdated Show resolved Hide resolved
@dirk-thomas
Copy link
Member

Please don't change the line limit and revert 0c94427.

@rotu
Copy link
Contributor Author

rotu commented Aug 29, 2019

Please don't change the line limit and revert 0c94427.

so not this? https://index.ros.org//doc/ros2/Contributing/Developer-Guide/#id4
What should the line limit be?
Edit: seems this uses the flake8 default of 79 instead of ament_flake8's 99. I'll switch back to 79 but I'm not gonna like it!

@dirk-thomas
Copy link
Member

seems this uses the flake8 default of 79

Yeah, colcon uses plain pep8 / flake8 without any ROS specific customizations.

@rotu rotu force-pushed the better_package_identification branch 2 times, most recently from 71433fd to d3ca897 Compare August 29, 2019 18:32
@rotu
Copy link
Contributor Author

rotu commented Aug 29, 2019

It successfully builds the entire ROS2 source tree now!

@dirk-thomas
Copy link
Member

dirk-thomas commented Sep 5, 2019

I think you introduced some syntax errors here.

Indeed - fixed in ba17901 and 8dad4a0.

@dirk-thomas
Copy link
Member

I dislike the use of format_locals in this way since (1) it prevents static analysis tools from being able to catch mistakes in variable/property names (2) it prevents tools from being able to detect unused local variables.

Both of these arguments are limitations in tooling. Those should go away when we move f-strings (see next comment).

On the plus side it resembles f-string format, but I’d rather use .format()

That is exactly the intention. Once we can drop Python 3.5 support all .format_map(locals()) cases will be converted to f-strings.

@dirk-thomas
Copy link
Member

Sorry for the numerous small edits. I thought I could address a few nitpicks through the GitHub UI but then it went down a rabit hole. I should have edited the files locally and pushed the changes to this branch instead...

For the moment I have stopped iterating on the patch since I am unable to bootstrap the colcon packages using this. The package identification of colcon-argcomplete fails due to a PicklingError:

Can't pickle <class 'CustomSdistDebCommand'>: attribute lookup CustomSdistDebCommand on builtins failed

The problem seems to be due to the switch to the multiprocess module and trying to pass Python data back and forth. To resolve this the logic will likely need to be change to be similar to what colcon-python-setup-py does: the subprocess outputs the data in Python syntax which then is being interpreted (see https://github.com/colcon/colcon-python-setup-py/blob/4d9c1c4bb2e8652ec4099be0cd9b1cf31cf73ac2/colcon_python_setup_py/package_identification/python_setup_py.py#L121).

That being said maybe it would be better to:

  • keep the Python package identification extension in this repo as is (only taking setup.cfg files into account, enough to perform the bootstrapping) and
  • update the Python package identification extension in the colcon-python-setup-py package to use the approach from this PR

What do you think?

@rotu
Copy link
Contributor Author

rotu commented Sep 5, 2019

Sorry for the numerous small edits. I thought I could address a few nitpicks through the GitHub UI but then it went down a rabit hole.

It's all good. I basically live down a fractal rabbit hole.

For the moment I have stopped iterating on the patch since I am unable to bootstrap the colcon packages using this. The package identification of colcon-argcomplete fails due to a PicklingError

I'll take a look. That particular issue looks as simple as adding 'cmdclass' to the blacklist here https://github.com/RoverRobotics-forks/colcon-core/blob/fbaa13384f97740b968bdeaf975f751dab298a29/colcon_core/package_identification/python.py#L154-L162

It's possible to just not use multiprocessing and run setup.py in the colcon process, but I figured isolating package setup.py's from each other is a desirable feature.

I do like this approach (I'm sure in part because I've sunk so much time and effort into it) and I think it's going to do the right thing more often than either the setup.py or setup.cfg approach. I'd like to fix it and ship it 🚀

@dirk-thomas
Copy link
Member

That particular issue looks as simple as adding 'cmdclass' to the blacklist

That sounds like a viable option to me.

It's possible to just not use multiprocessing and run setup.py in the colcon process

I wasn't suggesting that. Even the colcon-python-setup-py does it in a separate process - it just uses subprocess instead of multiprocessing. But lets go with the blacklisting for now.

@rotu rotu force-pushed the better_package_identification branch from d9f7c39 to 53915a2 Compare September 5, 2019 18:29
@dirk-thomas
Copy link
Member

Please rebase your branch on top of the latest state of master. Otherwise it is difficult to try the bootstrapping since other packages already depend on changes available on master.

@rotu rotu force-pushed the better_package_identification branch from 53915a2 to 63ba56b Compare September 5, 2019 18:43
@rotu
Copy link
Contributor Author

rotu commented Sep 5, 2019

Rebased.

@dirk-thomas
Copy link
Member

While the CI tests pass one of the tests fails when running colcon test --packages-select colcon-core. In the failing test the function distutils.core.run_setup even returns the information about the colcon-core package even when pointing to a setup.py file which is empty (https://github.com/colcon/colcon-core/pull/215/files#diff-381a95d67d06d3fbe2e095a1b1d3644bR38).

@rotu rotu force-pushed the better_package_identification branch 3 times, most recently from 6a43078 to d300e75 Compare September 7, 2019 03:35
@dirk-thomas
Copy link
Member

Why is the change in 1cf7b4f necessary? Could this be deferred to a separate PR instead to keep the diff minimal?

@rotu rotu force-pushed the better_package_identification branch from 94a420b to 23f0605 Compare September 13, 2019 23:46
@dirk-thomas
Copy link
Member

Please move the improved test into a separate PR with the goal to merge that one first.

This is waiting for #229, right?

@rotu
Copy link
Contributor Author

rotu commented Sep 19, 2019

This is waiting for #229, right?

Yes.

@rotu rotu force-pushed the better_package_identification branch from 59bf510 to 92af5f1 Compare September 19, 2019 20:54
@dirk-thomas
Copy link
Member

While #229 should land first it would be good to see this PR pass CI and then show the coverage impact of both PRs (since this one includes the other one). Let me know once that is the case so I can re-review. Thanks.

rotu and others added 4 commits September 19, 2019 17:36
Previously, we either monkey-patched setuptools and harvested the arguments passed to setuptools.setup or we parsed setup.cfg
Now we run the setup.py script with distutils.core.run_setup to return properties from the resulting Distribution object.
@rotu rotu force-pushed the better_package_identification branch from 92af5f1 to 1864386 Compare September 19, 2019 22:37
@rotu
Copy link
Contributor Author

rotu commented Sep 19, 2019

While #229 should land first it would be good to see this PR pass CI and then show the coverage impact of both PRs (since this one includes the other one). Let me know once that is the case so I can re-review. Thanks.

Done. In retrospect, splitting this up into two PRs has seemed to be a mistake :-(

@dirk-thomas
Copy link
Member

Overall what this patch does looks good to me.

The only thing making me a bit worry is the introduced complexity into colcon-core with the usage of multiprocessing. I just recall pass problems which have recently come up again (e.g. ros-infrastructure/catkin_pkg#266). And we don't strictly need this functionality to bootstrap colcon-core since all colcon packages use setup.cfg files.

I brought up the alternative above and would still be interested what you think about moving the functionality in colcon-python-setup-py where it would actually replace the current approach to "interrogate" the setup function:

That being said maybe it would be better to:

  • keep the Python package identification extension in this repo as is (only taking setup.cfg files into account, enough to perform the bootstrapping) and
  • update the Python package identification extension in the colcon-python-setup-py package to use the approach from this PR

What do you think?

@rotu
Copy link
Contributor Author

rotu commented Sep 26, 2019

I don't like the idea of having two implementations of the same logical operation, but you gave me an idea...

@rotu rotu force-pushed the better_package_identification branch 3 times, most recently from 65bfdeb to b9fd456 Compare September 26, 2019 23:54
@rotu
Copy link
Contributor Author

rotu commented Sep 26, 2019

Alright. I made it not use subprocesses when bootstrapping. I think it's silly and unnecessary - building C++ and Python projects necessarily involves spawning subprocesses; no need to fear them for package identification.

@rotu rotu force-pushed the better_package_identification branch 3 times, most recently from 971d916 to 192fcda Compare September 28, 2019 04:36
It turns out most of what I was doing with multiprocessing.Process was reinventing Pool.apply.
This also naturally throttles the number of simultaneous processes, which was a possible concern.
Move setup.py-specific stuff to run_setup_py. I expect to reuse this for python build, plus isolating it makes the subprocess workload minimal.
@rotu rotu force-pushed the better_package_identification branch from 192fcda to 3f670a8 Compare September 28, 2019 04:43
@rotu
Copy link
Contributor Author

rotu commented Sep 28, 2019

@dirk-thomas, I think you're gonna like this. I switched to a process pool to reduce the number of simultaneous processes (by default = number of CPU cores) and the code is even cleaner. Still passes all the tests :-).

@rotu
Copy link
Contributor Author

rotu commented Sep 30, 2019

Oh and setuptools read_configuration is not even threadsafe on simple cfg files. It uses os.chdir.

@rotu
Copy link
Contributor Author

rotu commented Oct 1, 2019

@dirk-thomas So now that it uses a process pool, this will not create arbitrarily many parallel processes which averts the Windows issue you mentioned.

As for the complexity, I’d rather have one, more complex and correct code path than two partial ones. Bugs hide in poorly-exercised code paths. If you'd rather move python discovery and building out of this package entirely, that sounds like a great task for a follow-on PR.

Is there something I’m missing or can this move forward as-is?

@dirk-thomas
Copy link
Member

I’d rather have one, more complex and correct code path than two partial ones.

I thought about this for a while and my main goal is to keep the complexity in colcon-core as low as possible. That is why colcon-core only offers the simple setup.cfg file extraction and colcon-python-setup-py the more complex subprocess-based setup.py / setup() evaluation.

Certainly the former needed fixing to not handle cases it didn't actually support (see colcon/colcon-ros#65).

If you'd rather move python discovery and building out of this package entirely, that sounds like a great task for a follow-on PR.

Python package identification can't be moved out of the core since this package wouldn't be able to bootstrap itself anymore (see https://colcon.readthedocs.io/en/released/developer/bootstrap.html).

Can you take a look at colcon/colcon-python-setup-py#22 and see if that works for your use case, too.

@dirk-thomas
Copy link
Member

Closing in favor of colcon/colcon-python-setup-py#22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress Actively being worked on (Kanban column)
Development

Successfully merging this pull request may close these issues.

3 participants