Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python bindings naming convention and packaging #5224

Open
patricksnape opened this issue Aug 20, 2015 · 9 comments
Open

Python bindings naming convention and packaging #5224

patricksnape opened this issue Aug 20, 2015 · 9 comments

Comments

@patricksnape
Copy link
Contributor

I've been working on creating automated builds (using conda) of OpenCV2 and OpenCV3. In particular, I am only really interested in the Python bindings. However, I have a couple of questions about the design decisions for OpenCV's Python bindings:

cv2.so is just dropped inside the Python site-packages folder

I don't think that this is good practise. It goes against standard Python packaging principles and dirties the contents of the site-packages folder. However, I also have a concrete complaint: on Windows, due to the way shared library loading is achieved, we need to drop all the opencv*.dll next to the cv2.so module. However, if another poorly packaged project also requires being dropped in the root of site-packages, and binds to opencv using C-extensions, it will also need these DLLs. As I mentioned, I'm trying to make automated builds, and this becomes very difficult if every project needs to start dumping files into the same folder, some of which may overwrite each other. Not to mention the fact that removing a package may involve cleaning up dependencies, which would include the aforementioned DLLs (breaking other packages that might rely on them). Therefore, why not place the cv2.so inside a small Python package that just exposes all the same top level modules/methods? This would have the advantage of isolating all the dependencies and also allow easy shipping of companion pure Python code with OpenCV, if this was ever needed.

OpenCV2 and OpenCV3 both use cv2 as their namespace

This one seems particularly crazy to me. Since OpenCV3 deprecated the cv2.cv module, which some people require, OpenCV3 is not a drop in replacement for OpenCV2. Which is fine, as long as the two are namespaced separately... but they are not. Why was OpenCV3 not given a new package name, such as cv3? At the moment, if someone installs OpenCV3 over OpenCV2, they will lose their OpenCV2! This seems very strange to me. If the whole point of the major version change was to denote breaking, non-backwards compatible changes, why would you keep the same namespace? This is further aggravated by the problem mentioned above, in that the cv2.so will literally get overwritten if OpenCV3 is installed after OpenCV2 and vice-versa.

As I said, I mostly just wanted to discuss the rationale behind these points. I understand that I do not represent a core contributor, but OpenCV is an amazing project and a massive boon to the Python community. Therefore, I would love to help steer it in a direction that made it more accessible to even more Python developers by providing easy to install automated builds.

@alalek
Copy link
Member

alalek commented Aug 20, 2015

  1. Dependency on all the opencv*.dll:

    Try to build static library (cmake -DBUILD_SHARED_LIBS=OFF ...). You can also check INSTALL_CREATE_DISTRIB=ON parameter. This should help.

  2. Python package conflicts.

    Did you try virtualenv? It is native Python approach to build isolated environments with different package configurations.

  3. Usage of both OpenCV 2.x and 3.x:

    OpenCV 3.x still uses module name cv2, same issue exists for C++ includes opencv2/..., MacOSX opencv2 framework. Main reason is to avoid large changes in code: in OpenCV code and in developers code due migration to 3.0. The second reason is rationality, the most part of developers don't use both versions of OpenCV in the same project.
    But issue with installation into the same place is here, there are no clear decisions. So current workaround about avoiding conflicts is to install different OpenCV versions into different places.

@patricksnape
Copy link
Contributor Author

Thanks for responding so quickly!

Try to build static library (cmake -DBUILD_SHARED_LIBS=OFF ...).

This is certainly an option, however, it does have the downside that I will have to run the build twice, once to create the DLLs and once to create a cv2.so that is statically linked.

You can also check INSTALL_CREATE_DISTRIB=ON parameter. This should help.

I looked at the usages of INSTALL_CREATE_DISTRIB and it didn't seem very useful for me. It's just going to put the built Python .so in some strange place, and force me to need to move it back afterwards.

Anyway, this doesn't really speak to the strange non-standard usage of cv2.so as a Python extension. I was more concerned about the lack of isolation to be honest. Although, on Windows I may enable static builds to get around this (if MSVC is happy to do it). Thanks for that suggestion!

Did you try virtualenv?

I am well versed in the in-and-outs of Python packaging and environments. Hence why I am such a firm believer in conda, as it is the only solution for properly distributing binary packages such as OpenCV. virtualenv does not help at all here, as I want to distribute OpenCV to developers who don't want to have to set up a compilation environment. Neither virtualenv nor pip/wheels are helpful in the case of something like OpenCV.

Main reason is to avoid large changes in code

Really? I'm surprised at this attitude coming from such a large software package! Why bother with a major version change if you really mean to just replace OpenCV2? Why not make the latest release just 2.5? I imagine the answer is because you wanted everyone to know that this release was full of breaking changes and a change in philosophy regarding things like non-free code and the opencv-contrib split. But, by maintaining the namespace, you have effectively just created OpenCV 2.5, as it is impossible to install them side-by-side. Plus, its called OpenCV3, and you import the namespace cv2 which also seems very strange!

The second reason is rationality, the most part of developers don't use both versions of OpenCV in the same project.

I think that this is a naive assumption. I think many people will have lots of code written using cv2.cv that means they either can't use OpenCV3, or must re-write their code. What if I want to use some of the new features added in OpenCV3, but I want to maintain my existing code? At the moment, this attitude says "too bad, re-write it all if you want to use OpenCV3". This seems crazy coming from a package that must have tens of thousands of developers.

@nehalecky
Copy link

👍

I want to distribute OpenCV to developers who don't want to have to set up a compilation environment.

I am one of those others, and as a community user trying to engage with OpenCV, I'd like to commend @patricksnape for his efforts in seeing OpenCV made available under package management. Their decision to use conda is the right one for all of the reasons he listed, including this reality:

Neither virtualenv nor pip/wheels are helpful in the case of something like OpenCV.

As Python continues to grow as a main language within scientific and data tools, conda is quickly becoming a the de-facto package management system. Proper packaging and distribution of OpenCV via conda would allow for a much broader adoption of OpenCV, which would be a great thing for this project, and CV in general.

As I am still trying to experience all of the awesomeness in Python support with the OpenCV 3.0 cut (having spent already a significant amount of time resolving dependencies), and I'm eager to see these advancements occur. For this, I'd recommend that the OpenCV team fully engage the issues brought forward (they are relevant to the distribution effort) and see them through to resolution so we can all benefit from this great community effort. :)

@patricksnape
Copy link
Contributor Author

@nehalecky Thanks for the words of encouragement! I'm glad that other people also feel this way. Please try installing the conda packages (conda install -c menpo opencv3) and let me know if you have any issues over on the repository. I hope everything 'just works'!

@roger-
Copy link

roger- commented Sep 1, 2015

Just wanted to say that reusing the CV2 namespace immediately struck me as mind boggling and counterintuitive. CV3 is incompatible in many cases, so why confuse people?! If someone's porting code then it's reasonable to make them /s/cv2/cv3 themselves.

Better yet use a version-less namespace like opencv, just like most packages do.

@rpep
Copy link

rpep commented Sep 1, 2015

Definitely agree with reusing the cv2 namespace - I see you replied that the reasoning was to avoid large changes in code, but in reusing it it's actually worse because it's now ambiguous if someone is using v2 or v3 unless they clearly state it (which I'm sure most people know isn't always done by people asking for help on sites like StackExchange). It's also illogical given that there was a change made from cv to cv2 previously - it'd either be good practice to stick with a single namespace, like cv, or change every version, but not do both.

With regards to installing on Windows, it was a huge pain to get working (although I don't normally develop on it, so it was probably partly my fault too). Some clearer instructions would probably be really helpful to newcomers.

@hmaarrfk
Copy link
Contributor

I don't know if there has been any effort to reduce the dependency on all the dynamic libraries at runtime.

This really causes pains when trying to compile opencv with other heavy dependencies (such as QT or ML libraries)

I'm not sure if this is still on the roadmap, but I would definitely be interested in knowing if it was !!!

cc: @mingwandroid

@mingwandroid
Copy link

mingwandroid commented Jan 21, 2019

Hey @hmaarrfk.

TL;DR: The easy part only saves 124KB and the hard part is really hard (and goes against what upstream does and documents).

I did some research here, there are 46 libopencv DSOs, while the Python extension links to 36 of them, the ten it doesn't link to are:

libopencv_datasets.so.3.4.2
libopencv_dnn_objdetect.so.3.4.2
libopencv_dpm.so.3.4.2
libopencv_hdf.so.3.4.2
libopencv_line_descriptor.so.3.4.2
libopencv_phase_unwrapping.so.3.4.2
libopencv_stereo.so.3.4.2
libopencv_superres.so.3.4.2
libopencv_videostab.so.3.4.2
libopencv_xobjdetect.so.3.4.2

The total size of all OpenCV DSOs is 121MB.
The saving we would make by excluding these DSOs is 124KB.

So from the perspective of disk space, there's no point in splitting things up further here.

We pass -Wl,--as-needed to ensure that the static linker doesn't pull in DSOs that are not needed. I expect that's the reason we're eliding these 10 libraries (though it could also be explicit in the CMake build files).

If you wanted to split things up so that the large libraries become optional then you'd need to do a lot of reprogramming to change how the OpenCV python bindings work, splitting cv2.so into multiple DSOs then you'd need to educate people about how our OpenCV python bindings have deviated from upstream.

Also, given we use hardlinks when we can, the cost of the OpenCV DSOs is only felt once.

IMHO none of this is worth the effort.

@hmaarrfk
Copy link
Contributor

I see, thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants