You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Obviously, this is more difficult to build and package:
CMake and setup.py files must be refactored a bit so that we can reuse code between the parent and child packages
Separate conda and wheel packages must be produced. With conda this seems more straightforward but since the child wheels depend on the parent core wheel, the build process seems more complicated
In any case, I don't think these challenges are insurmountable. This will have several benefits:
Smaller installation footprint for simple use cases (though note we are STILL duplicating shared libraries in the wheels, which is quite bad)
Less developer anxiety about expanding the scope of what Python code is shipped from apache/arrow. If in 5 years we are shipping 5 different Python wheels with each Apache Arrow release, that sounds completely fine to me.
Our current monolithic approach to Python packaging isn't likely to be sustainable long-term.
At a high level, I would propose a structure like this:
We can maintain the semantic appearance of a single
pyarrow
package by having thin API modules that would look likeObviously, this is more difficult to build and package:
CMake and setup.py files must be refactored a bit so that we can reuse code between the parent and child packages
Separate conda and wheel packages must be produced. With conda this seems more straightforward but since the child wheels depend on the parent core wheel, the build process seems more complicated
In any case, I don't think these challenges are insurmountable. This will have several benefits:
Smaller installation footprint for simple use cases (though note we are STILL duplicating shared libraries in the wheels, which is quite bad)
Less developer anxiety about expanding the scope of what Python code is shipped from apache/arrow. If in 5 years we are shipping 5 different Python wheels with each Apache Arrow release, that sounds completely fine to me.
Reporter: Wes McKinney / @wesm
PRs and other links:
Note: This issue was originally created as ARROW-8518. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: