-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6858: [C++] Simplify transitive build option dependencies #14224
ARROW-6858: [C++] Simplify transitive build option dependencies #14224
Conversation
|
This approach adds "depends" information to each (bool) build options and use it to resolve transitive build option dependencies automatically. This approach implements topological sort in CMake to resolve transitive dependencies. Another approach proposed in the associated Jira issue: It creates a Python script that generates a CMake code (.cmake file) that handles transitive dependencies. Dependencies information are written in the Python script. I think that this approach is better than the another approach because: * We can put option definitions and their dependencies into the same place. (We don't need to put them into .cmake and .py.) * We don't need to regenerate a .cmake file when we update option dependencies. * We can specify dependencies information with a simple way. (We can just add "DEPENDS ARROW_XXX ARROW_YYY ..." to an option defintion.) Here are downsides of this approach: * We need to maintain topological sort implementation in CMake. Because CMake doesn't provide a topological sort feature that is used in CMake internally. But topological sort algorithm is well-known (Tarjan's algorithm is published at 1976) and it's implementation in this approach has only 20+ CMake lines. I think that we can maintain it. * This can't support complex conditions such as "ARROW_X AND NOT ARROW_Y". But we don't have any complex condition for now.
443a594
to
de04fca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me. The topological sort in CMake makes me feel a little uneasy but it's at least not complicated :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks neat, thank you.
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Benchmark runs are scheduled for baseline = 44bbf9c and contender = 53ac2a0. 53ac2a0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
if(ARROW_PYTHON) | ||
set(ARROW_COMPUTE ON) | ||
set(ARROW_CSV ON) | ||
set(ARROW_DATASET ON) | ||
set(ARROW_FILESYSTEM ON) | ||
set(ARROW_HDFS ON) | ||
set(ARROW_JSON ON) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this broke my local development because of no longer including those (although I should probably start using presets ..)
Now, it's probably fine to remove this now Python C++ has moved, but we do assume that some C++ modules are built on the pyarrow side (eg we assume that CSV is always built, while with the above change you need to ensure manually that this is done in your cmake call).
In any case we should update the documentation at https://arrow.apache.org/docs/dev/developers/python.html#build-and-test to indicate that there are a few components required to be able to build pyarrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the info.
OK. I add ARROW_PYTHON
again but I make it a deprecated option. We'll remove ARROW_PYTHON
eventually.
See also: ARROW-17868
…he#14224) This approach adds "depends" information to each (bool) build options and use it to resolve transitive build option dependencies automatically. This approach implements topological sort in CMake to resolve transitive dependencies. Another approach proposed in the associated Jira issue: It creates a Python script that generates a CMake code (.cmake file) that handles transitive dependencies. Dependencies information are written in the Python script. I think that this approach is better than the another approach because: * We can put option definitions and their dependencies into the same place. (We don't need to put them into .cmake and .py.) * We don't need to regenerate a .cmake file when we update option dependencies. * We can specify dependencies information with a simple way. (We can just add "DEPENDS ARROW_XXX ARROW_YYY ..." to an option definition.) Here are downsides of this approach: * We need to maintain topological sort implementation in CMake. Because CMake doesn't provide a topological sort feature that is used in CMake internally. But topological sort algorithm is well-known (Tarjan's algorithm was published at 1976) and its implementation in this approach has only 20+ CMake lines. I think that we can maintain it. * This can't support complex conditions such as "ARROW_X AND NOT ARROW_Y". But we don't have any complex condition for now. Lead-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Yibo Cai <yibo.cai@arm.com>
This approach adds "depends" information to each (bool) build options and use it to resolve transitive build option dependencies automatically. This approach implements topological sort in CMake to resolve transitive dependencies.
Another approach proposed in the associated Jira issue: It creates a Python script that generates a CMake code (.cmake file) that handles transitive dependencies. Dependencies information are written in the Python script.
I think that this approach is better than the another approach because:
Here are downsides of this approach: