Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6858: [C++] Simplify transitive build option dependencies #14224

Merged
merged 4 commits into from
Sep 27, 2022

Conversation

kou
Copy link
Member

@kou kou commented Sep 23, 2022

This approach adds "depends" information to each (bool) build options and use it to resolve transitive build option dependencies automatically. This approach implements topological sort in CMake to resolve transitive dependencies.

Another approach proposed in the associated Jira issue: It creates a Python script that generates a CMake code (.cmake file) that handles transitive dependencies. Dependencies information are written in the Python script.

I think that this approach is better than the another approach because:

  • We can put option definitions and their dependencies into the same place. (We don't need to put them into .cmake and .py.)
  • We don't need to regenerate a .cmake file when we update option dependencies.
  • We can specify dependencies information with a simple way. (We can just add "DEPENDS ARROW_XXX ARROW_YYY ..." to an option definition.)

Here are downsides of this approach:

  • We need to maintain topological sort implementation in CMake. Because CMake doesn't provide a topological sort feature that is used in CMake internally. But topological sort algorithm is well-known (Tarjan's algorithm was published at 1976) and its implementation in this approach has only 20+ CMake lines. I think that we can maintain it.
  • This can't support complex conditions such as "ARROW_X AND NOT ARROW_Y". But we don't have any complex condition for now.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

This approach adds "depends" information to each (bool) build options
and use it to resolve transitive build option dependencies
automatically. This approach implements topological sort in CMake to
resolve transitive dependencies.

Another approach proposed in the associated Jira issue: It creates a
Python script that generates a CMake code (.cmake file) that handles
transitive dependencies. Dependencies information are written in the
Python script.

I think that this approach is better than the another approach
because:

* We can put option definitions and their dependencies into the same
  place. (We don't need to put them into .cmake and .py.)
* We don't need to regenerate a .cmake file when we update option
  dependencies.
* We can specify dependencies information with a simple way. (We can
  just add "DEPENDS ARROW_XXX ARROW_YYY ..." to an option defintion.)

Here are downsides of this approach:

* We need to maintain topological sort implementation in
  CMake. Because CMake doesn't provide a topological sort feature that
  is used in CMake internally. But topological sort algorithm is
  well-known (Tarjan's algorithm is published at 1976) and it's
  implementation in this approach has only 20+ CMake lines. I think
  that we can maintain it.
* This can't support complex conditions such as "ARROW_X AND NOT
  ARROW_Y". But we don't have any complex condition for now.
@kou kou force-pushed the cpp-cmake-simplify-option-dependencies branch from 443a594 to de04fca Compare September 23, 2022 19:19
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. The topological sort in CMake makes me feel a little uneasy but it's at least not complicated :)

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nits

cpp/cmake_modules/DefineOptions.cmake Outdated Show resolved Hide resolved
cpp/cmake_modules/DefineOptions.cmake Show resolved Hide resolved
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks neat, thank you.

cpp/cmake_modules/DefineOptions.cmake Outdated Show resolved Hide resolved
kou and others added 3 commits September 27, 2022 06:02
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
@cyb70289 cyb70289 merged commit 53ac2a0 into apache:master Sep 27, 2022
@kou kou deleted the cpp-cmake-simplify-option-dependencies branch September 27, 2022 05:42
@ursabot
Copy link

ursabot commented Sep 27, 2022

Benchmark runs are scheduled for baseline = 44bbf9c and contender = 53ac2a0. 53ac2a0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.2% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.28% ⬆️0.25%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 53ac2a00 ec2-t3-xlarge-us-east-2
[Finished] 53ac2a00 test-mac-arm
[Failed] 53ac2a00 ursa-i9-9960x
[Finished] 53ac2a00 ursa-thinkcentre-m75q
[Finished] 44bbf9cd ec2-t3-xlarge-us-east-2
[Finished] 44bbf9cd test-mac-arm
[Failed] 44bbf9cd ursa-i9-9960x
[Finished] 44bbf9cd ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Comment on lines -412 to -419
if(ARROW_PYTHON)
set(ARROW_COMPUTE ON)
set(ARROW_CSV ON)
set(ARROW_DATASET ON)
set(ARROW_FILESYSTEM ON)
set(ARROW_HDFS ON)
set(ARROW_JSON ON)
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this broke my local development because of no longer including those (although I should probably start using presets ..)

Now, it's probably fine to remove this now Python C++ has moved, but we do assume that some C++ modules are built on the pyarrow side (eg we assume that CSV is always built, while with the above change you need to ensure manually that this is done in your cmake call).
In any case we should update the documentation at https://arrow.apache.org/docs/dev/developers/python.html#build-and-test to indicate that there are a few components required to be able to build pyarrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the info.

OK. I add ARROW_PYTHON again but I make it a deprecated option. We'll remove ARROW_PYTHON eventually.

See also: ARROW-17868

fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Oct 17, 2022
…he#14224)

This approach adds "depends" information to each (bool) build options and use it to resolve transitive build option dependencies automatically. This approach implements topological sort in CMake to resolve transitive dependencies.

Another approach proposed in the associated Jira issue: It creates a Python script that generates a CMake code (.cmake file) that handles transitive dependencies. Dependencies information are written in the Python script.

I think that this approach is better than the another approach because:

* We can put option definitions and their dependencies into the same place. (We don't need to put them into .cmake and .py.)
* We don't need to regenerate a .cmake file when we update option dependencies.
* We can specify dependencies information with a simple way. (We can just add "DEPENDS ARROW_XXX ARROW_YYY ..." to an option definition.)

Here are downsides of this approach:

* We need to maintain topological sort implementation in CMake. Because CMake doesn't provide a topological sort feature that is used in CMake internally. But topological sort algorithm is well-known (Tarjan's algorithm was published at 1976) and its implementation in this approach has only 20+ CMake lines. I think that we can maintain it.
* This can't support complex conditions such as "ARROW_X AND NOT ARROW_Y". But we don't have any complex condition for now.

Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants