Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow: Refactor auto options #23163

Merged

Conversation

danimtb
Copy link
Member

@danimtb danimtb commented Mar 18, 2024

The aim of this PR is to solve recurrent issues with the options in the Arrow recipe. We have received many PRs trying to fix this situation but instead of making the recipe simpler, it has become complex this the management of "auto" options.

Making the recipe too smart is a way to make the recipe more likely to fail and in the end users may face bad experiences at cosume time.

Hope this helps to improve the situation!

Related to #22508 (comment)


@ericLemanissierBot
Copy link

ericLemanissierBot commented Mar 18, 2024

I detected other pull requests that are modifying arrow/all recipe:

This message is automatically generated by https://github.com/ericLemanissier/conan-center-conflicting-prs so don't hesitate to report issues/improvements there.

@conan-center-bot

This comment has been minimized.

Copy link
Contributor

@valgur valgur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a much saner approach. Thanks!

recipes/arrow/all/conanfile.py Show resolved Hide resolved
recipes/arrow/all/conanfile.py Show resolved Hide resolved
@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

@danimtb
Copy link
Member Author

danimtb commented Mar 21, 2024

Waiting for changes at #23185

@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

@danimtb
Copy link
Member Author

danimtb commented Mar 22, 2024

@valgur did not have much luck activating the parquet option although I think it makes sense. However, I just want to push the refactor of options management in this recipe, so let's leave the option value tweaks for a future PR

@danimtb danimtb marked this pull request as ready for review March 22, 2024 12:20
@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

@danimtb
Copy link
Member Author

danimtb commented Apr 2, 2024

After discussing the changes of this PR internally, we would like to avoid throwing unnecessary warnings to the user. However, some of the auto logic on options are unknown to us and would like to ask about them upstream in the arrow project. Especially the one regarding jemalloc and BSD platform.

@danimtb
Copy link
Member Author

danimtb commented Apr 3, 2024

I have opened an issue in the arrow project to ask for some collaboration apache/arrow#40979

@ericLemanissierBot ericLemanissierBot mentioned this pull request Apr 4, 2024
3 tasks
@kou
Copy link
Contributor

kou commented Apr 5, 2024

I'm an Apache Arrow developer. What should I answer to complete this PR?

@danimtb
Copy link
Member Author

danimtb commented Apr 5, 2024

Hi @kou and thanks again for your response.

As you can see in the changes of the PR, we are trying to simplify the management of "auto" options (CMake variables) in the Arrow recipe. As some option values affected other options the UX was quite ricky for the user consuming this recipe and the maintenance of the recipe was harder at different points.

We reviewed the logic and finally reduced most of the option default values to False, however, there are some of them that were quite dependant on their "auto" behavior to other ones and we added some warnings in the recipe:

        # Warn/error on some option values
        if not self.options.with_boost:
            if self.options.gandiva:
                self.output.warning("Option 'with_boost=True' could be required when option 'gandiva=True'")
        if not self.options.with_re2:
            if self.options.gandiva or self.options.parquet:
                self.output.warning("Option 'with_re2=True' could be required when option 'gandiva=True' or 'parquet=True'")
            if self.options.compute or self.options.dataset_modules:
                self.output.warning("Option 'with_re2=True' could be required when 'compute=True' or 'dataset_modules=True'")
        if not self.options.with_jemalloc:
            if "BSD" in self.settings.os:
                self.output.warning("Option 'with_jemalloc=True' could be required for BSD in settings.os")

We would like to simplify this more but we are not familiar with the intrinsics of the Arrow build, so basically we would need to answer these questions:

  • Should with_jemalloc (CMake's ARROW_JEMALLOC) option be True in BSD platforms? Are there any other platforms where jemalloc should be used mandatory?
  • Should Boost be required whenever the gandiva (ARROW_GANDIVA) option is activated?
  • Should re2 dependency be required whenever gandiva, parquet (ARROW_PARQUET), compute (ARROW_COMPUTE) or dataset_modules (ARROW_DATASET) are used?

Also, we noticed that this recipe was somehow used in the https://github.com/apache/arrow repository, so we want to know if there would be any issue with these changes.

Thanks a lot! 😄

@kou
Copy link
Contributor

kou commented Apr 5, 2024

  • Should with_jemalloc (CMake's ARROW_JEMALLOC) option be True in BSD platforms? Are there any other platforms where jemalloc should be used mandatory?

We don't need to use True for with_jemalloc on BSD platforms. There are no platforms that require jemalloc.

(In general, jemalloc is better than system allocator. So we recommend jemalloc but it's not required.)

  • Should Boost be required whenever the gandiva (ARROW_GANDIVA) option is activated?

Yes.

  • Should re2 dependency be required whenever gandiva, parquet (ARROW_PARQUET), compute (ARROW_COMPUTE) or dataset_modules (ARROW_DATASET) are used?

No. re2 is required only for gandiva.

Also, we noticed that this recipe was somehow used in the https://github.com/apache/arrow repository, so we want to know if there would be any issue with these changes.

We use this recipe only for CI. We're testing this recipe with unreleased Apache Arrow. It's for smooth release process. We want to avoid that downstreams including this recipe have patches that fixes problems in released Apache Arrow. If we can detect problems before release a new version, downstreams don't need to have patches. :-)

But maintaining this recipe (keeping this recipe buildable) in apache/arrow difficult... It's broken recently...

@danimtb
Copy link
Member Author

danimtb commented Apr 9, 2024

Thanks @kou for the answers. I will update the recipe accordingly! 😄

@conan-center-bot
Copy link
Collaborator

Conan v1 pipeline ✔️

All green in build 15 (b25d3f56f9623a7e6dcc790bd7ec2fe9ad80708b):

  • arrow/14.0.2:
    All packages built successfully! (All logs)

  • arrow/15.0.0:
    All packages built successfully! (All logs)

  • arrow/13.0.0:
    All packages built successfully! (All logs)

  • arrow/12.0.1:
    All packages built successfully! (All logs)

  • arrow/8.0.0:
    All packages built successfully! (All logs)

  • arrow/7.0.0:
    All packages built successfully! (All logs)

  • arrow/14.0.1:
    All packages built successfully! (All logs)

  • arrow/8.0.1:
    All packages built successfully! (All logs)

  • arrow/14.0.0:
    All packages built successfully! (All logs)

  • arrow/12.0.0:
    All packages built successfully! (All logs)

  • arrow/11.0.0:
    All packages built successfully! (All logs)

  • arrow/10.0.1:
    All packages built successfully! (All logs)

  • arrow/10.0.0:
    All packages built successfully! (All logs)


Conan v2 pipeline ✔️

Note: Conan v2 builds are now mandatory. Please read our discussion about it.

All green in build 14 (b25d3f56f9623a7e6dcc790bd7ec2fe9ad80708b):

  • arrow/14.0.2:
    All packages built successfully! (All logs)

  • arrow/15.0.0:
    All packages built successfully! (All logs)

  • arrow/10.0.0:
    All packages built successfully! (All logs)

  • arrow/14.0.0:
    All packages built successfully! (All logs)

  • arrow/14.0.1:
    All packages built successfully! (All logs)

  • arrow/13.0.0:
    All packages built successfully! (All logs)

  • arrow/12.0.0:
    All packages built successfully! (All logs)

  • arrow/11.0.0:
    All packages built successfully! (All logs)

  • arrow/12.0.1:
    All packages built successfully! (All logs)

  • arrow/7.0.0:
    All packages built successfully! (All logs)

  • arrow/10.0.1:
    All packages built successfully! (All logs)

  • arrow/8.0.0:
    All packages built successfully! (All logs)

  • arrow/8.0.1:
    All packages built successfully! (All logs)

@@ -332,8 +200,7 @@ def requirements(self):
self.requires("lz4/1.9.4")
if self.options.with_snappy:
self.requires("snappy/1.1.9")
if Version(self.version) >= "6.0.0" and \
self.options.get_safe("simd_level") != None or \
if self.options.get_safe("simd_level") != None or \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.options.get_safe("simd_level") != None or \
if self.options.get_safe("simd_level") is not None or \

@@ -332,8 +200,7 @@ def requirements(self):
self.requires("lz4/1.9.4")
if self.options.with_snappy:
self.requires("snappy/1.1.9")
if Version(self.version) >= "6.0.0" and \
self.options.get_safe("simd_level") != None or \
if self.options.get_safe("simd_level") != None or \
self.options.get_safe("runtime_simd_level") != None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.options.get_safe("runtime_simd_level") != None:
self.options.get_safe("runtime_simd_level") is not None:

Comment on lines +488 to +489
# FIXME: only headers components is used
self.cpp_info.components["libarrow"].requires.append("boost::boost")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# FIXME: only headers components is used
self.cpp_info.components["libarrow"].requires.append("boost::boost")
self.cpp_info.components["libarrow"].requires.append("boost::headers")

@wgtmac
Copy link
Contributor

wgtmac commented Apr 10, 2024

Happy to see this gets resolved. Thanks!

BTW, I port this to the Apache Arrow repo and let's see how it goes: apache/arrow#39729

@wgtmac
Copy link
Contributor

wgtmac commented Apr 11, 2024

I have reported an issue with the new recipe: #23475. Could you please take a look? @danimtb

dvirtz added a commit to dvirtz/vscode-parquet-viewer that referenced this pull request Apr 12, 2024
yhsng pushed a commit to yhsng/conan-center-index that referenced this pull request Apr 12, 2024
* Arrow: Refactor auto options

* add todo

* make boost compulsory in msvc

* enable parquet and thrift by default

* bump boost version to avoid conflicts with thrift's

* try with re2

* bool compute

* back to previous default values

* avoid config

* fix

* Arrow: remove dead code (conan-io#6)

* remove dead code

* fix indentation

* Update recipes/arrow/all/conanfile.py

* remove warnings

---------

Co-authored-by: ericLemanissier <ericLemanissier@users.noreply.github.com>
dvirtz added a commit to dvirtz/vscode-parquet-viewer that referenced this pull request Apr 12, 2024
dvirtz added a commit to dvirtz/vscode-parquet-viewer that referenced this pull request Apr 12, 2024
ericLemanissier added a commit to ericLemanissier/conan-center-index that referenced this pull request Apr 16, 2024
* Arrow: Refactor auto options

* add todo

* make boost compulsory in msvc

* enable parquet and thrift by default

* bump boost version to avoid conflicts with thrift's

* try with re2

* bool compute

* back to previous default values

* avoid config

* fix

* Arrow: remove dead code (#6)

* remove dead code

* fix indentation

* Update recipes/arrow/all/conanfile.py

* remove warnings

---------

Co-authored-by: ericLemanissier <ericLemanissier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants