Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Move to C++17 #32415

Closed
2 of 3 tasks
asfimport opened this issue Jul 18, 2022 · 35 comments
Closed
2 of 3 tasks

[C++] Move to C++17 #32415

asfimport opened this issue Jul 18, 2022 · 35 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Jul 18, 2022

The upcoming abseil release has dropped support for C++11, so {}eventually{}, arrow will have to follow. More details here.

Relatedly, when I tried to switch abseil to a newer C++ version on windows, things apparently broke in arrow CI. This is because the ABI of abseil is sensitive to the C++ standard that's used to compile, and google only supports a homogeneous version to compile all artefacts in a stack. This creates some friction with conda-forge (where the compilers are generally much newer than what arrow might be willing to impose). For now, things seems to have worked out with arrow specifying C+\11 while conda-forge moved to C\+17 - at least on unix, but windows was not so lucky.

Perhaps people would therefore also be interested in collaborating (or at least commenting on) this issue, which should permit more flexibility by being able to opt into given standard versions also from conda-forge.

Update:

It was voted on the dev ML to move to C++17:

Reporter: H. Vetinari

Subtasks:

Related issues:

Note: This issue was originally created as ARROW-17110. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
We still have to support down to GCC 4.8 (for some older R versions at least) so this will be a while coming. And even then I think C++14 will be the highest attainable version

@asfimport
Copy link
Collaborator Author

H. Vetinari:

We still have to support down to GCC 4.8 (for some older R versions at least) 

Where is the lower bound of R support defined (and how/why)? I tried looking but couldn't find anything. I think it'd be a good idea to define some sort of support policy (note, we did this in scipy over the last year or so, allowing us to move from 4.8 to 6.x and now to 8.x).

And even then I think C++14 will be the highest attainable version

Barring some progress on the lower bounds for compilers, you'll then be limited from upgrading abseil beyond 20220623.0 (and that's already working more or less by accident since 20211102 on unix is compiled with C+\17 in c-f. Unless we introduce multiple builds per CXX version in conda-forge, this problem will only get worse, because once vc142 becomes the minimum toolchain in the not too distant future, c-f can move to C\+17 also on windows globally).

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, where it is the default compiler. We do happen to get a number of bug reports from R users on CentOS 7, though.

R 3.6 on Windows used an odd gcc 4.9 mingw compiler, and that's the main source of "R requires an old compiler". But we already disable many features on R < 4.0 on Windows, and conditionally disabling more is not a problem. (GCS filesystem support, which uses abseil, is one of those already.) We could drop support for R 3.6 now, but since we can just disable features on the build, we haven't been forced to do so yet.

CRAN checks are now all running gcc 8 or newer: https://cran.r-project.org/web/checks/check_flavors.html

We have CI that builds arrow on C+\17 (and maybe also 14?). I think Homebrew also bumped up building arrow with C\17 to match abseil (or maybe that's still in the copy of the formula we test in apache/arrow). We also have an open PR to add Azure Blob Storage, which will require C\14: https://github.com/apache/arrow/pull/12914/files#r899724290. So maybe the solution for the abseil issue is to require the newer C\+ standard if using abseil built with it?

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
Thanks Neal - yeah I realize I read that backwards now. If we just need to build with 17 on conda-forge/when using newer Abseil that shouldn't be a problem (we'd have to update various pipelines/scripts), we just can't raise our minimum supported version.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
See also ARROW-12816.

I agree that for now conda-forge can simply build using C+\17. Just before the minimum version for Arrow is C\+11 doesn't mean you are forbidden to use a newer one :-D

@asfimport
Copy link
Collaborator Author

H. Vetinari:

Slight correction: GCC 4.8 is not an R requirement. It comes from CentOS 7, where it is the default compiler. We do happen to get a number of bug reports from R users on CentOS 7, though.

Centos 7 has the devtoolset backports until GCC 11 (except aarch where it's GCC 10) though... These are obviously available & in use for the manylinux images, and I think they're a very much acceptable requirement for users on such old platforms.

I agree that for now conda-forge can simply build using C+\17. Just before [because?] the minimum version for Arrow is C\+11 doesn't mean you are forbidden to use a newer one :-D

Well, I would like to avoid breaking your CI if possible. :)
And as I tried to explain, if conda-forge switches to C+\17 (especially on windows) while you still try to compile with C\+11, breakage is all-but-guaranteed.

PS. I hate the JIRA text parser mangler with a burning passion :O

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
Thanks. So let's make this ticket, "Make all conda-forge based CI pipelines specify C++17"?

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
It probably seems good for this ticket to focus on conda-forge but are the devtoolset backports a workable solution? If so, it would be nice to update the minimum version as well.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
The devtoolset backport won't do anything for the gcc 4.9 requirement on R Windows builds, I'm afraid.

@asfimport
Copy link
Collaborator Author

H. Vetinari:

The devtoolset backport won't do anything for the gcc 4.9 requirement on R Windows builds, I'm afraid.

What shape does this requirement take? The defining feature of the devtoolset backports is that they're fully ABI-compatible with the default compiler (i.e. 4.8), and I doubt R hard-depends on the presence of specific bugs in GCC 4.x that were fixed in later versions.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
The R Windows builds use a distribution of MinGW, not Centos: https://cran.r-project.org/bin/windows/Rtools/history.html

@asfimport
Copy link
Collaborator Author

Kouhei Sutou / @kou:
Can we avoid depending on Abseil ABI by removing Abseil use in cpp/src/arrow/filesystem/gcsfs_internal.cc and cpp/src/arrow/filesystem/gcsfs_test.cc?

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:

Can we avoid depending on Abseil ABI by removing Abseil use in cpp/src/arrow/filesystem/gcsfs_internal.cc and {}cpp/src/arrow/filesystem/gcsfs_test.cc{}?

AFAIU, the problem is that the GCS C++ headers use Abseil, so we depend on the ABI no matter what. @coryan Is my understanding right?

@asfimport
Copy link
Collaborator Author

H. Vetinari:

The R Windows builds use a distribution of MinGW, not Centos: https://cran.r-project.org/bin/windows/Rtools/history.html

Sorry, I read too quickly; should have been obvious that CentOS has nothing to do with windows.

However, the good news is that as soon as you drop support for R<4, the lower bound should be able to move up to GCC 8 (in rtools40; for R 4.0 & 4.1) resp. GCC 10 (in rtools42; for R>=4.2).

@asfimport
Copy link
Collaborator Author

Carlos O'Ryan / @coryan:

AFAIU, the problem is that the GCS C++ headers use Abseil, so we depend on the ABI no matter what.  @coryan  Is my understanding right?

That is correct, google-cloud-cpp uses Abseil in headers.

FWIW, gRPC now requires C++ >= 14:

https://github.com/grpc/proposal/blob/master/L98-requiring-cpp14.md
https://github.com/grpc/grpc/releases/tag/v1.47.0

so does google-cloud-cpp:

googleapis/google-cloud-cpp#8740
https://github.com/googleapis/google-cloud-cpp/releases/tag/v2.0.0

I expect that Protobuf will follow suit sooner rather than later.

The following policy, while yet adopted by any of these projects, may be informational:

https://opensource.google/documentation/policies/cplusplus-support

@asfimport
Copy link
Collaborator Author

Kouhei Sutou / @kou:
Thanks. I understand.

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
Just for consideration, is the following policy possible?

"We do not release new versions for R < 4 but we will consider backporting critical security issues"

I'm not sure if that would be more or less work than sprinkling more ifdef/checks throughout our code base.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
As I said before, we could drop support for R 3.6 now, but since we can just disable features on the build, we haven't been forced to do so yet. It's also possible for us to still test on and support older R versions but just not on Windows.

That said, IMO the real issue holding us to C+\11 isn't R or Windows but rather CentOS 7 and its default compilers. And I also don't think that abseil or any other optional dependency should determine whether the core Arrow library requires a newer C\+ standard, it should come from the Arrow developer community.

@asfimport
Copy link
Collaborator Author

H. Vetinari:

That said, IMO the real issue holding us to C++11 isn't R or Windows but rather CentOS 7 and its default compilers.

This sounds like the opposite of what Antoine was saying above (which I tend to agree with, if the R requirements aren't lifted as you describe). Isn't it harder to change stuff on windows (especially when there's an ABI-compatible GCC backport for CentOS)?

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Reiterating again: we could drop support for R 3.6 now (and thus the funky mingw gcc 4.9), but since we can just disable features on the build, we haven't been forced to do so yet.

@asfimport
Copy link
Collaborator Author

H. Vetinari:

Reiterating again: we could drop support for R 3.6 now (and thus the funky mingw gcc 4.9), but since we can just disable features on the build, we haven't been forced to do so yet. 

Apologies if I worded this badly (because relatively) - what I meant was: are the default compilers on CentOS really such a hard constraint when an os-native devtoolset install is just a CLI invocation away? Arguably that's the very reason these exist in the first place - because ultra-LTS OSes and up-to-date software don't mix well otherwise.

@asfimport
Copy link
Collaborator Author

Pavel Solodovnikov / @ManManson:
So, can we just execute all the usual testing tasks related to Arrow on the CentOS 7 executor with "devtoolset-11" to provide GCC 11, for example? Or maybe it should be another question instead: is there a CentOS 7 executor at all? (I am not familiar with Arrow's CI/CD infrastructure and procedures yet, so this may be a silly question)

If yes, then I think getting a green light after running the complete test suite will be a sign that we can safely advice Arrow users to build with devtoolset on centos and drop the dependency on gcc-4.8.

@asfimport
Copy link
Collaborator Author

H. Vetinari:
This is how manylinux does it (though with the caveat that the devtoolset only goes up until 10, because CentOS never published the builds for 11 on aarch, see e.g. pypa/manylinux#1266), see also the discussion in pypa/manylinux#1012.

@asfimport
Copy link
Collaborator Author

Kouhei Sutou / @kou:
If we use devtoolset on CetnOS 7 and our packages for CentOS 7 are built with _GLIBCXX_USE_CXX11_ABI=0, can our packages are linked with the default g++ (4.8.5)?

@asfimport
Copy link
Collaborator Author

H. Vetinari:
That's the idea behind the devtoolset, they use the same ABI as the original compiler. In slightly more detail, as summarized by Isuru (who some of you will likely know):

In devtoolset, libstdc\.so is a linker script that points to a static libstdc\ for the new parts and shared libstdc\ for the older parts and the C\ ABI is set to GCC-4 ABI.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:

If we use devtoolset on CetnOS 7 and our packages for CentOS 7 are built with _GLIBCXX_USE_CXX11_ABI=0, can our packages are linked with the default g++ (4.8.5)?

When we decide to switch to C+\14 or C\17, our header files will probably not be valid C+11 anymore, so it's not just a linker issue.

Either we want Arrow C\ to be compatible with the default g\ (why?) and then we need to still limit ourselves to C+\11, or we are happy telling users they need to use the devtoolset, and then we can switch to C+17 (assuming the R team is ok with dropping R 3.6 on Windows :-) ).

@asfimport
Copy link
Collaborator Author

H. Vetinari:
Just FYI, conda-forge now provides static builds of C+\11/C\14 as "escape hatches" for packages that cannot yet use the C\17 dynamic libs. This takes the heat off a little bit - i.e. it allows packages to move at their own speed w.r.t to C\, as opposed to forcing a conda-forge-wide choice for abseil -, but note that the next abseil version will still drop C\11 compatibility, so a move to at least C\+14 will still be necessary in the near-ish future.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
[~h-vetinari] The Abseil discussion is not very interesting IMHO, because it's possible to require C++17 only for GCS-enabled builds.

The important issue is about moving away from C+\11 for the whole codebase, i.e. adopt C\17 features in Arrow C\+ itself, not just have an optional dependency which requires it.

@asfimport
Copy link
Collaborator Author

H. Vetinari:
Sure, I was only commenting from the POV of abseil; I was not aware how deeply enmeshed (or not) this is with the rest of arrow. If you can move the parts depending on abseil to C++14 separately (and presumably not build them for various older runtimes), then there's less urgency.

@asfimport
Copy link
Collaborator Author

Kouhei Sutou / @kou:
I think that we can switch to C+\14 or C\17. Because it seems that we can mix a binary built with the default g\+ and a binary built with the debtoolset's g\ in the same process on CentOS 7.

I think that the following 2 cases:

  1. Build Arrow with the devtoolset's g\ and use the built Arrow as a library for a C++ program that is built with the default g\.

  2. dlopen() Arrow built with the devtoolset's g\ and a library built with the default g\ in the same process.

  3. is meaningless for us as Antoine said. Sorry.

  4. may be happen with Ruby. For example, ruby -r red-arrow -r unf_ext -e 'nil'. (unf_ext is one of Ruby libraries that use C++.)

It seems that 2. works too. So I think that we can switch to C+\14 or C\+17.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
@paleolimbot @nealrichardson Can you confirm that R would not block a move to C++17 and that we can launch a ML discussion about it?

@asfimport
Copy link
Collaborator Author

Dewey Dunnington / @paleolimbot:
I don't have much to add to Neal's ML note, although I will add some backround to why "the last 5 versions" is the support matrix used by many. The situations where we hear about old compilers and the arrow R package are primarily from large organizations that use Windows and whose IT departments are slow to upgrade anything (based on my brief experience in government, R 3.6 on Windows will still be around for a long time to come) and people using compute clusters that use centos7. Some of those environments prevent installing binaries from anywhere but CRAN, which may make backporting bugfixes difficult.

I'm not qualified to comment on moving Arrow's C++ code to anything for maintainability purposes...I just wanted to add some context to "we haven't done this yet because R": there really are users out there who want to use Arrow and may have to look elsewhere.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Not to muddy this further, but IMO we wouldn't put Depends: R >= 4.0 in the package when we upgrade to C++17 because (1) the package itself works fine with older R, just not the toolchain on Windows for older R; and (2) CRAN doesn't build and host binaries for R 3.6 anymore.

Also, in practice, I'm not concerned about arrow R package users on Windows 3.6. The parts of the package we're actively developing-datasets/acero-are already disabled in the old Windows builds because of threading issues. Windows R 3.6 users may still use arrow to read parquet and feather files, but for that they can use an older version of the arrow package just fine.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Thanks for the insight @nealrichardson @paleolimbot. It seems that nothing is blocking the move on the R side, which is good.

kou added a commit that referenced this issue May 3, 2023
Closes #33804 

### Rationale for this change

At some point, it would be useful to support the new C++ ABI `_GLIBCXX_USE_CXX11_ABI=1` in pyarrow wheels, especially when moving to C++17:

- #32415

I wanted to create a pyarrow wheel that supported the above ABI and adapted the existing CENTOS 7 manylinux2014 Dockerfile/wheel to produce a AlmaLinux 8 manylinux_2_28 Dockerfile/wheel.

Publishing wheels with a new ABI needs [careful consideration](https://pypackaging-native.github.io/key-issues/native-dependencies/cpp_deps/) so I think this is low priority, but I thought I'd provide this manylinux_2_28 implementation in case it was useful for current/future adoption. 

### What changes are included in this PR?

A manylinux_2_28 Dockerfile, adopted from the existing manylinux2014 Dockerfile

### Are these changes tested?

Manually tested at present

### Are there any user-facing changes?

Yes, there's a major ABI change, as pyarrow will be compiled with `_GLIBCXX_USE_CXX11_ABI=1`
* Closes: #33804

Supercedes:
* #33805
* Closes: #33804

Lead-authored-by: Simon Perkins <simon.perkins@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…#34818)

Closes apache#33804 

### Rationale for this change

At some point, it would be useful to support the new C++ ABI `_GLIBCXX_USE_CXX11_ABI=1` in pyarrow wheels, especially when moving to C++17:

- apache#32415

I wanted to create a pyarrow wheel that supported the above ABI and adapted the existing CENTOS 7 manylinux2014 Dockerfile/wheel to produce a AlmaLinux 8 manylinux_2_28 Dockerfile/wheel.

Publishing wheels with a new ABI needs [careful consideration](https://pypackaging-native.github.io/key-issues/native-dependencies/cpp_deps/) so I think this is low priority, but I thought I'd provide this manylinux_2_28 implementation in case it was useful for current/future adoption. 

### What changes are included in this PR?

A manylinux_2_28 Dockerfile, adopted from the existing manylinux2014 Dockerfile

### Are these changes tested?

Manually tested at present

### Are there any user-facing changes?

Yes, there's a major ABI change, as pyarrow will be compiled with `_GLIBCXX_USE_CXX11_ABI=1`
* Closes: apache#33804

Supercedes:
* apache#33805
* Closes: apache#33804

Lead-authored-by: Simon Perkins <simon.perkins@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…#34818)

Closes apache#33804 

### Rationale for this change

At some point, it would be useful to support the new C++ ABI `_GLIBCXX_USE_CXX11_ABI=1` in pyarrow wheels, especially when moving to C++17:

- apache#32415

I wanted to create a pyarrow wheel that supported the above ABI and adapted the existing CENTOS 7 manylinux2014 Dockerfile/wheel to produce a AlmaLinux 8 manylinux_2_28 Dockerfile/wheel.

Publishing wheels with a new ABI needs [careful consideration](https://pypackaging-native.github.io/key-issues/native-dependencies/cpp_deps/) so I think this is low priority, but I thought I'd provide this manylinux_2_28 implementation in case it was useful for current/future adoption. 

### What changes are included in this PR?

A manylinux_2_28 Dockerfile, adopted from the existing manylinux2014 Dockerfile

### Are these changes tested?

Manually tested at present

### Are there any user-facing changes?

Yes, there's a major ABI change, as pyarrow will be compiled with `_GLIBCXX_USE_CXX11_ABI=1`
* Closes: apache#33804

Supercedes:
* apache#33805
* Closes: apache#33804

Lead-authored-by: Simon Perkins <simon.perkins@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…#34818)

Closes apache#33804 

### Rationale for this change

At some point, it would be useful to support the new C++ ABI `_GLIBCXX_USE_CXX11_ABI=1` in pyarrow wheels, especially when moving to C++17:

- apache#32415

I wanted to create a pyarrow wheel that supported the above ABI and adapted the existing CENTOS 7 manylinux2014 Dockerfile/wheel to produce a AlmaLinux 8 manylinux_2_28 Dockerfile/wheel.

Publishing wheels with a new ABI needs [careful consideration](https://pypackaging-native.github.io/key-issues/native-dependencies/cpp_deps/) so I think this is low priority, but I thought I'd provide this manylinux_2_28 implementation in case it was useful for current/future adoption. 

### What changes are included in this PR?

A manylinux_2_28 Dockerfile, adopted from the existing manylinux2014 Dockerfile

### Are these changes tested?

Manually tested at present

### Are there any user-facing changes?

Yes, there's a major ABI change, as pyarrow will be compiled with `_GLIBCXX_USE_CXX11_ABI=1`
* Closes: apache#33804

Supercedes:
* apache#33805
* Closes: apache#33804

Lead-authored-by: Simon Perkins <simon.perkins@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@pitrou
Copy link
Member

pitrou commented Jul 4, 2023

This was done some time ago, closing.

@pitrou pitrou closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants