Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15678: [C++][CI] a crossbow job with MinRelSize enabled #12928

Closed
wants to merge 19 commits into from

Conversation

jonkeane
Copy link
Member

@jonkeane jonkeane commented Apr 19, 2022

In addition to the CI job that tests this, it includes for the wrong SIMD instructions on some platforms

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-r-minsizerel

This should fail, it is before the fix is added

@github-actions
Copy link

Revision: 44bfc59

Submitted crossbow builds: ursacomputing/crossbow @ actions-1876

Task Status
test-r-minsizerel Github Actions

@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-r-minsizerel

After @westonpace 's patch

@github-actions
Copy link

Revision: eb91ce0

Submitted crossbow builds: ursacomputing/crossbow @ actions-1877

Task Status
test-r-minsizerel Github Actions

@@ -78,96 +79,24 @@ class BitmapWriter {
int64_t byte_offset_;
};

class FirstTimeBitmapWriter {
class ARROW_EXPORT FirstTimeBitmapWriter {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolved the windows CI failures, though not sure if I should include ARROW_EXPORT for other classes as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the other classes are fully defined in the header you should not have to. Technically it wouldn't hurt either but best to be minimal with changes probably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@pitrou
Copy link
Member

pitrou commented Apr 20, 2022

@jonkeane Can you point to the Windows CI failures?

@jonkeane
Copy link
Member Author

@pitrou
Copy link
Member

pitrou commented Apr 20, 2022

Well, un-inlining the FirstTimeBitmapWriter method definitions could very well decrease performance, so we shouldn't do that. I'm also sure there's a way to avoid the linking failures, perhaps @kou can help?

@pitrou
Copy link
Member

pitrou commented Apr 20, 2022

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Apr 20, 2022

Benchmark runs are scheduled for baseline = 4544f95 and contender = 24a1764. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Finished ⬇️2.85% ⬆️0.25%] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️2.13% ⬆️0.3%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/537| 24a17640 test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/546| 24a17640 ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/544| 4544f953 ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/531| 4544f953 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/529| 4544f953 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/541| 4544f953 ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@jonkeane
Copy link
Member Author

There definitely are regressions in throughput on those C++ benchmarks.

Interestingly, it looks like there was a pretty big regression (larger than here, even!) in some of these already around #12634 which is odd since that's a java + documentation only ticket. Though it does look like our compiler version on the benchmarking machine went from 9.4.0 to 10.3.0 coincidentally at that commit, which might be the real source of that regression.

@pitrou
Copy link
Member

pitrou commented Apr 20, 2022

Yes, you can see there are regressions in kernels that write an output bitmap (such as is_in or substring matching), which definitely relates to FirstTimeBitmapWriter.

@kou
Copy link
Member

kou commented Apr 21, 2022

Can we use ARROW_FORCE_INLINE instead of hiding implementation to .cc here?

We can use __attribute__((always_inline)) with g++. There are many force inline macros in our code base such as FORCEINLINE, FORCE_INLINE and GDV_FORCE_INLINE.

@pitrou
Copy link
Member

pitrou commented Apr 21, 2022

I started a Crossbow job here: https://github.com/ursacomputing/crossbow/actions/runs/2203242917

@westonpace
Copy link
Member

westonpace commented Apr 22, 2022

I played around with this a bit more. I can reproduce it locally by building with SSE4_2:

cmake .. -DARROW_PARQUET=ON -DARROW_SIMD_LEVEL=SSE4_2 -DARROW_RUNTIME_SIMD_LEVEL=MAX -DARROW_BUILD_TESTS=ON

From there it's easiest to verify by just manually checking to see which version ends up in libparquet.so:

objdump --disassemble=_ZN5arrow8internal21FirstTimeBitmapWriter10AppendWordEml -S ./minsizerel/libparquet.so.800.0.0

If the output contains shlx then you've reproduced the bug. If it only contains shl then it picked the correct default symbol. If the method is entirely inlined you get no output.

  • The symbol is inlined with -DCMAKE_BUILD_TYPE=Release
  • The symbol is not inlined with -DCMAKE_BUILD_TYPE=MinSizeRel
    • However, on my system, in all cases, the libparquet.so file chooses the correct version unless...
    • I can get an invalid .so file if I switch the order the object files are passed to the linker: /usr/bin/clang++-13 ... level_conversion_bmi2.cc.o ... level_conversion.cc.o ...
  • The symbol is inlined, even with MinSizeRel is I try @kou's fix (__attribute__((always_inline))).
    • This seems like the easiest "spot fix" if we wanted to include something as part of 8.0.0

If you really want to reproduce the issue, I found a tool sde64 which will work if you have an Intel processor. It allows you to simulate older Intel processors and so you can pretend to have an Ivy Bridge processor (which does not have AVX2/BMI2 support):

sde64 -ivb -- ./minsizerel/parquet-arrow-test --gtest_filter=TestParquetIO/0.SingleNullableListNullableColumnReadWrite
Running main() from ../googletest/src/gtest_main.cc
Note: Google Test filter = TestParquetIO/0.SingleNullableListNullableColumnReadWrite
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestParquetIO/0, where TypeParam = arrow::BooleanType
[ RUN      ] TestParquetIO/0.SingleNullableListNullableColumnReadWrite
TID 0 SDE-ERROR: Executed instruction not valid for specified chip (IVYBRIDGE): 0x7f80c95b96b3: shlx rax, rbx, rax
Image: /home/pace/dev/arrow/cpp/sse4.2-min-build/minsizerel/libparquet.so.800+0x1666b3
Function: _ZN5arrow8internal21FirstTimeBitmapWriter10AppendWordEml
Instruction bytes are: c4 e2 f9 f7 c3 

@pitrou has also posted a suggestion on the Jira using pragmas. I had to include the arch specifier so it generated __attribute__((target("arch=haswell,avx2"))) and this appears to do what is expected. The function is compiled with avx2 but the nested call is not.

@kszucs
Copy link
Member

kszucs commented Apr 24, 2022

Thanks @westonpace for digging deeper! Since the release is approaching we should choose one solution to unblock the release.

@pitrou
Copy link
Member

pitrou commented Apr 25, 2022

IMHO the temporary solution should simply be to disallow dynamic dispatch on Homebrew builds. This should be doable by passing -DARROW_RUNTIME_SIMD_LEVEL=NONE to CMake.

@jonkeane
Copy link
Member Author

If we're going to leave this as is, we should also document somewhere that compiling with MinSizeRel also needs to disable SIMD in the same way.

@pitrou
Copy link
Member

pitrou commented Apr 25, 2022

This is not due to MinSizeRel specifically, so it shouldn't be worded like that. Also, this is really a bug that needs fixing, it's just that it's not reasonably possible to fix it properly for the release.

@westonpace
Copy link
Member

I agree with @pitrou that it doesn't make sense to rush this while we investigate a longer term fix for dynamic dispatch. The forced inline is more of a band-aid and there is no real guarantee it would work in all situations.

Agreed this isn't related to MinSizeRel. That configuration triggers this bug on this particular system. However, it is entirely possible some other build on some other system runs into this exact same bug without using MinSizeRel. As far as the compiler's are concerned we are in the domain of "undefined behavior" so any change to compilation flags or compiler versions could change what results we get.

@jonkeane
Copy link
Member Author

Ok, if we want to test this in our CI before we upstream the change to homebrew, I would remove this line

# https://github.com/Homebrew/homebrew-core/issues/94724
# https://issues.apache.org/jira/browse/ARROW-15664
ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2"
, add the compiler flag below, and then run the crossbow job homebrew-r-brew.

@westonpace westonpace removed their request for review July 11, 2022 14:36
@kszucs
Copy link
Member

kszucs commented Jul 25, 2022

@jonkeane @nealrichardson what's the status of it? Do we need this for the next homebrew release?

@jonkeane
Copy link
Member Author

What's here only adds the CI that will replicate the failure long-discussed in ARROW-15678. I'm inclined to close it until we have a champion who can resolve the issue — the code is here if they need it (or we could merge it and have a crossbow job that always fails until we have someone who can champion the fix...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants