-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64 support #1620
aarch64 support #1620
Conversation
aarch64 timer is available to userspace via arch register.
outline-atomics compilation flag changes behaviour of builtin_atomics, by adding runtime detection of LSE atomics. If these are supported, they will be used. This gains LSE atomics use without hurting compatibility with older aarch64 machines.
|
Regarding aarch64 architecture timer, being a ASM can I assume it will work under FreeBSD aarch64? So |
|
Both look good to me. |
|
Looking back |
|
Minor, but for completeness could you add a routine.cycles id to https://github.com/MariaDB/server/blob/10.5/mysys/my_rdtsc.c#L361? Its only used by unittest/mysys/my_rdtsc-t.c. Could you include the unit test results for the cycles timer in the commit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me. But, I think that we should do this in MariaDB Server 10.4 already. It was the first major version to extensively use std::atomic, for which -moutline-atomics could make a difference. It also introduced the use of the clang function __builtin_readcyclecounter(), which is claimed to cause SIGILL on some ARMv8 implementations.
It might be interesting to work on a follow-up change that would specify the GCC function attribute target_clones for some code that is making heavy use of std::atomic. I am mainly thinking of the InnoDB rw_lock_t functions, that is, provide both LSE and non-LSE versions of rw_lock_s_lock() and similar functions.
| ulonglong result; | ||
| __asm __volatile("mrs %0, CNTVCT_EL0" : "=&r" (result)); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this instruction available on all ARMv8 implementations? MDEV-23249 claims that the FreeBSD executables of MariaDB 10.3 or later are triggering SIGILL on the Raspberry Pi 3. In my understanding, that ought to be due to clang’s __builtin_readcyclecounter(). What code would that function generate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this register guaranteed to be nonzero? @mysqlonarm posted an experiment on MDEV-23249 that suggests that 0 is returned on his hardware. I posted a follow-up comment with some analysis on the impact of a constant return value. I think that some InnoDB performance benchmarks could be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approach using __builtin_readcyclecounter returns 0 but the cntvct_el0 return expected values.
bash$ cat timer.cc
#include
using namespace std;
int main()
{
unsigned long long result;
__asm __volatile("mrs %0, CNTVCT_EL0" : "=&r" (result));
cout << result << endl;
cout << __builtin_readcyclecounter() << endl;
}
bash$ clang++-8 timer.cc
bash$ ./a.out
829795946130603
0
bash$ ./a.out
829796087735762
0
bash$ ./a.out
829796206079031
0
bash$ ./a.out
829796414853453
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like he was reading PMCCNTR_EL0 - which is the cycle counter that belongs to the PMU. PMU could be set to e.g. only measure cycles in some execution states. It seems like a clang bug to use that reg for builtin_readcyclecounter. I'll dig.
I'm using CNTVCT_EL0 which is part of the architecture-timer. Always on, always non-decreasing.
I actually got to this patch because of severe performance issues. Without it, I got the constant "return 0" which was returned by get_rnd_value as you explain, which made my system crush as it couldn't get locks. With it, mariadb works well under load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that __builtin_readcyclecounter() was emitting code to read pmccntr_el0 and not cntvct_el0. The inline assembler code appears to work for @mysqlonarm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first #if in that row is # if __has_builtin(__builtin_readcyclecounter), which means clang will always go to it's own (wrong) implementation. I could test for aarch64 first, and not test for GNUC. It will fix mariadb on clang but code will look a little worse. Should I go this route?
In parallel, I'll dig into clang's __builtin_readcyclecounter and see if it could be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__GNUC__ is defined (as 4) on clang possibly from the beginning. Thus, it might be sufficient to disable that __builtin_readcyclecounter on ARM and fall back to the asm code. It would indeed be nice if the wrong implementation were corrected in clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To note, 10.3 is not affected by this, as it does not contain this code, which was introduced in 10.4.7.
I have combined this patch along with testing that the target is not aarch64 in the test for __builtin_readcyclecounter. A complete patch that combines them can be found in FreeBSD PR #248160. I've mentioned in the mentioned JIRA report that this patch allows MariaDB 10.5 to run on my Raspberry Pi 3. Since 10.4's code is identical for this file, it should fix things there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CyberBotX, thank you for confirming that this will address MDEV-23249.
@tsahee, can you please apply the ideas and rebase this pull request to 10.4? Please also refer to MDEV-23249 in the commit message. In that way, I can apply this pull request in such a way that GitHub identifies it as merged, not rejected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #1635 for 10.4. Please let me know if anything is missing.
addressed in the new pull request: #1635 |
|
This was closed by #1635 in 10.4, which will be merged to 10.5 some time before the 10.5.5 release. |
This pull request addresses two aarch64 issues:
This pull request is released under BSD license