New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on v3.20.2 and Ryzen 5 5500U #379
Comments
I'll need to know where in the code it's crashing. Please add --debug and if you are familiar with gdb a backtrace would be helpfull. All testing should be done using the default build, and please provide some more details about the faulting system, like the amount of RAM and any differences from the working system. Edit: also since the issue is not thread related testing would be better with only one miner thread. |
Alright, I think I'm going to exclusively use Output with
gdb output of the same:
I don't think this is an algo issue—allium, x11, neoscrypt, scryptn2, and any other algorithm I try gives the same output. |
Just remembered benchmark mode existed and tested with it, doesn't seem to be an algo issue:
|
You're solo mining. I don't think it's the issue if it works on the 3500U but it gives another opportunity to narrow the crash location. The only thing after the last message is calling thread_init which does nothing for most algos, then enters the loop and starts looking for work to hash. The next expected log is a new block report from GBT, stratum generates a different report and If you could test with stratum & benchmark the code would take different paths looking for work and might change the symptoms. Beyond that some additional debug messages can be added to zoom in on the exact code that's causing the crash. However, from a higher level, the fact that it works on the other system indicates a problem specific to the one system. Let me know if you're comfortable enough with code to add more debug messages with some coaching. Edit: adding -P will produce protocol logs and may tell us if it's even trying to connect to the server. Edit: I was starting to think it's an issue with solo mining. It's not well tested or mainained. You could try Tpruvot cpuminer-multi and/or Pooler cpuminer so see if either of them work. But that theory is shot down by the fact cpuminer-opt works on another system. |
I think I'm going to try stratum as a starting point since that's better maintained. I think I could stumble through adding some debug messages to the code if you point me in the right direction, but would prefer not to start with that. As for the miner and the os, I've tried downloading cpuminer-opt several times (even the version before the latest version), and they showed similar issues. Going to also try it on a new Live USB to see if it's the OS. Adding -P produces this. It looks normal to me up to the segfault, but hopefully you can make more sense of this than I can.
Thanks for all the help so far! |
From the protocol logs I can tell that the server sent work and the miner crashed trying to decode it. I have no idea why that would happen on one system but not another. It's also crashing in GBT code so your stratum test might produce different results. The focus for the GBT crash is on cpu-miner.c:get_upstream_work. That function sends the getblocktemplate request and procceses the result by calling gbt_work_decode then producing the new block log. This is the window where it crashes, and the place to put some debug printf as checkpoints to help narrow it down further. I'l wait for the stratum test results, if it's reproduceable using stratum it will make troubleshooting easier. |
Stratum seems to work, I let it run for a while and it was stable:
Going to try to narrow down the point where it crashes now. |
I'm starting to suspect an issue with the wallet, do both systems have their own wallets? If they're on the same network try mining on the other's wallet. |
They were originally on different wallets. I tried mining with the 3500u wallet, 5500u wallet, and a newly created wallet on both systems, but the 3500u system always worked and the 5500u system always gave a segfault. Now trying to narrow down the point of the crash. |
As you mentioned, I was able to confirm that it first crashes here. Going further into the code, it crashes here. I was able to track it to this for loop, where it looked like the program looped through it a few times, then crashed. This is where the mystery deepens. I changed the for loop (with no other changes to the code) to:
And it began solving blocks. The hashrate seems to match with what I was seeing in benchmarks. and the miner was indistinguishable from the working system apart from the junk output. Could there be some sort of race condition here?
|
Holy shit, good work. Can you get the loop counter and ARRAY_SIZE? Edit: It's silly code, ARRAY_SIZE controls the loop but hard coded 7 is used inside, but they should match. The target is just the 256 bit hash expressed as a uint32 array. I don't like that ARRAY_SIZE macro, might as well hard code it to 8 since the array's size is assumed inside the loop anyway. Edit2: I realize the stupidity of my first question. Capturing the loop counter makes the problem go away so it will always be 8. I suspect the compiler is building that section of code differently when you add the printf. The loop is more likely to be I'm not sure we'll get to the root cause but getting rid of ARRAY_SIZE macro might be a good start. I'm not a C expert so I'm not sure if its implementation is correct., On the surface I don't see a problem with it. |
It's a bit of a challenge as placing a
It generates output like this:
Any ideas on how I could output |
We're getting some crosstalk, I'm getting caught up with what you just wrote, I made some further comments above. Edit: as I suspected might happen. Compiler optimization is playing factor but hard coding the array's size might solve the problem. (meaning the crash) Edit: be32dec is a byte swap function used to convert from Little Endian to Big Endian. It's written to be agnostic, it will return |
I'm even less of a C expert, but I gave it a shot. I started every try with a fresh clone of the repo. Just curious, how did you guess that compiler optimisation was playing a factor in this? Past compiler horror stories? |
I'm not sure at what level loop unrolling occurs but vectorization is possible on a fixed sized loop and needs -O3. The entire array can be byte swapped in one shot using AVX2. With a printf in the loop vectorization isn't possible but loop unrolling still is. |
It's definitely vectorization. |
The only possibilities are wrong loop size, bad target pointer, or data is misaligned for AVX2. I've seen misaligned data before in hand coded vector instructions. but here the compiler is deciding to vectorize so it should check alignment before doing so. Also the data is defined with 64 bit aligment, which is more than enough for AVX2. Array size error seems more likely especially if it looped a couple of time before crashing. That's a classic buffer overflow. Capturing ARRAY_SIZE( work->target ) is critical. Displaying it before the for loop should still allow the loop to be vectorized and crash. Or just hard code the loop to 8 and see if the crash goes away. I think I'll get rid of the macro. It's used mostly for target and hash who's size is fixed. Using the macro is unnecessary. Edit: I think I've found part of the problem, misalignment is a possibility for the source target, I was only thinking of the destination work->target. This still involves a compiler bug because it should have detected the misalignment before vectorizing. Here's a look at the definitions with alignment added where necessary:
|
Gives array size as 8 before the loop starts. Additionally, if I hardcode
Unfortunately, I later realised that I wasn't sure if it looped before crashing. I originally had it print the iteration count at the end of each iteration and saw it increase, but that was before I realised it would fix the issue. Hence, I'm not sure now if the loop completes any iterations before crashing. |
Stay tuned I think I've found it!!! |
Damn, code formatting never works for me. I think I've found part of the problem, misalignment is a possibility for the source target, I was only thinking of the destination work->target. This still involves a compiler bug because it should have detected the misalignment before vectorizing. Here's a look at the definitions with alignment added where necessary: static bool gbt_work_decode( const json_t *val, struct work *work ) I don't know why attribute was in bold but it helps identify the three lines that need to be changed. |
I'm probably going to go to sleep soon, but do let me know if you need any help testing! I don't understand vectorization enough to guess—do you have a guess as to why this problem only shows up on some systems? |
Can you do a quick test with alignment added since you can reproduce the crash, I can't. I'm pretty confident now but Agree on the sleep, we must be in the same time zone. If this works I'll sleep well tonight. |
Are you able to put this up on pastebin so I can download and test it? I think GitHub may have removed some of the underscores. Edit: nevermind, I got it. |
You're right. Two leading and 2 trailing undescrores in attribute. There are many examples in the code if you grep -r attribute. |
Oops. I did it with this and it segfaulted again.
Edit: I did figure out the minor mystery of |
Oh well, maybe have to sleep on it. |
Some thoughts to sleep on... The crash is indeed reported as a segfault. A misaligned address should throw a processor exception in the same way a divide by zero or invalid instruction does. A segfault should allways be an invalid pointer address. At least that's the way it works on some processor architectures I'm more familiar with. Counterpoint, it only crashes when greater-than-default data alignment is required, that is when the loop is vectorized. We need to see the work->target & target pointers. |
Some last tests before sleep. Code:
WIth the
Without the patch:
Unfortunately, I don't have a very good understanding of pointers so I'm mostly lost here. |
You're one up on me, I wasn't aware of %p. Both those pointers are properly aligned. The low 6 address bits are zero which provides 64 byte alignment, more than requested. There's something else going on that seems to be specific to your CPU. This copy loop is used frequently in stratum code and has never crashed before. It also doesn't crash on your other CPU with every other controllable variable the same. I assume the same vectorization occurs on that CPU. The only difference architecturally is the addition of VAES in Ryzen 5000. That will result in kernel changes as well as any AES related code. Much of the affected code is in cpuminer-opt but only in the hashing code and definitely not anywhere near where it's crashing. At this point it looks like one or both properly aligned and apparently valid pointers is causing a segfault when the optimiser auto-vectorizes a loop. But if auto-vectorization is disabled in the compiler there is no segfault. It occurs persistently on one particular CPU and never on a very similar CPU with identical OS, compiler and source code. The crash itself is a mystery, from all the data available it shouldn't crash. That it doesn't crash on the Ryzen 3500U, or apparently anywhere else, makes it even more mysterious. That it only crashes when the code is auto-vectorized, well... I'm stumped. |
Finally did some testing from a live usb of 22.04. Still segfaults, so I
don't think it's the system. -O2 worked as expected.
With the patch:
0x7f1878002150 0x7f187f5dece0
Without the patch:
0x7fba58002150 0x7fba5f1d7ce0
…On Sun., Aug. 28, 2022, 01:25 JayDDee, ***@***.***> wrote:
You're one up on me, I wasn't aware of %p.
Both those pointers are properly aligned. The low 6 address bits are zero
which provides 64 byte alignment, more than requested.
Both pointers also look good. I don't know the memory mapping but both are
within 4 GB of each other.
There's something else going on that seems to be specific to your CPU.
This copy loop is used frequently in stratum code and has never crashed
before. It also doesn't crash on your other CPU with every other
controllable variable the same.
I assume the same vectorization occurs on that CPU.
The only difference architecturally is the addition of VAES in Ryzen 5000.
That will result in kernel changes as well as any AES related code. Much of
the affected code is in cpuminer-opt but only in the hashing code and
definitely not anywhere near where it's crashing.
At this point it looks like one or both properly aligned and apparently
valid pointers is causing a segfault when the optimiser auto-vectorizes a
loop. But if auto-vectorization is disabled in the compiler there is no
segfault. It occurs persistently on one particular CPU and never on a very
similar CPU with identical OS, compiler and source code.
The crash itself is a mystery, from all the data available it shouldn't
crash. That it doesn't crash on the Ryzen 3500U, or apparently anywhere
else, makes it even more mysterious. That it only crashes when the code is
auto-vectorized, well...
I'm stumped.
—
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOXLYE3KLTTDAXQRHZBX6LV3LZ5PANCNFSM57ZRU27Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That test has narrowed the problem to a misaligned access fault when writing the byte-swapped 256 bit vector back to memory. You can confirm by removing the "u" to force an aligned store to see if the segfault comes back. You can easilly toggle back and forth and prove the CPU is improperly faulting an aligned access. You can do the same test BTW loadu/storeu just splits the memory access into multiple smaller chunks to avoid alignement issues, at significant performance penalty. |
__m256i x = mm256_bswap_32_test( _mm256_loadu_si256( (__m256i*)target ) );
_mm256_storeu_si256( ( (__m256i*)(work->target)), x); Gives __m256i x = mm256_bswap_32_test( _mm256_loadu_si256( (__m256i*)target ) );
_mm256_store_si256( ( (__m256i*)(work->target)), x); Gives You were right, the segfault returns when I remove the Incidentally, a warranty ticket for my 5500U laptop I put in a while ago has finally been processed. A usb port is busted, so they're going to replace the motherboard sometime. After that, I'm going to test to see if it also happens on another cpu of the same model. |
Hash function vectorization operates on multiple parallel data streams so each lane is like a seperate thread. Compiler is limited to simpler stuff like fixed iteration loops with no dependencies and data copying. I was surprised the compiler was able to vectorize the bswap loop, but I guess it was looking specifically for inverting arrays. |
I was reading a bit about AMD64 architecure, they actual had 64 bit before Intel, and how alignment actually works. Editted to remove the reference to zen2 architecture. |
This also gives AMD an opportunity to reproduce this problem on the very same CPU. |
Just a correction, the 3500U is actually based on Zen+, not Zen 2. AMD naming conventions will never cease to surprise me 😅. I'm going to try to link your ticket to mine—you opened it with AMD, right? |
I used the online support to fire off a question as a teaser to see if a human would pick it up. They sent me an email with a ticket number but no link to it. I'll let you know if I hear anything back. I think my bit counting of alignment was wrong, The source pointer (target) is not aligned to 32 bytes but the destination work->target is. It doesn't matter much because the fault is on the destination pointer. |
Got a reply from AMD, it's being escalated to an "expert". |
I just had a thought about this. AFAIK Zen+ has a different AVX2 implementation, the same as Zen (1). AVX2 (256 bit ) operations are executed as two AVX (128 bit) operations. Zen2 implemented full 256 bit wide execution units. This could effectively reduce the required data alignment for AVX2 on Zen+ and could partially explain the different behaviour on the two CPUs. This is just speculation, I look forward to the AMD experts explaining what's really happening. |
Interesting...I don't have any experience with AMD's support system but I hope their "experts" are better than Apple's "geniuses". |
Reply from AMD. They want a service request from you. Let me know if you want any help with the information requested. Dear Jay, Your service request : SR #{ticketno:[8201225820]} has been reviewed and updated. Response and Service Request History: Thank you for your email. We'd be happy to investigate this issue, however it would be easier to work with the user affected directly. Please could you ask the user to open a service request here: https://www.amd.com/en/support/contact-email-form Please provide the following information in the service request:
Once we have that information, we will work with the user directly and investigate the issue that is seen with a segfault. In order to update this service request, please respond without deleting or modifying the service request reference number in the email subject or in the email correspondence below. Please Note: This service request will automatically close if we do not receive a response within 10 days and cannot be reopened. If it is not feasible to respond within 10 days, feel free to open a new service request and reference this ticket for continued support. Best regards, Matt AMD Global Customer Care |
Thanks for letting me know, I'll put one in! Do you know how I should
describe the issue more technically? I don't think "segfault" is going to
cut it.
…On Mon, Sep 5, 2022 at 8:05 AM JayDDee ***@***.***> wrote:
Reply from AMD. They want a service request from you. Let me know if you
want any help with the information requested.
You can also keep me in the loop as I will be able to better answer their
questions about cpuminer-opt.
_Dear Jay,
Your service request : SR #{ticketno:[8201225820]} has been reviewed and
updated.
Response and Service Request History:
Thank you for your email.
We'd be happy to investigate this issue, however it would be easier to
work with the user affected directly.
Please could you ask the user to open a service request here:
https://www.amd.com/en/support/contact-email-form
Please provide the following information in the service request:
Description of the issue and a link to the Github page
Full System Specs - Including BIOS version
OS/Distribution Version/Kernel etc
System Name/Model (eg if a laptop what is the model and where was it purchased from)
dmesg log and similar logs from OS
Once we have that information, we will work with the user directly and
investigate the issue that is seen with a segfault.
In order to update this service request, please respond without deleting
or modifying the service request reference number in the email subject or
in the email correspondence below.
Please Note: This service request will automatically close if we do not
receive a response within 10 days and cannot be reopened.
If it is not feasible to respond within 10 days, feel free to open a new
service request and reference this ticket for continued support.
Best regards,
Matt
AMD Global Customer Care_
—
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOXLYCVILP7JUEFDSVOYKLV4XO2LANCNFSM57ZRU27Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
"Segfault" is a good start because that is how the OS is reporting it. You can expand by explaining where the fault is occurring and how it was determined to actually be a misaligned fault and that the faulting address is in fact aligned to 32 bytes |
Submitted—could you please let them know that my ticket is 8201227170?
…On Mon, Sep 5, 2022 at 11:16 AM JayDDee ***@***.***> wrote:
"Segfault" is a good start because that is how the OS is reporting it. You
can expand by explaining where the fault is occurring and how it was
determined to actually be a misaligned fault and that the faulting address
is in fact aligned to 32 bytes
—
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOXLYG664PQUHFDBT44LSLV4YFGLANCNFSM57ZRU27Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Done. That should close the loop so we are all informed. I'm reopening this issue since it's still active. |
For reference here is a summary of the main points as I understand them at this point.
|
Update on the warranty situation: the entire motherboard was replaced, but
the same error occurs.
…On Mon, Sep 5, 2022 at 11:17 PM JayDDee ***@***.***> wrote:
For reference here is a summary of the main points as I understand them at
this point.
- Two test laptop PCs, similar except for CPU generation. Target has
5500U, control has 3500U.
- Testing uses same OS, Ubuntu-22.04, same compiler version, same
compile options, same application source code, same application options.
- Subject source code is a looped copy and byte order reversal of a
256 bit array composed of 8 32 bit integers.
- Control never crashes.
- Target crashes when compiled with auto-vectorization, otherwise
works correctly.
- Target crashes when array byte swap source code is replaced with AV2
intrinsics using aligned store _mm256_store_si256
- Target does not crash and works correctly when using AVX2
intrinsincs with unaligned store _mm256_storeu_si256.
- Displaying the faulting pointer with printf or gdb shows it always
aligned to 32 bytes as required by AVX2.
—
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOXLYCEAU3NEIGSD6YDX3DV42ZWJANCNFSM57ZRU27Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
AMD is closing my ticket saying issue is resolved but wil work with you to find root cause of your issue. Pleas let me know what they find, if they find anything. |
Just to let you know, they haven't responded to my ticket (8201227170) since they sent me an automated email saying it had been opened. Are you able to ask them what the status is from your closed ticket? |
They told me to open a new ticket if I wanted further support so they'd likely ignore any queries about the old one. I think no news is good news. That your ticket hasn't been closed yet is a good sign. There is always pressure to close tickets quickly to improve metrics. AMD techs are probably waiting to get their hands on the laptop. I expect a reply soon after because it will be a hot potato. What's in that reply will be the interesting part. If you don't have a contact for your ticket you could use TECH.SUPPORT@amd.com. That was used by the "experts" for my ticket and I was able to reply. |
Dear Anthony,
Your service request : SR #{ticketno:[8201227170]} has been reviewed and
updated.
Response and Service Request History:
Sorry for the late response.
I have checked and there are no open issues with Segfault.
I cannot comment on opensource software however, I would recommend that the
developer register on the AMD Developer site <https://developer.amd.com/>
and work directly with AMD developers or post details of the issue on the AMD
Developer Communit
<https://community.amd.com/t5/newcomers-start-here/bd-p/newcomer-forum>y (you
will need to ask to be whitelisted).
…On Sat, Sep 17, 2022 at 12:39 PM JayDDee ***@***.***> wrote:
They told me to open a new ticket if I wanted further support so they'd
likely ignore any queries about the old one.
I think no news is good news. That your ticket hasn't been closed yet is a
good sign. There is always pressure to close tickets quickly to improve
metrics. AMD techs are probably waiting to get their hands on the laptop. I
expect a reply soon after because it will be a hot potato. What's in that
reply will be the interesting part.
If you don't have a contact for your ticket you could use
***@***.*** That was used by the "experts" for my ticket and I
was able to reply.
—
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOXLYEKF5CLW7ISJCJT2OTV6XX2PANCNFSM57ZRU27Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Disappointing but not entirely unexpected. Unfortunately AMD took the easy way out and blamed the software, ignoring the evidence to the contrary. There's nothing I can do because I don't own the CPU, or type of CPU, and can't reproduce the problem. |
There has been another report of the same problem, #389, this time with an Intel CPU. This eliminates the CPU as the problem. |
Interesting. I will see if I can test it on a newer version of GCC sometime this week to see if they've fixed it since then. |
I'm thinking of adding some debug code just for this issue. The code will be inserted just before the loop that crashes and will test the alignment of the target pointers before the crash. It will be compiled whenever AVX2 is present regardless of compiler optimization and is activated at run time with the --debug option. I suggest adding it for testing. Feel free to make any modifications.
|
Some statistical observations: There is a 50% random chance that any address will be aligned to 32 bytes or better. The absence of a crash is not conclusive. Test results need to be interpreted carefully. I hated statistics in school, so much uncertainty. |
I tried to compile the latest version of cpuminer-opt on Ubuntu 22.04 x86_64 with GCC 11.2.0.
-march=native -Wall
-O3 -march=znver2 -mvaes -Wall
-O3 --march=znver2 -Wall
-O3 --march=znver1 -Wall
-O3 --march=znver3 -Wall
All of them gave the following output when run:
Changing the thread count didn't help. I was trying to solo mine dogecoin as an experiment with
--algo=scrypt
. I later tried the same setup on a Ryzen 5 3500U, and everything worked.The text was updated successfully, but these errors were encountered: