ASSERT: Invariant test failure: Too many reads (AArch64+sve) #6499

joshua-warburton · 2023-12-08T11:33:46Z

Describe the bug
When running various applications, including fleetbench, we run into the following assert:

Trace invariant failure in T1909 at ref # 19484 (957 instrs since timestamp 13345924302819833): Too many read records

This failing test compares a predicted number of reads with the actual number of reads recorded by dynamorio.

The above case is failing at this point, when the second read is considered:

       19482       15318:        1909 ifetch       4 byte(s) @ 0x0000000000405b3c a54a4681   ld1w   (%x20,%x10,lsl #2) %p1/z -> %z1.s
       19483       15318:        1909 read         0 byte(s) @ 0x0000000000954e80 by PC 0x0000000000405b3c
       19484       15318:        1909 read         0 byte(s) @ 0x0000000000954e84 by PC 0x0000000000405b3c
       19485       15318:        1909 read         0 byte(s) @ 0x0000000000954e88 by PC 0x0000000000405b3c
       19486       15318:        1909 read         0 byte(s) @ 0x0000000000954e8c by PC 0x0000000000405b3c
       19487       15318:        1909 read         0 byte(s) @ 0x0000000000954e90 by PC 0x0000000000405b3c
       19488       15318:        1909 read         0 byte(s) @ 0x0000000000954e94 by PC 0x0000000000405b3c
       19489       15318:        1909 read         0 byte(s) @ 0x0000000000954e98 by PC 0x0000000000405b3c
       19490       15318:        1909 read         0 byte(s) @ 0x0000000000954e9c by PC 0x0000000000405b3c
       19491       15319:        1909 ifetch       4 byte(s) @ 0x0000000000405b4

As this is run on a machine with a 256 bit vector length, we see 8 reads as expected for this instruction.

The invariant checker predicts the reads using the following function

uint
instr_num_memory_read_access(instr_t *instr)
{
    int i;
    opnd_t curop;
    int count = 0;
    const int opc = instr_get_opcode(instr);
    if (opc_is_not_a_real_memory_load(opc))
        return 0;
    for (i = 0; i < instr_num_srcs(instr); i++) {
        curop = instr_get_src(instr, i);
        if (opnd_is_memory_reference(curop)) {
            ++count;
        }
    }
    return count;
}

This means that we can only ever expect one read per memory accessing operand (effectively one per instruction).

Should we expect sve instructions to "roll up" the memory accesses into a single entry, or do we have to do a better job at predicting the number of reads and writes?

Versions

This is run against the HEAD

The text was updated successfully, but these errors were encountered:

derekbruening · 2023-12-08T17:09:54Z

Side question: why do the read records in the trace snippet all have 0 bytes? Something looks off there.

Main question: this is definitely an issue with the invariant checker. One solution is to mark scatter/gather instructions as predicated since with their masks for each load/store being optional they are dynamically variable in whether they do each load/store. We marked x86 rep string instructions this way: #1556, #6284. (We would then also mark x86 scatter/gather as well.). The invariant checker gives up on this check for predicated instructions.

If there are aspects that can be checked by the invariant checker, we should try to add them: e.g., checking the sizes of any loads/stores -- that's statically known from the decoding? That would then report an error on these 0-byte loads.

joshua-warburton · 2023-12-08T18:05:07Z

For the side question, I suspect it may be due to the predicate being all zeros.

We can probably know the number of loads and stores by looking at the destination register and the vector length, we're also likely to know the sizes of the loads and stores as that's usually directly related to the element size.

derekbruening · 2023-12-08T20:30:10Z

For the side question, I suspect it may be due to the predicate being all zeros.

If a predicate bit is zero, then no load is done, so there should be no read record in the trace for that element. If the scatter/gather expansion is leaving an app load instr in place for a zero predicate, that is incorrect: that load needs to be skipped.

joshua-warburton · 2023-12-12T17:39:58Z

I've been looking into it and considering your previous comments the obvious solution is to update instr_is_predicated to correctly detect predicated AArch64 instructions.

I had a basic go at this and it does seem to resolve the problem of the assert in question but it doesn't remove all of the related "too many writes" assert.

This is due to using this method to detect if something is predicated

for (int i = 0; i < instr_num_srcs(instr); i++) {
        opnd = instr_get_src(instr, i);
        if (opnd_is_predicate_zero(opnd) || opnd_is_predicate_merge(opnd))
            return true;
    }

This will catch most predicated instructions.

AArch64 has several ways of using predicates, and to have the correct return for instr_is_predicated we have to account for all of them.

The simplest is what is caught by the above snippet:

LD1W    { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]
FMAXNMP <Zdn>.<Ts>, <Pg>/M, <Zdn>.<Ts>, <Zm>.<Ts>
et al

These are predicates with modes which directly imply that these are governing predicates
and are thus predicated instructions.

We can also reasonably assume that predicates in ld/st instructions like

ST1D    { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>{, #<simm>, MUL VL}]

imply that they are predicated instructions.

The problem comes with instructions like

FADDQV  <Vd>.<Ts>, <Pg>, <Zn>.<Tb>

which have governing predicates, and are thus predicated instructions, but have no
features in the current version of the codec that distinguish them from instructions like

SEL     <Zd>.<Ts>, <Pv>, <Zn>.<Ts>, <Zm>.<Ts>

where the predicates are merely being manipulated rather than modifying the reads or writes of the instruction. (There are a very limited number of this type of instruction).

If we want to make sure that instr_is_predicated always returns the correct we will either have to:

Add something to the codec to mark predicated instructions with a flag or similar when decodeing/being created
Use a set of general rules based off the properties of the operand to determine if something is a predicated or not, with a list of exceptions to check against.
Something else

Do you have any opinion on how we should proceed with this?

derekbruening · 2023-12-12T20:12:24Z

For x86 we have a flag in the codec/table for predicates of various types, which turns into a DR_PRED_* instr_t-level flag. I would vote for a similar approach for aarch64 as well.

derekbruening · 2023-12-16T01:34:56Z

Filed #6510 for x86.

joshua-warburton · 2024-01-02T13:56:16Z

Hey @derekbruening, I ended up modifying the codec so that properties can be added to operands with a dot notation. For example p10_lo.gov uses p10_lo to decode the operand but it is then marked as a governing operand.

Instructions with at least one governing operand are then marked as predicated by modifying the decode function generator to apply the DR_PRED_GOVERNING predicate value. This means we can then skip those instructions during the invariant checker and both the instructions which are predicated and the governing predicate are marked in the API for anyone who is interested in that information

derekbruening · 2024-01-02T15:57:51Z

For the aarch64-internal codec that sounds fine. For the API-visible part, since the x86 and SVE concepts are so similar, we should try to share the interface to make it easier on clients. I'll add such a discussion to PR #6535.

When debugging i#6499 we noticed that drcachesim was producing 0 byte read/write records for some SVE load/store instructions: ``` ifetch 4 byte(s) @ 0x0000000000405b3c a54a4681 ld1w (%x20,%x10,lsl #2) %p1/z -> %z1.s read 0 byte(s) @ 0x0000000000954e80 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e84 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e88 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e8c by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e90 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e94 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e98 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e9c by PC 0x0000000000405b3c ifetch 4 byte(s) @ 0x0000000000405b4 ``` This turned out to be due to drdecode being linked into drcachesim twice: once into the drcachesim executable, once into libdynamorio. drdecode uses a global variable to store the SVE vector length to use when decoding so we end up with two copies of that variable and only one was being initialized. To fix this properly we would need to refactor the libraries so that there is only one copy of the sve_veclen global variable, or change the way that the decoder gets the vector length so its no longer stored in a global variable. In the mean time we have a workaround which makes sure both copies of the variable get initialized and drcachesim produces correct results. With that workaround in place however, the results were still wrong. For expanded scatter/gather instructions when you are using an offline trace, raw2trace doesn't have access to the load/store instructions from the expansion, only the original app scatter/gather instruction. It has to create the read/write records using only information from the original scatter/gather instruction and it uses the size of the memory operand to determine the size of each read/write. This works for x86 because the x86 IR uses the per-element data size as for the memory operand of scatter/gather instructions. This doesn't work for AArch64 because the AArch64 codec uses the maximum data transferred (per-element data size * number of elements) like other SIMD load/store instructions. We plan to make the AArch64 IR consistent with x86 by changing it to use the same convention as x86 for scatter/gather instructions but in the mean time we can work around the inconsistency by fixing the size in raw2trace based on the instruction's opcode. Issues: #6499, #5365

…predicated (#6535) This patch adds the DR_PRED_MASKED flag to instructions containing any governing predicate registers as well as tagging the governing register with DR_OPND_IS_MASKED. To do this, it extends the codec functionality such that operands may have "flags" which allow us to add some special code generation when they are encountered. In this case we amend the instr decode/encode functions to both tag the governing predicate and the instr itself. issue: #6499

joshua-warburton · 2024-01-09T11:47:00Z

With the above merged patch, this issue is no longer occurring in our tests

When debugging i#6499 we noticed that drcachesim was producing 0 byte read/write records for some SVE load/store instructions: ``` ifetch 4 byte(s) @ 0x0000000000405b3c a54a4681 ld1w (%x20,%x10,lsl #2) %p1/z -> %z1.s read 0 byte(s) @ 0x0000000000954e80 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e84 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e88 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e8c by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e90 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e94 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e98 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e9c by PC 0x0000000000405b3c ifetch 4 byte(s) @ 0x0000000000405b4 ``` This turned out to be due to drdecode being linked into drcachesim twice: once into the drcachesim executable, once into libdynamorio. drdecode uses a global variable to store the SVE vector length to use when decoding so we end up with two copies of that variable and only one was being initialized. To fix this properly we would need to refactor the libraries so that there is only one copy of the sve_veclen global variable, or change the way that the decoder gets the vector length so its no longer stored in a global variable. In the mean time we have a workaround which makes sure both copies of the variable get initialized and drcachesim produces correct results. With that workaround in place however, the results were still wrong. For expanded scatter/gather instructions when you are using an offline trace, raw2trace doesn't have access to the load/store instructions from the expansion, only the original app scatter/gather instruction. It has to create the read/write records using only information from the original scatter/gather instruction and it uses the size of the memory operand to determine the size of each read/write. This works for x86 because the x86 IR uses the per-element data size as for the memory operand of scatter/gather instructions. This doesn't work for AArch64 because the AArch64 codec uses the maximum data transferred (per-element data size * number of elements) like other SIMD load/store instructions. We plan to make the AArch64 IR consistent with x86 by changing it to use the same convention as x86 for scatter/gather instructions but in the mean time we can work around the inconsistency by fixing the size in raw2trace based on the instruction's opcode. Issues: #6499, #5365, #5036

derekbruening mentioned this issue Dec 16, 2023

Mark x86 SIMD operations with masks as predicated #6510

Open

joshua-warburton mentioned this issue Jan 2, 2024

i#6499: AArch64 codec: mark instructions with governing predicate as predicated #6535

Merged

jackgallagher-arm mentioned this issue Jan 8, 2024

i#5365 AArch64: Fix 0 size read/write records in drmemtrace #6544

Merged

joshua-warburton mentioned this issue Jan 9, 2024

Invariant checker fails on sampled runs, but not on full runs of fleetbench #6547

Open

joshua-warburton closed this as completed Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASSERT: Invariant test failure: Too many reads (AArch64+sve) #6499

ASSERT: Invariant test failure: Too many reads (AArch64+sve) #6499

joshua-warburton commented Dec 8, 2023 •

edited

Loading

derekbruening commented Dec 8, 2023

joshua-warburton commented Dec 8, 2023

derekbruening commented Dec 8, 2023

joshua-warburton commented Dec 12, 2023

derekbruening commented Dec 12, 2023

derekbruening commented Dec 16, 2023

joshua-warburton commented Jan 2, 2024

derekbruening commented Jan 2, 2024

joshua-warburton commented Jan 9, 2024 •

edited

Loading

ASSERT: Invariant test failure: Too many reads (AArch64+sve) #6499

ASSERT: Invariant test failure: Too many reads (AArch64+sve) #6499

Comments

joshua-warburton commented Dec 8, 2023 • edited Loading

derekbruening commented Dec 8, 2023

joshua-warburton commented Dec 8, 2023

derekbruening commented Dec 8, 2023

joshua-warburton commented Dec 12, 2023

derekbruening commented Dec 12, 2023

derekbruening commented Dec 16, 2023

joshua-warburton commented Jan 2, 2024

derekbruening commented Jan 2, 2024

joshua-warburton commented Jan 9, 2024 • edited Loading

joshua-warburton commented Dec 8, 2023 •

edited

Loading

joshua-warburton commented Jan 9, 2024 •

edited

Loading