DCCM load/store conflict situation #36

taddevrp · 2020-02-19T17:53:55Z

DCCM data outputs are forced to 0 values at he testbench.
Does this mean DCCM is not supposed to be used?

And if DCCM is in the working state, I've got a question.
Does LSU support resolving of conflicts when store and load operations are performed with the same target address? Or such situations should be resolved on the software side?

aprnath · 2020-02-19T22:01:27Z

Hi,
The DCCM is fully functional. The simple demo TB we put out does not support it, obviously.
We will be releasing a new testbench shortly which will run with the DCCM. Please stay tuned.

Resolution of resource conflicts are handled by the hardware.

aprnath · 2020-02-20T03:16:01Z

Please try out the latest release, and reopen if needed.

taddevrp · 2020-02-20T11:59:03Z

I've pulled newest version of the project but it doesn't help.

I'll describe the issue I've found.
When there are sequential storing and loading with the same target address are done, it's not guaranteed they'll be ordered in a correct way.

In case which I've explored there is a sequence of store operation and then store and load to the same address are done. As I've understood load and store units work in parallel and are not synchronous. Because of payload store unit is busy when new accesses come, so it postpones new memory write operations. And when load access happens it is executed immediately, as load unit is doing nothing at that time.
It causes situation in which store happens later then load to the same address even if in assembly code load should happens later.

This error is reproducible, so I've created small test-case to illustrate it. Here is assembler file (dccm_test.s.txt) and its disassembled version (dccm_test.dS.txt).

Last two instructions uses the same addresses (sp+12 or 0x7_FFCC). As you can see on waveforms store to 0x7_FFCC happens a few cycles later then load from it, but in assembly program and disassembled code order is reversed.

So, it causes reading of X's from this address.

aprnath · 2020-02-20T13:18:38Z

Reopening this issue.

The way the DCCM works, for alignment and ECC handling, is that it does a read-modify-write, for non-word stores or unaligned stores because the ECC has to be calculated before updating the memory. If the memories are not initialized at reset, this could result in X propagation in simulation for such cases. Word stores that are aligned do not do a read-modify-write.

agrobman · 2020-02-20T17:15:15Z

I didn't understand how do you run your test. Please, provide run commands/test sources/linker file and exec.log from simulation.

FYI:

TB sets up reset to address 0, your code starts from 0x80000000,
Your stack is set to 0x28_0000, while DCCM base address is 0xf0040000.

You need to be aware that external memories are 64KB only and ignore upper 16 bits of the address. Also the $readmemh tasks, loading the RAMs will not load bytes, which address is higher than 0xffff .

Please, take a look on tests code and linker files of the provided examples to get hints how to deal with DCCM, ICCM code and data. These are a bit tricky.
( all this complexity is because our verilator version doesn't support SV dynamic data types, the stuff could be much simpler if we could use associative arrays to model CPU memories)

The TB setup has changed since 1.4 .

exec.log helps to understand what processor did during a test simulation. ( although it doesn't show registers updates for external loads)

taddevrp · 2020-02-20T23:19:42Z

I'm using custom testbench and configurations, so addresses differ from default ones.
In my test case initialization part of program (init section) located in the ROM (with offset 0x80000000) that's why 0x80000000 is start address. And main program (text section) located in the ICCM with offset 0x00100000.
In this test case DCCM has offset 0x00200000 and size 512 KB. Stack is located at the end of the memory, so default value of pointer is 0x28_0000.

For complication and linking of the program I used this commands:

${GCC_PREFIX}-as -march=rv32imc dccm_test.s -o dccm_test.o
${GCC_PREFIX}-as -march=rv32imc ctrl0.s -o ctrl0.o
${GCC_PREFIX}-ld -m elf32lriscv --discard-none -Tdccm_test.ld -o dccm_test.elf dccm_test.o ctrl0.o

Here are links for initial and main parts of the program.
Here is linkers sctipt.
Here are resulting disassembled code and exec.log.

In general both ICCM and DCCM work fine except of this "store-load" case.
As the example I've changed test program which I've shared earlier to make it work fine. I've just put 20 NOP operation between store and load and it resolved this coherency problem. Here is the link to this program and here is its disassembled version.

As you can see on waveforms load happens later then store now, and data are read correctly.

So, it seems there is a problem in coherency mechanism of the processor.

agrobman · 2020-02-21T00:10:17Z

OK, this is effect of the uninitialized memory with ECC protection and Store to Load bypassing in the Store buffer. You need first write the DCCM locations, you are going to use, then you can read them in any order. The problem happens if you write uninitialized location and read immediately from it.

Theoretically, memory with ECC protection should be first "initialized" by word aligned word size stores and only then it can be freely accessed. The crt0 can execute the whole application data region zeroing and only then C coded main can use it as data, stack or heap ...
Also, I need to remind that any uninitialized variable in C is assumed to have zero value.
There is another reason for memory initialization - misaligned of smaller than word variables access.
the CPU always make read/modify/write operations for byte and half word stores. These will fail in simulation if the memory was not initialized or cause random ECC error/exceptions in real life.

taddevrp · 2020-02-21T19:32:25Z

Initialization of stack area with zeros helped to avoid coherence problem.
Thanks!

But what will happen in real life situation in this case?
DCCM has unknown values after reset, so ECC will be inevitably incorrect at the very first read. And as there is always read before write, first stack writing will cause ECC errors.

BTW. Is it possible to remove ECC checking from ICCM and DCCM with some option as it is done at I-Cache?

aprnath · 2020-02-21T19:36:47Z

The PRM has this verbiage in section 3.4:
Note: Memories with parity or ECC protection must be initialized with correct parity or ECC. Otherwise, a read access to an uninitialized memory may report an error. The method of initialization depends on the organization and capabilities of the memory. Initialization might be performed by a memory self-test or depend on firmware to overwrite the entire memory range (e.g., via DMA accesses).

We will update it to be explicit about the initialization mechanisms, adding the following:

Note: If the DCCM is uninitialized, a load following a store to the same address may get incorrect data. If firmware initializes the DCCM, aligned word-sized stores should be used (because they don’t check ECC), followed by a fence, before any load instructions to DCCM addresses are executed.

agrobman · 2020-02-21T19:37:45Z

mfdc[8] disables ECC checking/correction for all swerv memories.

agrobman · 2020-02-21T19:40:12Z

but you still may get simulation problems if your application will read uninitialized memory locations

taddevrp · 2020-02-24T13:47:13Z

@aprnath
Thanks for clarification!

@agrobman
Yes, I know about this bit. I'm interested can DCCM ECC checking mechanism be removed entirely from the design by some HDL parameter?

And as for the situation with store followed by load instruction in my testcase. If Store to Load bypassing in the Store buffer happens why then LSU reads DCCM at all? There is aligned word-sized store, so generaly we don't care about data in the memory as it'll be fully overwritten.

agrobman · 2020-03-03T16:10:24Z

We don't have plans to create a design parameter to remove ECC protection from the memories.
You are welcome to create your local version without ECC logic.
Regarding DCCM store buffer behavior - the ECC logic is executed in parallel to speed up the CPU operation regardless of access size.

Again the CCMs were designed in assumption that the memories had to be initialized before use.

taddevrp · 2020-03-04T10:44:08Z

Ok, it's clear.
Thanks!

aprnath closed this as completed Feb 20, 2020

taddevrp mentioned this issue Feb 20, 2020

DCCM coherncy issue #38

Closed

aprnath reopened this Feb 20, 2020

taddevrp closed this as completed Mar 4, 2020

aprnath mentioned this issue Sep 2, 2020

dccm initialization #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DCCM load/store conflict situation #36

DCCM load/store conflict situation #36

taddevrp commented Feb 19, 2020

aprnath commented Feb 19, 2020

aprnath commented Feb 20, 2020

taddevrp commented Feb 20, 2020

aprnath commented Feb 20, 2020 •

edited

Loading

agrobman commented Feb 20, 2020

taddevrp commented Feb 20, 2020

agrobman commented Feb 21, 2020

taddevrp commented Feb 21, 2020

aprnath commented Feb 21, 2020

agrobman commented Feb 21, 2020

agrobman commented Feb 21, 2020

taddevrp commented Feb 24, 2020 •

edited

Loading

agrobman commented Mar 3, 2020

taddevrp commented Mar 4, 2020

DCCM load/store conflict situation #36

DCCM load/store conflict situation #36

Comments

taddevrp commented Feb 19, 2020

aprnath commented Feb 19, 2020

aprnath commented Feb 20, 2020

taddevrp commented Feb 20, 2020

aprnath commented Feb 20, 2020 • edited Loading

agrobman commented Feb 20, 2020

taddevrp commented Feb 20, 2020

agrobman commented Feb 21, 2020

taddevrp commented Feb 21, 2020

aprnath commented Feb 21, 2020

agrobman commented Feb 21, 2020

agrobman commented Feb 21, 2020

taddevrp commented Feb 24, 2020 • edited Loading

agrobman commented Mar 3, 2020

taddevrp commented Mar 4, 2020

aprnath commented Feb 20, 2020 •

edited

Loading

taddevrp commented Feb 24, 2020 •

edited

Loading