[Cider 2] Handle memory input/output #1968

EclecticGriffin · 2024-03-11T19:05:22Z

Currently Cider 2.0 cannot print out the contents of memory in a way that is compatible with the json tooling we use for snapshot testing. The old infrastructure for this is somewhat tangled and tortured in both directions since it involves some base64 encodings and a nightmare fud python script. Rather than hooking back into this, it could be worthwhile to do something different for the new version, specifically a more "stupid" binary encoding.

@sampsyo said:

I had suggested, in the spirit of trying to do the simplest possible thing here, that it might not be too hard to load/dump "raw bits," i.e., the exact bit-level contents of the memories. This is more or less what the RTL simulators already do (they are using hex-encoded text, but same difference). We could then pre-/post-process these files into our JSON format externally, alleviating the need for any serde hacking on Cider's side.

This of course omits all the non-memory results that Cider 1.0 can already produce. But I believe this is fine: all we really need to check correctness is those memory dumps.

There are a few things to pin down for this since we can have memories contain values of arbitrary bit width.

little-endian or big-endian
padding
structure validation

To that end I propose the following:

we use little-endian
values are padded to the nearest byte with the padding always being zeros (which will be discarded/ignored by cider)
multidimensional memories are flattened with row-major order

The non-obvious part is what we do about structure and validation. We could make the dump be as dumb as possible, i.e., just bits with no embedded information about memory names or structures, but that requires us to have a data file available to deserialize the dump and means for loading memory we just have to assume that the data file was created with the memory information laid out correctly and in the exact order in which the memory instances are defined in the main component. To that end, it may be worth having a preamble which contains the names of the memories, their dimensions, and the definition order and have a tool binary, in the fud2 philosophy, which can convert json to memory dumps and vice-versa without requiring retaining a data file (or constructing a dummy one for the cases in which we run a program without providing data and still wish to observe the output)

EclecticGriffin · 2024-03-12T18:56:30Z

I am currently leaning toward having a simple preamble with this info since I think it will make things simpler in the long run, but I am open to other ideas. CC: @sampsyo

sampsyo · 2024-03-12T19:58:32Z

All sounds awesome. I agree with the decisions you summarized briefly:

little endian
pad to bytes
row-major for multi-dimensional memories

And very good reasoning about the metadata that describes what's in the file(s), which of course is necessary for producing any other format from such a binary dump. I actually think this dovetails nicely with two other things going on in the ecosystem.

First:

it may be worth having a preamble which contains the names of the memories, their dimensions, and the definition order

This is pretty much the goal of the "YXI" interface definition format created by @nathanielnrn for AXI purposes! Check it out:

$ calyx -b yxi examples/futil/dot-product.futil
{
  "toplevel": "main",
  "memories": [
    {
      "name": "A0",
      "width": 32,
      "size": 8
    },
    {
      "name": "B0",
      "width": 32,
      "size": 8
    },
    {
      "name": "v0",
      "width": 32,
      "size": 1
    }
  ]
}

That is, YXI is just a JSON document that includes the names and dimensions of the exposed top-level memories (using @external or ref). It's meant to be a comprehensive description of the "external interface" to a Calyx program. So maybe it is exactly what we want here?

So, Cider 2.0 could dump "just the bytes" and rely on a separate YXI file to interpret it. Or, it could produce a single file that consists of the YXI data (presumably serialized to some other format) followed by all the bytes. Either way, maybe it would be cool to standardize on YXI being the way to describe this stuff, extending it if necessary to address this use case (as opposed to inventing a new/different format with similar-but-not-quite-identical contents)?

and have a tool binary, in the fud2 philosophy, which can convert json to memory dumps and vice-versa

I don't think I've broadcasted this too broadly, but @bcarlet and I have recently started working with @Angelica-Schell to do something like this!! That is, we are starting small, but we are hoping to build a standalone data converter tool (as you say, taking the fud2 approach) that can convert between many different data formats. Including Verilog-simulator-friendly binary files, OG fud-style JSON, anything else we can think of. And Cider's preferred format could be wrapped up into that!

Anyway, this is just to say that we should totally build such a thing, and it should probably be a command-line flag tacked onto what @Angelica-Schell is beginning to construct now.

EclecticGriffin · 2024-03-12T20:10:50Z

don't think I've broadcasted this too broadly, but @bcarlet and I have recently started working with @Angelica-Schell to do something like this!! That is, we are starting small, but we are hoping to build a standalone data converter tool (as you say, taking the fud2 approach) that can convert between many different data formats. Including Verilog-simulator-friendly binary files, OG dud-style JSON, anything else we can think of. And Cider's preferred format could be wrapped up into that!

Ah brilliant, that's exactly what I was thinking about. Happy to help out if needed.

This is pretty much the goal of the "YXI" interface definition format created by @nathanielnrn for AXI purposes!

Amazing! How does this look for multidimensional memories with the size definition?

Essentially what I was imagining was basically the yxi interface info (serialized into a binary format) followed by the raw binary data for all the memories one after another. I think that is better than the version where we have the header for a single memory followed by the data for that memory and so on, since we can easily extract the header info from that without needing to parse the entire file.

sampsyo · 2024-03-12T20:18:55Z

I don't remember of the top of my head where we landed on multi-dimensional memories, but IIRC we either (a) don't handle them at all, or (b) just report the total size (i.e., the product of the dimensions). And glancing quickly at the code, I think it's (b), as in, it uses get_mem_info:

calyx/calyx-ir/src/utils.rs

Line 39 in 23cd0a1

fn get_mem_info(&self) -> Vec<MemInfo> {

I think that is better than the version where we have the header for a single memory followed by the data for that memory and so on, since we can easily extract the header info from that without needing to parse the entire file.

Yeah, totally makes sense to me.

EclecticGriffin mentioned this issue Mar 11, 2024

Cider 2.0 Tracker Issue #1913

Closed

22 tasks

EclecticGriffin changed the title ~~Does not generate an output JSON compatible with our testing infrastructure~~ [Cider 2] Handle memory input/output Mar 11, 2024

EclecticGriffin added this to the Cider 2.0 milestone Mar 15, 2024

EclecticGriffin mentioned this issue Mar 27, 2024

[Cider 2] Memory data dump format & serialization/deserialization #1988

Merged

EclecticGriffin mentioned this issue May 14, 2024

[Cider 2] Memory loading & dumps #2041

Merged

EclecticGriffin closed this as completed in #2041 May 14, 2024

EclecticGriffin mentioned this issue May 16, 2024

[Cider 2] Connect to fud2 and the existing test suites #2044

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cider 2] Handle memory input/output #1968

[Cider 2] Handle memory input/output #1968

EclecticGriffin commented Mar 11, 2024 •

edited

Loading

EclecticGriffin commented Mar 12, 2024

sampsyo commented Mar 12, 2024 •

edited

Loading

EclecticGriffin commented Mar 12, 2024

sampsyo commented Mar 12, 2024

[Cider 2] Handle memory input/output #1968

[Cider 2] Handle memory input/output #1968

Comments

EclecticGriffin commented Mar 11, 2024 • edited Loading

EclecticGriffin commented Mar 12, 2024

sampsyo commented Mar 12, 2024 • edited Loading

EclecticGriffin commented Mar 12, 2024

sampsyo commented Mar 12, 2024

EclecticGriffin commented Mar 11, 2024 •

edited

Loading

sampsyo commented Mar 12, 2024 •

edited

Loading