Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cider 2] Handle memory input/output #1968

Closed
Tracked by #1913
EclecticGriffin opened this issue Mar 11, 2024 · 4 comments · Fixed by #2041
Closed
Tracked by #1913

[Cider 2] Handle memory input/output #1968

EclecticGriffin opened this issue Mar 11, 2024 · 4 comments · Fixed by #2041
Milestone

Comments

@EclecticGriffin
Copy link
Collaborator

EclecticGriffin commented Mar 11, 2024

Currently Cider 2.0 cannot print out the contents of memory in a way that is compatible with the json tooling we use for snapshot testing. The old infrastructure for this is somewhat tangled and tortured in both directions since it involves some base64 encodings and a nightmare fud python script. Rather than hooking back into this, it could be worthwhile to do something different for the new version, specifically a more "stupid" binary encoding.

@sampsyo said:

I had suggested, in the spirit of trying to do the simplest possible thing here, that it might not be too hard to load/dump "raw bits," i.e., the exact bit-level contents of the memories. This is more or less what the RTL simulators already do (they are using hex-encoded text, but same difference). We could then pre-/post-process these files into our JSON format externally, alleviating the need for any serde hacking on Cider's side.

This of course omits all the non-memory results that Cider 1.0 can already produce. But I believe this is fine: all we really need to check correctness is those memory dumps.

There are a few things to pin down for this since we can have memories contain values of arbitrary bit width.

  • little-endian or big-endian
  • padding
  • structure validation

To that end I propose the following:

  • we use little-endian
  • values are padded to the nearest byte with the padding always being zeros (which will be discarded/ignored by cider)
  • multidimensional memories are flattened with row-major order

The non-obvious part is what we do about structure and validation. We could make the dump be as dumb as possible, i.e., just bits with no embedded information about memory names or structures, but that requires us to have a data file available to deserialize the dump and means for loading memory we just have to assume that the data file was created with the memory information laid out correctly and in the exact order in which the memory instances are defined in the main component. To that end, it may be worth having a preamble which contains the names of the memories, their dimensions, and the definition order and have a tool binary, in the fud2 philosophy, which can convert json to memory dumps and vice-versa without requiring retaining a data file (or constructing a dummy one for the cases in which we run a program without providing data and still wish to observe the output)

@EclecticGriffin EclecticGriffin changed the title Does not generate an output JSON compatible with our testing infrastructure [Cider 2] Handle memory input/output Mar 11, 2024
@EclecticGriffin
Copy link
Collaborator Author

I am currently leaning toward having a simple preamble with this info since I think it will make things simpler in the long run, but I am open to other ideas. CC: @sampsyo

@sampsyo
Copy link
Contributor

sampsyo commented Mar 12, 2024

All sounds awesome. I agree with the decisions you summarized briefly:

  • little endian
  • pad to bytes
  • row-major for multi-dimensional memories

And very good reasoning about the metadata that describes what's in the file(s), which of course is necessary for producing any other format from such a binary dump. I actually think this dovetails nicely with two other things going on in the ecosystem.

First:

it may be worth having a preamble which contains the names of the memories, their dimensions, and the definition order

This is pretty much the goal of the "YXI" interface definition format created by @nathanielnrn for AXI purposes! Check it out:

$ calyx -b yxi examples/futil/dot-product.futil
{
  "toplevel": "main",
  "memories": [
    {
      "name": "A0",
      "width": 32,
      "size": 8
    },
    {
      "name": "B0",
      "width": 32,
      "size": 8
    },
    {
      "name": "v0",
      "width": 32,
      "size": 1
    }
  ]
}

That is, YXI is just a JSON document that includes the names and dimensions of the exposed top-level memories (using @external or ref). It's meant to be a comprehensive description of the "external interface" to a Calyx program. So maybe it is exactly what we want here?

So, Cider 2.0 could dump "just the bytes" and rely on a separate YXI file to interpret it. Or, it could produce a single file that consists of the YXI data (presumably serialized to some other format) followed by all the bytes. Either way, maybe it would be cool to standardize on YXI being the way to describe this stuff, extending it if necessary to address this use case (as opposed to inventing a new/different format with similar-but-not-quite-identical contents)?

and have a tool binary, in the fud2 philosophy, which can convert json to memory dumps and vice-versa

I don't think I've broadcasted this too broadly, but @bcarlet and I have recently started working with @Angelica-Schell to do something like this!! That is, we are starting small, but we are hoping to build a standalone data converter tool (as you say, taking the fud2 approach) that can convert between many different data formats. Including Verilog-simulator-friendly binary files, OG fud-style JSON, anything else we can think of. And Cider's preferred format could be wrapped up into that!

Anyway, this is just to say that we should totally build such a thing, and it should probably be a command-line flag tacked onto what @Angelica-Schell is beginning to construct now.

@EclecticGriffin
Copy link
Collaborator Author

don't think I've broadcasted this too broadly, but @bcarlet and I have recently started working with @Angelica-Schell to do something like this!! That is, we are starting small, but we are hoping to build a standalone data converter tool (as you say, taking the fud2 approach) that can convert between many different data formats. Including Verilog-simulator-friendly binary files, OG dud-style JSON, anything else we can think of. And Cider's preferred format could be wrapped up into that!

Ah brilliant, that's exactly what I was thinking about. Happy to help out if needed.

This is pretty much the goal of the "YXI" interface definition format created by @nathanielnrn for AXI purposes!

Amazing! How does this look for multidimensional memories with the size definition?

Essentially what I was imagining was basically the yxi interface info (serialized into a binary format) followed by the raw binary data for all the memories one after another. I think that is better than the version where we have the header for a single memory followed by the data for that memory and so on, since we can easily extract the header info from that without needing to parse the entire file.

@sampsyo
Copy link
Contributor

sampsyo commented Mar 12, 2024

I don't remember of the top of my head where we landed on multi-dimensional memories, but IIRC we either (a) don't handle them at all, or (b) just report the total size (i.e., the product of the dimensions). And glancing quickly at the code, I think it's (b), as in, it uses get_mem_info:

fn get_mem_info(&self) -> Vec<MemInfo> {

I think that is better than the version where we have the header for a single memory followed by the data for that memory and so on, since we can easily extract the header info from that without needing to parse the entire file.

Yeah, totally makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants