Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an AXI-interface generator implemented Calyx #1733

Open
17 of 30 tasks
nathanielnrn opened this issue Oct 4, 2023 · 11 comments
Open
17 of 30 tasks

Create an AXI-interface generator implemented Calyx #1733

nathanielnrn opened this issue Oct 4, 2023 · 11 comments
Labels
C: FPGA Changes for the FPGA backend Type: Tracker Track various tasks

Comments

@nathanielnrn
Copy link
Contributor

nathanielnrn commented Oct 4, 2023

This issue is intended to track progress on Phase 2 of Calyx Meets the Real World. This writeup gives great overarching context and what we are working towards.

Currently, we can run a limited number of programs on real FPGAS using fud. We accomplish this by generating Verilog AXI wrappers.

Unfortunately, the current state of the AXI wrappers is less than ideal. Lots of the generation code is hardcoded, and in general Verilog is not a fun language to work with. To that end, we are trying to build a generator that will take in a .yxi file and output an AXI interface -- in calyx. The hope is that by using calyx-py we will be able to avoid some of the issues we've faced in the past (see #1071) and more easily create a more generalizable wrapper.

For reference, a dot-product.yxi (meaning the yxi-backend output of a dot-product.futil program) looks like this:

{
  "toplevel": "main",
  "memories": [
    {
      "name": "A0",
      "width": 32,
      "size": 8
    },
    {
      "name": "B0",
      "width": 32,
      "size": 8
    },
    {
      "name": "v0",
      "width": 32,
      "size": 1
    }
  ]
}

The current plan is to have a separate AXI controller for each memory, similar to the current Verilog implementation.

Currently, both @evanmwilliams and I are working on getting acquainted with calyx-py. After that it probably makes sense to get together and formalize some next incremental steps, as a full AXI interface seems a bit daunting to tackle all in one go.

At that point we can list and track completion of subtasks here!

Update Nov 20 2023:
Both me and @evanmwilliams have familiarized ourselves with calyx-py a bit. Work has also gone into manually creating a version of a Calyx axi-wrapper. Based on in person discussions it seems like next step is to create a testbench that ensures the correctness of said axi-wrapper with cocotb, similar to what we've done in the past. Goal is to strat with just the read portion of an axi-wrapper. The code we are trying to target lives in the branch axi-calyx-gen

Update Jan 2024:
I've broken up work into a bunch of smaller tasks both in case we onboard someone to help work on this and also to give a clear game plan as we all get busy as the semester starts. There is a lot here but I think by chipping away at things we can make good progress.

Tasks to be completed, in order:

Some offshoot ideas that have sprung up:

  • Working on Zero-cycle transitions from dynamic to static control #1828 to make AXI implementations (and interface implementations in general) easier.
  • Write a standalone data converter for generating byte arrays, with things like to_byte and byte_to functions. I believe this currently exists in a number of places in the repo. I believe some of this is done in fud currently? But I have a vague recollection of it being duplicated in some places? Perhaps the old verilog AXI cocotb testbench?
  • For the further future, readmemh/writememh wrappers based on yxi.
  • Possibly: Add IDX_SIZE information to .yxi outputs. #1751, but it seems like we are moving in a direction that requires memories to be well-formed in the sense that IDX_SIZE must match the expected width based on SIZE of a memory and that multi-dimension memories be flattened to seq_mem_d1 memories.
@sampsyo
Copy link
Contributor

sampsyo commented Oct 4, 2023

Excellent! Sounds like a plan!

@rachitnigam rachitnigam added Type: Tracker Track various tasks C: FPGA Changes for the FPGA backend labels Oct 4, 2023
@sampsyo
Copy link
Contributor

sampsyo commented Oct 16, 2023

Expanding a little bit on the imaginary code I wrote above for how the AXI "wrapper" code might work, I think we should really use Calyx's ref cells to thread through the memories we want to expose.

That is, imagine that we have our main Calyx design, called main, that we intend to wrap:

component main() -> () {
  cells {
    @external input_mem = std_mem_d1(...);
    @external output_mem = std_mem_d1(...);
  }
}

We should first rewrite main to use ref cells instead of @external:

ref input_mem = std_mem_d1(...);
ref output_mem = std_mem_d1(...);

(In fact, we have elsewhere occasionally discussed getting rid of the @external attribute altogether and replacing it with ref. Since @external can only appear in top-level components anyway, ref would behave identically to @external in top-level components. But that's for another day; for now we can imagine that we have to do this preprocessing ourselves.)

Then, our job in this work is to generate a new top-level component, called axi_wrapper. It will "own" the memories, declaring them as "real" (non-ref) subcells:

component axi_wrapper(...) -> (...) {
  cells {
    the_main = main();
    main_input_mem = std_mem_d1(...);
    main_output_mem = std_mem_d1(...);
  }
}

The control for axi_wrapper can then use an invoke to run main, like this:

invoke the_main[input_mem=main_input_mem, output_mem=main_output_mem]();

Therefore, we can think of axi_wrapper's control program as embodying this rough "to-do" list:

  1. Receive input data from the host, putting them in my own main_input_mem.
  2. invoke the_main, as above. It has access to main_input_mem and main_output_mem during its execution.
  3. Send output data from my main_output_mem back to the host.
  4. Tell the host we are done!

…which can hopefully be implemented as a big seq that steps through those various phases!

(One minor note: the axi_wrapper thingy I'm envisioning here may also want to have subcells for individual, per-memory AXI controller components. Maybe? In which case we would define an axi_controller component, which would also have a ref cell for the memory it needs to interact with. And then axi_wrapper would use invoke axi_controller[mem=something](...) to tell it to send/receive data or whatever.)

@rachitnigam
Copy link
Contributor

I like this idea! It is in the spirit of #1603. The idea is that Calyx is purely responsible for defining the computational interface of the component and something else can come in and provide the memory interface.

Spitballing a little more: one can imagine that once we #1261 and have a standard memory interface that has read and write done signal, the Calyx kernel can directly be connected to the AXI manager. Going a step further, this AXI module can instantiate things like memory coalescers, caches, reuse buffers etc. and transparently improve the performance of the module. This kind of compute-memory decoupling might also be interesting to @andrewb1999 and @matth2k.

@andrewb1999
Copy link
Collaborator

One question I have here is how to AXI interfaces will be implemented. I know currently the AXI interfaces reads all inputs values to on-chip memory and then launches the kernel. My general suggestion is that by default external memories should be fully off-chip, aka every time we want to read an address value we need to use the AXI interface to read a value from DRAM. If we want to buffer values on chip, this should be explicitly in the Calyx somewhere (either the main module or the axi wrapper module).

@rachitnigam
Copy link
Contributor

Yeah, seconded! The goal of this project (if I understand correctly) is to express as much of the logic needed to move data around within Calyx itself. This includes the logic needed to "externalize" memory interfaces.

@sampsyo
Copy link
Contributor

sampsyo commented Oct 20, 2023

Thanks for the feedback, both of y'all!

I know currently the AXI interfaces reads all inputs values to on-chip memory and then launches the kernel. My general suggestion is that by default external memories should be fully off-chip, aka every time we want to read an address value we need to use the AXI interface to read a value from DRAM.

Yes, it is in scope in our original proposal to go beyond the "one-sized-fits-all" data flow we have now. That is, aside from just changing the default (from buffer-everything to buffer-nothing/directly access host memory), it seems like there are many intermediate points you'd want to generate. For example, streaming data "blockwise" instead of requesting it on demand "wordwise" would be in scope, and would put things like AXI bursts behind the ref std_seq_mem abstraction layer.

So anyway, the overall trajectory here is (1) recreate exactly what we currently have (the buffer-everything-on-chip policy) in Calyx land, and then (2) use our new, awesome, flexible, hackable, debuggable AXI generator to add new features/interface styles.

@rachitnigam
Copy link
Contributor

Fly-by comment but there is something unsaid about the expressive power of ref in all of this. It's enabling us to do some cool things so we should eventually spend some more time thinking about extensions or other use cases.

@nathanielnrn
Copy link
Contributor Author

nathanielnrn commented Dec 21, 2023

There has been substantial progress with getting the read portion of the AXI interface to work #1820.
Also some updated tracking in the original comment

@sampsyo
Copy link
Contributor

sampsyo commented Jan 11, 2024

Given @nathanielnrn's awesome recent progress in #1842, I found myself mapping out a few granular steps for the medium-term future (aside from the aforementioned next step of converting this fixed-function implementation into a suitably parameterized generator). In no particular order:

  • Add the subordinate "control" interface, in which the host (the manager for this relationship) tells us when to start.
  • Evolve the cocotb testbench to have the same configurability. This probably means making the cocotb testbench load up the yxi JSON file itself so it knows which memories to expose to the simulated hardware.
  • Consider writing a fud (or fud2) harness to make the whole thing work end-to-end. That is, doing something like fud2 something.futil -s sim.data=stuff.json --to dat --through axi-cocotb should (1) compile the Calyx program normally, (2) emit the yxi JSON file, (3) generate the AXI wrapper from the yxi spec, and (4) run the combined design using cocotb. This would ideally allow broad differential testing with "normal" (readmemh/writememh) simulation.
  • Consider adding a mode to the Calyx compiler that omits the go/done interface for the @toplevel component. Something like this will be important for when we hand this stuff off to the Xilinx toolchain, which of course will not know that it needs to do this. Morally speaking, the AXI control interface takes the place of the Calyx go/done interface, so it makes sense to omit one and keep the other.

And there are three "offshoot" ideas that are not that important but are kind of adjacent, to consider retuning to "someday":

  • Chasing down Zero-cycle transitions from dynamic to static control #1828 to make the AXI implementations easier.
  • Consider writing a better standalone data converter for purposes like this… stuff like int_to_bytes and bytes_to_int is surprisingly subtle and not actually all that AXI-specific. A standalone tool for generating the byte arrays necessary here would make this important functionality more reusable and testable.
  • Return to the idea of writing a readmemh/writememh wrapper based on yxi. This is very much a sidetrack from our current efforts but would be really satisfying because it would let us delete some weirdly special-purpose stuff from the core Calyx compiler.

@nathanielnrn
Copy link
Contributor Author

nathanielnrn commented Jan 19, 2024

As the semester is coming up I thought it seemed like a good place to stop and more concretely consider next steps and take stock of where we are with things.

Some good progress has been made w.r.t creating a parameterized version of our AXI implementation:

  1. Parameterized address channels (AR and AW) has an outstanding PR Add calyx-py AXI generator address channels #1855.
  2. Parameterized read channels has an outstanding PR Add calyx-py AXI generator read channel #1856

Things left to be done for the parameterized generator:

  1. Create parameterized write channels
  2. Create parameterized write-response channels. (Worth noting this one should be especially easy, as we don't currently do much with this channel)

It should be noted that all 4 of the above are blocked by #1850, which is what I will be working on most immediately.


Once the generator is done, I think it makes sense to tackle things in the following order (see comment for more detail about specific tasks):

  1. Modify the cocotb testbench to take in .yxi files. We can likely look to the verilog cocotb testbench for some inspiration in this respect.
  2. Write a fud2 harness that works end to end. We want the harness to
    a. Compile the calyx program normally.
    b. Emit the programs interface as an.yxi file .
    c. Generate the AXI wrapper from the yxi spec.
    d. Run the wrapped design using cocotb.
  3. Make the existing cocotb testbench work with runt and CI.
  4. Expand unit tests to include things like:
    • Multiple transactions
    • 0 and non-zero base addresses
    • Large (>256 transfers) data sets
  5. Work on the (hardcoded and then generator? Maybe we can skip straight to the generator) subordinate control interface in order to interface with XRT.
  6. Add a pass to the compiler that omits the go/done interface and replaces it with an ap_start/ap_done interface for toplevel components. This will likely be necessary for XRT interfacing to work. It is worth noting that there may be another option to target user-managed control instead, but it seems like this misses some of the point of creating generalizable interface for FPGAs that give us the benefits of using XRT.

The current offshoot ideas that are adjacent to this work, that we can continue returning to someday are:

  • Working on Zero-cycle transitions from dynamic to static control #1828 to make AXI implementations (and interface implementations in general) easier.
  • Write a standalone data converter for generating byte arrays, with things like *_to_byte and byte_to_* functions. I believe this currently exists in a number of places in the repo. I believe some of this is done in fud currently? But I have a vague recollection of it being duplicated in some places? Perhaps the old verilog AXI cocotb testbench?
  • For the further future, readmemh/writememh wrappers based on yxi.
  • Possibly: Add IDX_SIZE information to .yxi outputs. #1751, but it seems like we are moving in a direction that requires memories to be well-formed in the sense that IDX_SIZE must match the expected width based on SIZE of a memory and that multi-dimension memories be flattened to seq_mem_d1 memories.

The tracking for these has been updated above.

@sampsyo
Copy link
Contributor

sampsyo commented Jan 22, 2024

This all sounds great!!! Just one small note on the compiler hacking:

Add a pass to the compiler that omits the go/done interface and replaces it with an ap_start/ap_done interface for toplevel components.

The heart of the matter here may not actually be a new pass, nor even a new backend: I think all we need is a compiler option that omits the go/done signals on the top-level component. Then we can provide our own control interface in our wrapper, without worrying about anyone else mucking it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: FPGA Changes for the FPGA backend Type: Tracker Track various tasks
Projects
None yet
Development

No branches or pull requests

4 participants