Adding vector dialect support to Calyx? #1777

medbzkst · 2023-11-15T12:39:59Z

I am targeting the lowering of MLIR codes that contain functions that receive memrefs as input, transform them into vectors (using load, maskedload, or gather operations), perform several operations using vector dialect, and then spit out a memref as an output (using store, maskedstore, or scatter operation). The thing is that CIRCT does not support lowering vector dialect operations to Calyx. However, vector dialect is an abstraction that can be directly concretised into hardware. One way to see that Calyx is suitable for that is the Dahlia and the systolic array generator frontends.

So what about generating hardware that interfaces with memories (memrefs as inputs and outputs) and performs all the required vector operations as they are instructed in vector operations? Maybe a conversion pass vector-to-calyx?

The text was updated successfully, but these errors were encountered:

sampsyo · 2023-11-17T17:14:19Z

Hi, @medbzkst—thanks for the super interesting suggestion! I will freely admit I'm no expert on either vector nor the SCFToCalyx pass infrastructure. However, the prospect of lowering data-parallel MLIR ops to hardware seems totally tractable.

The main question I'd have is: can this be accomplished (perhaps in a lower-performance way) with existing MLIR passes? For example, there is VectorToSCF. Does it suffice, for correctness only, to use this and then lower the resulting vector-free code to Calyx? I'd be super interested to know if this causes any problems.

I expect that to generate not-great hardware, but it would be extremely useful as a baseline. It's not clear to me whether a better lowering would go directly from vector to Calyx or if some intermediate point would make more sense, but I'd be willing to believe that the direct route is good.

sampsyo · 2023-11-17T17:15:17Z

It occurs to me to mention @jiahanxie353, who has been ramping up on the CIRCT-related Calyx infrastructure. Is this something you'd have an opinion about, or even be interested in contributing to?

jiahanxie353 · 2023-11-18T17:05:37Z

Hi @medbzkst thanks for bringing up this super interesting conversion!

While I'm not an expert on vector either, my current understanding is that lowering from Vector to SCF and then from SCF to Calyx seems more reasonable. The reason is SCF's if and for are very useful for Vector transferring operations, such as loading/storing to/from the memory. This is particularly when we have high dimensional Vectors, where the Vectors need to be unpacked, the bounds need to be checked, etc. It seems to me that SCF is more suitable for these aspects.

or even be interested in contributing to?

I'm definitely interested! I can delve more into it and return to the thread with some insights once I've formulated some thoughts.

medbzkst · 2023-11-19T10:32:03Z

Thank you for the feedback @sampsyo and @jiahanxie353.

Indeed the first idea is to try to use the available infrastructure, and intuitively it would be a great baseline for correctness and performance. However, I came across some problems while I was trying to do so.

The VectorToScf pass seems to be not as generic as it sounds. From the top of the source code file you can see the comment "This file implements lowering of vector transfer operations to SCF.". So it is precisely there to lower the transfer operations. Even if I give it a run on a very simple MLIR module that receives 2 vectors as input, sum them up, and return the output, it does not behave as expected in the sense that it does not convert that to an SCF for or while loop even by setting the -full-unroll option.

You can give it a try if you want on this MLIR module:

module {
    func.func @testing(%vec1 : vector<8xf64>, %vec2 : vector<8xf64>) -> (vector<8xf64>) {
        %res = arith.addf %vec1, %vec2 : vector<8xf64>
        return %res : vector<8xf64>
    }
}

By running mlir-opt --convert-scf-to-vector filename.mlir or mlir-opt --convert-scf-to-vector="full-unroll=1" filename.mlir.

Now the idea was, let's convert that manually (by an external script as a temporary solution before implementing a pass for that) to the desired loop implementation in SCF. The problem happened with those operations that iterate and check a condition, i.e. the masked operations. For example a gather operation if thought of as a loop, it would require checking whether to grab an element from memory or skip it and use a pass-through element. Even less sophisticated than that, a maskedload operation would do the same. The problem here is that the loop will have more than one control block which MLIR does not seem to be happy with using the while loop (perhaps that's why there is no pass available that does that?). Using the for loop it would also be problematic because those operations yield nothing, but at least it is at the level of CIRCT here.

Finally, the last chance was to lower everything to CF and lower from there to Calyx as it will be just about a bunch of blocks with some controls. However, this means having backedges in the CFG since we have loops, which is not lowerable so far.

For those reasons, I believe there should be a way to have a VectorToCalyx pass. Most especially when dealing with the masked operations, it would be probably more optimal to decide on how to connect components at the pass level and have a direct Calyx code optimised for that.

sampsyo · 2023-11-22T17:09:20Z

Intriguing. I think it might be useful, if we were to go down the "direct lowering" road, to try hand-writing some Calyx programs that perform data-parallel vector-ish operations and see how those look…

rachitnigam · 2023-11-23T14:21:07Z

@medbzkst another possible place to look is #1585. @xerpi implemented a Halide-to-Calyx compiler that implements lowering for vector operations to Calyx but it wasn't merged into CIRCT. Perhaps that's a good place to start?

Also, if you have questions about Calyx, we can add you to the Calyx slack and directly answer questions there as well. If you'd like to join, please send me an email.

jiahanxie353 · 2023-11-23T16:10:11Z

Thanks @medbzkst for the detailed issue description!

Even if I give it a run on a very simple MLIR module that receives 2 vectors as input, sum them up, and return the output, it does not behave as expected in the sense that it does not convert that to an SCF for or while loop even by setting the -full-unroll option.

This is surprising...

Anyway, I think we should still keep VectorToSCF pass. Because I think structured control flow is just very important for vector operations. And I hope VectorToSCF is still under development, so they will eventually support full functionalities? Otherwise it's so surprising that they couldn't event support vectors add...

Now the idea was, let's convert that manually (by an external script as a temporary solution before implementing a pass for that) to the desired loop implementation in SCF.

With this manual temporary solution of basic case of maskload, I would propose using scf.for and then lower to Calyx. It's true that we currently don't support that, but I can try to implement it.

At the end of the day, I would imagine we can have code like:

func.func @maskedload(%base_vector: memref<?xf32>, %mask: memref<8xi1>, %pass_thru: memref<8xf32>) -> memref<8xf32> {
  %result = scf.for %index = 0 to %size step 1 {
    %mask_val = load %mask[%index] 
    %cond = cmpi eq %mask_val, %c1
    %result_next = scf.if %cond {
      %val = load %base_vector[%index] 
      scf.yield %val
    } else {
      %val = load %pass_thru[%index]
      scf.yield %val
    }
  scf.yield %result_next
  }
  return %result
}

So as long as we support lowering scf.for to Calyx, we are good for now.

Most especially when dealing with the masked operations, it would be probably more optimal to decide on how to connect components at the pass level and have a direct Calyx code optimised for that.

I think it might be useful, if we were to go down the "direct lowering" road, to try hand-writing some Calyx programs that perform data-parallel vector-ish operations and see how those look…

Agree with these comments. Given that vector is already pretty low level, we can expose vectors to Calyx directly and exploit parallelisms there.

Summing up, the solution I come up with is using a mix of SCF and Calyx, where SCF is in charge of the control flow, and Calyx to exploit "data-parallel vector-ish operations". And I can

Support scf.for (with non-empty yield) lowering to Calyx;
I will also explore some direct VectorToCalyx opportunities, especially with mask operations.

Please let me know what you think

rachitnigam · 2023-12-12T23:16:38Z

Hey @medbzkst! I'm going to close this issue but please feel free to re-open this or a new issue if you'd like to discuss another problem/need help with something!

rachitnigam added Status: Needs Triage Issue needs some thinking C: CIRCT Changes for the CIRCT backend labels Nov 23, 2023

rachitnigam closed this as completed Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding vector dialect support to Calyx? #1777

Adding vector dialect support to Calyx? #1777

medbzkst commented Nov 15, 2023

sampsyo commented Nov 17, 2023

sampsyo commented Nov 17, 2023

jiahanxie353 commented Nov 18, 2023

medbzkst commented Nov 19, 2023 •

edited

Loading

sampsyo commented Nov 22, 2023

rachitnigam commented Nov 23, 2023

jiahanxie353 commented Nov 23, 2023 •

edited

Loading

rachitnigam commented Dec 12, 2023

Adding vector dialect support to Calyx? #1777

Adding vector dialect support to Calyx? #1777

Comments

medbzkst commented Nov 15, 2023

sampsyo commented Nov 17, 2023

sampsyo commented Nov 17, 2023

jiahanxie353 commented Nov 18, 2023

medbzkst commented Nov 19, 2023 • edited Loading

sampsyo commented Nov 22, 2023

rachitnigam commented Nov 23, 2023

jiahanxie353 commented Nov 23, 2023 • edited Loading

rachitnigam commented Dec 12, 2023

medbzkst commented Nov 19, 2023 •

edited

Loading

jiahanxie353 commented Nov 23, 2023 •

edited

Loading