Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add true, dual-port blockram #1726

Merged
merged 1 commit into from
Oct 27, 2021
Merged

Add true, dual-port blockram #1726

merged 1 commit into from
Oct 27, 2021

Conversation

martijnbastiaan
Copy link
Member

@martijnbastiaan martijnbastiaan commented Mar 26, 2021

TODO:

  • VHDL primitive
  • SystemVerilog primitive (same as Verilog?)
  • Test with clock B+ C
  • Test conflict behavior

@christiaanb
Copy link
Member

christiaanb commented Mar 31, 2021

@martijnbastiaan @leonschoorl the distinction between slow and fast seems off, at least, veryUnsafeSynchronizer doesn't have a notion of slow and fast:

veryUnsafeSynchronizer
  :: Int
  -- ^ Period of clock belonging to @dom1@
  -> Int
  -- ^ Period of clock belonging to @dom2@
  -> Signal dom1 a
  -> Signal dom2 a
veryUnsafeSynchronizer t1 t2
  -- this case is just an optimization for when the periods are the same
  | t1 == t2 = same

  | otherwise = go 0

  where
  same :: Signal dom1 a -> Signal dom2 a
  same (s :- ss) = s :- same ss

  go :: Int -> Signal dom1 a -> Signal dom2 a
  go relativeTime (a :- s)
    | relativeTime <= 0 = a :- go (relativeTime + t2) (a :- s)
    | otherwise = go (relativeTime - t1) s

so when t1 is larger than t2 (i.e. t1 is slower than t2) then you need to add t2 multiple times to relativeTime to catch up to the subtraction of one t1 from relativeTime. Whereas when t2 is larger than t1 (i.e. t2 is slower than t1) then you need to subtract t1 multiple times from relativeTime to catch up to the addition of one t2 to relativeTime.

I guess what I'm trying to say is, given that the dual port ram model takes after veryUnsafeSynchronizer:

  1. Is this concept of slow and fast clocks something we really need? or
  2. Can we just make the portA and portB behavior symmetric? i.e. it behaves correctly regardless of whether portA is faster than portB or partA is faster than portB

@martijnbastiaan
Copy link
Member Author

martijnbastiaan commented Mar 31, 2021

The purpose of the slow/fast clocks is simply because we know the slow clock will consume exactly one cycle from one of its streams. I.e., the next iteration of go will be spent in the section handling the fast clock. This allows us to set the "current conflict" back to Nothing:

(as2, bs2) = go Nothing ram1 (relativeTime + tA) as1 bs0

Without switching the clocks at the very start you'd need to check explicitly whether the "next" tick is going to be spent in the "other" part of go (or precalculate how many cycles you spend, but that's ugly IMO as you need to do it again and again).

@christiaanb
Copy link
Member

Alright, that makes sense.


-> Clock domB
-- ^ Clock for port B
-> Signal domB (Maybe a)
Copy link
Member

@rowanG077 rowanG077 Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have a new type for this or an Either. I assume if you pass in a Just here then it means you want to write and if you pass Nothing then a value is only read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rowanG077 What do you think of this new API?
2c0411e
It's isomorphic with Either but a bit more descriptive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonschoorl Thx! This is exactly what I meant. There can be no misunderstanding now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be no misunderstanding now

Famous last words

@christiaanb
Copy link
Member

christiaanb commented Oct 22, 2021

For a next PR/Issue:

  • Investigate whether Vivado and Quartus still infer TDP RAM when you switch to same-port read-first (and perhaps no-change) behavior
    • If this doesn't work: create new vendor-specific primitives
  • Investigate whether Vivado and Quartus still infer TDP RAM when you add port-enables
    • If this doesn't work: create new vendor-specific primitives
  • Create configurable Haskell models so you can switch between same-port ReadFirst, WriteFirst (and perhaps NoChange)
  • If we need to create vendor-specific primitives: created Haskell TDP RAM with a configurable Vendor

@christiaanb christiaanb marked this pull request as ready for review October 25, 2021 07:08
@lmbollen lmbollen force-pushed the dual-port-blockram branch 3 times, most recently from dae7e0c to 38a9b76 Compare October 25, 2021 08:45
christiaanb
christiaanb previously approved these changes Oct 25, 2021
Copy link
Member

@christiaanb christiaanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some minor comments

clash-prelude/src/Clash/Explicit/BlockRam.hs Outdated Show resolved Hide resolved
tests/shouldwork/Signal/DualBlockRam0.hs Outdated Show resolved Hide resolved
clash-prelude/src/Clash/Explicit/BlockRam.hs Show resolved Hide resolved
@christiaanb christiaanb dismissed their stale review October 25, 2021 11:18

I want an implicitly clocked version as well.

Copy link
Member

@christiaanb christiaanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see an implicitly same-clocked counterpart of Clash.Explicit.BlockRam.trueDualPortBlockRam in Clash.Prelude.BlockRam

Copy link
Member

@christiaanb christiaanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the minor comments.


{-# LANGUAGE Trustworthy #-}

{-# OPTIONS_GHC -fplugin GHC.TypeLits.KnownNat.Solver #-}
{-# OPTIONS_HADDOCK show-extensions #-}

--Prevent generation of eta port names for trueDualPortBlockRam
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--Prevent generation of eta port names for trueDualPortBlockRam
-- Prevent generation of eta port names for trueDualPortBlockRam

Comment on lines 840 to 854
trueDualPortBlockRam ::
forall nAddrs dom a .
( HasCallStack
, KnownNat nAddrs
, HiddenClock dom
, NFDataX a
)
=> Signal dom (E.RamOp nAddrs a)
-- ^ ram operation for port A
-> Signal dom (E.RamOp nAddrs a)
-- ^ ram operation for port B
-> (Signal dom a, Signal dom a)
-- ^ Outputs data on /next/ cycle. When writing, the data written
-- will be echoed. When reading, the read data is returned.
trueDualPortBlockRam inA inB = E.trueDualPortBlockRam hasClock hasClock inA inB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trueDualPortBlockRam ::
forall nAddrs dom a .
( HasCallStack
, KnownNat nAddrs
, HiddenClock dom
, NFDataX a
)
=> Signal dom (E.RamOp nAddrs a)
-- ^ ram operation for port A
-> Signal dom (E.RamOp nAddrs a)
-- ^ ram operation for port B
-> (Signal dom a, Signal dom a)
-- ^ Outputs data on /next/ cycle. When writing, the data written
-- will be echoed. When reading, the read data is returned.
trueDualPortBlockRam inA inB = E.trueDualPortBlockRam hasClock hasClock inA inB
trueDualPortBlockRam ::
#ifdef CLASH_MULTIPLE_HIDDEN
forall nAddrs dom1 dom2 a .
( HasCallStack
, KnownNat nAddrs
, HiddenClock dom
, NFDataX a
)
=> Signal dom1 (E.RamOp nAddrs a)
-- ^ ram operation for port A
-> Signal dom2 (E.RamOp nAddrs a)
-- ^ ram operation for port B
-> (Signal dom1 a, Signal dom2 a)
-- ^ Outputs data on /next/ cycle. When writing, the data written
-- will be echoed. When reading, the read data is returned.
trueDualPortBlockRam inA inB = E.trueDualPortBlockRam (hasClock @dom1) (hasClock @dom2) inA inB
#else
forall nAddrs dom a .
( HasCallStack
, KnownNat nAddrs
, HiddenClock dom
, NFDataX a
)
=> Signal dom (E.RamOp nAddrs a)
-- ^ ram operation for port A
-> Signal dom (E.RamOp nAddrs a)
-- ^ ram operation for port B
-> (Signal dom a, Signal dom a)
-- ^ Outputs data on /next/ cycle. When writing, the data written
-- will be echoed. When reading, the read data is returned.
trueDualPortBlockRam inA inB = E.trueDualPortBlockRam hasClock hasClock inA inB
#endif

@@ -0,0 +1 @@
ADDED: Added basic support for true dual port block ram in WriteFirst mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ADDED: Added basic support for true dual port block ram in WriteFirst mode.
ADDED: Added support for true dual ported block ram:
* Implicitly clocked: `Clash.Prelude.BlockRam.trueDualPortBlockRam` and
* Explicitly clocked: `Clash.Explicit.BlockRam.trueDualPortBlockRam`
Any values that's being written on a particular port is also the value that will be read on that port, i.e. the same-port read/write behavior is: WriteFirst. For mixed port read/write, when both ports have the same address, when there is a write on the port A, the output of port B is undefined, and visa versa.

Comment on lines 1124 to 1126
-- | Produces vendor-agnostic HDL that will be inferred as a true, dual port
-- block ram. It has write-first (or "new data") behavior, no clock enables,
-- no byte enables, and is symmetric in its data ports.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- | Produces vendor-agnostic HDL that will be inferred as a true, dual port
-- block ram. It has write-first (or "new data") behavior, no clock enables,
-- no byte enables, and is symmetric in its data ports.
-- | Produces vendor-agnostic HDL that will be inferred as a true, dual port
-- block ram. Any values that's being written on a particular port is also the
-- value that will be read on that port, i.e. the same-port read/write behavior
-- is: WriteFirst. For mixed port read/write, when both ports have the same
-- address, when there is a write on the port A, the output of port B is
-- undefined, and visa versa.

clash-prelude/src/Clash/Prelude/BlockRam.hs Show resolved Hide resolved
@alex-mckenna
Copy link
Contributor

@lmbollen You've set off the end of line whitespace check in these places:

./clash-prelude/src/Clash/Explicit/BlockRam.hs:389:{-# OPTIONS_GHC -fno-do-lambda-eta-expansion #-} 
./tests/shouldwork/Signal/DualBlockRamDefinitions.hs:25:

@lmbollen lmbollen force-pushed the dual-port-blockram branch 4 times, most recently from 77748ac to 1cb4ac6 Compare October 26, 2021 11:33
Add SystemVerilog / Verilog / VHDL primitives

dualport bram: Move fast/slow domain ordering into the the simulation model

This way the domain swapping never ends up in the HDL.
Because the HDL primitives don't care about which domain is faster or
slower.

Add NOINLINEd wrapper around trueDualPortBlockRam#

Vivado doesn't like to see bitslices in the path to the data input.

Allow non-power-of-two sized rams

Added read / write tests, conflict tests and non multiple clock frequency tests.
@christiaanb christiaanb merged commit fdfef78 into master Oct 27, 2021
@christiaanb christiaanb deleted the dual-port-blockram branch October 27, 2021 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants