Skip to content

Commit

Permalink
Merge pull request #14 from David-Durst/throughputRefactor
Browse files Browse the repository at this point in the history
Refactor ThroughputModifications, with responses to Kayvon's suggestions
  • Loading branch information
David-Durst committed Aug 6, 2018
2 parents 99e3eb4 + c6a3295 commit 93f4e4f
Show file tree
Hide file tree
Showing 11 changed files with 565 additions and 332 deletions.
14 changes: 8 additions & 6 deletions src/Core/Aetherling/Analysis/Latency.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
{-|
Module: Aetherling.Analysis.Latency
Description: Determines the initial latency for how long it takes for a
Description: Compute latency of Aetherling ops
Determines the initial latency for how long it takes for a
pipelined module to receive input, and the max combinational path
for the highest latency, single cycle part of the circuit.
-}
Expand Down Expand Up @@ -54,11 +56,11 @@ initialLatency (ArrayReshape _ _) = 1
initialLatency (DuplicateOutputs _ _) = 1

initialLatency (MapOp _ op) = initialLatency op
initialLatency (ReduceOp par numComb op) | par `mod` numComb == 0 && isComb op = 1
initialLatency (ReduceOp par numComb op) | par `mod` numComb == 0 = initialLatency op * (ceilLog par)
initialLatency (ReduceOp par numComb op) =
initialLatency (ReduceOp numTokens par op) | par == numTokens && isComb op = 1
initialLatency (ReduceOp numTokens par op) | par == numTokens = initialLatency op * (ceilLog par)
initialLatency (ReduceOp numTokens par op) =
-- pipelinng means only need to wait on latency of tree first time
reduceTreeInitialLatency + (numComb `ceilDiv` par) * (initialLatency op + registerInitialLatency)
reduceTreeInitialLatency + (numTokens `ceilDiv` par) * (initialLatency op + registerInitialLatency)
where
reduceTreeInitialLatency = initialLatency (ReduceOp par par op)
-- op adds nothing if its combinational, its CPS else
Expand Down Expand Up @@ -127,7 +129,7 @@ maxCombPath (MapOp _ op) = maxCombPath op
maxCombPath (ReduceOp par _ op) | isComb op = maxCombPath op * ceilLog par
-- since connecting each op to a copy, and all are duplicates,
-- maxCombPath is either internal to each op, or from combining two of them
maxCombPath (ReduceOp par numComb op) = max (maxCombPath op) maxCombPathFromOutputToInput
maxCombPath (ReduceOp numTokens par op) = max (maxCombPath op) maxCombPathFromOutputToInput
where
-- since same output goes to both inputs, just take max of input comb path
-- plus output path as that is max path
Expand Down
3 changes: 1 addition & 2 deletions src/Core/Aetherling/Analysis/Metrics.hs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{-|
Module: Aetherling.Analysis.Metrics
Description: This provides helper functions and metrics like space used in the
analysis of ops
Description: Types and helper functions for analyzing Aetherling ops
-}
module Aetherling.Analysis.Metrics where
import Aetherling.Operations.Types
Expand Down
18 changes: 10 additions & 8 deletions src/Core/Aetherling/Analysis/PortsAndThroughput.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
{-|
Module: Aetherling.Analysis.PortsAndThroughput
Description: Determines the input and output ports of an op, the clocks per
Description: Compute interfaces of Aetherling ops
Determines the input and output ports of an op, the clocks per
sequence used to process the inputs on those ports, and the resulting throughput
-}
module Aetherling.Analysis.PortsAndThroughput where
Expand Down Expand Up @@ -53,12 +55,12 @@ inPorts (DuplicateOutputs _ op) = inPorts op

inPorts (MapOp par op) = renamePorts "I" $ liftPortsTypes par (inPorts op)
-- take the first port of the op and duplicate it par times, don't duplicate both
-- ports of reducer as reducing numComb things in total, not per port
inPorts (ReduceOp par numComb op) = renamePorts "I" $ map scaleSeqLen $
-- ports of reducer as reducing numTokens things in total, not per port
inPorts (ReduceOp numTokens par op) = renamePorts "I" $ map scaleSeqLen $
liftPortsTypes par $ portToDuplicate $ inPorts op
where
scaleSeqLen (T_Port name origSLen tType pct) =
T_Port name (origSLen * (numComb `ceilDiv` par)) tType pct
T_Port name (origSLen * (numTokens `ceilDiv` par)) tType pct
portToDuplicate ((T_Port name sLen tType pct):_) = [T_Port name sLen tType pct]
portToDuplicate [] = []

Expand Down Expand Up @@ -236,16 +238,16 @@ clocksPerSequence (MapOp _ op) = cps op
-- reduce needs to get a complete sequence. If less than parallel,
-- need to write to register all but last, if fully parallel or more,
-- reduce is combinational
clocksPerSequence (ReduceOp par numComb op) |
isComb op = combinationalCPS * (numComb `ceilDiv` par)
clocksPerSequence (ReduceOp numTokens par op) |
isComb op = combinationalCPS * (numTokens `ceilDiv` par)
-- Why not including tree height? Because can always can pipeline.
-- Putting inputs in every clock where can accept inputs.
-- Just reset register every numComb/par if not fully parallel.
-- Just reset register every numTokens/par if not fully parallel.
-- What does it mean to reduce a linebuffer?
-- can't. Can't reduce anything with a warmup as this will create
-- an asymmetry between inputs and outputs leading to horrific tree
-- structure
clocksPerSequence (ReduceOp par numComb op) = cps op * (numComb `ceilDiv` par)
clocksPerSequence (ReduceOp numTokens par op) = cps op * (numTokens `ceilDiv` par)

clocksPerSequence (NoOp _) = combinationalCPS
clocksPerSequence (Underutil denom op) = denom * cps op
Expand Down
10 changes: 5 additions & 5 deletions src/Core/Aetherling/Analysis/Space.hs
Original file line number Diff line number Diff line change
Expand Up @@ -69,16 +69,16 @@ space (MapOp par op) = (space op) |* par
-- area of reduce is area of reduce tree, with area for register for partial
-- results and counter for tracking iteration time if input is sequence of more
-- than what is passed in one clock
space (ReduceOp par numComb op) | par == numComb = (space op) |* (par - 1)
space rOp@(ReduceOp par numComb op) =
space (ReduceOp numTokens par op) | par == numTokens = (space op) |* (par - 1)
space rOp@(ReduceOp numTokens par op) =
reduceTreeSpace |+| (space op) |+| (registerSpace $ map pTType $ outPorts op)
|+| (counterSpace $ numComb * (denominator opThroughput) `ceilDiv` (numerator opThroughput))
|+| (counterSpace $ numTokens * (denominator opThroughput) `ceilDiv` (numerator opThroughput))
where
reduceTreeSpace = space (ReduceOp par par op)
-- need to be able to count all clocks in steady state, as that is when
-- will be doing reset every nth
-- thus, divide numComb by throuhgput in steady state to get clocks for
-- numComb to be absorbed
-- thus, divide numTokens by throuhgput in steady state to get clocks for
-- numTokens to be absorbed
-- only need throughput from first port as all ports have same throuhgput
(PortThroughput _ opThroughput) = portThroughput op $ head $ inPorts op

Expand Down
22 changes: 5 additions & 17 deletions src/Core/Aetherling/Operations/AST.hs
Original file line number Diff line number Diff line change
@@ -1,26 +1,14 @@
{-|
Module: Aetherling.Operations.AST
Description: Provides the Aetherling Abstract Syntax Tree (AST) and functions
Description: Aetherling's AST
Provides the Aetherling Abstract Syntax Tree (AST) and functions
for identifying errors in that tree.
-}
module Aetherling.Operations.AST where
import Aetherling.Operations.Types

{-|
The Aetherling Operations. These are split into four groups:
1. Leaf, non-modifiable rate - these are arithmetic, boolean logic,
and bit operations that don't contain any other ops and don't have
a parameter for making them run with a larger or smaller throughput.
2. Leaf, modifiable rate - these are ops like linebuffers,
and space-time type reshapers that have a parameter for changing their
throughput and typically are not mapped over to change their throuhgput.
These ops don't have child ops
3. Parent, non-modifiable rate - these ops like composeSeq and composePar have
child ops that can have their throughputs' modified, but the parent
op doesn't have a parameter that affects throughput
4. Parent, modifiable rate - map is the canonical example. It has child ops
and can have its throughput modified by changing parallelism.
-}
-- | The operations that can be used to create dataflow DAGs in Aetherling
data Op =
-- LEAF OPS
Add TokenType
Expand Down Expand Up @@ -77,7 +65,7 @@ data Op =

-- HIGHER ORDER OPS
| MapOp {mapParallelism :: Int, mappedOp :: Op}
| ReduceOp {reduceParallelism :: Int, reduceNumCombined :: Int, reducedOp :: Op}
| ReduceOp {reduceNumTokens :: Int, reduceParallelism :: Int, reducedOp :: Op}

-- TIMING HELPERS
| NoOp [TokenType]
Expand Down
52 changes: 50 additions & 2 deletions src/Core/Aetherling/Operations/Properties.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
{-|
Module: Aetherling.Operations.Properties
Description: Describes properties that are intrinsic to operators that do not
Description: Properties of Aetherling ops that don't require analysis
Describes properties that are intrinsic to operators that do not
require any analysis, like if the operator has a combinational path from at
least one input port to one output port.
-}
Expand Down Expand Up @@ -41,7 +43,7 @@ isComb (ArrayReshape _ _) = True
isComb (DuplicateOutputs _ _) = True

isComb (MapOp _ op) = isComb op
isComb (ReduceOp par numComb op) | par == numComb = isComb op
isComb (ReduceOp numTokens par op) | par == numTokens = isComb op
isComb (ReduceOp _ _ op) = False

isComb (NoOp tTypes) = True
Expand All @@ -52,3 +54,49 @@ isComb (Delay _ op) = False
isComb (ComposePar ops) = length (filter isComb ops) > 0
isComb (ComposeSeq ops) = length (filter isComb ops) > 0
isComb (Failure _) = True

hasInternalState :: Op -> Bool
hasInternalState (Add t) = False
hasInternalState (Sub t) = False
hasInternalState (Mul t) = False
hasInternalState (Div t) = False
hasInternalState (Max t) = False
hasInternalState (Min t) = False
hasInternalState (Ashr _ t) = False
hasInternalState (Shl _ t) = False
hasInternalState (Abs t) = False
hasInternalState (Not t) = False
hasInternalState (And t) = False
hasInternalState (Or t) = False
hasInternalState (XOr t) = False
hasInternalState Eq = False
hasInternalState Neq = False
hasInternalState Lt = False
hasInternalState Leq = False
hasInternalState Gt = False
hasInternalState Geq = False
hasInternalState (LUT _) = False

-- this is meaningless for this units that don't have both and input and output
hasInternalState (MemRead _) = True
hasInternalState (MemWrite _) = True
hasInternalState (LineBuffer _ _ _ _ _) = True
hasInternalState (Constant_Int _) = False
hasInternalState (Constant_Bit _) = False

hasInternalState (SequenceArrayRepack _ _ _) = True
hasInternalState (ArrayReshape _ _) = False
hasInternalState (DuplicateOutputs _ _) = False

hasInternalState (MapOp _ op) = hasInternalState op
hasInternalState (ReduceOp numTokens par op) | par == numTokens = hasInternalState op
hasInternalState (ReduceOp _ _ op) = True

hasInternalState (NoOp tTypes) = False
hasInternalState (Underutil denom op) = hasInternalState op
-- since pipelined, this doesn't affect clocks per stream
hasInternalState (Delay _ op) = hasInternalState op

hasInternalState (ComposePar ops) = length (filter hasInternalState ops) > 0
hasInternalState (ComposeSeq ops) = length (filter hasInternalState ops) > 0
hasInternalState (Failure _) = True
4 changes: 3 additions & 1 deletion src/Core/Aetherling/Operations/Types.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
{-|
Module: Aetherling.Operations.Types
Description: Describes Aetherling's type system and the ports of operations
Description: Type system for interfaces of Aetherling's Ops
Describes Aetherling's type system and the ports of operations
that accept/emit tokens of those types. PortThroughput is also defined here
as it is a type used for comparing ports when composing sequences of ops.
-}
Expand Down
Loading

0 comments on commit 93f4e4f

Please sign in to comment.