-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial evaluation #950
Comments
In a mildly annoying turn of events, the most recent This should be removed as soon as |
Update: Another paper, Taming Code Explosion in Supercompilation, seems to offer a reasonable approach to reducing the size of code after supercompilation. It might be adaptable to reducing circuit duplication in designs. |
Update: the latest PR for this issue (#1288) provides the common semantics for the evaluator, going all the way to beta-normal eta-long form (NF). However, it should be noted that this is not the full partial evaluator as intended, but simply a good place to stop and get feedback. As I see it, there are three problems to be solved before the level of whole program optimization we're after is achieved:
|
Clash's existing compilation method of exhaustive rewriting, while significantly improved over the years, turns out to be a large performance bottleneck; especially when forced to unroll loops. Specifically, it's the successive traversal of the expression ADT to basically achieve compile-time evaluation in a very round-about way that is hurting us the most.
The idea is to repurpose Clash's existing WHNF-evaluator to an NF evaluator as a first pass of the compiler. Given that it was specifically designed for compile-time evaluation, it is much faster at this job than the exhaustive rewrite system. An early prototype of this approach did, however, show a lot of circuit duplication. So the final implementation should:
Various thoughts
Value
should containTick
ed andCast
ed values, this way the evaluator can preserve both. Ticks must be maintained because they contain register and instance names. Adding these values means we have to update the primitive evaluation rules (https://github.com/clash-lang/clash-compiler/blob/master/clash-ghc/src-ghc/Clash/GHC/Evaluator.hs), as they need to be able to look through ticks (alacoreView
). Additionally, some casts may have to be translated toPrim
s, such as casts fromInteger
toUnsigned
clash-compiler/clash-lib/src/Clash/Normalize/Strategy.hs
Line 57 in 9624b40
appProp
: dropcaseLet
: dropcaseCon
: dropcaseCase
: drop, as mentioned earlier, is essential part of PEcaseElemNonReachable
: drop, but fold functionality into PEelemExistentials
: drop, but fold functionality into PEinlineNonRep
: dropinlineOrLiftNonRep
: droptypeSpec
: keepnonRepSpec
: keepetaExpansionTL
: keepnonRepANF
: maybe keep; seemakeANF
bindConstantVar
: dropconstantSpec
: keepmakeANF
: maybe keep. The "official" normal-form as described in Baaij (2015) prescribes ANF; but we've already started to deviate from that quite a bit. Some parts of theNetlist
stage still expect partial ANF (e.g. RHS of an alternative must be a variable if the case-expression is a projection). So perhaps we can reducemakeANF
to something simpler so that just theseNetlist
expectations are met.makeANF
is something that also enabled oursimpleCSE
pass, but when PE will do CSE, we'll dropsimpleCSE
.deadCode
: maybe keep, let's see if we make it part of PE (e.g. do GC on the evaluator's heap)topLet
: keeprecToLetRec
: keep; this might be another reason to keep fullmakeANF
, asrecToLetRec
is expecting expression in the "official" normal-form of Baaij (2015)inlineWorkFree
: dropinlineSmall
: dropsimpleCSE
: drop, CSE should become part of PEreduceConst
: dropreduceNonRepPrim
: drop; PE should be unrolling primitives in generalcaseFlat
: keepdisjointExpressionConsolidation
: keepremoveUnusedExpr
: keepinlineCleanup
: keepflattenLet
: keepsplitCastWork
: drop, PE should handle castsinlineCast
: dropcaseCast
: drop, PE should propagate castsletCast
: drop, PE should propagate castseliminateCastCast
: maybe keep, see if functionality can be folded into PEargCastSpec
: maybe keep, seeeliminateCastCast
etaExpandSyn
, keepappPropFast
, dropseperateArguments
, keepxOptimize
, drop; make part of PETrie
fromTerm
toHeapPointer
; so we can look up whether equal/equivalent partially evaluated expressions already live on the heap before we start evaluating them, and if so, just replace it to a reference heap variable. Don't forget to put an update frame on the stack, so that after evaluation, the expression is replaced by the heap variable reference again. We can use e.g. http://hackage.haskell.org/package/generic-trie to get us a start at aTrie
; and only roll our own if it turns out to be the bottleneck.register
-like things, so we must not forget to addsetName
ticks based on the current closest let-binder when we put arguments on the heap to defer evaluation. Either that, or give the heap-binder the sameString
part as the closest let-binding.Relevant literature
Remarks with regards to the above literate: their QOR metrics talk about allocation and code size, which don't really apply to our use case; we care about eventual circuit size. This means we should make different trade-offs then the above literature, but what's nice is that they do point out where such trade-offs/decisions can be made. Additionally, we should expect that we get to unroll all recursion; any recursion that cannot be fully unrolled basically isn't a structural description of a circuit (which is the view that Clash has on Haskell programs). We have to check how this aspect will influence our termination methods: I think we still should have termination measures in place, Clash shouldn't loop forever in case it's given an incorrect structural hardware description.
Hopefully fixes issues:
The text was updated successfully, but these errors were encountered: