-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VTA Chisel Wide memory interface. #32
Conversation
* Added SyncQueue with tests - Implementation uses sync memory to implement larger queues. * AXI 64/128/256/512 data bits support by AXIParams->dataBits A wide implementation of load/store is used when AXI interface data width is larger than number of bits in a tesor. Instructions are stored as 64bit tensors to allow 64bit address alignment * TensorLoad is modified to replace all VME load operations. Multiple simultaneous requests can be generated. Load is pipelined and separated from request generation. * TensorStore -> TensorStoreNarrowVME TensorStoreWideVME. The narrow one is the original one * TensorLoad -> TensorLoadSimple (original) TensorLoadWideVME TensorStoreNarrowVME * LoadUop -> LoadUopSimple is the original one. The new one is based on TensorLoad * Fetch -> FetchVME64 FetchWideVME. Reuse communication part from TensorLoad. * DPI intreface changed to transfer more than 64bit. svOpenArrayHandle is used. tsim library compilation now requires verilator * Compute is changed to use TensorLoad style of load uop. * VME changed to generate/queue/respond to multiple simultaneous load requests
Adding @tmoreau89 . I need help on deciding how to proceed further. Checks flow fails because VTA DPI modification require changes to TVM. A Verilator include directory is required in order to build new TSIM DPI communication library by TVM make flow. Those changes are needed to transfer more than 64bit data as sv arrays. There is a pull request in TVM "VTA cmake change to include Verilator header for building tsim library" #8797. It fails checks as no Verilator found by cmake. |
Thank you @aasorokiin - CC-ing @vegaluisjose |
…pache#33) * Fix Makefile to use Chisel -o instead of top name and .sv instead of .v * fix reset to reset.asBool * fix SyncQueue to deprecated module.io * fix toBools to asBools
I think I found the issue of the new chisel version, I just submitted #34 that should fix it. |
#34 was just merged, @aasorokiin please rebase |
…ide verilator array acces fuctionality and avoid compilation warnings
Changes were addressed. TSIM ci also should pass now. @vegaluisjose |
could you push en empty commit to see if we can re-trigger CI? Thanks! @aasorokiin |
I tried that, but checks keep failing. |
Oh I see, what is going on now. Can you do this docker update on a separate PR? we will merge that before this one, so it can take the changes. |
Hey @aasorokiin , now that we have #36 merged...could you please rebase to see how this one goes? |
Is there any reason tsim test run got skipped? @vegaluisjose |
Oh yeah, could you please remove the following? https://github.com/apache/tvm-vta/blob/main/tests/scripts/docker_bash.sh#L70-L74 I believe we fixed this already |
@@ -211,7 +211,7 @@ class AXIMaster(params: AXIParams) extends AXIBase(params) { | |||
ar.bits.qos := params.qosConst.U | |||
ar.bits.region := params.regionConst.U | |||
ar.bits.size := params.sizeConst.U | |||
ar.bits.id := params.idConst.U | |||
//do not override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, what is up with ar.bits.id
and w.bits.strb
here? by default the strobe mask is all ones. Also, ar.bits.id
is not being assigned now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id/strb should be handled by VME now. VME can send multiple read requests with different IDs. Strb is used by TensorStore write to mark data of valid tensors in case when AXI data is wider than tensor data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but why changing the defaults? Isn't better to let these modules to customize (override) these values?
Also, if it is done for reads ar
why it is not done for writes aw
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, those are not defaults. They were set in setConst. That expected no change of value.
VME write path did not get changed other than supporting wider data. It was not identified critical for hiding memory access latency. DE10 config AXI burst of 16 pulses was short enough to make hiding burst-to-burst read latency important. Now even 1 pulse bursts can potentially transfer read data without gaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the original intent of this method, back when we had support for one small board/configuration, was to use this as a mechanism to derive other params in the protocol. However, things have obviously evolved quite a bit since then.
What if we do the same for both reads/writes in terms of id
so future contributors know that they have to handle id
and strb
?
Maybe we can even add comments on top of that setConst
method saying what they should implement and we can point to that documentation later if needed.
For the record, these suggestions are intended to help future contributions. We really appreciate all your efforts on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved write id to VME. Added comments. Check PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright this looks like ready to go on my end, thanks a lot @aasorokiin !
do you have something to add? @tmoreau89
hardware/chisel/src/test/scala/unittest/SyncQueue2PortMemTest.scala
Outdated
Show resolved
Hide resolved
Thanks @vegaluisjose for reviewing the PR and ensuring that the CI tests go green, much appreciated. While going through the PR there were a few things that I'd like to address before we merge:
|
Is there anything else expected form me? @tmoreau89 @vegaluisjose |
Hi @aasorokiin - thanks for the ping and addressing the comments. |
Thanks @vegaluisjose @aasorokiin and Intel team - the PR has been merged. |
VTA modification for parametrizable AXI data size.
Performance change:
Numbers are based on test_benchmark_topi_conv2d.py limited to resnet-18.C2
Baseline 64bit data tsim run: 192M cycles. New 64bit data transfer: 97M cycles.
128bit data: 58M cycles. 256bit data: 39M cycles. 512bit data: 29M cycles.
Code changes:
Code contributions to this PR were made by the following individuals (in alphabetical order): @suvadeep89, @stevenmburns, @pasqoc, @adavare, @sjain12intel, @aasorokiin, and @zhenkuny.