Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


DNN on Mobile Platforms - Good Reads



  1. Efficient Processing of Deep Neural Networks:A Tutorial and Survey [arXiv '17]

Lecture Slides: Hardware architectures for DNN

Opportunities for innovation

Algorithmic Optimizations

 - Kernel Computation - Convert convolutional operation to multiplication
 - Computational Transforms 
      - Reducing Multiplications
            - Gauss's multiplication algorithm
            - Strassen
            - Winograd 1D  - 36 multi> 16 multi
      - FFT
           - Convert conv O(N0^2 Nf^2) -> O(N0^2 logN0) - reduce computation but requires much more mem space and bandwidth  
      - cuDNN

Solve Memory access bottlenecks

 - Data reuse
      - Convolutional reuse, Fmap reuse, filter reuse
 - Local Accumulation

DNN Model and hardware co-design

 - Reduce size of operands for storage/ compute
      - Floating point -> Fixed point
      - Bit width reduction
      - Non-linear quantization
 - Reduce Number of operations for storage/ compute
      - Exploit activation statistics (Compression)
      - Network pruning
      - Compact network architectures - break large convolution layers into series of small convolutional layers

CNN on embedded FPGA platforms

  1. [Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware] [ISVLSI '16]

    - Fixed point implementation 
    - Quantization startegy - the best radix position of the fixed point data in each layer is chosen differently
    - But fixed point adders and multipliers remain unchanged
    - Runtime configurable hardware architecture
    - Use caffe framework
  2. NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs [TRETS '17]

    - Have Convolutional-Specific Processor(CSP) implemented on PL side
       -Convolutional Engine
       -Programmable soft core – manages the execution of complex CNN
    - 16 bit fixed point 
    - Used caffe framework
    - Frame rate – VGG-16 5.5 FPS and ResNet-18 6.6 FPS
    - Two CSPs can be fit into next generation Ultrascale+ SoCs
  3. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network [FPGA '16]

What's the best combination for mobile platforms? CPU + FPGA/ GPU/ CGRA/ ASIC?

  1. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Learning?News Article Paper published in ACM Digital Library, February 2017


 - Superior power efficiency
 - Customizability
     - DNN go deeper -> improved accuracy -> increase comp/mem bw/storage	
     - compact low precision data types  32>16>8>ter>binary
     - Introduce sparsity(the presence of zeros) > using pruning/ReLU
     - Low precision + sparsity > introduce irregular parallelism > difficult for GPU > can exploit extreme customizability of FPGAs

 - Bad floating-point performance – solution DSP and compact low precision data types

“The current ML problems using 32-bit dense matrix multiplication is where GPUs excel. We encourage other developers and researchers to join forces with us to reformulate machine learning problems to take advantage of the strength of FPGAs using smaller bit processing because FPGAs can adapt to shifts toward lower precision,”

  1. Wave Dataflow Processing Engine A Dataflow Processing Chip for Training Deep Neural Networks Solving the Performance Challenge For Machine Learning Acceleration


 – Wave DPU - targeting server based DNN training and inferencing     
      – very promising numbers for server based system 
      - Alex Net    inference in the rate of 962,000 fps


 – superior performance but become obsolete  

Batching - Throughput vs Latency

  1. Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [MMSys’18]

Technical Reference

ZCU102 driver install

Forum posts


ZCU102 runtime power measurement

Understand Chaidnn conv hls code

How to?

create caller graph of given C/C++ project with source files

doxygen configfile

connect command window of zcu102 with uart

Before connecting USB UART cable to PC

ls /dev/ > dev_list_1.txt After connecting USB UART cable to PC ls /dev/ | diff --suppress... -y - dev_list_1.txt

Find correct interface ttyUSB0 or ttyUSB1

sudo screen -L /dev/ttyUSB0 115200

How do I find all files containing specific text on Linux?

grep -rnw '/path/to/somewhere/' -e 'pattern'

How to use ctags with vim?

In folder > ctags -R .

Add this line to ~/.vimrc : set tags=./tags,tags;$HOME

Flow of ChaiDNN

Editting software source

Editting hardware source

Copy files to SD Card

Running on board


DNN on Mobile Platform



No releases published


No packages published
You can’t perform that action at this time.