Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

README.md

DNN_MP

DNN on Mobile Platforms - Good Reads

Papers

Survey

  1. Efficient Processing of Deep Neural Networks:A Tutorial and Survey [arXiv '17]

Lecture Slides: Hardware architectures for DNN

Opportunities for innovation

Algorithmic Optimizations

 - Kernel Computation - Convert convolutional operation to multiplication
 
 - Computational Transforms 
 
      - Reducing Multiplications
            - Gauss's multiplication algorithm
            - Strassen
            - Winograd 1D  - 36 multi> 16 multi
            
      - FFT
           - Convert conv O(N0^2 Nf^2) -> O(N0^2 logN0) - reduce computation but requires much more mem space and bandwidth  
           
      - cuDNN

Solve Memory access bottlenecks

 - Data reuse
      - Convolutional reuse, Fmap reuse, filter reuse
      
 - Local Accumulation

DNN Model and hardware co-design

 - Reduce size of operands for storage/ compute
      - Floating point -> Fixed point
      - Bit width reduction
      - Non-linear quantization
      
 - Reduce Number of operations for storage/ compute
      - Exploit activation statistics (Compression)
      - Network pruning
      - Compact network architectures - break large convolution layers into series of small convolutional layers

CNN on embedded FPGA platforms

  1. [Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware] [ISVLSI '16]

    - Fixed point implementation 
    - Quantization startegy - the best radix position of the fixed point data in each layer is chosen differently
    - But fixed point adders and multipliers remain unchanged
    - Runtime configurable hardware architecture
    - Use caffe framework
    
  2. NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs [TRETS '17]

    - Have Convolutional-Specific Processor(CSP) implemented on PL side
       -Convolutional Engine
       -Programmable soft core – manages the execution of complex CNN
    - 16 bit fixed point 
    - Used caffe framework
    - Frame rate – VGG-16 5.5 FPS and ResNet-18 6.6 FPS
    - Two CSPs can be fit into next generation Ultrascale+ SoCs
    
  3. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network [FPGA '16]

What's the best combination for mobile platforms? CPU + FPGA/ GPU/ CGRA/ ASIC?

  1. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Learning?News Article Paper published in ACM Digital Library, February 2017

FPGA

 - Superior power efficiency
 - Customizability
     - DNN go deeper -> improved accuracy -> increase comp/mem bw/storage	
     - compact low precision data types  32>16>8>ter>binary
     - Introduce sparsity(the presence of zeros) > using pruning/ReLU
     - Low precision + sparsity > introduce irregular parallelism > difficult for GPU > can exploit extreme customizability of FPGAs

 - Bad floating-point performance – solution DSP and compact low precision data types

“The current ML problems using 32-bit dense matrix multiplication is where GPUs excel. We encourage other developers and researchers to join forces with us to reformulate machine learning problems to take advantage of the strength of FPGAs using smaller bit processing because FPGAs can adapt to shifts toward lower precision,”

  1. Wave Dataflow Processing Engine A Dataflow Processing Chip for Training Deep Neural Networks Solving the Performance Challenge For Machine Learning Acceleration

CGRA

 – Wave DPU - targeting server based DNN training and inferencing     
      – very promising numbers for server based system 
      - Alex Net    inference in the rate of 962,000 fps

ASIC

 – superior performance but become obsolete  

Batching - Throughput vs Latency

  1. Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [MMSys’18]

Technical Reference

ZCU102 driver install

https://www.xilinx.com/support/answers/59128.html

Forum posts

Xilinx

ZCU102 runtime power measurement

https://forums.xilinx.com/t5/SDSoC-Environment-and-reVISION/ZCU102-run-time-power-measurement/td-p/863810

Understand Chaidnn conv hls code

https://forums.xilinx.com/t5/SDSoC-Environment-and-reVISION/Don-t-understand-how-Convolution-processor-of-ChaiDNN-works/td-p/863797

How to?

create caller graph of given C/C++ project with source files

doxygen configfile https://github.com/neovim/neovim/wiki/Generate-callgraphs-with-Doxygen

connect command window of zcu102 with uart

Before connecting USB UART cable to PC

ls /dev/ > dev_list_1.txt After connecting USB UART cable to PC ls /dev/ | diff --suppress... -y - dev_list_1.txt

Find correct interface ttyUSB0 or ttyUSB1

sudo screen -L /dev/ttyUSB0 115200

How do I find all files containing specific text on Linux?

grep -rnw '/path/to/somewhere/' -e 'pattern'

https://stackoverflow.com/questions/16956810/how-do-i-find-all-files-containing-specific-text-on-linux

How to use ctags with vim?

In folder > ctags -R . https://andrew.stwrt.ca/posts/vim-ctags/

Add this line to ~/.vimrc : set tags=./tags,tags;$HOME https://stackoverflow.com/questions/11975316/vim-ctags-tag-not-found

Flow of ChaiDNN

Editting software source

Editting hardware source

Copy files to SD Card

Running on board

About

DNN on Mobile Platform

Resources

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.