FADO

Floorplan-Aware Directive Optimization for HLS designs on Multi-die FPGAs

Important Notice (Under Artifact Evaluation!)

Thanks for using our FADO framework! FADO is developed by the Reconfiguration Computing System Lab @ HKUST, and to appear as a regular paper (oral) in the International Symposium FPGA 2023.

For personal usage, not redistribution, you can refer to the pre-print...

in this repo as fado.pdf
on arXiv: https://arxiv.org/abs/2212.11582

Linfeng Du, Tingyuan Liang, Sharad Sinha, Zhiyao Xie, and Wei Zhang. 2022. FADO: Floorplan-Aware Directive Optimization for High-Level Synthesis Designs on Multi-Die FPGAs. In Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’23), February 12–14, 2023, Monterey, CA, USA. ACM, New York, NY, USA, 11 pages.

Environment Requirements

Step 0: System Checking (only verified versions are listed, other versions are not guaranteed)
- Ubuntu OS: 20.04.4 LTS / 20.04.5 LTS
- Linux: 5.4.0-050400-generic / 5.14.0-1054-oem
- Vitis/Vitis_HLS/Vivado 2020.2
- $\geq$ 64GB DDR4 for back-end implementation using Vitis, as suggested by Xilinx document UG1301
Step 1: Apt: a single command: bash step1-install-apt-packages.sh, or separate commands: sudo apt install <the following packages>
- faketime
- iverilog
- swig pre-requisite for pip install oapackage
Step 2: Python 3.9: a single command: pip install -r step2-pip-requirements.txt, or separate commands: pip install <the following packages>
- OApackage==2.7.1 for ploting pareto front
- matplotlib==3.5.1
- defaultlist==1.0.0
- graphviz==0.20
- anytree==2.8.0
- pyverilog==1.3.0
- mip==1.14.0
Step 3: Packages for Alveo U250 Board (Please notice that (2) and (3) in our environment are too old and abandoned on the current Xilinx website. Hence, please use our archive to have the same experiment environment.)
- (1) Download (https://www.xilinx.com/products/boards-and-kits/alveo/u250.html#gettingStarted) and install the Xilinx Runtime using
  - sudo apt install ./xrt*.deb
- (2) Download (deployment_archive) and install the U250 Deployment Platform using
  - sudo apt-get install ./xilinx-u250-xdma-201830.2-2580015_18.04.deb
- (3) Download (development_archive) and install the U250 Development Platform using
  - sudo apt-get install ./xilinx-u250-xdma-201830.2-dev-2580015_18.04.deb

Artifact Evaluation

For artifact evaluation, if you come across any difficulty about the environment or experiments, or if you need us to provide a remote environment for you, please do not hesitate to contact Linfeng Du @ linfeng.du@connect.ust.hk. We will get back to you ASAP (most likely within 24 hours).

To reproduce the results shown in the FADO paper, to be specific, mainly the last two rows of Table 5 and the whole Table 6, we design the following three experiments. Plesae find below the:

working directory and the corresponding data entry in Table 5 and/or Table 6
command used in terminal
the explanation about generated results and output log
the uncertainty analysis: whether you can reproduce the same or very close results as shown in the paper -- same results will be reproduced for some experiments, while others could vary because of the uncertainty shown in the workflow figure below.
- Uncertainty 1: the initial "AutoBridge Floorplanner" using MILP solver could give various initial solutions
- Uncertainty 2: iterative calling "AutoBridge Floorplanner" could lead to more uncertainty in the resulting QoR output
- Uncertainty 3: randomness during back-end placement and routing (P&R)
- About runtime:
  - CPU performance difference
  - Operating system process scheduling
  - random convergence time of MILP solver
    - Notice: although we can set random seeds to keep the solver's performance stable, it could limit the optimality of results generated. Instead, we run experiments multiple times and reported the most common observation of latency and resource in our paper.
  - ...

Experiment 1: Latency, Resource and DSE Runtime

Command:
```
python main.py 2 9
```
- "2" for AutoBridge Floorplanner
- "9" for various choices of directive optimization (and iterative floorplan legalization)
Working directories:
- ./benchmarks/.*/latency_fp_do:
  - Corresponding data entry in the paper
    - Table 5: "Initial FP -> Iterative DO" (the second line)
  - Uncertainty analysis: (factor: Uncertainty 1)
    - latency: almost always the same
    - resource: almost always the same
    - runtime: could vary
- ./benchmarks/.*/latency_ab:
  - Corresponding data entry in the paper
    - Table 5: "Iterative (DO + AutoBridge FP)" (the third line)
  - Uncertainty analysis: (factors: Uncertainty 1, Uncertainty 2)
    - latency: almost always the same
    - resource: almost always the same
    - runtime: could vary, especially because of MILP solver's convergence randomness
- ./benchmarks/.*/latency_fado:
  - Corresponding data entry in the paper
    - Table 5: "Original (no directive)" (the first line)
    - Table 5: "Iterative (DO + Incr FP) (Ours)" (the fourth line)
    - Table 6 (the whole table)
  - Uncertainty analysis: (factor: Uncertainty 1)
    - latency: almost always the same
    - resource: almost always the same
    - runtime: could vary
    - specially, the "mttkrp_cov" benchmark could have larger randomness because the final utilization is very close to the upper limit of available resource on the FPGA. Except for the most common results reported in our paper, other common results include:
      
      ======== DSE Stages (Table 6) MTTKRP*2+COV*2 ========
      Stage 0: Online
      Resource: 57.10%, Latency (thousand cycles): 160062.3
      Stage 1: Online+Offline
      Resource: 57.10%, Latency (thousand cycles): 160062.3
      Stage 2: Online+Offline+Ahead
      Resource: 63.45%, Latency (thousand cycles): 101763.6
      Stage 3: Online+Offline+Ahead+Back
      Resource: 64.39%, Latency (thousand cycles): 101755.4
      or
      
      ======== DSE Stages (Table 6) MTTKRP*2+COV*2 ========
      Stage 0: Online
      Resource: 63.15%, Latency (thousand cycles): 163241.1
      Stage 1: Online+Offline
      Resource: 64.67%, Latency (thousand cycles): 153927.2
      Stage 2: Online+Offline+Ahead
      Resource: 63.26%, Latency (thousand cycles): 129184.0
      Stage 3: Online+Offline+Ahead+Back
      Resource: 63.25%, Latency (thousand cycles): 128104.0
Output log:
- in ./benchmarks/.*/output/latency_resource_runtime.log
- Example log of test ./benchmark/cnn_2mm/latency_fado:
  
  Iterative (DO + Incr FP) (Our FADO) directive search result (Table 5):
  Runtime (s): 1.7685
  Latency (thousand cycles): 91.164
  Resource: 55%
  ============ DSE Stages (Table 6) ============
  Original (no directive):
  Resource: 28.27%, Latency (thousand cycles): 8933.0
  Stage 0: Online
  Resource: 28.27%, Latency (thousand cycles): 734.6
  Stage 1: Online+Offline
  Resource: 40.12%, Latency (thousand cycles): 131.8
  Stage 2: Online+Offline+Ahead
  Resource: 55.01%, Latency (thousand cycles): 91.4
  Stage 3: Online+Offline+Ahead+Back
  Resource: 54.56%, Latency (thousand cycles): 91.2
Explanation:
- Experiment 1 is designed for you to get almost the same latency and resource, and proportional runtime for every test case, as reported in our paper.

Experiment 2: Frequency Test Only

Command:
```
python main.py 3 4
```
- "3" for exporting RTL design, and packing XO
- "4" for running Vitis flow (v++)
Working directories:
- ./benchmarks/.*/freq_fp_do:
  - Corresponding data entry in the paper
    - Table 5: "Initial FP -> Iterative DO" (the second line)
  - Uncertainty analysis: (factor: Uncertainty 3)
    - frequency: almost always the same
- ./benchmarks/.*/freq_ab:
  - Corresponding data entry in the paper
    - Table 5: "Iterative (DO + AutoBridge FP)" (the third line)
  - Uncertainty analysis: (factors: Uncertainty 3)
    - frequency: almost always the same
- ./benchmarks/.*/freq_fado:
  - Corresponding data entry in the paper
    - Table 5: "Iterative (DO + Incr FP) (Ours)" (the fourth line)
  - Uncertainty analysis: (factor: Uncertainty 3)
    - frequency: almost always the same
Output:
- Please check the post-implementation Fmax using the script ./script/get_freq.py, e.g., starting from the currect base directory:
```
cd ./benchmarks/cnn_2mm/freq_fado/
python ../../../script/get_freq.py .
```
- Example output in the terminal:
  
  Usage: python get_freq.py $(realpath [benchmark base]) Relative path: ./vitis_run/top_xilinx_u250_xdma_201830_2.temp/reports Full vitis report path: ./vitis_run/top_xilinx_u250_xdma_201830_2.temp/reports/link/imp Timing report found: ./vitis_run/top_xilinx_u250_xdma_201830_2.temp/reports/link/imp/> impl_1_xilinx_u250_xdma_201830_2_bb_locked_timing_summary_postroute_physopted.rpt
  
  Fmax: 274.10
Explanation:
- Experiment 2 is designed for you to get almost the same frequency for every test case as reported in paper.

Experiment 3: Whole Flow of FADO

Command:
```
python main.py 2 9
python main.py 3 4
```
Working directories:
- ./benchmarks/.*/all_ab:
  - Corresponding data entry in the paper
    - Table 5: "Iterative (DO + AutoBridge FP)" (the third line)
- ./benchmarks/.*/freq_fado:
  - Corresponding data entry in the paper
    - Table 5: "Iterative (DO + Incr FP) (Ours)" (the fourth line)
Output:
- Latency, Resource, and Runtime in ./benchmarks/.*/output/latency_resource_runtime.log.
- Fmax using the script ./script/get_freq.py.
Explanation:
- Experiment 3 is designed for you to test the functionality of FADO' whole workflow.
- Since all uncertainties mentioned are included in this test, the QoR output could vary a little bit more than previous experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
_fig		_fig
benchmarks		benchmarks
script		script
src_py		src_py
.gitignore		.gitignore
CITATION.cff		CITATION.cff
FADO-FPGA23-Presentation.pdf		FADO-FPGA23-Presentation.pdf
LICENSE		LICENSE
README.md		README.md
fado.pdf		fado.pdf
step1-install-apt-packages.sh		step1-install-apt-packages.sh
step2-pip-requirements.txt		step2-pip-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FADO

Important Notice (Under Artifact Evaluation!)

Environment Requirements

Artifact Evaluation

Experiment 1: Latency, Resource and DSE Runtime

Experiment 2: Frequency Test Only

Experiment 3: Whole Flow of FADO

About

Releases

Packages

License

RipperJ/FADO

Folders and files

Latest commit

History

Repository files navigation

FADO

Important Notice (Under Artifact Evaluation!)

Environment Requirements

Artifact Evaluation

Experiment 1: Latency, Resource and DSE Runtime

Experiment 2: Frequency Test Only

Experiment 3: Whole Flow of FADO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages