Skip to content

proj_dis_perf_improvements_1

Tsukasa OI edited this page Aug 2, 2023 · 33 revisions

Project: Disassembler Performance Improvements (Optimization 1)

Benchmarking System

This benchmark is performed on:

  • Ubuntu 23.04
  • AMD Ryzen 5 PRO 5650G processor.

In the parallel run, I ran 6 parallel jobs with -j6 (corresponding 6 cores; although the processor has 12 hardware threads, -j12 just slowed the benchmark).

Aggregate Performance Improvements

Type Expected Improvements (in general)
Binary Files 50-130%
RISC-V ELF Programs 30-40%
RISC-V ELF Libraries 0-500%

This is relative to the latest master (commit c8e1332cc7d4) and taken on 2023-08-01.

Columns

objdump -d (ELF)

Program Prerequisites HTable/Caching Mapping Syms Notes
Busybox 1.35.1 (RV64GC) 3.4-3.7% 39.2-43.0% 39.7-45.4%
OpenSBI 1.1 (generic fw_*.elf) 4.6-8.4% 73.4-75.5% 74.3-76.0%
Linux kernel 5.19 (vmlinux) 3.7-3.7% 40.1-40.5% 40.2-40.7%
Linux kernel 5.19 (vmlinux.o) (-0.1)-2.5% 1.2-25.1% 3.7-25.3% Not finally linked
glibc (libc.so.6) 3.1-5.5% 34.9-40.6% 35.8-41.3%

objdump -d (ELF-based archive)

Program Prerequisites HTable/Caching Mapping Syms
glibc (libc.a) 0.8-1.9% 15.1-18.4% 15.1-18.4%
newlib (libc.a) 0.5-3.2% 7.4-15.4% 7.4-16.4%

objdump -D (binary)

Program Prerequisites HTable/Caching Mapping Syms
Linux kernel 5.19 (vmlinux) 1.5-2.0% 134.0-162.4% 133.5-161.2%
Random files (/dev/urandom) 0.9-2.4% 165.0-170.8% 163.0-169.9%
1M (1048576) CSR instructions 4.3% 777.7% 771.5%

Despite that the CSR optimization is in the "Prerequisites", its effect is significant in "HTable/Caching" or later. Hash table optimization adds great synergy to the CSR optimization.

gdb: disas of near all code region

Program Prerequisites HTable/Caching Mapping Syms
Linux kernel 5.19 (vmlinux) with debug info 51.0% 58.1% 58.0%
Linux kernel 5.19 (vmlinux) without debug info 149.0% 169.9% 172.9%
OpenSBI 1.1 (generic fw_*.elf) 100.5-102.1% 120.2-121.0% 122.2-122.8%
1M (1048576) CSR instructions (ELF) 82.9% 381.7% 378.2%

Batch: objdump -d on Linux distribution

Serial Run: All ELF Files Under the Directory

System Path N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
/usr/bin 563 3.2% 33.9% 34.6%
Debian unstable
(as of 2022-07-20)
/usr/bin 269 2.3% 32.8% 33.1%
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
/usr/lib 6797 27.1% 49.1% 86.0%
Debian unstable
(as of 2022-07-20)
/usr/lib 548 92.9% 109.3% 497.0%

Parallel Run: All (including data-only ELFs)

System N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
7666 18.7% 35.5% 57.4%
Debian unstable
(as of 2022-07-20)
946 122.3% 136.2% 454.6%

Batch: objdump -D (as binary) on Linux distribution

Serial Run: All ELF Files Under the Directory

System Path N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
/usr/bin 563 3.1% 90.5% 90.8%
Debian unstable
(as of 2022-07-20)
/usr/bin 269 3.5% 75.9% 76.4%

Parallel Run: All (including data-only ELFs)

System N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
7666 3.2% 93.8% 93.9%
Debian unstable
(as of 2022-07-20)
946 3.4% 75.2% 75.7%

objdump -d (ELF): Extreme Examples

Program N Prerequisites HTable/Caching Mapping Syms (in other words)
OpenSSL 2 132.5-133.6% 138.0-141.1% 3247.6-3286.8% x33.476-33.868
LLVM 1 152.2% 152.4% 29048.6% x291.486
  • OpenSSL
    1. Ubuntu 22.04 LTS (image for HiFive Unmatched) : /usr/lib/riscv64-linux-gnu/libcrypto.so.3
    2. Debian unstable (as of 2022-07-20) : /usr/lib/riscv64-linux-gnu/libcrypto.so.3
  • LLVM
    1. Ubuntu 22.04 LTS (Package libllvm14 Version 1:14.0.0-1ubuntu1) : /usr/lib/riscv64-linux-gnu/libLLVM-14.so.1

For large library with many symbols, the effect of the mapping symbol optimization is huge. I did expect some improvements but not that huge.

This optimization also benefits Arm architecture (not AArch64, due to different mapping symbol handlings).

Clone this wiki locally