Skip to content

Add PE configuration benchmark results and fix decoder bugs#4

Merged
zhoubot merged 2 commits intoLinxISA:mainfrom
fengzhazha:main
Feb 15, 2026
Merged

Add PE configuration benchmark results and fix decoder bugs#4
zhoubot merged 2 commits intoLinxISA:mainfrom
fengzhazha:main

Conversation

@fengzhazha
Copy link
Contributor

Changes

Bug Fixes

  • Fix cube v2 decoder for dynamic ARRAY_SIZE (use math.log2 instead of hardcoded >> 4)
  • Fix multiple continuous assignment bugs in issue_queue and decoder

Documentation

  • Add benchmark results for 64×64×64 MATMUL with different PE configurations:
    PE Array Uops Actual Cycles Efficiency
    16×16 64 74 90.54%
    8×8 512 579 88.95%
    4×4 4096 4163 98.46%

Test Files

  • Add Verilator testbench for cycle count measurement
  • Add test scripts for PE configuration testing

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

Shulin Feng and others added 2 commits February 14, 2026 18:24
- Fix decoder tile calculation to use dynamic shift_amount based on ARRAY_SIZE
  instead of hardcoded >> 4 (divide by 16)
- Fix multiple continuous assignment bugs in decoder, issue_queue, and mmio
  using explicit priority mux pattern
- Add 64x64x64 MATMUL testbench for cycle count measurement
- Change ARRAY_SIZE to 8 for testing different PE configurations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Test results for 64×64×64 MATMUL with different PE array sizes:
- 16×16 PE: 74 cycles (90.54% efficiency)
- 8×8 PE: 579 cycles (88.95% efficiency)
- 4×4 PE: 4163 cycles (98.46% efficiency)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@zhoubot zhoubot merged commit 6156214 into LinxISA:main Feb 15, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants