Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap / Contributing #1

Open
56 of 73 tasks
Dolu1990 opened this issue Nov 14, 2023 · 10 comments
Open
56 of 73 tasks

Roadmap / Contributing #1

Dolu1990 opened this issue Nov 14, 2023 · 10 comments

Comments

@Dolu1990
Copy link
Member

Dolu1990 commented Nov 14, 2023

If you are interrested in contributing to the project, please let me know ^^

Here are the current work-items in completion order

  • Plugin API
  • Pipeline API
  • Basic frontend
  • Decoder
  • Multi issue dispatcher
  • Execute
  • Integer ALU / Shift
  • Basic testbench
  • Writeback
  • Bypass
  • Hazard
  • Branch
  • Multi issue
  • RVLS integration
  • Konata traces
  • Regression framework
  • Load / Store
  • Basic CSR support
  • mul / div / rem
  • Passing riscv-test
  • BTB + RAS predictor
  • Passing riscv-arch-test
  • Passing embench / coremark / dhrystone in dual issue
  • GShare predictor
  • Late ALU support
  • Exception support
  • Interrupt support
  • RV32 + RV64 support
  • Privilege / CSR implemented
  • Run FreeRTOS tests
  • MMU
  • RVA
  • Linux / buildroot / opensbi
  • Multi issue fetch's aligner
  • RVC
  • I$
  • D$
  • Memory coherency / multi-core
  • SoC
  • Floating point
  • Software prefetcher (zicbop -fprefetch-loop-arrays)
  • Hardware prefetcher
  • PMP
  • SMT support

Decoupled todo :

  • Small iterative shift plugin
  • Small iterative mul/div plugin
  • Bit manip extention
  • crypto extention
  • Adding bridges from ibus dbus toward bus standards (Wishbone, Tilelink, AXI, AHB, Avalon, ...)
  • ...

Improvements

  • flush signal propagate from upstream to the fetch down into the aligner, instead of being only used late in the aligner ( very bad timings when RVC is used )
  • When the FPU is enable, the DispatchPlugin can't handle the late RS uses (val skip), as the assumption of a short pipeline isn't true anymore. Need fix
  • Check that out of pipe / fpu do not write X0
  • Check FPU access to io region
  • maybe lsuL1 store should freeze cpu when refill is already using the bank write interface
  • LsuL1 area increase too much with way count
  • The RVC decompressor could be optimized
  • Another pipeline would be needed to support serializing multiple uop from one instruction
  • The current AlignerPlugin can easily support 48 bits / 64 bits instruction
  • Coherency bridge to tilelink without fifo
  • LsuPlugin fence missing, especialy as now there is a write buffer, this is needed
  • LsuL1 last stage realy need to be stable and not sensitive to any concurent task progress
  • LsuL1 write buffer
  • LsuL1 doesn't need bank/way arbitration when there is no prefetch / coherency / multi threading
  • LsuCacheless bus cmd persistance need to be implemented
  • DispatchPlugin DONT_FLUSH_FROM_LANES is too pessimistic and reduce perf on lsuL1 (ex : branch -> lw)
  • RamSyncMwXor isn't good, as it use async read, need to implement RamSyncMwMux for sync regfile instead
  • BranchPlugin used in late ALU timings could be improved by precalculating the target PC (when the target PC doesn't come from registers)
  • BranchPlugin used in late ALU could reuse some of the early BranchPlugin results.
  • DispatchPlugin could buffer some instructions (ex 1 for dual issue), allowing to avoid half full dipatch (36% on dhrystone)
  • Add memory region and prevent their accesses via trap
  • Avoid TrapPlugin trap request directly halting fetch, as it create long combinatorial path

Redesign

  • Having the cache out of pip would allow to share it with probe requests to save area
@Dolu1990 Dolu1990 changed the title Roadmap Roadmap / Contributing Nov 21, 2023
@Dolu1990 Dolu1990 pinned this issue Nov 22, 2023
@andreasWallner
Copy link
Collaborator

just to avoid duplicate work: iterative shift is on it's way, just like iterative mul

@bitpasta
Copy link

bitpasta commented Jan 1, 2024

just to let you know - i'm working on a radix2 divider here:
https://github.com/bitpasta/VexiiRiscv/tree/dev_divradix2

@Dolu1990
Copy link
Member Author

lolololol

[Progress] Start VexiiRiscv test simulation with seed 2

OpenSBI v0.8
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name       : NaxRiscv
Platform Features   : timer,mfdeleg
Platform HART Count : 1
Boot HART ID        : 0
Boot HART ISA       : rv32imasu
BOOT HART Features  : scounteren,mcounteren
BOOT HART PMP Count : 0
Firmware Base       : 0x80000000
Firmware Size       : 64 KB
Runtime SBI Version : 0.2

MIDELEG : 0x00000222
MEDELEG : 0x0000b109
[    0.000000] Linux version 5.10.1 (rawrr@rawrr) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2020.11-rc3-8-g9ef54b7d0b) 10.2.0, GNU ld (GNU Binutils) 2.34) #2 SMP Wed Jan 26 14:18:17 CET 2022
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] Initial ramdisk at: 0x(ptrval) (8388608 bytes)
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000080400000-0x000000008fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080400000-0x000000008fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080400000-0x000000008fffffff]
[    0.000000] SBI specification v0.2 detected
[    0.000000] SBI implementation ID=0x1 Version=0x8
[    0.000000] SBI v0.2 TIME extension detected
[    0.000000] SBI v0.2 IPI extension detected
[    0.000000] SBI v0.2 RFENCE extension detected
[    0.000000] SBI v0.2 HSM extension detected
[    0.000000] riscv: ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s18700 r0 d22260 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64008
[    0.000000] Kernel command line: rootwait console=hvc0 earlycon=sbi root=/dev/ram0 init=/sbin/init
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 241280K/258048K available (4717K kernel code, 553K rwdata, 632K rodata, 166K init, 213K bss, 16768K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] random: get_random_bytes called from start_kernel+0x35c/0x4dc with crng_init=0
[    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[    0.000053] sched_clock: 64 bits at 100MHz, resolution 10ns, wraps every 4398046511100ns
[    0.001054] Console: colour dummy device 80x25
[    0.001405] printk: console [hvc0] enabled
[    0.001405] printk: console [hvc0] enabled
[    0.001949] printk: bootconsole [sbi0] disabled
[    0.001949] printk: bootconsole [sbi0] disabled
[    0.002617] Calibrating delay loop (skipped), value calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[    0.003334] pid_max: default: 32768 minimum: 301
[    0.004330] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.004931] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.011492] rcu: Hierarchical SRCU implementation.
[    0.013344] smp: Bringing up secondary CPUs ...
[    0.013686] smp: Brought up 1 node, 1 CPU
[    0.015071] devtmpfs: initialized
[    0.018177] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.018841] futex hash table entries: 256 (order: 2, 16384 bytes, linear)
[    0.020091] NET: Registered protocol family 16
[    0.045047] clocksource: Switched to clocksource riscv_clocksource
[    0.076845] NET: Registered protocol family 2
[    0.079953] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[    0.080635] TCP established hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.081450] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.082188] TCP: Hash tables configured (established 2048 bind 2048)
[    0.082859] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.083429] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.085938] Unpacking initramfs...
[    0.226491] Initramfs unpacking failed: invalid magic at start of compressed archive
[    0.257373] Freeing initrd memory: 8192K
[    0.259617] workingset: timestamp_bits=30 max_order=16 bucket_order=0
[    0.298254] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
[    0.298755] io scheduler mq-deadline registered
[    0.299089] io scheduler kyber registered
[    0.496443] NET: Registered protocol family 10
[    0.499721] Segment Routing with IPv6
[    0.500313] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.503312] NET: Registered protocol family 17
[    0.505957] Freeing unused kernel memory: 164K
[    0.506284] Kernel memory protection not selected by kernel config.
[    0.506742] Run /init as init process
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Saving random seed: [    0.861828] random: dd: uninitialized urandom read (512 bytes read)
OK
Starting network: OK

Welcome to Buildroot
buildroot login: root
root
           _  _                     ___      _
    o O O | \| |   __ _    __ __   | _ \    (_)     ___     __     __ __
   o      | .` |  / _` |   \ \ /   |   /    | |    (_-<    / _|    \ V /
  TS__[O] |_|\_|  \__,_|   /_\_\   |_|_\   _|_|_   /__/_   \__|_   _\_/_
 {======|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|
./o--000'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'

root@buildroot:~# cat /proc/cpuinfo
cat /proc/cpuinfo
processor	: 0
hart		: 0
isa		: rv32ima
mmu		: sv32

root@buildroot:~# echo 1+2+3*4 | bc
echo 1+2+3*4 | bc
15
root@buildroot:~# micropython
micropython
MicroPython v1.13 on 2022-01-26; linux version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import math

>>> math.sin(math.pi/4)

0.7071067811865475
>>> from sys import exit

>>> exit()

root@buildroot:~# ls /
ls /
bin      init     linuxrc  opt      run      tmp
dev      lib      media    proc     sbin     usr
etc      lib32    mnt      root     sys      var
root@buildroot:~# 

@bitpasta
Copy link

Wow nice! Congratulations!

@djsftree
Copy link

Congrats indeed!

@Dolu1990
Copy link
Member Author

Dolu1990 commented Apr 8, 2024

Brawww

root@buildroot:~# cat /proc/cpuinfo 
processor	: 0
hart		: 0
isa		: rv32ima
mmu		: sv32

processor	: 1
hart		: 1
isa		: rv32ima
mmu		: sv32
#################
chocolate-doom -1 -timedemo demo1.lmp &
...
timed 5026 gametics in 2724 realtics (64.577827 fps)
#################
chocolate-doom -1 -timedemo demo1.lmp &
chocolate-doom -1 -timedemo demo1.lmp &
...
timed 5026 gametics in 2897 realtics (60.721436 fps)
timed 5026 gametics in 2918 realtics (60.284443 fps)

Brawwwwwww

For reference :

python3 -m litex_boards.targets.digilent_nexys_video --soc-json build/digilent_nexys_video/csr.json --cpu-type=vexiiriscv  --vexii-args="--allow-bypass-from=0 --debug-privileged --with-mul --with-div --div-ipc --with-rva --with-supervisor --performance-counters 0 --fetch-l1 --fetch-l1-ways=4 --lsu-l1 --lsu-l1-ways=4 --fetch-l1-mem-data-width-min=64 --lsu-l1-mem-data-width-min=64  --with-btb --with-ras --with-gshare --relaxed-branch --regfile-async --lsu-l1-refill-count 2 --lsu-l1-writeback-count 2 --with-lsu-bypass --decoders=2 --lanes=2 --lsu-l1-store-buffer-slots=4 --lsu-l1-store-buffer-ops=32" --cpu-count=2 --with-jtag-tap  --with-video-framebuffer --with-sdcard --with-ethernet --with-coherent-dma --l2-bytes=131072

With the chocolate doom patch from litex-hub/linux-on-litex-vexriscv#290, which avoid x11 layer.

@bitpasta
Copy link

bitpasta commented Apr 9, 2024

Nice! :) How does that compare to Vex and Nax?

@Dolu1990
Copy link
Member Author

Dolu1990 commented Apr 9, 2024

It compare quite well also, not all option are turned on.

Vex   : timed 5026 gametics in 4866 realtics (36.150841 fps) ( no l2)
Vexii : timed 5026 gametics in 2724 realtics (64.577827 fps) (128KB-l2)
Nax   : timed 5026 gametics in 2375 realtics (74.067368 fps) (128KB-l2)

Just tried now quad core and octo core works aswell.

@Dolu1990
Copy link
Member Author

root@nexys:~# cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux trixie/sid"
NAME="Debian GNU/Linux"
VERSION_CODENAME=trixie
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@nexys:~# cat /proc/cpuinfo 
processor	: 0
hart		: 0
isa		: rv64imafdc
mmu		: sv39
mvendorid	: 0x0
marchid		: 0x5
mimpid		: 0x0

processor	: 1
hart		: 1
isa		: rv64imafdc
mmu		: sv39
mvendorid	: 0x0
marchid		: 0x5
mimpid		: 0x0

root@nexys:~# 

@Dolu1990
Copy link
Member Author

Dolu1990 commented May 3, 2024

root@nexys:~# neofetch 
       _,met$$$$$gg.          root@nexys 
    ,g$$$$$$$$$$$$$$$P.       ---------- 
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux trixie/sid riscv64 
 ,$$P'              `$$$.     Kernel: 6.1.0-rc2+ 
',$$P       ,ggs.     `$$b:   Uptime: 17 hours, 47 mins 
`d$$'     ,$P"'   .    $$$    Packages: 1698 (dpkg) 
 $$P      d$'     ,    $$P    Shell: bash 5.2.15 
 $$:      $$.   -    ,d$$'    Resolution: 800x600 
 $$;      Y$b._   _,d$P'      WM: wmaker 
 Y$$.    `.`"Y$$$$P"'         Theme: Adwaita [GTK3] 
 `$$b      "-.__              Icons: Adwaita [GTK3] 
  `Y$$                        Terminal: /dev/pts/2 
   `Y$$.                      CPU: (2) 
     `$$b.                    Memory: 97MiB / 472MiB 
       `Y$$b.
          `"Y$b._                                     
              `"""                                    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants