Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request -- compact address space support in the CPU complex #629

Open
biosbob opened this issue Jun 11, 2023 · 2 comments
Open

feature request -- compact address space support in the CPU complex #629

biosbob opened this issue Jun 11, 2023 · 2 comments

Comments

@biosbob
Copy link
Collaborator

biosbob commented Jun 11, 2023

with #619 providing some background, here's a proposal for adding a new CAS_EN generic to neorv32_cpu (with a default value of false for backward-compatibility).... if CAS_EN <= true, then all of the neorv32_cpu_* modules can assume we're operating within a logical 17-bit address space described in #619.... for review, bits (16 downto 15) of a compact address select among four distinct sub-regions:

  • b"00" is the IMEM space
  • b"01" is the BOOT space
  • b"10" is the DMEM space
  • b"11" is the PERI space

bits (14 downto 0) are then used within respective implementations of these sub-spaces.... note that all of the 15 lower-order bits are not necessarily needed when reading/writing an addressed location; the PERI space, for instance, can surely use fewer bits to reduce wires as well as to simplify address decode....

when CAS_EN => true, the (32-bit) addresses contained within the ibus_req_o and dbug_req_o ports of neorv32_cpu are effectively 17-bit compressed addresses with bits (31 downto 17) <= '0'.... rather than adding a parallel cas_req_t with only 17-bits of address, i believe we can simply reuse the current bus_req_t type and simply re-define the semantics of its 32-bit address.... outside of neorv32_cpu, only bits (16 downto 0) of these ports will actually be used; hopefully the synthesis tools can figure out that bits (31 downto 17) are always 0 and never used.... as noted above, we might route even fewer of the remaining address lines to other modules....

inside the CPU complex, we can hopefully reduce synthesized logic when CAS_EN => true.... for example, we "know" that any instruction address (eg, current/next PC) only requires a 16-bit register; and any data address being fetched can be staged in a 17-bit register.... note that this optimization does NOT reduce the size of the data (32 bits) being read/written to these compact addresses....

an alternative (easier???) implementation might simply assume that ALL instruction fetches are within 0x0000 up to 0xffff.... this would presumably still eliminate some logic elements within the CPU complex.... i could then perform the final reduction from 32-bits down to 17-bits outside of the CPU complex -- in a re-designed neorv32_buswitch module for instance....

perhaps this is the smallest step we can take that would still yield some reduction in LUTs.... said another way, CAS_EN => true simply asserts that ALL instruction fetches will address the lower 64K of the address space....

comments????

@stnolting
Copy link
Owner

Hey Bob!

Finally, I have time to come back to this 🙈

I really like the concept of the "compressed" address space. But I am not sure how to integrate that into the current setup of the core / project.

I have been thinking about a rework of the internal bus system. @agamez made some great suggestions in #576 (which is still pending). A centralized interconnect that takes care of the address decoding might be a good thing to start with. This would make customization of the address map (as for your proposal) much simpler as there would be only one instance that needs to be customized.

I have been testing some VHDL constructs to setup the processor's address map as single array of records, but I am still not sure how to handle some language-specific aspects.

Anyway, a central address decoding should be the first thing to do for supporting your approach (and also for implementing #576).

@stnolting
Copy link
Owner

I have been thinking about simplifying the address space and decoding... Right now the address space of the IO modules / peripheral is densely packed making it hard to add further addresses or to relocate entire modules.

So how about this: 256 bytes of address space for every module.

  • right now only the debug memory and the CFS really use an address space of 256 bytes
  • if a module just implements 2 32-bit registers the remaining addresses will just "mirror" these two registers
  • we can use bits [12:8] of the address word for easy selection of the accessed IO module (allowing 2^32 modules)

What do you think about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants