Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more optimized startup code #135

Open
jnk0le opened this issue May 28, 2023 · 10 comments
Open

more optimized startup code #135

jnk0le opened this issue May 28, 2023 · 10 comments

Comments

@jnk0le
Copy link
Contributor

jnk0le commented May 28, 2023

Hi, you could consider using whole or parts of my more optimized startup code available here.

There are also other size optimizations like gp covering whole sram, .rodata before .text for 0 offset addressing or .srodata before .rodata (usually I see that the embedded toolchains are preconfigured to disable small data to conserve sram space)

while(1) is 202 bytes (188 without static initializers)

with 1KiB bootloader it should grow by 4 bytes, above it's more complicated

@cnlohr
Copy link
Owner

cnlohr commented May 29, 2023

Hmm... there's a few trade-offs that I've worked through.

  1. It should be in a .c file, so that ch32v003fun can be included with only one .c file. Simplifies inclusion for people on other build systems.
  2. HPE should not be enabled by default. Users should choose if they want it or not. It is not a clear win for many situations.
  3. What about BSS?
  4. What about _data initialization?
  5. Other than that, this is great. Would you consider making some changes to make the default ch32v003fun implementation a little tighter?

I don't understand how .rodata being before .text is helpful, but that would be interesting.

I would like to integrate several of the principles.

Would you consider making a PR for ch32v003fun.c?

@cnlohr
Copy link
Owner

cnlohr commented May 29, 2023

cruuuuuud... I guess I was turning HPE on in my code. I really should not.

@jnk0le
Copy link
Contributor Author

jnk0le commented May 30, 2023

First thing is to resolve the wanted naming for linker symbols as there is literally no standardization.
There are also instances relying on .srodata (aka small rodata) being in SRAM.

I don't understand how .rodata being before .text is helpful, but that would be interesting.

Anything within bottom 2 KiB can be addressed by single addi (or lw) instruction instead of c.lui + addi.
The code uses relative offsets and function pointer making is less common than stuff in .rodata

  1. What about BSS?
  2. What about _data initialization?

do you mean .data and .bss sections? Those are done in L_51 and L_113 loops

cruuuuuud... I guess I was turning HPE on in my code. I really should not.
BTW, only bit 0 of 0x804 is HPE, bit 1 enables interrupt nesting

@jnk0le
Copy link
Contributor Author

jnk0le commented May 30, 2023

by making assumption that mtvec is always initialized to starting address (bootloader sets up mtvec to app address without mode bits), there should be no size diff caused by app offset.

-2 bytes if relying on reset state of mtvec

@jnk0le
Copy link
Contributor Author

jnk0le commented May 30, 2023

wait, are you sure that this area at 0x1ffff000 can actually be used? datasheet only says it's "factory-cured bootloader" and those areas tend to be read only (and write once only).

@cnlohr
Copy link
Owner

cnlohr commented May 30, 2023

by making assumption that mtvec is always initialized to starting address (bootloader sets up mtvec to app address without mode bits), there should be no size diff caused by app offset.

-2 bytes if relying on reset state of mtvec

That makes me anxxxxiousssss

wait, are you sure that this area at 0x1ffff000 can actually be used? datasheet only says it's "factory-cured bootloader" and those areas tend to be read only (and write once only).

Absolutely! That's what I have been using on my 1920-byte-USB thing. I sometimes use it, sometimes flash, but I can definitely reprogram the bootloader.

@jnk0le
Copy link
Contributor Author

jnk0le commented Jun 1, 2023

looking at the openwch repo, the reprogramming of the bootloader is a documented use case

The expected way of entering bootloader is by system reset after setting the bit 14 of FLASH->STATR
https://github.com/openwch/ch32v003/blob/main/EVT/EXAM/IAP/V00x_APP/User/main.c#LL32C5-L32C18

In this way the bootloader experiences the full reset and 0x00000000 remap.

Exiting to APP probably also requires this procedure (though some of the keying should be already done for flash programming).
Because of system reset there seems to be no need for cleanup in peripheral registers.

That's quite different from stm32 "system" bootloaders

That makes me anxxxxiousssss

If the bootloader is in separate flash bank, that's entered/exit by system reset, I think that this -2 bytes is safe

Absolutely! That's what I have been using on my 1920-byte-USB thing. I sometimes use it, sometimes flash, but I can definitely reprogram the bootloader.

minichlink is offsetting the binaries to 0x1ffff000/0x08000000 from 0x00000000 based on command param.

I'll try to figure out how this works in the cursed openocd fork

@jnk0le
Copy link
Contributor Author

jnk0le commented Jun 1, 2023

BTW: https://github.com/cnlohr/ch32v003fun/blob/master/ch32v003fun/ch32v003fun.c#L817

This may break if the compiler decides to prepare address/immediate before the assembly blocks.
GCC is biased to allocate from a5 down (accidentally no break yet), but llvm is going from a0 up.

Those inline asm blocks need register clobbers to be safe.

@mrx23dot
Copy link

mrx23dot commented Oct 7, 2023

A small side track, does stack grow toward .code instead of global vars in the original linker script?

@jnk0le
Copy link
Contributor Author

jnk0le commented Oct 7, 2023

it's always from top towards .heap then .bss and .data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants