Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDS Speed Boost #24

Open
wants to merge 8 commits into
base: nds
Choose a base branch
from
Open

NDS Speed Boost #24

wants to merge 8 commits into from

Conversation

RetroGamer02
Copy link

I made a few small changes to boost the speed a bit.
Used some NDS BIOS Math and Hardware accelerated Math where I found possible and optimized with O3 instead of O2.

@@ -178,7 +178,7 @@ void gd_create_origin_lookat(Mat4f *mtx, struct GdVec3f *vec, f32 roll) {

gd_set_identity_mat4(mtx);
if (hMag != 0.0f) {
invertedHMag = 1.0f / hMag;
invertedHMag = swiDivide(1.0f, hMag);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swiDivide doesn't support floats. Also, changes to the game code should be ifdefed to preserve the N64 build.

@@ -417,7 +417,7 @@ static void g_vtx(Gwords *words) {
const Vtx *vertices = (const Vtx*)words->w1;

// Store vertices in the vertex buffer
memcpy(&vertex_buffer[index - count], vertices, count * sizeof(Vtx));
swiFastCopy(vertices, &vertex_buffer[index - count], sizeof(Vtx) * 4);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this replace count with 4? That seems like a bug.

Copy link

@Epicpkmn11 Epicpkmn11 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi swiFastCopy is bugged and actually significantly slower than memcpy, if you want something faster try DMA or tonccpy or some other DS optimized memcpy

http://problemkaputt.de/gbatek-bios-memory-copy.htm

BUG: The NDS/DSi uses the fast 32-byte-block processing only for the first N bytes (not for the first N words), so only the first quarter of the memory block is FAST, the remaining three quarters are SLOWLY copied word-by-word.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi swiFastCopy is bugged and actually significantly slower than memcpy, if you want something faster try DMA or tonccpy or some other DS optimized memcpy

http://problemkaputt.de/gbatek-bios-memory-copy.htm

BUG: The NDS/DSi uses the fast 32-byte-block processing only for the first N bytes (not for the first N words), so only the first quarter of the memory block is FAST, the remaining three quarters are SLOWLY copied word-by-word.

Thank you for the info. I tried DMA copy and found it caused graphical corruption. This is the first I have heard of tonccpy. Would you say that normal swiCopy would be good for this?

Copy link

@Epicpkmn11 Epicpkmn11 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swiCopy is even worse than swiFastCopy, I did some testing and tonccpy seems to be more or less equal to memcpy (but VRAM safe), swiFastCopy is about 10% slower than memcpy, and swiCopy is about half the speed of memcpy. For whatever reason dmaCopy isn't cooperating with my testing so not sure exactly on it but I know it should be faster than memcpy.

You might need to flush the cache for DMA copy to work, since DMA is separate from the CPU it can't access the CPU's cache. I'm not sure how big if a speed penalty cache flushing has so CPU caching might end up faster in some cases because of that.

Edit: DMA wasn't cooperating because I was using no$gba it turns out, though my results still seem a bit weird on hardware... I'm getting DMA as like half the speed of memcpy which doesn't seem right... I'm just putting cpuStartTiming(0) before and cpuEndTiming() after doing a large copy, not sure if there's a better way to do that.

Copy link
Author

@RetroGamer02 RetroGamer02 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to use dmaCopy but I can't seem to get it to work without corrupted graphics. I am flushing the cache.
I used
DC_FlushRange(&vertex_buffer[index - count], count * sizeof(Vtx));
dmaCopy(vertices, &vertex_buffer[index - count], count * sizeof(Vtx));
I have no idea what is wrong. I tried using
DC_FlushRange(vertices, count * sizeof(Vtx));
But that is even worse.

Copy link

@Epicpkmn11 Epicpkmn11 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to flush the source, not the destination

DC_FlushRange(vertices, count * sizeof(Vtx));

edit: didn't see your whole message whoops, not sure why it's not working tbh 😅, i usually just use tonccpy since it's good enough and always works

@RetroGamer02
Copy link
Author

I did not think about preserving the N64 build my apologies. As for replacing count with 4 it seems that swiFastCopy calculates size differently than memcpy. I will start adding the ifdefs in a bit.

@1upus
Copy link

1upus commented Feb 17, 2023

Maybe you can fix ingame dialogs fonts too? It will be great!

@riolubruh
Copy link

BTW the optimization flag -Ofast is better than -O3 in some cases (I have also tested it and as far as I can tell there is no downside to doing so)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants