Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort not working on Android devices, Quest 3 and Lenovo A3 #112

Open
mark2142 opened this issue Apr 8, 2024 · 21 comments
Open

Sort not working on Android devices, Quest 3 and Lenovo A3 #112

mark2142 opened this issue Apr 8, 2024 · 21 comments

Comments

@mark2142
Copy link

mark2142 commented Apr 8, 2024

First I want to say thank you to the author for his great work and effort.
I am using Vulkan.
I am using a Lenovo A3 and my brother is using a quest 3 and we have the same problem.
I can see the splats on device.
However the splats seem to quickly go from unsorted but visible to a very sparse set of shimmering splats.
When I set the number of frames between sort to a high number it takes longer for the splats to fade away.
If I don't sort at all, All splats are there but out of order and don't change.
So the sort is the problem.
Using Vulkan it does not complain about the wave intrinsics.
So as far as I can tell the sort is trashing my splats. I know there's an older version of the sort in the 0.7 version of the project but I haven't been able to fix using that. Anybody have a fix to the sort algorithm or different GPU sort or any other ideas on how to fix this?

@aras-p
Copy link
Owner

aras-p commented Apr 9, 2024

I know there's an older version of the sort in the 0.7 version of the project but I haven't been able to fix using that

So you mean the older version also has the sorting issue?

What GPU is in your Lenovo machine?

@mark2142
Copy link
Author

mark2142 commented Apr 9, 2024

Not sure if the old version has the sorting issue. And I can't find the specs for the non pc-edition of the A3 glasses same issue on quest 3.
The function CSCalcDistances is returning a large number of zeros on device. And each time the sort is run the number of zeroes is more.
I think the sort is failing due to this distance calculation failing.
Could there be an issue with FloatToSortableUint on device?
What log can I give you to help. I can trace the values out at different stages of the sorting process but I feel that this distance calculation could be the issue. Is it possible the splats are being deactivated due to an incorrect distance calculation?

@aras-p
Copy link
Owner

aras-p commented Apr 11, 2024

Hmm yeah that is very curious. Apparently something goes wrong somewhere, but kinda hard to say where or why. Something in the GPU driver not quite liking or mishandling code that comes out of FloatToSortableUint? No idea :/

@b0nes164
Copy link
Contributor

b0nes164 commented Apr 16, 2024

I'll open a PR to push the latest version of my sort some time this week, but the only bug I am aware of on the version here is a very rare TDR crash on lower end devices due to cache thrashing.

There could be a bug that is not being caught being by the current testing spread, as I have not tested on Qualcomm, and Qualcomm is wave size 128. No idea about Vulkan, as I've only tested on DX12, though AFAIK Vulkan should be more capable than DX12.

Could you also confirm if the older sorting version fixes the problem?

@b0nes164
Copy link
Contributor

Hi @aras-p, I just want to give a roadmap for where I want to go with this PR and how long it might take.

It's very likely that there is a bug on Qualcomm GPUs as I mentioned above, I have not tested on Qualcomm. I have a Qualcomm GPU on the way which should arive sometime this weekend-ish, and so I expect to begin debugging then.

However, bugs aside, I had several things planned that I wanted to push and I figure this would be a good opportunity to roll them all together into one PR:

  • The sorting code now lives at GPUSorting, which includes a Unity package. For ease of maintenance I want to replace the code here with a depedency for that package.
  • This might be a bit of a reach, but I have been cooking a method that should allow GPUs without forward progress to run OneSweep without catching on fire. Maybe this could be added an "experimental use-at-your own risk" option, disabled by default? (Totally understand if you do not want to deal with the headache.)

I expect this to take maybe 2 weeks. If you have any issues please let me know. 👍

@aras-p
Copy link
Owner

aras-p commented Apr 20, 2024

@b0nes164 much thanks for the continued interest in pushing portable GPU sorting state of the art!

That said, I'd like to avoid external (non-unity) package dependencies in this project. I could perhaps do the "vendoring" approach of like periodically literally copying parts of your GPUSorting repository into this one (with proper attribution etc.), but actually depending on the package directly I'm not too keen on.

Maybe this could be added an "experimental use-at-your own risk" option, disabled by default?

Possibly!

@ZaneZee
Copy link

ZaneZee commented Apr 29, 2024

Hey there, I'm running into the same issue here as you are when trying to build to the Quest 3. The gaussians seem to be popping in and out of existence, but only when building to the Quest. This popping is slowed down by the "Sort Nth Frame" parameter and does not happen when stopping the sorting code from running. This is using Vulkan. Any help from anyone that has solved this would be amazing, @b0nes164 if you have a solution that worked for you I'd love to hear it! Happy to validate any ideas if you don't have a Quest 3 available. I will update this if I find a fix!

img-4357_ufoiEZVX.mp4

@b0nes164
Copy link
Contributor

Hi @ZaneZee.

So the laptop I bought to test the sorting on Qualcomm did not have the capabilities to even run the sort (boo). See b0nes164/GPUSorting#3 (comment).

However, if you are willing, I can write a debug version of the sort for you to run on your Quest in Unity, and then you can send the results back to me. Keep in mind this will be a long process that could take several days to finish.

Based on the video, only a small part of the sort is breaking as opposed to #82 (comment), but it's unclear exactly what's going wrong.

@ZaneZee
Copy link

ZaneZee commented Apr 29, 2024

Hey, @b0nes164 thanks for the quick response! I am absolutely willing to help however I can. Let me know what you'd like me to test and I'm happy to build it to my Quest 3 and give you the results. Totally understand the back and forth will be a process! From what I've already tested it seems to start with a denser blob of gaussians when first loading the scene, then quickly devolving into this blob of flashing gaussians. I can also provide an example of what this looks like if you think it would be helpful. Thanks again, looking forward to testing whatever ideas you've got!

@b0nes164
Copy link
Contributor

b0nes164 commented May 1, 2024

@ZaneZee Sweet, I should have a debugger up by tonight.

@b0nes164
Copy link
Contributor

b0nes164 commented May 2, 2024

@ZaneZee Alright. The debugging procedure is as follows:

  1. Create an empty project.
  2. Create an empty gameobject and add the C# script to the gameobject.
  3. Attatch the compute shader to the gameobject.
  4. Build using Vulkan and run the project on your Quest 3.
  5. Collect the text files generated from your Quest 3, and zip them back to me.

Many thanks!

DVRDebug.zip

@ZaneZee
Copy link

ZaneZee commented May 2, 2024

@b0nes164 Amazing I will go through this when I can today and send back the results! Thanks!

@ZaneZee
Copy link

ZaneZee commented May 3, 2024

@b0nes164 Here are the logs, let me know if there is any issue with these or if you need more info! Happy to help where I can. Should be able to get back to you pretty quickly now if you need anything else!
DebugLogs.zip

@mark2142
Copy link
Author

mark2142 commented May 3, 2024

I have made the logs for the phone used to run the lonovo A3.
It is a mot
motorola_edge_plus_2022_lenovo_A3_Android13.zip
orola edge plus 2022.
It has the same problem as quest 3 and the video in an earlier post shows.

@b0nes164
Copy link
Contributor

b0nes164 commented May 3, 2024

@ZaneZee Do you have discord? If you do, add me: throw_away_1234. I don't want to clog up this issue thread anymore than we already have.

@mark2142 I appreciate the help, but I don't need any testing from you at the moment. Also you did not set up the tests correctly (probably did not attach the compute shader) as the data collected is unintialized memory values.

@mark2142
Copy link
Author

@ZaneZee and @b0nes164 .
What is the progress guys?
Do you want more help?
Do you have any more information on the problem?

@b0nes164
Copy link
Contributor

Apologies, but I've been extremely busy with other work at the moment. There is definitely something breaking with the sort, but I haven't been able to replicate the issue in isolation. It's definitely fixable and is still on my to do list, but finding time is the issue at the moment. Thanks again to @ZaneZee for testing and their patience.

@mark2142
Copy link
Author

Don't know why but another project based on this project seems to work.
https://github.com/ptc-lexvandersluijs/Unity3DGS_VR

@aras-p
Copy link
Owner

aras-p commented May 16, 2024

Don't know why but another project based on this project seems to work. https://github.com/ptc-lexvandersluijs/Unity3DGS_VR

From the looks of it, it is based on much older version of this same project, where the GPU sorting routine was based on FidelityFX radix sort. That sort is several times slower, but apparently does not have the bug/issue with some GPUs.

@b0nes164
Copy link
Contributor

b0nes164 commented May 19, 2024

Just to give a quick update, there are definitely transpilation issues going on. For whatever reason, using WaveGetLaneCount() or WaveGetLaneIndex() as a predicate for large ternary operations causes incorrect behavior for a single lane on the Quest 3 in Vulkan. See lines L189-L191, L283-285, and L471-472, and note that no race conditions are possible at any of these locations. Simply breaking the operation into multiple lines produced correct behavior.

While changing these lines was enough for the sort to work correctly in isolated single runs, it still fails when integrated with the splatting routine, just less than before.

More testing will be required to figure what else is breaking and what else the transpilation doesn't like.

@ZaneZee
Copy link

ZaneZee commented May 23, 2024

Hey @b0nes164, I sent you another message with some more info on discord. Let me know if you have time to continue trying to solve the issue here! Happy to help in any way I can still as well. Thanks for taking the time with what you've already looked into also, at the very least it is one step closer to being resolved. Looking forward to hearing from you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants