-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Billboard vertex compression #2234
Conversation
… show, and translate into 3 floats.
…ance, color and pick color.
… error when undefined.
compressedAttribute2 : 4, // image height, color, pick color, 2 bytes free | ||
eyeOffset : 5, | ||
scaleByDistance : 6, | ||
pixelOffsetScaleByDistance : 7 | ||
}; | ||
|
||
// Identifies to the VertexArrayFacade the attributes that are used only for the pick | ||
// pass or only for the color pass. | ||
var allPassPurpose = 'all'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should drop purpose from VertexArrayFacade
completely since it is not used anywhere else.
@bagnell can you test this on mobile? We should make sure this works on a lot of GPUs. I will test Intel and NVIDIA on Mac. |
float rotation = positionLowAndRotation.w; | ||
#endif | ||
|
||
float upperBound = pow(2.0, 15.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't trust GLSL compilers (especially on mobile) to optimize out the calls to pow
. Can we instead declare constants or even #define
s in case the constants use a register on some atypical implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps declare some of these as 1 / ...
- I'm not sure that this actually matters anymore though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for Math.pow
in the JavaScript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I was hoping the compiler/interpreter would optimize those calls because its much clearer that bits are being shifted without using constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also write czm_
shift left and right functions, which should get inlined.
We're saving a lot of memory and memory bandwidth, but we added a lot of vertex shader instructions so I'd like us to test this on as many systems as possible before merging. @bagnell can you also run some before/after tests (both FPS and GPU memory usage) using perhaps 1K, 10K, and 30K static labels? The memory win should be nice, but I'm not 100% sure about performance. I believe in the extreme case (and without off/on distances), we become fragment bound (perhaps it is framebuffer memory IO. I doubt it is rasterization/compute or texture IO since the fragment shader is trivial and the textures are small and will be in on-chip cache) so the extra vertex shader instructions may not matter at all and the GPU memory bandwidth savings could be a nice win or at least not a non-trivial slowdown. |
Let me know when the code changes are in and I will help test this on different GPUs. |
@pjcozzi The code changes are in. This is good to go for testing. |
Also, can we carefully test picking? Are we ever at risk for the pick id to be off by one due to the compression? |
I didn't compress the color or pick color, just packed them. I don't expect there to be any problems with picking. I didn't notice the test failure. I'll take a look. |
Billboards worked fine on the Dell and Nexus 7 tablets. I didn't see any change in the FPS. For 1K billboards, we saved a few hundred KB. For 10K billboards, we saved ~30MB. For 30K billboards, we saved ~40MB. |
Except for that test failure, things are also good on my MacBook Pro in Chrome and Firefox and with NVIDIA and Intel. @mramato can you try this branch with the Mac Mini / AMD? |
For 10K billboards, we saved 30 MB? Is that (30 * 1024 * 1024) / (10 * 1024) = 3,072 bytes per billboard? I would have expected roughly like 6 * sizeof(vec4) * 4 = 384 bytes (assuming a savings of 6 attributes and each billboard is 4 vertices). Memory allocator overhead and the such? |
Mini looks good. |
Right, for 30K billboards, I'm expecting ~7MB saved, but I'm getting at least ~30MB consistently according to the Chrome task manager. |
@bagnell anything else you want to do here? I still want to test on a few more machines. |
Ah, the test failure. |
@pjcozzi This is ready for another look. |
For the performance numbers above, did you test with labels or billboards? The numbers would make more sense for labels. Did you tested on IE 11? I'll also test on Windows/NVIDIA tonight with all browsers. Can we also do a dynamic performance: perhaps 15K billboards with their positions changing each frame? We introduced some CPU overhead but we also reduced it a bit in the browser/driver and reduced the memory bandwidth. Perhaps we should also try a performance test where we modify a property that is not packed and compressed along with several others, which I would expect to be a bit slower, which should be fine. |
Looks good on Windows Chrome, Firefox, and IE 11. |
Here's a better idea for the dynamic performance test: instead of just changing position, change all properties. |
Tests are good now on Mac. Let's just do a sanity check for the dynamic performance and then we're good I think. @bagnell anything else? |
ba15279
to
4567101
Compare
For changing all of the properties of 15K billboards per frame, this branch runs at 17 FPS and master runs at 11 FPS. |
For #419.