Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Billboard vertex compression #2234

Merged
merged 20 commits into from
Oct 31, 2014
Merged

Billboard vertex compression #2234

merged 20 commits into from
Oct 31, 2014

Conversation

bagnell
Copy link
Contributor

@bagnell bagnell commented Oct 29, 2014

For #419.

compressedAttribute2 : 4, // image height, color, pick color, 2 bytes free
eyeOffset : 5,
scaleByDistance : 6,
pixelOffsetScaleByDistance : 7
};

// Identifies to the VertexArrayFacade the attributes that are used only for the pick
// pass or only for the color pass.
var allPassPurpose = 'all';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should drop purpose from VertexArrayFacade completely since it is not used anywhere else.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

@bagnell can you test this on mobile? We should make sure this works on a lot of GPUs. I will test Intel and NVIDIA on Mac.

float rotation = positionLowAndRotation.w;
#endif

float upperBound = pow(2.0, 15.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't trust GLSL compilers (especially on mobile) to optimize out the calls to pow. Can we instead declare constants or even #defines in case the constants use a register on some atypical implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps declare some of these as 1 / ... - I'm not sure that this actually matters anymore though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for Math.pow in the JavaScript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I was hoping the compiler/interpreter would optimize those calls because its much clearer that bits are being shifted without using constants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also write czm_ shift left and right functions, which should get inlined.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

We're saving a lot of memory and memory bandwidth, but we added a lot of vertex shader instructions so I'd like us to test this on as many systems as possible before merging.

@bagnell can you also run some before/after tests (both FPS and GPU memory usage) using perhaps 1K, 10K, and 30K static labels? The memory win should be nice, but I'm not 100% sure about performance. I believe in the extreme case (and without off/on distances), we become fragment bound (perhaps it is framebuffer memory IO. I doubt it is rasterization/compute or texture IO since the fragment shader is trivial and the textures are small and will be in on-chip cache) so the extra vertex shader instructions may not matter at all and the GPU memory bandwidth savings could be a nice win or at least not a non-trivial slowdown.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

Let me know when the code changes are in and I will help test this on different GPUs.

@bagnell
Copy link
Contributor Author

bagnell commented Oct 29, 2014

@pjcozzi The code changes are in. This is good to go for testing.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

This test fails in Chrome and Firefox on my Mac with NVIDIA:

image

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

Also, can we carefully test picking? Are we ever at risk for the pick id to be off by one due to the compression?

@bagnell
Copy link
Contributor Author

bagnell commented Oct 29, 2014

I didn't compress the color or pick color, just packed them. I don't expect there to be any problems with picking.

I didn't notice the test failure. I'll take a look.

@bagnell
Copy link
Contributor Author

bagnell commented Oct 29, 2014

Billboards worked fine on the Dell and Nexus 7 tablets.

I didn't see any change in the FPS. For 1K billboards, we saved a few hundred KB. For 10K billboards, we saved ~30MB. For 30K billboards, we saved ~40MB.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

Except for that test failure, things are also good on my MacBook Pro in Chrome and Firefox and with NVIDIA and Intel.

@mramato can you try this branch with the Mac Mini / AMD?

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

For 10K billboards, we saved 30 MB? Is that (30 * 1024 * 1024) / (10 * 1024) = 3,072 bytes per billboard? I would have expected roughly like 6 * sizeof(vec4) * 4 = 384 bytes (assuming a savings of 6 attributes and each billboard is 4 vertices). Memory allocator overhead and the such?

@mramato
Copy link
Contributor

mramato commented Oct 29, 2014

Mini looks good.

@bagnell
Copy link
Contributor Author

bagnell commented Oct 29, 2014

Right, for 30K billboards, I'm expecting ~7MB saved, but I'm getting at least ~30MB consistently according to the Chrome task manager.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

@bagnell anything else you want to do here? I still want to test on a few more machines.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 29, 2014

Ah, the test failure.

@bagnell
Copy link
Contributor Author

bagnell commented Oct 29, 2014

@pjcozzi This is ready for another look.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 30, 2014

For the performance numbers above, did you test with labels or billboards? The numbers would make more sense for labels.

Did you tested on IE 11? I'll also test on Windows/NVIDIA tonight with all browsers.

Can we also do a dynamic performance: perhaps 15K billboards with their positions changing each frame? We introduced some CPU overhead but we also reduced it a bit in the browser/driver and reduced the memory bandwidth. Perhaps we should also try a performance test where we modify a property that is not packed and compressed along with several others, which I would expect to be a bit slower, which should be fine.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 30, 2014

Looks good on Windows Chrome, Firefox, and IE 11.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 30, 2014

Here's a better idea for the dynamic performance test: instead of just changing position, change all properties.

@pjcozzi
Copy link
Contributor

pjcozzi commented Oct 31, 2014

Tests are good now on Mac. Let's just do a sanity check for the dynamic performance and then we're good I think. @bagnell anything else?

@bagnell
Copy link
Contributor Author

bagnell commented Oct 31, 2014

For changing all of the properties of 15K billboards per frame, this branch runs at 17 FPS and master runs at 11 FPS.

pjcozzi added a commit that referenced this pull request Oct 31, 2014
@pjcozzi pjcozzi merged commit a51837a into master Oct 31, 2014
@pjcozzi pjcozzi deleted the billboardCompression branch October 31, 2014 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants