New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU local texture data copying #1119
Conversation
return; | ||
} | ||
|
||
#endif // SFML_OPENGL_ES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this supposed to be an #else
to handle the fallback just below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ho... I just noticed the return
above and it's implication with the if (GLEXT_framebuffer_object && GLEXT_framebuffer_blit)
... a bit spaghetti. 😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't think of any other way of writing it so that even on non-ES systems it would fall through to the CPU copy if the extension isn't available...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with OpenGL ES enough to know if the following would compile... but it could be more readable.
#ifdef SFML_OPENGL_ES
bool fallback = true;
#else
ensureGlContext();
priv::ensureExtensionsInit();
bool fallback = !(GLEXT_framebuffer_object && GLEXT_framebuffer_blit);
#endif
if (fallback) update(texture.copyToImage(), x, y);
else {
// the big part here
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you ask me, this makes the code harder to reason about than the current version.
Interesting, I'll test this code in a bit to see how it affect perfs on my machine! :-) |
src/SFML/Graphics/Texture.cpp
Outdated
GLenum status; | ||
glCheck(status = GLEXT_glCheckFramebufferStatus(GLEXT_GL_FRAMEBUFFER)); | ||
|
||
if (status != GLEXT_GL_FRAMEBUFFER_COMPLETE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be more concise to refactor as follows:
if (status == GLEXT_GL_FRAMEBUFFER_COMPLETE)
GLEXT_glBlitFramebuffer(...)
else
err() << ...
cleanup / return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Best performance of several runs, without this patch: 👍 |
8904853
to
5cb3895
Compare
@@ -583,11 +583,14 @@ Glyph Font::loadGlyph(Uint32 codePoint, unsigned int characterSize, bool bold, f | |||
// pollute them with pixels from neighbors | |||
const unsigned int padding = 1; | |||
|
|||
width += 2 * padding; | |||
height += 2 * padding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorry I was away the whole week, I try to catch up)
What's the reason for adding 2 * padding to width and height here, and then constantly subtracting it everywhere width and height are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block of memory we work with now includes the border padding (which it didn't before). Instead of initializing the texture to the padding colour when we create it (which might waste time when the texture gets really big), we do it every time a glyph would actually need the padding around it and upload it then.
5cb3895
to
bfdd834
Compare
This PR needs a rebase. |
src/SFML/Graphics/Font.cpp
Outdated
unsigned int y = glyph.textureRect.top - padding; | ||
unsigned int w = glyph.textureRect.width + 2 * padding; | ||
unsigned int h = glyph.textureRect.height + 2 * padding; | ||
page.texture.update(reinterpret_cast<const Uint8*>(&m_pixelBuffer[0]), w, h, x, y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unsafe, since it depends on the processor endianness. On little endian machines you'll end up with RGBA byte ordering and on big endian you'll get ABGR.
Why did you change the pixel buffer from uint8 to uint32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was that because the unused area of m_pixelBuffer should contain transparent white pixels, it could be done once during initialization instead of every time a glyph is added. This scales better with bigger glyphs and the more glyphs are added.
The only reason I changed it to Uint32 is because that way initialization would only consist of a single line (the assign). Now that I think of it, since a single memset wouldn't be able to be used internally either (it can only assign a single byte value to a range), the assign would loop over each T anyway. I'll probably revert the storage back to Uint8 and loop initialize using a Uint8 array. If the optimizer isn't too dumb, it should end up producing pretty similar results.
Some tests: https://godbolt.org/g/o4OGt5
src/SFML/Graphics/Font.cpp
Outdated
page.texture.loadFromImage(newImage); | ||
Texture oldTexture(page.texture); | ||
page.texture.create(textureWidth * 2, textureHeight * 2); | ||
page.texture.update(oldTexture); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be even more efficient to create a fresh new texture with the bigger size, copy the old content to it, and swap (to be implemented) them? We already have the implementation for a swap function in operator=. So that would just be some refactoring actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. 😁
Does this need additional changes or have all the mentioned points been cleared up? |
At the moment this is not compatible with the context changes i.e. it is broken. Will need to rebase/re-implement/re-test once the context stuff is merged. |
bfdd834
to
b6bd0af
Compare
Rebased onto master, addressed the raised issues and fixed a few bugs encountered during testing. |
src/SFML/Graphics/Font.cpp
Outdated
unsigned int y = glyph.textureRect.top - padding; | ||
unsigned int w = glyph.textureRect.width + 2 * padding; | ||
unsigned int h = glyph.textureRect.height + 2 * padding; | ||
page.texture.update(reinterpret_cast<const Uint8*>(&m_pixelBuffer[0]), w, h, x, y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you forgot to remove the reinterpret_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
b6bd0af
to
ed00f1c
Compare
… roundtrip to the CPU and back, add sf::Texture::swap to enable swapping texture contents, fixed sf::Font::cleanup not shrinking its allocated pixel buffer storage when the user loads a new font using the same sf::Font object.
This seems to work nicely, even though I don't fully understand all the changes, I still approved it, since nobody else has bothered to do so far. 😉 Here's some code I used to test things a bit: #include <SFML/Graphics.hpp>
int main()
{
const int gameWidth = 800;
const int gameHeight = 600;
sf::RenderWindow window(sf::VideoMode(gameWidth, gameHeight), "SFML Test");
window.setVerticalSyncEnabled(true);
sf::Texture tex1;
tex1.loadFromFile("200.png");
sf::Sprite spr1(tex1);
sf::Texture tex2;
tex2.create(200, 200);
sf::Sprite spr2(tex2);
spr2.setPosition({300, 0});
sf::RenderTexture rt;
rt.create(200, 200);
rt.clear(sf::Color::Green);
rt.display();
while (window.isOpen())
{
sf::Event event;
while (window.pollEvent(event))
{
if (event.type == sf::Event::Closed)
{
window.close();
break;
}
if (event.type == sf::Event::KeyPressed)
{
if (event.key.code == sf::Keyboard::Space)
tex1.swap(tex2);
else if (event.key.code == sf::Keyboard::Return)
tex1.update(rt.getTexture());
}
}
window.clear();
window.draw(spr1);
window.draw(spr2);
window.display();
}
} |
ed00f1c
to
6b71456
Compare
Since this commit was merged the following code now crashes/asserts when copying a texture. #include <SFML/Graphics.hpp>
int main()
{
sf::Texture texture;
texture.loadFromFile("image.jpg");
sf::Texture texture2 = texture;
} Output:
|
Thanks for letting us know, I'll see to get this fixed asap. |
The fix is trivial, it's just a typo (should be |
In many applications, it is common for the user to stitch smaller textures into bigger ones in order to minimize the amount of OpenGL state changes necessary when drawing. This is commonly known as atlasing. Even
sf::Font
makes use of this when loading its glyphs.The problem with atlasing in SFML is that each time a new texture has to be inserted into the atlas, users would have to copy the texture data back to RAM, resize/update it and transfer it back on to the GPU. This costs a lot of time. With this PR, the user (given the proper hardware support) will be able to copy the existing texture to a temporary texture on the GPU, resize the original texture to its new size and copy the old texture data over. This does not involve any copying of data between the GPU and CPU.
Testing with this code yields 2 orders of magnitude performance increase. On my system, the time it takes for the font to load the codepoints decreases from 20 seconds down to under 1 second.
This is only indirectly testing the texture copying and is a contrived example, since it is meant to be a minimal demonstration. If anybody else has texture copy benchmarks, they can provide the data here as well.
In addition to this feature, I went ahead and optimized
sf::Font::cleanup()
to deallocate its pixel buffer storage. This way, if the user loads glyphs using a huge character size and later reuses the samesf::Font
object for another font at smaller character sizes, they won't be stuck with astd::vector
using up more memory than necessary.