Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU local texture data copying #1119

Merged
merged 1 commit into from Mar 7, 2017
Merged

GPU local texture data copying #1119

merged 1 commit into from Mar 7, 2017

Conversation

binary1248
Copy link
Member

In many applications, it is common for the user to stitch smaller textures into bigger ones in order to minimize the amount of OpenGL state changes necessary when drawing. This is commonly known as atlasing. Even sf::Font makes use of this when loading its glyphs.

The problem with atlasing in SFML is that each time a new texture has to be inserted into the atlas, users would have to copy the texture data back to RAM, resize/update it and transfer it back on to the GPU. This costs a lot of time. With this PR, the user (given the proper hardware support) will be able to copy the existing texture to a temporary texture on the GPU, resize the original texture to its new size and copy the old texture data over. This does not involve any copying of data between the GPU and CPU.

#include <SFML/Graphics.hpp>

int main()
{
    sf::Font font;
    if (!font.loadFromFile("resources/sansation.ttf"))
        return -1;

    for (unsigned int i = 32; i < 127; i++)
        font.getGlyph(i, 1000, false, 0);
}

Testing with this code yields 2 orders of magnitude performance increase. On my system, the time it takes for the font to load the codepoints decreases from 20 seconds down to under 1 second.

This is only indirectly testing the texture copying and is a contrived example, since it is meant to be a minimal demonstration. If anybody else has texture copy benchmarks, they can provide the data here as well.

In addition to this feature, I went ahead and optimized sf::Font::cleanup() to deallocate its pixel buffer storage. This way, if the user loads glyphs using a huge character size and later reuses the same sf::Font object for another font at smaller character sizes, they won't be stuck with a std::vector using up more memory than necessary.

return;
}

#endif // SFML_OPENGL_ES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this supposed to be an #else to handle the fallback just below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ho... I just noticed the return above and it's implication with the if (GLEXT_framebuffer_object && GLEXT_framebuffer_blit)... a bit spaghetti. 😜

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of any other way of writing it so that even on non-ES systems it would fall through to the CPU copy if the extension isn't available...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with OpenGL ES enough to know if the following would compile... but it could be more readable.

#ifdef SFML_OPENGL_ES
    bool fallback = true;
#else
    ensureGlContext();
    priv::ensureExtensionsInit();
    bool fallback = !(GLEXT_framebuffer_object && GLEXT_framebuffer_blit);
#endif

if (fallback) update(texture.copyToImage(), x, y);
else {
 // the big part here
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ask me, this makes the code harder to reason about than the current version.

@mantognini
Copy link
Member

Interesting, I'll test this code in a bit to see how it affect perfs on my machine! :-)

GLenum status;
glCheck(status = GLEXT_glCheckFramebufferStatus(GLEXT_GL_FRAMEBUFFER));

if (status != GLEXT_GL_FRAMEBUFFER_COMPLETE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more concise to refactor as follows:

if (status == GLEXT_GL_FRAMEBUFFER_COMPLETE)
    GLEXT_glBlitFramebuffer(...)
else
    err() << ...

cleanup / return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@mantognini
Copy link
Member

Best performance of several runs, without this patch: 1.67 real 0.60 user 0.33 sys
Worst performance of several runs, with this patch: 1.12 real 0.14 user 0.08 sys

👍

@@ -583,11 +583,14 @@ Glyph Font::loadGlyph(Uint32 codePoint, unsigned int characterSize, bool bold, f
// pollute them with pixels from neighbors
const unsigned int padding = 1;

width += 2 * padding;
height += 2 * padding;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry I was away the whole week, I try to catch up)

What's the reason for adding 2 * padding to width and height here, and then constantly subtracting it everywhere width and height are used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block of memory we work with now includes the border padding (which it didn't before). Instead of initializing the texture to the padding colour when we create it (which might waste time when the texture gets really big), we do it every time a glyph would actually need the padding around it and upload it then.

@eXpl0it3r
Copy link
Member

This PR needs a rebase.

unsigned int y = glyph.textureRect.top - padding;
unsigned int w = glyph.textureRect.width + 2 * padding;
unsigned int h = glyph.textureRect.height + 2 * padding;
page.texture.update(reinterpret_cast<const Uint8*>(&m_pixelBuffer[0]), w, h, x, y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unsafe, since it depends on the processor endianness. On little endian machines you'll end up with RGBA byte ordering and on big endian you'll get ABGR.

Why did you change the pixel buffer from uint8 to uint32?

Copy link
Member Author

@binary1248 binary1248 Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that because the unused area of m_pixelBuffer should contain transparent white pixels, it could be done once during initialization instead of every time a glyph is added. This scales better with bigger glyphs and the more glyphs are added.

The only reason I changed it to Uint32 is because that way initialization would only consist of a single line (the assign). Now that I think of it, since a single memset wouldn't be able to be used internally either (it can only assign a single byte value to a range), the assign would loop over each T anyway. I'll probably revert the storage back to Uint8 and loop initialize using a Uint8 array. If the optimizer isn't too dumb, it should end up producing pretty similar results.

Some tests: https://godbolt.org/g/o4OGt5

page.texture.loadFromImage(newImage);
Texture oldTexture(page.texture);
page.texture.create(textureWidth * 2, textureHeight * 2);
page.texture.update(oldTexture);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be even more efficient to create a fresh new texture with the bigger size, copy the old content to it, and swap (to be implemented) them? We already have the implementation for a swap function in operator=. So that would just be some refactoring actually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. 😁

@eXpl0it3r
Copy link
Member

Does this need additional changes or have all the mentioned points been cleared up?

@binary1248
Copy link
Member Author

At the moment this is not compatible with the context changes i.e. it is broken. Will need to rebase/re-implement/re-test once the context stuff is merged.

@binary1248
Copy link
Member Author

Rebased onto master, addressed the raised issues and fixed a few bugs encountered during testing.

unsigned int y = glyph.textureRect.top - padding;
unsigned int w = glyph.textureRect.width + 2 * padding;
unsigned int h = glyph.textureRect.height + 2 * padding;
page.texture.update(reinterpret_cast<const Uint8*>(&m_pixelBuffer[0]), w, h, x, y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you forgot to remove the reinterpret_cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

… roundtrip to the CPU and back, add sf::Texture::swap to enable swapping texture contents, fixed sf::Font::cleanup not shrinking its allocated pixel buffer storage when the user loads a new font using the same sf::Font object.
@eXpl0it3r
Copy link
Member

This seems to work nicely, even though I don't fully understand all the changes, I still approved it, since nobody else has bothered to do so far. 😉

Here's some code I used to test things a bit:

#include <SFML/Graphics.hpp>

int main()
{
    const int gameWidth = 800;
    const int gameHeight = 600;

    sf::RenderWindow window(sf::VideoMode(gameWidth, gameHeight), "SFML Test");
    window.setVerticalSyncEnabled(true);

    sf::Texture tex1;
    tex1.loadFromFile("200.png");
    sf::Sprite spr1(tex1);

    sf::Texture tex2;
    tex2.create(200, 200);
    sf::Sprite spr2(tex2);
    spr2.setPosition({300, 0});

    sf::RenderTexture rt;
    rt.create(200, 200);
    rt.clear(sf::Color::Green);
    rt.display();

    while (window.isOpen())
    {
        sf::Event event;
        while (window.pollEvent(event))
        {
            if (event.type == sf::Event::Closed)
            {
                window.close();
                break;
            }

            if (event.type == sf::Event::KeyPressed)
            {
                if (event.key.code == sf::Keyboard::Space)
                    tex1.swap(tex2);
                else if (event.key.code == sf::Keyboard::Return)
                    tex1.update(rt.getTexture());
            }

        }

        window.clear();
        window.draw(spr1);
        window.draw(spr2);
        window.display();
    }
}

@eXpl0it3r eXpl0it3r moved this from Review & Testing to Ready in SFML 2.5.0 Mar 2, 2017
@eXpl0it3r eXpl0it3r merged commit 6b71456 into master Mar 7, 2017
@eXpl0it3r eXpl0it3r deleted the feature/texture_copy branch March 7, 2017 13:58
@eXpl0it3r eXpl0it3r moved this from Ready to Merged in SFML 2.5.0 Mar 7, 2017
@texus
Copy link
Contributor

texus commented Mar 8, 2017

Since this commit was merged the following code now crashes/asserts when copying a texture.

#include <SFML/Graphics.hpp>

int main()
{
    sf::Texture texture;
    texture.loadFromFile("image.jpg");

    sf::Texture texture2 = texture;
}

Output:

SFML/src/SFML/Graphics/Texture.cpp:450: void sf::Texture::update(const sf::Texture&, unsigned int, unsigned int): Assertion `y + texture.m_size.x <= m_size.y' failed.

@eXpl0it3r
Copy link
Member

Thanks for letting us know, I'll see to get this fixed asap.

@LaurentGomila
Copy link
Member

The fix is trivial, it's just a typo (should be texture.m_size.y in the assert condition).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
SFML 2.5.0
  
Merged / Superseded
Development

Successfully merging this pull request may close these issues.

None yet

6 participants