New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed sf::Image::create #1166

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
6 participants
@Fytch
Contributor

Fytch commented Nov 11, 2016

Discussion thread: http://en.sfml-dev.org/forums/index.php?topic=20875.0
Basically, the issue with sf::Image::create was, that it would either
occupy space, it doesn't need, because std::vector::resize doesn't
actually shrink the vector, or reallocate in an inefficient way by
needlessly copying the whole old image over. Neither did it grant strong
exception safety guarantee because it changed the non-throwing members
(m_size) prior to doing the critical stuff (reallocating) which may
throw. Changing the order and using a temporary
(create-temporary-and-swap idiom; see http://www.gotw.ca/gotw/059.htm)
fixes both of these problems.

@LaurentGomila

Looks good

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Nov 11, 2016

Member

SFMLBuildMaster: Build this please.

Member

eXpl0it3r commented Nov 11, 2016

SFMLBuildMaster: Build this please.

@binary1248

std::vector::swap is not guaranteed to be noexcept until C++17. Moving the swap before the m_size modification doesn't provide a strong exception guarantee, but there is a better chance that the Image is still usable in some way if an exception is thrown.

Show outdated Hide outdated src/SFML/Graphics/Image.cpp Outdated
@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Nov 12, 2016

Member

My proposal would look something like this:

////////////////////////////////////////////////////////////
void Image::create(unsigned int width, unsigned int height, const Color& color)
{
    if (width && height)
    {
        // Create a new pixel buffer first for exception safety's sake
        std::vector<Uint8> newPixels(width * height * 4);

        // Fill it with the specified color
        Uint8* ptr = &newPixels[0];
        Uint8* end = ptr + newPixels.size();
        while (ptr < end)
        {
            *ptr++ = color.r;
            *ptr++ = color.g;
            *ptr++ = color.b;
            *ptr++ = color.a;
        }

        // Commit the new pixel buffer
        m_pixels.swap(newPixels);

        // Assign the new size
        m_size.x = width;
        m_size.y = height;
    }
    else
    {
        // Truncate the pixel buffer
        m_pixels.swap(std::vector<Uint8>());

        // Assign the new size
        m_size.x = 0;
        m_size.y = 0;
    }
}


////////////////////////////////////////////////////////////
void Image::create(unsigned int width, unsigned int height, const Uint8* pixels)
{
    if (pixels && width && height)
    {
        // Swap out the contents of the pixel buffer
        m_pixels.swap(std::vector<Uint8>(pixels, pixels + width * height * 4));

        // Assign the new size
        m_size.x = width;
        m_size.y = height;
    }
    else
    {
        // Truncate the pixel buffer
        m_pixels.swap(std::vector<Uint8>());

        // Assign the new size
        m_size.x = 0;
        m_size.y = 0;
    }
}
Member

binary1248 commented Nov 12, 2016

My proposal would look something like this:

////////////////////////////////////////////////////////////
void Image::create(unsigned int width, unsigned int height, const Color& color)
{
    if (width && height)
    {
        // Create a new pixel buffer first for exception safety's sake
        std::vector<Uint8> newPixels(width * height * 4);

        // Fill it with the specified color
        Uint8* ptr = &newPixels[0];
        Uint8* end = ptr + newPixels.size();
        while (ptr < end)
        {
            *ptr++ = color.r;
            *ptr++ = color.g;
            *ptr++ = color.b;
            *ptr++ = color.a;
        }

        // Commit the new pixel buffer
        m_pixels.swap(newPixels);

        // Assign the new size
        m_size.x = width;
        m_size.y = height;
    }
    else
    {
        // Truncate the pixel buffer
        m_pixels.swap(std::vector<Uint8>());

        // Assign the new size
        m_size.x = 0;
        m_size.y = 0;
    }
}


////////////////////////////////////////////////////////////
void Image::create(unsigned int width, unsigned int height, const Uint8* pixels)
{
    if (pixels && width && height)
    {
        // Swap out the contents of the pixel buffer
        m_pixels.swap(std::vector<Uint8>(pixels, pixels + width * height * 4));

        // Assign the new size
        m_size.x = width;
        m_size.y = height;
    }
    else
    {
        // Truncate the pixel buffer
        m_pixels.swap(std::vector<Uint8>());

        // Assign the new size
        m_size.x = 0;
        m_size.y = 0;
    }
}
@Fytch

This comment has been minimized.

Show comment
Hide comment
@Fytch

Fytch Nov 12, 2016

Contributor

@binary1248 Your proposal won't compile because swap requires an lvalue-reference while you're trying to invoke it with a temporary. Change that the other way round temp.swap(m_pixels); and it'll work. I agree with your changes regarding the order. I didn't know swap wasn't noexcept as of C++11, which is probably due to allocator intricacies.

Contributor

Fytch commented Nov 12, 2016

@binary1248 Your proposal won't compile because swap requires an lvalue-reference while you're trying to invoke it with a temporary. Change that the other way round temp.swap(m_pixels); and it'll work. I agree with your changes regarding the order. I didn't know swap wasn't noexcept as of C++11, which is probably due to allocator intricacies.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Nov 12, 2016

Member

The memcpy can be done as part of the initialization thus saving the internal memset

As mentioned in the previous comments, there was a reason for std::memset: it was much faster than vector::assign (which is most likely called by the equivalent constructor).

Member

LaurentGomila commented Nov 12, 2016

The memcpy can be done as part of the initialization thus saving the internal memset

As mentioned in the previous comments, there was a reason for std::memset: it was much faster than vector::assign (which is most likely called by the equivalent constructor).

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Nov 12, 2016

Member

@Fytch I knew there was a reason why the idiomatic way to shrink a vector pre-C++11 always swapped from the temporary... 🙄

@LaurentGomila I think you are mixing things up... memset is still called in the constructor now, as part of the filling with T() since only a size is passed. If T is a POD, the compiler is smart enough to do it all in one go with memset instead of looping over every element. If you initialize the vector directly from an array of PODs, the compiler is also smart enough to only write to the memory once, via the memcpy. A vector assign is slower whether or not the vector was initialized or not because it loops over every element, so that is not really what I was addressing. See what the compiler generates at -O3.

Member

binary1248 commented Nov 12, 2016

@Fytch I knew there was a reason why the idiomatic way to shrink a vector pre-C++11 always swapped from the temporary... 🙄

@LaurentGomila I think you are mixing things up... memset is still called in the constructor now, as part of the filling with T() since only a size is passed. If T is a POD, the compiler is smart enough to do it all in one go with memset instead of looping over every element. If you initialize the vector directly from an array of PODs, the compiler is also smart enough to only write to the memory once, via the memcpy. A vector assign is slower whether or not the vector was initialized or not because it loops over every element, so that is not really what I was addressing. See what the compiler generates at -O3.

@Fytch

This comment has been minimized.

Show comment
Hide comment
@Fytch

Fytch Nov 12, 2016

Contributor

As I am new to GitHub and collaborating on open source projects in general: How do I go about changing the code of this pull request without having to open a separate, second pull request? Should I just push another commit to my branch?

Contributor

Fytch commented Nov 12, 2016

As I am new to GitHub and collaborating on open source projects in general: How do I go about changing the code of this pull request without having to open a separate, second pull request? Should I just push another commit to my branch?

@victorlevasseur

This comment has been minimized.

Show comment
Hide comment
@victorlevasseur

victorlevasseur Nov 12, 2016

Contributor

Yes

Le sam. 12 nov. 2016 12:52, Fytch notifications@github.com a écrit :

As I am new to GitHub and collaborating on open source projects in
general: How do I go about changing the code of this pull request without
having to open a separate, second pull request? Should I just push another
commit to my branch?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1166 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA63oLyd0fmaESP184lgFhgPVbZtvT-Jks5q9ahzgaJpZM4Kvo4M
.

Contributor

victorlevasseur commented Nov 12, 2016

Yes

Le sam. 12 nov. 2016 12:52, Fytch notifications@github.com a écrit :

As I am new to GitHub and collaborating on open source projects in
general: How do I go about changing the code of this pull request without
having to open a separate, second pull request? Should I just push another
commit to my branch?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1166 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA63oLyd0fmaESP184lgFhgPVbZtvT-Jks5q9ahzgaJpZM4Kvo4M
.

@binary1248 binary1248 added this to the 2.4.2 milestone Nov 12, 2016

@binary1248 binary1248 added s:accepted and removed s:undecided labels Nov 12, 2016

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Nov 12, 2016

Member

A vector assign is slower whether or not the vector was initialized or not because it loops over every element

Why in this specific case the compiler would not be as smart as in the constructor, and not use std::memcpy as well?

Member

LaurentGomila commented Nov 12, 2016

A vector assign is slower whether or not the vector was initialized or not because it loops over every element

Why in this specific case the compiler would not be as smart as in the constructor, and not use std::memcpy as well?

@binary1248

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Nov 12, 2016

Member

It was Visual Studio (probably an old version at that time), with the standard options generated by CMake in Release configuration.

Member

LaurentGomila commented Nov 12, 2016

It was Visual Studio (probably an old version at that time), with the standard options generated by CMake in Release configuration.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Nov 12, 2016

Member

@Fytch Can you squash the commits into a single one?

Member

binary1248 commented Nov 12, 2016

@Fytch Can you squash the commits into a single one?

fixed sf::Image::create
Discussion thread: http://en.sfml-dev.org/forums/index.php?topic=20875.0
Basically, the issue with sf::Image::create was, that it would either
occupy space, it doesn't need, because std::vector::resize doesn't
actually shrink the vector, or reallocate in an inefficient way by
needlessly copying the whole old image over. Neither did it grant strong
exception safety guarantee because it changed the non-throwing members
(m_size) prior to doing the critical stuff (reallocating) which may
throw. Changing the order and using a temporary
(create-temporary-and-swap idiom; see http://www.gotw.ca/gotw/059.htm)
fixes both of these problems.
@Fytch

This comment has been minimized.

Show comment
Hide comment
@Fytch

Fytch Nov 12, 2016

Contributor

Done.

Contributor

Fytch commented Nov 12, 2016

Done.

@Fytch

This comment has been minimized.

Show comment
Hide comment
@Fytch

Fytch Nov 12, 2016

Contributor

On an unrelated note: std::vector<Uint8> turns up quite often in Image.cpp. Why not make it a typedef inside of sf::Image's class body in Image.hpp and use that typedef instead? It would live up to the DRY principle. The same goes for quite a lot of adapter classes.

Contributor

Fytch commented Nov 12, 2016

On an unrelated note: std::vector<Uint8> turns up quite often in Image.cpp. Why not make it a typedef inside of sf::Image's class body in Image.hpp and use that typedef instead? It would live up to the DRY principle. The same goes for quite a lot of adapter classes.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Nov 13, 2016

Member

I don't know what we would gain with an alias for std::vector<Uint8>. It's not long or hard to type, not likely to change in the future, and at least we know exactly what we deal with in the implementation.

Member

LaurentGomila commented Nov 13, 2016

I don't know what we would gain with an alias for std::vector<Uint8>. It's not long or hard to type, not likely to change in the future, and at least we know exactly what we deal with in the implementation.

@eXpl0it3r eXpl0it3r self-assigned this Nov 21, 2016

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Nov 21, 2016

Member

Probably not the best, but here some simple benchmarking code.

#include <SFML/Graphics.hpp>
#include <iostream>

int main()
{
    const unsigned int ITERATIONS = 1000;
    const unsigned int WIDTH = 1920;
    const unsigned int HEIGHT = 1080;

    // Create a new sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            sf::Image image;

            clock.restart();
            image.create(WIDTH, HEIGHT, sf::Color::Green);
            average += clock.getElapsedTime();
        }

        std::cout << "New: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }

    // Re-use of sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        sf::Image image;
        image.create(WIDTH, HEIGHT, sf::Color::Green);

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            clock.restart();
            image.create(WIDTH, HEIGHT, sf::Color::Green);
            average += clock.getElapsedTime();
        }

        std::cout << "Re-use: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }


    std::vector<sf::Uint8> vec;
    vec.reserve(WIDTH * HEIGHT * 4);
    for(std::size_t i = 0; i < vec.size(); i += 4)
    {
        vec.push_back(0x00);
        vec.push_back(0xFF);
        vec.push_back(0x00);
        vec.push_back(0xFF);
    }

    // Create a new sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            sf::Image image;

            clock.restart();
            image.create(WIDTH, HEIGHT, vec.data());
            average += clock.getElapsedTime();
        }

        std::cout << "Array New: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }

    // Re-use of sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        sf::Image image;
        image.create(WIDTH, HEIGHT, vec.data());

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            clock.restart();
            image.create(WIDTH, HEIGHT, vec.data());
            average += clock.getElapsedTime();
        }

        std::cout << "Array Re-use: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }
}

I can't really see a big difference using -O3 on an i5-4460 @ 3.2 GHz.

master

Type Average Time (± 200μs)
Color New 3488.05μs
Color Re-use 979.852μs
Array New 3660.14μs
Array Re-use 979.725μs

this pr

Type Average Time (± 200μs)
Color New 3591.06μs
Color Re-use 955.014μs
Array New 3721.27μs
Array Re-use 994.17μs

But since this isn't purely about performance, but also about exception safety and given that theory behind it makes more sense, it's seem ready to be merged. 😄

(For me personally it showed once again that spending time on (micro) optimizations is not necessarily worth it.)

This PR has been added to my merge list, meaning it will be merged soon, unless someone raises any concerns.

Member

eXpl0it3r commented Nov 21, 2016

Probably not the best, but here some simple benchmarking code.

#include <SFML/Graphics.hpp>
#include <iostream>

int main()
{
    const unsigned int ITERATIONS = 1000;
    const unsigned int WIDTH = 1920;
    const unsigned int HEIGHT = 1080;

    // Create a new sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            sf::Image image;

            clock.restart();
            image.create(WIDTH, HEIGHT, sf::Color::Green);
            average += clock.getElapsedTime();
        }

        std::cout << "New: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }

    // Re-use of sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        sf::Image image;
        image.create(WIDTH, HEIGHT, sf::Color::Green);

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            clock.restart();
            image.create(WIDTH, HEIGHT, sf::Color::Green);
            average += clock.getElapsedTime();
        }

        std::cout << "Re-use: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }


    std::vector<sf::Uint8> vec;
    vec.reserve(WIDTH * HEIGHT * 4);
    for(std::size_t i = 0; i < vec.size(); i += 4)
    {
        vec.push_back(0x00);
        vec.push_back(0xFF);
        vec.push_back(0x00);
        vec.push_back(0xFF);
    }

    // Create a new sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            sf::Image image;

            clock.restart();
            image.create(WIDTH, HEIGHT, vec.data());
            average += clock.getElapsedTime();
        }

        std::cout << "Array New: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }

    // Re-use of sf::Image
    {
        sf::Clock clock;
        sf::Time average;

        sf::Image image;
        image.create(WIDTH, HEIGHT, vec.data());

        for (unsigned int i = 0; i < ITERATIONS; ++i)
        {
            clock.restart();
            image.create(WIDTH, HEIGHT, vec.data());
            average += clock.getElapsedTime();
        }

        std::cout << "Array Re-use: " << (static_cast<float>(average.asMicroseconds()) / static_cast<float>(ITERATIONS)) << "\n";
    }
}

I can't really see a big difference using -O3 on an i5-4460 @ 3.2 GHz.

master

Type Average Time (± 200μs)
Color New 3488.05μs
Color Re-use 979.852μs
Array New 3660.14μs
Array Re-use 979.725μs

this pr

Type Average Time (± 200μs)
Color New 3591.06μs
Color Re-use 955.014μs
Array New 3721.27μs
Array Re-use 994.17μs

But since this isn't purely about performance, but also about exception safety and given that theory behind it makes more sense, it's seem ready to be merged. 😄

(For me personally it showed once again that spending time on (micro) optimizations is not necessarily worth it.)

This PR has been added to my merge list, meaning it will be merged soon, unless someone raises any concerns.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Nov 21, 2016

Member

Your "array re-use" test is using a fixed color 😛

For me personally it showed once again that spending time on (micro) optimizations is not necessarily worth it

It was not a micro-optimization, I can tell you the difference was much bigger than in your test. But well, that was a long time ago and in a specific environment. Let's move on.

Member

LaurentGomila commented Nov 21, 2016

Your "array re-use" test is using a fixed color 😛

For me personally it showed once again that spending time on (micro) optimizations is not necessarily worth it

It was not a micro-optimization, I can tell you the difference was much bigger than in your test. But well, that was a long time ago and in a specific environment. Let's move on.

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Nov 21, 2016

Member

Copy-paste error, when prettying it up for posting here. Updated code and results.

Ah, also forgot to mention that the values can jump between multiple hundred μs, so results are sometimes lower sometimes higher. I'd say ± 200μs.

Member

eXpl0it3r commented Nov 21, 2016

Copy-paste error, when prettying it up for posting here. Updated code and results.

Ah, also forgot to mention that the values can jump between multiple hundred μs, so results are sometimes lower sometimes higher. I'd say ± 200μs.

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Nov 22, 2016

Member

Merged in 70e3c35 on branch 2.4.x

Member

eXpl0it3r commented Nov 22, 2016

Merged in 70e3c35 on branch 2.4.x

@eXpl0it3r eXpl0it3r closed this Nov 22, 2016

@Foaly

This comment has been minimized.

Show comment
Hide comment
@Foaly

Foaly Feb 9, 2017

Contributor

Hello!
I know I might be a little late, as this is already merged and all, but looking over this code the other day, I wondered why raw pointers are used in the void Image::create(unsigned int width, unsigned int height, const Color& color) method line 73.

I did a quick test. Turns out the endianness is a bit of a pain in the ass, but other than that using iterators and memcpy work great. I haven't benchmarked if my code is faster.

What do you think?

for (std::vector<Uint8>::iterator iter = newPixels.begin(); iter != newPixels.end(); iter += 4)
{
    Uint32 c = color.toInteger();
    c = (c >> 24) | ((c & 0x00ff0000) >> 8) | ((c & 0x0000ff00) << 8) | (c << 24); // reverse byte order

    std::memcpy(&(*iter), &c, sizeof(c));
}

For the sake of readability I also wondered why

size.x = width;
size.y = height;

was used instead of size = sf::Vector2u(width, height);.

Contributor

Foaly commented Feb 9, 2017

Hello!
I know I might be a little late, as this is already merged and all, but looking over this code the other day, I wondered why raw pointers are used in the void Image::create(unsigned int width, unsigned int height, const Color& color) method line 73.

I did a quick test. Turns out the endianness is a bit of a pain in the ass, but other than that using iterators and memcpy work great. I haven't benchmarked if my code is faster.

What do you think?

for (std::vector<Uint8>::iterator iter = newPixels.begin(); iter != newPixels.end(); iter += 4)
{
    Uint32 c = color.toInteger();
    c = (c >> 24) | ((c & 0x00ff0000) >> 8) | ((c & 0x0000ff00) << 8) | (c << 24); // reverse byte order

    std::memcpy(&(*iter), &c, sizeof(c));
}

For the sake of readability I also wondered why

size.x = width;
size.y = height;

was used instead of size = sf::Vector2u(width, height);.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Feb 9, 2017

Member

Turns out the endianness is a bit of a pain in the ass

... and is not correctly handled in your code. You can't interpret a 32-bits variable as 4 separate bytes and get the correct order on both little and big endian. There will always be one that will be reversed.

using iterators and memcpy work great

Except that it looks much more complicated? What's wrong with the original code?

size = sf::Vector2u(width, height)

Why not. Both are ok for me.

Member

LaurentGomila commented Feb 9, 2017

Turns out the endianness is a bit of a pain in the ass

... and is not correctly handled in your code. You can't interpret a 32-bits variable as 4 separate bytes and get the correct order on both little and big endian. There will always be one that will be reversed.

using iterators and memcpy work great

Except that it looks much more complicated? What's wrong with the original code?

size = sf::Vector2u(width, height)

Why not. Both are ok for me.

@Foaly

This comment has been minimized.

Show comment
Hide comment
@Foaly

Foaly Feb 21, 2017

Contributor

I was trying to write code that is a bit more generic and efficient. But if the endianness is a problem I guess it can stay the way it is.

Contributor

Foaly commented Feb 21, 2017

I was trying to write code that is a bit more generic and efficient. But if the endianness is a problem I guess it can stay the way it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment