Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msvc build: adding new metadata to a TIFF file is extremely slow #200

Closed
peterbud opened this issue Dec 25, 2017 · 18 comments
Closed

msvc build: adding new metadata to a TIFF file is extremely slow #200

peterbud opened this issue Dec 25, 2017 · 18 comments
Assignees

Comments

@peterbud
Copy link

Repro:

  • Download Windows build of exiv2
  • Extract exiv2-0.26-msvc.tar.gz\exiv2-0.26-msvc.tar\dist\2015\x64\dll\Release\bin to a folder
  • Download sample.tif from here (93 MB)
  • Open a command prompt in that folder
  • use the following command: exiv2 "-Mset Exif.Photo.UserComment hello world I am a nice big string" sample.tif

Result: it takes ~60 seconds to add this metadata to the tif file
Expected result: it should take max 1-2 secs

@tbeu tbeu changed the title exiv2 Windows build: adding a new metedata to a TIFF file is extremely slow exiv2 Windows build: adding a new metadata to a TIFF file is extremely slow Dec 25, 2017
@peterbud
Copy link
Author

One more comment I have tested with the same file on Ubuntu, the above command completes in less than a second.

@clanmills clanmills self-assigned this Dec 31, 2017
@clanmills
Copy link
Collaborator

I don't know what's the matter here. On an earlier version of Exiv2 on my laptop (built June 2016), this command executes in about 1 second. I haven't reproduced your performance report using the v0.26 msvc/2015/x64/dll build (although I believe you). I built msvc/2015/x64/dll from 'master' and got the message:

763 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/build $ bin/exiv2 ~/Downloads/sample.tif 
Reading TIFF file /Users/rmills/Downloads/sample.tif
Exiv2 exception in print action for file /Users/rmills/Downloads/sample.tif:
tiff directory length is too large
764 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/build $ 

This is coming from the FUJIFILM makernote. I get this message on all platforms on 'master'.

I have more work to discover what's going on here and hope to give this more attention next week.

@clanmills clanmills changed the title exiv2 Windows build: adding a new metadata to a TIFF file is extremely slow msvc build: adding new metadata to a TIFF file is extremely slow Dec 31, 2017
@clanmills
Copy link
Collaborator

I know what's causing the message on master "tiff directory length is too large" on all platforms. PR#180 deals with that matter and hasn't been integrated yet. I'll speak with @D4N about that.

I've build 'master' + PR#180 locally and reproduced your results. On MacOS-X, you command runs in 0.5 seconds and about 50 seconds on msvc/2015/x64/dll.

This is a very surprising find. In June 2016, it was running in about 2 seconds. I'll debug this next week and give you an update.

@peterbud
Copy link
Author

Thanks Robin in advance. Happy New Year!

@peterbud
Copy link
Author

peterbud commented Jan 9, 2018

Is there any update on this issue?

@D4N
Copy link
Member

D4N commented Jan 9, 2018

Not from my side, sorry. I don't own a Windows box. If I'll have some time for this in the next days, I'll try to run the binary with wine. That could tell me if it's a slow Windows syscall or if the windows build is oddly broken.

Since I have never developed anything on Windows: could you run exiv2 with a profiler or with something comparable to strace, if something like that exists?

Concerning #180, that will take a long time until we can merge that, as it requires a lot of grind-work.

@clanmills
Copy link
Collaborator

@D4N Leave this to me. I'll deal with it.

@peterbud I simply haven't got round to investigating this yet.

@clanmills
Copy link
Collaborator

clanmills commented Jan 9, 2018

I have investigated this and the issue was introduced with svn://dev.exiv2.org/svn/trunk --revision 4717. That change removed the API BasicIo::temporary() from basicio.hpp. The code using that API was changed to use an in-memory file. This was done to boost performance. Regrettably, it has resulted in a performance hit on msvc builds (and probably cygwin and mingw builds).

I don't know without further investigation why this change should have caused this performance hit on msvc builds. I would like to investigate cygwin and mingw windows builds to see if they are impacted. More work is necessary before I can offer a patch or work-around.

@peterbud
Copy link
Author

peterbud commented Jan 9, 2018

Hi Robin, I quickly checked that particular commit, and I have found something which can probably help you moving forward.
In the change in action.cpp I see that for MSVC and MINGW build a CRITICAL_SECTION is being used.

However as I see that CRITICAL_SECTION is never initialized by InitializeCriticalSection(). I can imagine that means that therefore EnterCriticalSection() waits until the system defined timeout, and that causes the performance hit.
Please check out the documentation of EnterCriticalSection(). I see it there:

Before using a critical section, some thread of the process must call InitializeCriticalSection or InitializeCriticalSectionAndSpinCount to initialize the object.

Maybe it is worth to check this first?

And one more answer: the mingw build is showing the same performance problem, actually that was the first where we have noticed the problem, and then did a repro with the MSVC standard official build.

@clanmills
Copy link
Collaborator

clanmills commented Jan 9, 2018

Wonderful, Peter. Are you in Australia? sample.tif is full of gum trees! I'll investigate that in the morning. I'm about to go to bed. It's 22:00 in England.

I hope it's something simple like that as I really don't want to re-instate the API BasicIo::temporary(). I think another reason to remove BasicIo::temporary() was because of reports of temporary files being left on disk (although I never totally discovered why).

I think there's an official MINGW and Cygwin build. I'd be very appreciative if you have investigate the Cygwin build.

actions.cpp is part of the application code (exiv2.exe) and not part of the library. I'm surprised there's a CS in there.

I'll look at this on Wednesday morning. Thanks very much for working with me on this.

@clanmills
Copy link
Collaborator

I've looked at the CS in actions.cpp. That code is only used by the application exiv2(.exe) and isn't involved in your performance issue. You are correct that we should call InitializeCriticalSection() before using that CS. I'll do more work on the performance issue this evening.

@clanmills
Copy link
Collaborator

I know what's wrong and have a simple work-around. The MemIo object allocates buffer space in blocks of 32kb. By increasing that to 2mb, the performance issue goes away! The fix is in src/basicio.cpp

    void MemIo::Impl::reserve(long wcount)
    {
        long need = wcount + idx_;
        long blockSize = 2*1024*1024;   // 32768           `

        if (!isMalloced_) {
            // Minimum size for 1st block
            long size  = EXV_MAX(blockSize * (1 + need / blockSize), size_);
            byte* data = (byte*)std::malloc(size);
            std::memcpy(data, data_, size_);
            data_ = data;
            sizeAlloced_ = size;
            isMalloced_ = true;
        }

        if (need > size_) {
            if (need > sizeAlloced_) {
                // Allocate in blocks
                long want      = blockSize * (1 + need / blockSize );
                data_ = (byte*)std::realloc(data_, want);
                sizeAlloced_ = want;
                isMalloced_ = true;
            }
            size_ = need;
        }
    }

Discussion

When the MemIo object (an in-memory file stream) is increased in size, it allocates memory in blocks of 32kb using std::realloc(). std::realloc() may allocate and copy if the underlying block isn't big enough. The existing and arbitrary blockSize of 32kb causes "thrashing". I have modified this to 2mb.

There is no API in v0.26 to to enable the user to set blockSize.

I don't think increasing MemIo blockSize to 2mb will cause excessive memory use by the library. Clearly there's a tradeoff. When blockSize is increased you gain performance at the expense of more memory. If MemIo objects are used for small in-memory files, you are allocating lots of memory for small files.

The previous code (before r4717) used temporaryFile(). This was a temporary file on local storage. Occasionally (as reported in Redmine), when an application crashed, temporary files could be left on disk. Using MemIo (instead of temporaryFile()) is designed to improve performance and avoid temporary "orphaned" files being left on local storage. Because temporaryFile() created a file on local storage, the std::realloc()/thrashing issue cannot appear.

Your feedback and thoughts about this work-around are welcome.

@peterbud
Copy link
Author

Hi Robin, I cannot judge whether this is the right design decision, but I have tested this and this is indeed solving the performance problem at hand. I believe today image files tend to be in the range of 5-30 MB easily, TIFF files even bigger (see this 93MB sample), and as far as I know malloc() can happily allocate big chunk of memory in one piece assuming you have memory, so I'm not sure how much these step-wise chunking is needed, but I'd leave it to you and the team to discuss what is the right strategy .

Thank you Robin once again.

@clanmills
Copy link
Collaborator

clanmills commented Jan 10, 2018

I think the MemIo object was added to Exiv2 many years ago to deal with small file streams. The metadata in JPG and PNG is formatted as a small TIFF file and embedded in the JPG or PNG. The blockSize of 32kb is probably appropriate for that. In a JPG, the embedded TIFF cannot exceed 64k (because of the architecture of the JPG). To write 100mb into a MemIo object has challenged the code in an unanticipated way.

I've thought about how to initialise the MemIo object with a small block and get the blockSize to increase rapidly as the stream grows. I intend to test (and instrument) the code below with our test suite. If all goes well (and I always expect everything to go well), this code will be added to Exiv2 'master' next week.

Be closing this issue, you've expressed your satisfaction with the workaround. I'll reopen the case until the code has been submitted to 'master'.

There is the secondary discovery that the CS is not being correctly initialised in actions.cpp/temporaryFile() which is used by the exiv2(.exe) command-line program. I'll also submit a fix for that. That code is not part of the library and therefore shouldn't be of much interest to you (although you're welcome to comment when I deal with it).

    void MemIo::Impl::reserve(long wcount)
    {
        long need = wcount + idx_;
        long    blockSize =     32*1024;   // 32768           `
        long maxBlockSize = 4*1024*1024;

        if (!isMalloced_) {
            // Minimum size for 1st block
            long size  = EXV_MAX(blockSize * (1 + need / blockSize), size_);
            byte* data = (byte*)std::malloc(size);
            std::memcpy(data, data_, size_);
            data_ = data;
            sizeAlloced_ = size;
            isMalloced_ = true;
        }

        if (need > size_) {
            if (need > sizeAlloced_) {
                blockSize = 2*sizeAlloced_ ;
                if ( blockSize > maxBlockSize ) blockSize = maxBlockSize ;
                // Allocate in blocks
                long want      = blockSize * (1 + need / blockSize );
                data_ = (byte*)std::realloc(data_, want);
                sizeAlloced_ = want;
                isMalloced_ = true;
            }
            size_ = need;
        }
    }

@clanmills clanmills reopened this Jan 10, 2018
@clanmills
Copy link
Collaborator

I've tested the (second) patch above and requested it to be added in MinGW: https://github.com/Alexpux/MINGW-packages/pull/3280

I will not update exiv2 'master' immediately because I also want to service the CS issue discussed above.

@peterbud I would like to have a private discussion with you about MinGW/MSYS2/CMake. Could you email please: robin@clanmills.com

@D4N
Copy link
Member

D4N commented Jan 24, 2018

Although we have already #213 to include the workaround that @clanmills proposed, I'd like to add that nowadays we have so much memory at our hands, that it is often possible to just copy the whole file into RAM and avoid any subsequent reallocations. Don't know if anyone is using exiv2 in a resource constrained environment though, there that would be a problem.

D4N pushed a commit that referenced this issue Jan 26, 2018
* changes to MemIo::reserve() for #200

* Changes following review by Dan (thanks, Dan).
@D4N
Copy link
Member

D4N commented Jan 26, 2018

This should have been fixed by #213. Any objections to closing this issue?

@peterbud
Copy link
Author

Thanks guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants