What happened to wstring support in open() APIs? #2637

ribtoks · 2023-05-28T08:44:55Z

Hi

I upgraded exiv2 to 0.28.0 just to find that std::wstring support has been nuked out in 7933ff4 but I could not find a matching PR, issue or at least a discussion that would explain why and what to do now.

std::string just does not work in Windows for non-latin1 paths for Exiv2::ImageFactory::open() (Visual Studio compiler).

What is the intended way to use it now?

These are all the old issues that reference the problem:

The text was updated successfully, but these errors were encountered:

cgilles · 2023-06-02T15:46:31Z

I can confirm that digiKam 8.1.0 pre-release based on Exiv2 0.28 is broken under windows to read metadata from images.

https://bugs.kde.org/show_bug.cgi?id=469372
https://bugs.kde.org/show_bug.cgi?id=470566

neheb · 2023-06-06T04:26:57Z

The intended way to use this is to set UTF-8 as the code page. See: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

edit: oh I see. digiKam is compiled with MSVCRT instead of UCRT. I don't think that can work.

ping @piponazo

cgilles · 2023-06-06T06:05:26Z

No, digiKam is cross compiled under Windows using MXE (MinGW env - https://github.com/mxe/mxe) since 10 years now. The MSVC build is just another way to build the application in the KDE infra, it's not the official production release at all, because MSVC is unsafe,slow, and under-productive platform to compile huge code as digiKam, but it's another story...

Exiv2 team, please revert this commit to drop the UNICODE support under Windows. It's not at all a "dead" code... or explain well the work around. Note: I'm surprised to not see an unit-tests crying for the UTF16 long file path under Windows...

neheb · 2023-06-06T06:52:30Z

No, digiKam is cross compiled under Windows using MXE (MinGW env - https://github.com/mxe/mxe) since 10 years now. The MSVC build is just another way to build the application in the KDE infra, it's not the official production release at all, because MSVC is unsafe,slow, and under-productive platform to compile huge code as digiKam, but it's another story...

Exiv2 team, please revert this commit to drop the UNICODE support under Windows. It's not at all a "dead" code... or explain well the work around. Note: I'm surprised to not see an unit-tests crying for the UTF16 long file path under Windows...

Read what I said again. MSVCRT and UCRT

maksim-petukhov · 2023-08-15T18:14:59Z

@neheb @piponazo
What was the reason for deleting perfectly working code in 7933ff4? I'd like to advocate for reverting this deletion. The removed code was functional and served a purpose, so it doesn't seem like dead code. It's been quite useful for me, and I believe it could continue to benefit other users as well.

The intended way to use this is to set UTF-8 as the code page. See: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

I see that the current recommendation is to "set UTF-8 as the code page". It is a bit reckless to ask users of Exiv2 to do that. There are some problems with this approach. First of all, this can perfectly work in tests but there is a reality when you have 150+ external libraries in your project and neither of them mentions that it works flawlessly with relatively new Microsoft invention. There are also loads of your code that relies on assumption "it is not UTF-8 there but something else" you have to check to make sure nothing is broken by forcing UTF-8 code page.

The "UTF-8 as the code page" feature is available only since Windows 1803. For Windows 10 operating systems prior to 1803 (10.0.17134.0), only static linking of UCRT is supported.. Windows 10 version prior 1803 are still supported in LTSC channel and static linking of UCRT is discouraged by Microsoft for security reasons. I don't see a nice way to use 0.28 across all Windows 10 versions.

I'm sharing these insights in the spirit of maintaining the functionality and usability of Exiv2 across a wide range of scenarios. I hope we can discuss this further and consider the potential benefits of restoring the code.

Thank you for your attention and consideration.

neheb · 2023-08-15T18:35:43Z

@maksim-petukhov more reading here: #2090

maksim-petukhov · 2023-08-15T18:51:29Z

@neheb I read it. It is still doesn't answer why @piponazo deleted code for EXIV2_ENABLE_WIN_UNICODE. exiv2.exe binary works with "set UTF-8 as the code page" - great. But as I'm trying to point out, there are big projects out there relying on exiv2 library that can't just "set UTF-8 as the code page" because this could break a lot of things.

cgilles · 2023-08-17T18:00:08Z

And digiKam also depend on Exiv2 in-deep, and we don't have this kind of problem with UTF8 under Windows when it's cross compiled under Linux MinGW. Please revert this code deletion, as the library is not suitable under Windows when it's cross compiled. I second also maksim-petukhov about the topic : "Why to drop working code tested since a vrey long time ?"

NOTE: recommend me to compile whole digiKam with MSVC under Windows is not a solution for digiKam at all...

Gilles Caulier

kmilos · 2023-08-18T07:15:46Z

NOTE: recommend me to compile whole digiKam with MSVC under Windows is not a solution for digiKam at all...

You have the MSYS2 UCRT64 option (the default now for MSYS2). I think Octave recently switched to UCRT builds also using MXE - @mmuetzel can you please advise the digiKam devs?

kmilos · 2023-08-18T07:24:40Z

AFAICT, it seems Fedora's MinGW cross toolchain can also target UCRT since F37: https://fedoraproject.org/wiki/Changes/F37MingwUCRT (e.g. https://src.fedoraproject.org/rpms/mingw-zlib/pull-request/3)

Please get in touch with any other distro not supporting and ask for UCRT targets (cf. Arch, Debian, don't know about Ubuntu...)

mmuetzel · 2023-08-18T08:30:42Z

Just my 2 cents without getting a complete picture yet: Switching Windows to a UTF-8 locale only affects binaries that use the new Universal C Runtime (UCRT) instead of the older MS Visual C Runtime (MSVCRT). But even then, there are still quite a few applications that don't work when switching Windows to a UTF-8 locale. (IIRC, Notepad++ had some issues last time I checked.)

So, the only way to reliably support non-ASCII characters in file names on Windows is to use their wide character API. I guess that is what is meant when people talk about wstring here?

kmilos · 2023-08-18T08:36:02Z

I guess that is what is meant when people talk about wstring here?

Correct. However, we're discussing here about transitioning exiv2 clients to UCRT builds and UTF-8 locale only, as exiv2 in the latest version dropped the W API*. digiKam is a good example, as it uses MXE builds like Octave.

* Why, and why now is a separate discussion, but let's be pragmatic for a moment and focus on the UCRT build transition, as it'll have to be done by people sooner or later anyway...

mmuetzel · 2023-08-18T08:47:23Z

I understand that this issue took a turn to become about linking to the UCRT. But to come back to the original issue:
Last time I checked, setting Windows to a UTF-8 locale was still experimental. And I'd really urge you guys too not recommend to your users to activate a setting that will leave them with a half borked Windows installation.

kmilos · 2023-08-18T08:58:02Z

And I'd really urge you guys too not recommend to your users to activate a setting that will leave them with a half borked Windows installation.

How so? The UTF-8 setlocale setting (or the app manifest) only affects the running process, not the system.

Anyhow, as I said, I'm not too keen on getting into why (feel free to join the prior issues and PRs linked above), but trying to offer workarounds.

kmilos · 2023-08-18T09:33:02Z

The "UTF-8 as the code page" feature is available only since Windows 1803. For Windows 10 operating systems prior to 1803 (10.0.17134.0), only static linking of UCRT is supported.. Windows 10 version prior 1803 are still supported in LTSC channel and static linking of UCRT is discouraged by Microsoft for security reasons. I don't see a nice way to use 0.28 across all Windows 10 versions.

Btw, even more than one year ago this was already less than 0.2% of all Windows 10/11 machines (since 2018 >90% out of those 2.4% were already on 1803). (Windows 7 & 8.1 are not counted here as they are no longer supported by MS.) If you say a lot of enterprise machines are probably also not counted here, what could it be realistically then given the 3-5 year enterprise cycles? Still less than 1%, no? What percentage of those will ever use exiv2? The UCRT toolchains are there (both native and cross). Let's move on.

mmuetzel · 2023-08-21T16:17:21Z

If you are sure that you won't restore using the wide character Windows API, that's your decision.

(Cross-)compiling MinGW and GCC that default to linking to the UCRT should only require adding --with-default-msvcrt=ucrt to their respective configure flags.

neheb · 2023-08-21T18:02:18Z

wstring API removal was premature looks like. Can probably be added back in terms of std::filesystem.

Debian and hence Ubuntu do not have a UCRT target: https://groups.google.com/g/linux.debian.bugs.dist/c/vU9KxW1RTLg

Fedora has one but it's very barebones.

neheb · 2023-08-21T19:21:16Z

Yeah messing around with std::filesystem leads me to believe the wstring API should come back. fs::path is an std::(w)string based on the platform. For Windows it requires calling .string(), which does not work with references as it's a temporary.

gh-andre · 2023-10-19T18:03:53Z

I encountered this regression as well after upgrading to v0.28.0 and would like to note that Microsoft added UTF-8 code page on top of core Windows functionality, which remains to be UTF-16 (well, UCS-2, actually), so it will end up calling the same W functions in Win32 that _wfopen did, just slower, because of the extra character conversion.

Also, I imagine processes using a specific code page would need to check the code page inherited from the console and report an error if it doesn't match - wouldn't this cause more problems, compared to just using wide characters as 0.26.x did and not worrying about code pages?

It would be great if this breaking change was reverted until a usable alternative is found, whatever it is. You can even make it smaller if you use std::filesystem::path instead of std::string in FileIo::Impl because it already provides functionality to obtain std::string, std::wstring and std::u8string for any given path, so it should be possible to reduce code duplication by just conditionally compiling code where there's difference, like calling _wfopen vs. fopen (or using file I/O traits of sorts).

I will also note that since UTF-8 being the target here, keep in mind that in C++20 char8_t* and char*, as well as std::u8string and std::string are no longer interchangeable, so if you end up calling path functions returning std::u8string, it won't work with the rest of the code expecting std::string.

neheb · 2023-10-19T18:39:25Z

Requires small API change. See #2800

gh-andre · 2023-10-20T02:01:32Z

While it's a good change, I don't mean it as just an improvement, but rather as a way to reduce the amount of conditionally compiled code, which in 0.26.x included entire methods. Having std::filesystem::path would allow compile conditionally only key function calls, such as _wfopen vs. fopen and call the string method appropriate for the API call.

As a side note, you may want to revisit by-value path parameters - most paths are over the short string optimization limit and will unnecessarily allocate memory. Given that std::filesystem::directory_entry::path() returns std::const filesystem::path&, it would reduce memory allocations for apps that scan a lot of images.

neheb · 2023-10-21T19:36:50Z

@gh-andre public API is kept at C++11 and private is C++17. The ref to value change was because it does not work under Windows. The alternative is to reintroduce const std::string& and const std::wstring&.

gh-andre · 2023-10-22T20:05:40Z

Not sure what wasn't working on Windows. Maybe it is a C++11 issue, I cannot say. Definitely has no problem in C++17, so I was able to create a patch to handle opening wide-character file paths for Exiv2 v0.28.0. It will only work for files because the implementation isn't cleanly separated for various derived classes of BasicIo and surrounding code.

That is, some things seem like they got worse over time, such as calling fopen and stat outside of the implementation class, and some were like that from the beginning, like calling ::remove and fs::remove within the same block of code, or extracting a part of a URL as if it's a file (e.g. std::make_unique<FileIo>(pathOfFileUrl(path))), which will fail for any URL-encoded sequences, like %20.

Initially, I took a stab at the whole thing, similar to how previous EXIV2_ENABLE_WIN_UNICODE did it, but it was futile because file paths and protocols were all mixed in. The WIN_UNIQUDE patch hacked it in places doing things like assigning wchar_t to char characters, which only works for ASCII, duplicating thumbnail extensions methods, etc. So, I opted for a localized change, just to be able to work with image files. If anyone is interested, here's the patch:

https://github.com/StoneStepsInc/exiv2-nuget/blob/master/patches/01-wide-char-paths.patch

It will apply against the released Exiv2 0.28.0 source in exiv2-0.28.0-Source.tar.gz. Narrow character functions will still work as before, but will not be able to open file paths outside of the current code page. Functions that take const std::filesystem::path& will open Windows paths comprised from valid UTF-16 characters.

AlynxZhou · 2023-12-01T18:03:17Z

If I am not wrong, this could be fixed with #2800? Is there anything blocking it from being merged? Thanks!

This is an alternative approach to using the std::wstring versions of exiv2 methods that are not available in v0.28.x (see Exiv2/exiv2#2637). It should also allow non ascii paths to be used when libexiv2 has been compiled without wstring methods enabled. The proper approach is to set the Windows code page to utf-8, but this probably won't work on old Windows, and it's not clear how to do it for a Python application. Instead we transcode paths from utf-8 to the current code page, which could fail.

cgilles · 2024-03-16T12:40:45Z

Note : digiKam is now also compiled with VCPKG and MSVC compiler under Windows and the problem is also present in native build of Exiv2.

https://bugs.kde.org/show_bug.cgi?id=483743

Gilles Caulier

kmilos · 2024-03-25T16:07:04Z

@cgilles As noted in the darktable pull request and elsewhere, please do not use the .manifest directly (MSVC only) - if you wrap it in a .rc, it works for both MinGW and MSVC (you'll need one extra option for MSVC though...)

ribtoks added the bug label May 28, 2023

ribtoks changed the title ~~What happend to wstring support in open() APIs?~~ What happened to wstring support in open() APIs? May 28, 2023

kmilos mentioned this issue Jun 9, 2023

Exiv2 0.28 msys2/MINGW-packages#17480

Closed

neheb mentioned this issue Jul 1, 2023

basic conversion to C++20 #2645

Draft

maksim-petukhov mentioned this issue Aug 15, 2023

Allow building Exiv2 0.27 with C++17 compiler #2743

Open

kmilos added platforms Cygwin, MinGW, cross-compilation, NetBSD, FreeBSD etc windows Windows Specific issues labels Aug 18, 2023

kmilos mentioned this issue Oct 19, 2023

In Exiv 0.27.6 there was an overload of ImageFactory::open taking std::wstring and it's gone in 0.28.0 #2799

Closed

kmilos mentioned this issue Oct 25, 2023

EXIF broken in recent 4.5.0 darktable-org/darktable#15478

Closed

kmilos added the i18n Internationalisation label Oct 25, 2023

kmilos mentioned this issue Dec 7, 2023

Chinese path not supported #2857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happened to wstring support in open() APIs? #2637

What happened to wstring support in open() APIs? #2637

ribtoks commented May 28, 2023 •

edited

cgilles commented Jun 2, 2023 •

edited

neheb commented Jun 6, 2023 •

edited

cgilles commented Jun 6, 2023

neheb commented Jun 6, 2023

maksim-petukhov commented Aug 15, 2023

neheb commented Aug 15, 2023

maksim-petukhov commented Aug 15, 2023

cgilles commented Aug 17, 2023

kmilos commented Aug 18, 2023 •

edited

kmilos commented Aug 18, 2023 •

edited

mmuetzel commented Aug 18, 2023

kmilos commented Aug 18, 2023 •

edited

mmuetzel commented Aug 18, 2023

kmilos commented Aug 18, 2023 •

edited

kmilos commented Aug 18, 2023 •

edited

mmuetzel commented Aug 21, 2023

neheb commented Aug 21, 2023

neheb commented Aug 21, 2023

gh-andre commented Oct 19, 2023 •

edited

neheb commented Oct 19, 2023

gh-andre commented Oct 20, 2023

neheb commented Oct 21, 2023

gh-andre commented Oct 22, 2023

AlynxZhou commented Dec 1, 2023

cgilles commented Mar 16, 2024

kmilos commented Mar 25, 2024 •

edited

What happened to wstring support in open() APIs? #2637

What happened to wstring support in open() APIs? #2637

Comments

ribtoks commented May 28, 2023 • edited

cgilles commented Jun 2, 2023 • edited

neheb commented Jun 6, 2023 • edited

cgilles commented Jun 6, 2023

neheb commented Jun 6, 2023

maksim-petukhov commented Aug 15, 2023

neheb commented Aug 15, 2023

maksim-petukhov commented Aug 15, 2023

cgilles commented Aug 17, 2023

kmilos commented Aug 18, 2023 • edited

kmilos commented Aug 18, 2023 • edited

mmuetzel commented Aug 18, 2023

kmilos commented Aug 18, 2023 • edited

mmuetzel commented Aug 18, 2023

kmilos commented Aug 18, 2023 • edited

kmilos commented Aug 18, 2023 • edited

mmuetzel commented Aug 21, 2023

neheb commented Aug 21, 2023

neheb commented Aug 21, 2023

gh-andre commented Oct 19, 2023 • edited

neheb commented Oct 19, 2023

gh-andre commented Oct 20, 2023

neheb commented Oct 21, 2023

gh-andre commented Oct 22, 2023

AlynxZhou commented Dec 1, 2023

cgilles commented Mar 16, 2024

kmilos commented Mar 25, 2024 • edited

ribtoks commented May 28, 2023 •

edited

cgilles commented Jun 2, 2023 •

edited

neheb commented Jun 6, 2023 •

edited

kmilos commented Aug 18, 2023 •

edited

kmilos commented Aug 18, 2023 •

edited

kmilos commented Aug 18, 2023 •

edited

kmilos commented Aug 18, 2023 •

edited

kmilos commented Aug 18, 2023 •

edited

gh-andre commented Oct 19, 2023 •

edited

kmilos commented Mar 25, 2024 •

edited