Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get argv from main entry point encoded as UTF-8 #2090

Merged
merged 9 commits into from Feb 14, 2022
Merged

Get argv from main entry point encoded as UTF-8 #2090

merged 9 commits into from Feb 14, 2022

Conversation

piponazo
Copy link
Collaborator

Fix #1996

Thanks to the discussion we had ad #1996, I could find out how to properly get the arguments from argv on Windows properly encoded as UTF-8. Thanks to this, we can drop more than 1K LoC which was conditionally enabled by EXV_UNICODE_PATH (by setting the CMake option: EXIV2_ENABLE_WIN_UNICODE).

I could not have found how to do this without checking this project:
https://github.com/huangqinjin/wmain

Other interesting article about the topic can be found here:
https://utf8everywhere.org/

@codecov
Copy link

codecov bot commented Feb 11, 2022

Codecov Report

Merging #2090 (ddec60f) into main (e3ca59b) will decrease coverage by 0.00%.
The diff coverage is 48.14%.

❗ Current head ddec60f differs from pull request most recent head f2279e6. Consider uploading reports for the commit f2279e6 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2090      +/-   ##
==========================================
- Coverage   63.19%   63.19%   -0.01%     
==========================================
  Files          96       96              
  Lines       19204    19203       -1     
  Branches     9797     9797              
==========================================
- Hits        12136    12135       -1     
  Misses       4762     4762              
  Partials     2306     2306              
Impacted Files Coverage Δ
include/exiv2/basicio.hpp 100.00% <ø> (ø)
include/exiv2/error.hpp 65.51% <ø> (ø)
include/exiv2/exif.hpp 100.00% <ø> (ø)
include/exiv2/image.hpp 100.00% <ø> (ø)
include/exiv2/types.hpp 70.58% <ø> (ø)
src/error.cpp 74.41% <ø> (ø)
src/exif.cpp 66.01% <ø> (ø)
src/futils.cpp 72.06% <ø> (ø)
src/image.cpp 63.94% <ø> (ø)
src/preview.cpp 50.00% <ø> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3ca59b...f2279e6. Read the comment docs.

Copy link
Collaborator

@kmilos kmilos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@@ -169,6 +169,7 @@ if(MSVC)

# Object Level Parallelism
add_compile_options(/MP)
add_compile_options(/utf-8)
Copy link
Collaborator

@kmilos kmilos Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need something equivalent for MinGW gcc/clang, or nothing is required there? I.e. I don't see -DUNICODE -D_UNICODE defined anywhere that would give us string->wstring redefinition etc. on MinGW, if presumably /utf-8 does this for MSVC? If these are equivalent, maybe we prefer the compiler independent definitions?

Also maybe add a separate comment, the one above does not pertain to this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSYS2/MinGW CI jobs indeed failed. I have to check locally with that environment to check what to do there. The new test added is at the moment failing on that platform:
https://github.com/Exiv2/exiv2/runs/5154900058?check_suite_focus=true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I have seen, we do not need to define UNICODE nor _UNICODE. It looks like they that behavior is enabled by default in VisualStudio and I mostly have seen people trying to disable those flags on my google searchs.

Regarding the MSYS2 & MinGW support ... it looks like a nightmare to get that running. I have installed MSYS2 from scratch, installed all the dependencies, configured & compiled the Exiv2 project and then ... I found out that the MSYS2 terminal was not choosing the UTF-8 character set by defualt 😅 . Here a screenshot with different terminals on windows (from top to bottom: PowerShell, Command Prompt, Command Prompt + Bash, The MSYS2 terminal):

Terminals_UTF8

By chanching the options in that terminal, I could see the special characters:
MSYS2_Terminal_With_UTF8

However the exiv2 cmd application generated on MSYS2 + MinGW does not seem to be able to process that special path:

pipon@LuisDesktopWin MINGW64 /f/projects/exiv2/buildMSYS2
$ bin/exiv2.exe /f/exiv2Exps/Εκκρεμότητες/Stonehenge.heic
F:/exiv2Exps/????e▒?t?te?/Stonehenge.heic: Failed to open the file

The /utf-8 flag is specific to Visual Studio and not used with MinGW. I have tried to use the compiler flags -municode with MinGW, but then, MinGW tries to convert char to wchar:

[ 49%] Building CXX object src/CMakeFiles/exiv2lib.dir/futils.cpp.obj
F:/projects/exiv2/src/futils.cpp: In function 'std::string Exiv2::getProcessPath()':
F:/projects/exiv2/src/futils.cpp:403:23: error: no match for 'operator=' (operand types are 'std::string' {aka 'std::__cxx11::basic_string<char>'} and 'TCHAR [260]' {aka 'wchar_t [260]'})
  403 |                 ret = filename;
      |                       ^~~~~~~~

From my google searchs, I am not very optimistic about being able to have generate a Windows build with MinGW with support for UTF-8.

Now the question is ... do we need this? From my point of view, it would be enough to provide a Windows Release generated with Visual Studio which is able to deal with this situation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would like to have a portable codebase that runs on Windows without assuming the compiler. There are many projects out there that rely on MinGW provided binaries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. I would like to take a pragmatic approach and fix one thing at each time. We can focus on this PR on fixing #1996 for MSVC and later investigate how to do it with MinGW.

I think that by following that approach the build with MinGW would remain the same (we would not be breaking anything).

However, I need to check how I could skip the new test on Win+MinGW builds

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the way to go then if this is MSVC specific.

I still don't get why this approach won't (or currently doesn't) work:

  • have main() w/ char* argc (i.e. assume multi-byte UTF-8 input for all arguments on all platforms)
  • parse arguments as multi-byte UTF-8 char* only on all platforms (i.e. as today)
  • on Windows, convert only filenames w/ existing s2ws() when needed to open the I/O w/ Windows wchar APIs

The only sticking point, as you say, might be what the different consoles encode, I haven't checked that thoroughly.

@@ -120,6 +120,8 @@ namespace {
// Main
int main(int argc, char* const argv[])
{
setlocale(LC_CTYPE, ".utf8");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this require Windows 10 as minimum?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yes, indeed Windows 10 is the minimum. We either say: Only Win10 or greater is supported in the command line application or we add some preprocessor conditions there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reasonable for the exiv2 command-line program to be targeted at Windows/10 and later.

Exiv2 "legacy support" is branch 0.27-maintenance and branch 'main' is about modernisation. If a user needs something "modern" to run on "legacy", they can report an issue and a PR.

src/version.cpp Show resolved Hide resolved
src/wmain.c Outdated Show resolved Hide resolved
@clanmills
Copy link
Collaborator

I found running exiv2/msvc builds work fine on DOS. exiv2/gcc/mingw builds work fine on MinGW/bash. And that's what we build and test.

Trying to use msvc/utf-8 enabled applications on MinGW/bash is a horror show which has nothing to do with exiv2.

@piponazo piponazo force-pushed the main_WinUtf8 branch 2 times, most recently from a918201 to 6766374 Compare February 11, 2022 15:31
@kmilos
Copy link
Collaborator

kmilos commented Feb 11, 2022

Trying to use msvc/utf-8 enabled applications on MinGW/bash is a horror show which has nothing to do with exiv2.

However, the inverse should/must be possible. MinGW built programs are as native Windows programs as any, and should run in cmd and PowerShell just like any other Windows program.

@kmilos
Copy link
Collaborator

kmilos commented Feb 11, 2022

I also wonder if our current s2ws()/ws2s() implementation is entirely correct and perhaps shadowing some problems here - most conversions like this use CP_UTF8 and not CP_ACP here, including MinGW's own helper function.

@piponazo
Copy link
Collaborator Author

Trying to use msvc/utf-8 enabled applications on MinGW/bash is a horror show which has nothing to do with exiv2.

However, the inverse should/must be possible. MinGW built programs are as native Windows programs as any, and should run in cmd and PowerShell just like any other Windows program.

Right, I normally can run MSVC builds from any terminal (cmd, powershell, bash). With the MinGW build it should be the same. The only difference at the moment is that with MSVC we can use the flag /utf-8 to indicate that the execution character set is UTF-8:
https://docs.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170

At the moment I did not find a way to make this work with MinGW.

@neheb
Copy link
Collaborator

neheb commented Feb 11, 2022

I doubt this can ever workeith MinGW. MSYS has a UCRT and clang tool chain that should work fine.

@kmilos
Copy link
Collaborator

kmilos commented Feb 11, 2022

I doubt this can ever workeith MinGW. MSYS has a UCRT and clang tool chain that should work fine.

Eh? When we say MinGW we mean MSYS2 MINGW64/UCRT64/CLANG64 (and hopefully CLANGARM64 soon). Plenty of UTF-8 programs there for years w/ MSVCRT. This is like saying there was no UTF-8 before UCRT, like no MSVC compiled programs against MSVCRT before Windows 8 and VS2017 were UTF-8 capable?!

@neheb
Copy link
Collaborator

neheb commented Feb 11, 2022

I doubt this can ever workeith MinGW. MSYS has a UCRT and clang tool chain that should work fine.

Eh? When we say MinGW we mean MSYS2 MINGW64/UCRT64/CLANG64 (and hopefully CLANGARM64 soon). Plenty of UTF-8 programs there for years w/ MSVCRT. This is like saying there was no UTF-8 before UCRT, like no MSVC compiled programs against MSVCRT before Windows 8 and VS2017 were UTF-8 capable?!

I don't think I've seen utf-8 programs with msvcrt. I've seen plenty of utf-16 ones but not utf-8.

@clanmills
Copy link
Collaborator

I'm not sure we need to do anything here. When this topic was brought up a few months ago, we discovered that the posix platforms (mingw, macOS, Linux) required no attention. The aim is to get the Greek Path to work on msvc/Windows and that's what you now have working. Great Job.

@kmilos. Miloš your point about s2ws()/ws2s() is interesting, however I don't believe that's related to the Greek Path.

@piponazo
Copy link
Collaborator Author

@clanmills The issue #1996 correctly states that he problem is present on MSVC & MinGW builds.

With this PR I am proposing to fix the MSVC build and discard all the usages of std::wstring and wchar_t. By doing so, we will start to assume that std::string and char* variables are considered UTF-8, anywhere in the program. That seems to be enough thanks to the new UTF-8 support on the Universal C Runtime (UCRT) which Windows 10 brings.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#utf-8-support

The problem seems to be that MinGW does not (and cannot) use UCRT:
https://mingwpy.github.io/ucrt.html
https://stackoverflow.com/questions/57528555/how-do-i-build-against-the-ucrt-with-mingw-w64

I think we could merge this as it is, solve 1 of the 2 problems (MSVC build) and if somebody else is interested in the topic, we could work on also fixing the MinGW problem in another branch.

@neheb
Copy link
Collaborator

neheb commented Feb 11, 2022

Note that UCRT is available for 7 and 8.1. So it's not that bad.

@kmilos
Copy link
Collaborator

kmilos commented Feb 11, 2022

The problem seems to be that MinGW does not (and cannot) use UCRT

Don't think this is true any longer. Note the link above again: MSYS2 has many environments for which exiv2 is built. At least UCRT64 and CLANG64 link to UCRT.

@kmilos
Copy link
Collaborator

kmilos commented Feb 11, 2022

Note that UCRT is available for 7 and 8.1. So it's not that bad.

You might be right about MSVCRT, sorry. From here:

UCRT can be installed on older versions of Windows, but UTF-8 support will only work on Windows 10 (November 2019 update) and newer.

@piponazo
Copy link
Collaborator Author

piponazo commented Feb 11, 2022

I found a interesting thread in which MinGW people is discussing about providing UCRT version of their packages:
msys2/MINGW-packages#6901

So it seems that at some point we will be able to compile with MSYS2/MinGW and link against UCRT 🛩️

@kmilos
Copy link
Collaborator

kmilos commented Feb 11, 2022

So it seems that at some point we will be able to compile with MSYS2/MinGW and link against UCRT

That point was as soon as UCRT64 environment was made available 😉

Exiv2 builds are already available for all supported MSYS2 environments: https://packages.msys2.org/base/mingw-w64-exiv2

@piponazo
Copy link
Collaborator Author

After I followed the link provided by @kmilos I tried to use the packages which are using UCRT:
https://www.msys2.org/docs/environments/

Exiv2 compiled correctly, I got some some pop-ups complaining about missing DLLs, I placed them in the bin directory and I finally ended up with a weird crash 😭

image

@piponazo
Copy link
Collaborator Author

Very strange ... if I launch the exiv2 exe from that build directory from a windows CMD I get that pop-up. However I can run the same exe from the UCRT64 terminal. Sadly, the magic is not working 😢
image

@neheb
Copy link
Collaborator

neheb commented Feb 11, 2022

strange.

@clanmills
Copy link
Collaborator

Here's the evidence that the Greek works on MinGW/msys2: #1996 (comment)

If I create a Greek path in DOS, I am unable to cd to that directory in bash.

@kmilos
Copy link
Collaborator

kmilos commented Feb 13, 2022

Good going Luis. Sorry I wasn't able to help because I have all these environments set up on my box already, but I am unfortunately away for a week...

kevinbackhouse
kevinbackhouse previously approved these changes Feb 13, 2022
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/wmain.c Outdated Show resolved Hide resolved
piponazo and others added 8 commits February 13, 2022 22:49
- Adapt exifprint to the new wmain strategy
- Delete have_unicode_path
- wmain does not work with MSYS & MinGW
- cmake: entry point via cmake instead of pragma
- cmake: better doc for MSVC flags
- Fix entry point in sample apps
- Adapt CMake code to work with MSVC & MinGW
- Use the CMake generator 'MSYS Makefiles' for MSYS builds
- Run CI build in parallel
- MSYS with NLS OFF
Co-authored-by: Kevin Backhouse <kevinbackhouse@github.com>
@piponazo piponazo requested a review from hassec February 14, 2022 15:23
@piponazo piponazo merged commit e450c49 into main Feb 14, 2022
@piponazo piponazo deleted the main_WinUtf8 branch February 14, 2022 15:37
@clanmills
Copy link
Collaborator

Great Work, @piponazo. Well Done.

lazka added a commit to lazka/MINGW-packages that referenced this pull request Jun 21, 2023
new libinih dependency

EXIV2_ENABLE_WIN_UNICODE was removed in Exiv2/exiv2#2090
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug compilers Related with compiler options, definitions, support, etc. windows Windows Specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using exiv2.exe on msvc and mingw with greek letters on path
7 participants