Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl: revert to non-Unicode builds [ci skip] #20

Merged
merged 1 commit into from
Jul 20, 2021
Merged

curl: revert to non-Unicode builds [ci skip] #20

merged 1 commit into from
Jul 20, 2021

Conversation

vszakats
Copy link
Member

@vszakats vszakats commented Jul 20, 2021

Windows interfaces to use the Unicode flavour of the Windows API, but that also meant that the expected encoding/codepage of strings (e.g. local filenames, URLs) exchanged via the libcurl API became ambiguous and undefined.

Previously all strings had to be passed in the active Windows locale, using an 8-bit codepage. In Unicode libcurl builds, the expected string encoding became an undocumented mixture of UTF-8 and 8-bit locale, depending on the actual API, build options/dependencies, internal fallback logic based on runtime auto-detection of passed string, and the result of file operations (scheduled for removal in 7.78.0). While some parts of libcurl kept using 8-bit strings internally, e.g. when reading the environment.

From the user's perspective this poses an unreasonably complex task in finding out how to pass (or read) a certain non-ASCII string to (from) a specific API without unwanted or accidental conversions or other side-effects. Missing the correct encoding may result in unexpected behaviour, e.g. in some cases not finding files, reading/writing a different file, accessing the wrong URL or passing a corrupt username or password.

Note that these issues may only affect strings with non-7-bit-ASCII content.

For now the least bad solution seems to be to revert back to how libcurl/curl worked for most of its existence and only re-enable Unicode once the remaining parts of Windows Unicode support are well-understood, ironed out and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5

@vszakats vszakats changed the title curl: revert back to non-Unicode builds [ci skip] curl: revert to non-Unicode builds [ci skip] Jul 20, 2021
@vszakats vszakats force-pushed the unioff branch 2 times, most recently from 54dd5e9 to b0ac143 Compare July 20, 2021 00:32
On closer inspection, the state of Windows Unicode support in libcurl does
not seem to be ready for production. Existing support extended certain
Windows interfaces to use the Unicode flavour of the Windows API, but that
also meant that the expected encoding/codepage of strings (e.g. local
filenames, URLs) exchanged via the libcurl API became ambiguous and
undefined.

Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API, build options/dependencies, internal fallback logic based on
runtime auto-detection of passed string, and the result of file operations
(scheduled for removal in 7.78.0). While some parts of libcurl kept using
8-bit strings internally, e.g. when reading the environment.

From the user's perspective this poses an unreasonably complex task in
finding out how to pass (or read) a certain non-ASCII string to (from) a
specific API without unwanted or accidental conversions or other
side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, reading/writing a
different file, accessing the wrong URL or passing a corrupt username or
password.

Note that these issues may only affect strings with _non-7-bit-ASCII_
content.

For now the least bad solution seems to be to revert back to how
libcurl/curl worked for most of its existence and only re-enable Unicode
once the remaining parts of Windows Unicode support are well-understood,
ironed out and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant