New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*.rc: escape non-ASCII/non-UTF-8 character for clarity #1217

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@vszakats
Member

vszakats commented Jan 17, 2017

No description provided.

@vszakats vszakats changed the title from *.rc: escape non-ASCII/non-UTF8 character for clarity to *.rc: escape non-ASCII/non-UTF-8 character for clarity Jan 17, 2017

@bagder

bagder approved these changes Jan 17, 2017

I'm fine with this. Maybe add a comment above, explaining what \xA9 will render as?

@vszakats

This comment has been minimized.

Show comment
Hide comment
@vszakats

vszakats Jan 17, 2017

Member

Agreed. Added a comment about the meaning of \xA9.

Member

vszakats commented Jan 17, 2017

Agreed. Added a comment about the meaning of \xA9.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Jan 17, 2017

Member

I'm just curious in your codepage what does © look like

Member

jay commented Jan 17, 2017

I'm just curious in your codepage what does © look like

@vszakats

This comment has been minimized.

Show comment
Hide comment
@vszakats

vszakats Jan 17, 2017

Member

It comes down to how the codepage of the file is (mis)detected/-interpreted by the editor/viewer at hand.

E.g. in my regular editor (mcedit), I've set the default encoding to UTF-8, so it looks like this:
screen shot 2017-01-17 at 19 39 13

In the browser (Safari) this particular github.com PR diff looks like this for the first hunk (and appears fine for the second hunk):
screen shot 2017-01-17 at 19 45 11

It's generally difficult to predict how 8-bit codepages behave using default configurations in different environments, so one solution I found to work is to stick to 7-bit ASCII or UTF-8 (where 7-bit ASCII is not enough) for text files. In case of Windows Resources this may not always be practical due to the spotty support for UTF-8 encoded .rc files, though it can work with the more popular C toolchains.

Member

vszakats commented Jan 17, 2017

It comes down to how the codepage of the file is (mis)detected/-interpreted by the editor/viewer at hand.

E.g. in my regular editor (mcedit), I've set the default encoding to UTF-8, so it looks like this:
screen shot 2017-01-17 at 19 39 13

In the browser (Safari) this particular github.com PR diff looks like this for the first hunk (and appears fine for the second hunk):
screen shot 2017-01-17 at 19 45 11

It's generally difficult to predict how 8-bit codepages behave using default configurations in different environments, so one solution I found to work is to stick to 7-bit ASCII or UTF-8 (where 7-bit ASCII is not enough) for text files. In case of Windows Resources this may not always be practical due to the spotty support for UTF-8 encoded .rc files, though it can work with the more popular C toolchains.

@vszakats

This comment has been minimized.

Show comment
Hide comment
@vszakats

vszakats Jan 17, 2017

Member

FWIW, here's a commit that implements UTF-8 Windows Resources for another project:
vszakats/harbour-core@10d2537

Member

vszakats commented Jan 17, 2017

FWIW, here's a commit that implements UTF-8 Windows Resources for another project:
vszakats/harbour-core@10d2537

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Jan 18, 2017

Member

Looking at that makes me wonder why our resource file translation field is currently set to 1200 (Unicode) if the block is not in any Unicode encoding. Is that a mistake and should it be 1252 (ANSI/Latin1) instead?

Also, though there's no hard-and-fast rule for this the traditional practice for PR branches is to go on your origin instead of upstream. An exception would be multiple project collaborators working on a branch like recall the https proxy branch that was around for a while or now to a much lesser extent generate-curl-1. A small change like this will be transient though and not that.

Member

jay commented Jan 18, 2017

Looking at that makes me wonder why our resource file translation field is currently set to 1200 (Unicode) if the block is not in any Unicode encoding. Is that a mistake and should it be 1252 (ANSI/Latin1) instead?

Also, though there's no hard-and-fast rule for this the traditional practice for PR branches is to go on your origin instead of upstream. An exception would be multiple project collaborators working on a branch like recall the https proxy branch that was around for a while or now to a much lesser extent generate-curl-1. A small change like this will be transient though and not that.

@vszakats

This comment has been minimized.

Show comment
Hide comment
@vszakats

vszakats Jan 18, 2017

Member

@jay As for 1200 vs 1252 I think it's fine as it as, and although I've yet to find definitive information, my experience is that this value doesn't influence how resource compilers interpret the .rc file in any particular codepage. For that, most RC tools offer a command-line option (usually -c). Nor does it seem to control how values are stored inside the PE image, which is always UTF-16. So what does it control? Hard to tell. Maybe \xA9 works because it means copyright symbol in Unicode as well.

I've updated the patch to leave out guesses about the codepage being used in relation to \xA9.

The branching is noted, again something that crossed my mind, but decided this is such a simple change that the branch will be short lived.

Member

vszakats commented Jan 18, 2017

@jay As for 1200 vs 1252 I think it's fine as it as, and although I've yet to find definitive information, my experience is that this value doesn't influence how resource compilers interpret the .rc file in any particular codepage. For that, most RC tools offer a command-line option (usually -c). Nor does it seem to control how values are stored inside the PE image, which is always UTF-16. So what does it control? Hard to tell. Maybe \xA9 works because it means copyright symbol in Unicode as well.

I've updated the patch to leave out guesses about the codepage being used in relation to \xA9.

The branching is noted, again something that crossed my mind, but decided this is such a simple change that the branch will be short lived.

@vszakats

This comment has been minimized.

Show comment
Hide comment
@vszakats

vszakats Jan 18, 2017

Member

Converted hex to lowercase to match rest of file and cleaned the comment further.

Member

vszakats commented Jan 18, 2017

Converted hex to lowercase to match rest of file and cleaned the comment further.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Jan 19, 2017

Member

Ok LGTM, please make sure to reference this discussion in the body of the upstream commit message with a Ref: or Closes: line, for example one of these
Ref: <url>
Ref: https://github.com/curl/curl/pull/1217
Closes https://github.com/curl/curl/pull/1217
Closes #1217 etc

Member

jay commented Jan 19, 2017

Ok LGTM, please make sure to reference this discussion in the body of the upstream commit message with a Ref: or Closes: line, for example one of these
Ref: <url>
Ref: https://github.com/curl/curl/pull/1217
Closes https://github.com/curl/curl/pull/1217
Closes #1217 etc

@vszakats vszakats closed this in df86db7 Jan 19, 2017

@vszakats

This comment has been minimized.

Show comment
Hide comment
@vszakats

vszakats Jan 19, 2017

Member

Thank you @jay. I hope I got it right!

Member

vszakats commented Jan 19, 2017

Thank you @jay. I hope I got it right!

@vszakats vszakats deleted the winresesc branch Jan 19, 2017

bagder added a commit that referenced this pull request Jan 19, 2017

jkralik pushed a commit to jkralik/curl that referenced this pull request Jan 23, 2017

peterpih pushed a commit to railsnewbie257/curl that referenced this pull request Jan 24, 2017

@miurahr

This comment has been minimized.

Show comment
Hide comment
@miurahr

miurahr Mar 30, 2018

A code \xA9 is defined in CP1252 and Unicode but is not defined in CP936 that cause a compilation problem on Chinese and other non-English Windows that "default language for non-unicode application" is non-Unicode.
MSVC toolset assumes source code is DBCS on such windows environment, and it cause a compilation problem. That is why vcpkg project comment refers here.

miurahr commented Mar 30, 2018

A code \xA9 is defined in CP1252 and Unicode but is not defined in CP936 that cause a compilation problem on Chinese and other non-English Windows that "default language for non-unicode application" is non-Unicode.
MSVC toolset assumes source code is DBCS on such windows environment, and it cause a compilation problem. That is why vcpkg project comment refers here.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Mar 30, 2018

Member

the suggested solution in that issue is use (c) instead, does anyone object to that?

Member

jay commented Mar 30, 2018

the suggested solution in that issue is use (c) instead, does anyone object to that?

@vszakats

This comment has been minimized.

Show comment
Hide comment
@vszakats

vszakats Mar 30, 2018

Member

(c) is fine of course. Copyright is another option.

BTW, to avoid relying on default codepages used by the resource compiler in various environments, it may help to explicitly specify the .rc codepage by passing the -c65001 option (for UTF-8) to the resource compiler. This exact option works with MSVC, windres, Borland C, Pelles C, ICC. It would be interesting to see if this resolves the problem on Chinese Windows.

Member

vszakats commented Mar 30, 2018

(c) is fine of course. Copyright is another option.

BTW, to avoid relying on default codepages used by the resource compiler in various environments, it may help to explicitly specify the .rc codepage by passing the -c65001 option (for UTF-8) to the resource compiler. This exact option works with MSVC, windres, Borland C, Pelles C, ICC. It would be interesting to see if this resolves the problem on Chinese Windows.

@dfandrich

This comment has been minimized.

Show comment
Hide comment
@dfandrich

dfandrich Apr 1, 2018

Collaborator

IINAL, but my understanding is that (c) doesn't have the same legal significance that © does, in those jurisdictions where a copyright notice is still required.

Collaborator

dfandrich commented Apr 1, 2018

IINAL, but my understanding is that (c) doesn't have the same legal significance that © does, in those jurisdictions where a copyright notice is still required.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Apr 1, 2018

Member

it may help to explicitly specify the .rc codepage by passing the -c65001 option (for UTF-8) to the resource compiler

@miurahr can you try that

Member

jay commented Apr 1, 2018

it may help to explicitly specify the .rc codepage by passing the -c65001 option (for UTF-8) to the resource compiler

@miurahr can you try that

@miurahr

This comment has been minimized.

Show comment
Hide comment
@miurahr

miurahr Apr 2, 2018

@dfandrich INAL but, Berne Convention contract among 175 countries takes a way not to require formal registration or representation, that is because it is not mandatory to add © except for one who were living out of Berne Convension, or one who authored in USA before 1989 when US joined to Berne convention.

@jay I'd like try -c65001

miurahr commented Apr 2, 2018

@dfandrich INAL but, Berne Convention contract among 175 countries takes a way not to require formal registration or representation, that is because it is not mandatory to add © except for one who were living out of Berne Convension, or one who authored in USA before 1989 when US joined to Berne convention.

@jay I'd like try -c65001

@lock lock bot locked as resolved and limited conversation to collaborators Jul 1, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.