Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*.rc: escape non-ASCII/non-UTF-8 character for clarity #1217

Closed
wants to merge 1 commit into from

Conversation

vszakats
Copy link
Member

No description provided.

@vszakats vszakats changed the title *.rc: escape non-ASCII/non-UTF8 character for clarity *.rc: escape non-ASCII/non-UTF-8 character for clarity Jan 17, 2017
Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this. Maybe add a comment above, explaining what \xA9 will render as?

@vszakats
Copy link
Member Author

Agreed. Added a comment about the meaning of \xA9.

@jay
Copy link
Member

jay commented Jan 17, 2017

I'm just curious in your codepage what does © look like

@vszakats
Copy link
Member Author

vszakats commented Jan 17, 2017

It comes down to how the codepage of the file is (mis)detected/-interpreted by the editor/viewer at hand.

E.g. in my regular editor (mcedit), I've set the default encoding to UTF-8, so it looks like this:
screen shot 2017-01-17 at 19 39 13

In the browser (Safari) this particular github.com PR diff looks like this for the first hunk (and appears fine for the second hunk):
screen shot 2017-01-17 at 19 45 11

It's generally difficult to predict how 8-bit codepages behave using default configurations in different environments, so one solution I found to work is to stick to 7-bit ASCII or UTF-8 (where 7-bit ASCII is not enough) for text files. In case of Windows Resources this may not always be practical due to the spotty support for UTF-8 encoded .rc files, though it can work with the more popular C toolchains.

@vszakats
Copy link
Member Author

FWIW, here's a commit that implements UTF-8 Windows Resources for another project:
vszakats/hb@10d2537

@jay
Copy link
Member

jay commented Jan 18, 2017

Looking at that makes me wonder why our resource file translation field is currently set to 1200 (Unicode) if the block is not in any Unicode encoding. Is that a mistake and should it be 1252 (ANSI/Latin1) instead?

Also, though there's no hard-and-fast rule for this the traditional practice for PR branches is to go on your origin instead of upstream. An exception would be multiple project collaborators working on a branch like recall the https proxy branch that was around for a while or now to a much lesser extent generate-curl-1. A small change like this will be transient though and not that.

@vszakats
Copy link
Member Author

vszakats commented Jan 18, 2017

@jay As for 1200 vs 1252 I think it's fine as it as, and although I've yet to find definitive information, my experience is that this value doesn't influence how resource compilers interpret the .rc file in any particular codepage. For that, most RC tools offer a command-line option (usually -c). Nor does it seem to control how values are stored inside the PE image, which is always UTF-16. So what does it control? Hard to tell. Maybe \xA9 works because it means copyright symbol in Unicode as well.

I've updated the patch to leave out guesses about the codepage being used in relation to \xA9.

The branching is noted, again something that crossed my mind, but decided this is such a simple change that the branch will be short lived.

@vszakats vszakats force-pushed the winresesc branch 2 times, most recently from a144aab to e23a934 Compare January 18, 2017 14:34
@vszakats
Copy link
Member Author

Converted hex to lowercase to match rest of file and cleaned the comment further.

@jay
Copy link
Member

jay commented Jan 19, 2017

Ok LGTM, please make sure to reference this discussion in the body of the upstream commit message with a Ref: or Closes: line, for example one of these
Ref: <url>
Ref: https://github.com/curl/curl/pull/1217
Closes https://github.com/curl/curl/pull/1217
Closes #1217 etc

@vszakats vszakats closed this in df86db7 Jan 19, 2017
@vszakats
Copy link
Member Author

Thank you @jay. I hope I got it right!

@vszakats vszakats deleted the winresesc branch January 19, 2017 11:37
bagder pushed a commit that referenced this pull request Jan 19, 2017
peterpih pushed a commit to railsnewbie257/curl that referenced this pull request Jan 24, 2017
@miurahr
Copy link

miurahr commented Mar 30, 2018

A code \xA9 is defined in CP1252 and Unicode but is not defined in CP936 that cause a compilation problem on Chinese and other non-English Windows that "default language for non-unicode application" is non-Unicode.
MSVC toolset assumes source code is DBCS on such windows environment, and it cause a compilation problem. That is why vcpkg project comment refers here.

@jay
Copy link
Member

jay commented Mar 30, 2018

the suggested solution in that issue is use (c) instead, does anyone object to that?

@vszakats
Copy link
Member Author

vszakats commented Mar 30, 2018

(c) is fine of course. Copyright is another option.

BTW, to avoid relying on default codepages used by the resource compiler in various environments, it may help to explicitly specify the .rc codepage by passing the -c65001 option (for UTF-8) to the resource compiler. This exact option works with MSVC, windres, Borland C, Pelles C, ICC. It would be interesting to see if this resolves the problem on Chinese Windows.

@dfandrich
Copy link
Contributor

IINAL, but my understanding is that (c) doesn't have the same legal significance that © does, in those jurisdictions where a copyright notice is still required.

@jay
Copy link
Member

jay commented Apr 1, 2018

it may help to explicitly specify the .rc codepage by passing the -c65001 option (for UTF-8) to the resource compiler

@miurahr can you try that

@miurahr
Copy link

miurahr commented Apr 2, 2018

@dfandrich INAL but, Berne Convention contract among 175 countries takes a way not to require formal registration or representation, that is because it is not mandatory to add © except for one who were living out of Berne Convension, or one who authored in USA before 1989 when US joined to Berne convention.

@jay I'd like try -c65001

@lock lock bot locked as resolved and limited conversation to collaborators Jul 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants