Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmake: Replace invalid UTF-8 byte sequence #1271

Closed
wants to merge 1 commit into from

Conversation

webmaster128
Copy link
Contributor

by and alternative UTF-8 compatible pattern.

Before, the file contained special chars in a different encoding than ASCII or UTF-8 making text editors and Python complain when reading the file. Looks like 7-bit ASCII and UTF-8 are the preferred encodings: https://cmake.org/cmake/help/v3.0/manual/cmake-language.7.html#encoding

@mention-bot
Copy link

@webmaster128, thanks for your PR! By analyzing the history of the files in this pull request, we identified @billhoffman, @Sukender and @jzakrzewski to be potential reviewers.

@jay jay added the cmake label Feb 20, 2017
@jay
Copy link
Member

jay commented Feb 20, 2017

hm. I don't know why it was done like that in the first place. I would think there's a regex that makes both of those lines unnecessary? If there is no other feedback after a while I don't see any problem with it.

/cc @Sukender

@ligfx
Copy link
Contributor

ligfx commented Feb 21, 2017

It looks like that second line (REPLACE "§!§" "\n") is supposed to add newlines back, but it doesn't work for me. It's also a cosmetic, not syntactical, concern for the generated CMake file (which I assume people don't normally read?).

I'd say, remove the Unicode shenanigans entirely:

-  string(REGEX REPLACE "\\\\\n" "§!§" MAKEFILE_INC_TEXT ${MAKEFILE_INC_TEXT})
+  string(REGEX REPLACE "\\\\\n" "" MAKEFILE_INC_TEXT ${MAKEFILE_INC_TEXT})
string(REGEX REPLACE "([a-zA-Z_][a-zA-Z0-9_]*)[\t ]*=[\t ]*([^\n]*)" "SET(\\1 \\2)" MAKEFILE_INC_TEXT ${MAKEFILE_INC_TEXT})
-  string(REPLACE "§!§" "\n" MAKEFILE_INC_TEXT ${MAKEFILE_INC_TEXT})

@webmaster128
Copy link
Contributor Author

webmaster128 commented Feb 21, 2017

I have no clue what the code is doing. I just tried to get rid of the non-UTF-8 char without changing behavior. Feel free to fix this differently.

@ligfx
Copy link
Contributor

ligfx commented Feb 21, 2017

Feel free to fix this differently.

@webmaster128 What do you mean? Are you asking me to put up another pull request instead of this one?

@webmaster128
Copy link
Contributor Author

What do you mean? Are you asking me to put up another pull request instead of this one?

I was just trying to say that I do not feel confident to do greater changes to code I did not write and do not understand what it is doing. So I'd be happy if someone provides a change that superseeds this one by providing a different solution to the same problem.

ligfx added a commit to ligfx/curl that referenced this pull request Feb 21, 2017
A solution that came out of discussion on curl#1271. The strange bytes are
used to mark escaped newlines so that they can be converted back after
processing. However, text editors and other programs tend to silently
change these characters in CMakeLists.txt or complain. This is a
cosmetic concern for generated files (which I assume no-one reads), so
let's remove it entirely.
@ligfx
Copy link
Contributor

ligfx commented Feb 21, 2017

@webmaster128 In that case, I've opened up #1275 with my proposed solution (removing the strange bytes entirely).

@webmaster128
Copy link
Contributor Author

Closing in favour of #1275

ligfx added a commit to ligfx/curl that referenced this pull request Feb 24, 2017
A solution that came out of discussion on curl#1271. The strange bytes are
used to mark escaped newlines so that they can be converted back after
processing. However, text editors and other programs tend to silently
change these characters in CMakeLists.txt or complain. This is a
cosmetic concern for generated files (which I assume no-one reads), so
let's remove it entirely.
jay pushed a commit that referenced this pull request Feb 25, 2017
- Change the encoding of the regex temp placeholder token to UTF-8.

Prior to this change the file contained special chars in a different
encoding than ASCII or UTF-8 making text editors and Python complain
when reading the file.

Closes #1271
Closes #1275
@lock lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

Successfully merging this pull request may close these issues.

4 participants