Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parser] Detect non-breaking space (U+00A0) and offer a fix-it #16090

Merged
merged 5 commits into from Apr 24, 2018

Conversation

@hashemi
Copy link
Contributor

hashemi commented Apr 22, 2018

Detect Unicode non-breaking space character (U+00A0) in the lexer, diagnose it, and offer a fix-it turning it into a regular space. Also treat the non-breaking space as whitespace when considering whether an operator is left/right bound.

Resolves SR-7122.

@CodaFi CodaFi requested a review from jrose-apple Apr 22, 2018
@hashemi hashemi force-pushed the hashemi:non-breaking-spaces branch to 9d28cd5 Apr 22, 2018
@@ -716,6 +722,12 @@ static bool isRightBound(const char *tokEnd, bool isLeftBound,
else
return true;

case '\302':
if (tokEnd[1] == '\240')

This comment has been minimized.

Copy link
@rintaro

rintaro Apr 23, 2018

Member

Could you make these '\xC2' and '\xA0'?

@@ -682,6 +682,12 @@ static bool isLeftBound(const char *tokBegin, const char *bufferBegin) {
else
return true;

case '\240':
if (tokBegin - 1 != bufferBegin && tokBegin[-2] == '\302')

This comment has been minimized.

Copy link
@rintaro

rintaro Apr 23, 2018

Member

ditto

} else if (Codepoint == 0x000000A0) {
// Non-breaking whitespace (U+00A0)
diagnose(CurPtr - 1, diag::lex_nonbreaking_space)
.fixItReplaceChars(getSourceLoc(CurPtr - 1), getSourceLoc(Tmp), " ");

This comment has been minimized.

Copy link
@rintaro

rintaro Apr 23, 2018

Member

Nitpick: indent.

Copy link
Member

rintaro left a comment

This looks great! Just a few code style requests.

@rintaro

This comment has been minimized.

Copy link
Member

rintaro commented Apr 23, 2018

Also, could you add tests for fix-it in test/Parse ?

@hashemi

This comment has been minimized.

Copy link
Contributor Author

hashemi commented Apr 23, 2018

Thanks @rintaro, all suggestions incorporated.
I made a new file for the parse tests. Is there a better spot? Recovery.swift is another possibility.

@rintaro

This comment has been minimized.

Copy link
Member

rintaro commented Apr 24, 2018

I think dedicated test file is OK. Since this test is handling invisible characters, it's more safe from accidental (automatic) edits.
@swift-ci Please smoke test

@@ -716,6 +722,12 @@ static bool isRightBound(const char *tokEnd, bool isLeftBound,
else
return true;

case '\xC2':
if (tokEnd[1] == '\xA0')

This comment has been minimized.

Copy link
@hashemi

hashemi Apr 24, 2018

Author Contributor

Can you confirm that this dereference is safe? Can we assume that the buffer is always null-terminated?

This comment has been minimized.

@rintaro rintaro merged commit 06c528d into apple:master Apr 24, 2018
2 checks passed
2 checks passed
Swift Test Linux Platform (smoke test)
Details
Swift Test OS X Platform (smoke test)
Details
@rintaro

This comment has been minimized.

Copy link
Member

rintaro commented Apr 24, 2018

Thank you Ahmad! Please mark the SR issue RESOLVED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.