Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An implementation for 0168-multi-line-string-literals.md #8813

Merged
merged 39 commits into from Apr 26, 2017

Conversation

@johnno1962
Copy link
Contributor

@johnno1962 johnno1962 commented Apr 17, 2017

A prototype implementation for the proposal 0168 as discussed on the swift evolution thread over the last week. It has been tested inside the Xcode source editor is completely functional as a reference implementation bar any changes that would be need to be made to other parts of the toolchain which seem to be minimal. There is a new test file.

Resolves SR-170.

// work back from the end to find whitespace to strip
while (start > Bytes.begin() && isWhitespace(start[-1])) {
if (*--start == '\n' || *start == '\r') {
if (start[-1] == '\r')
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right, I think: \r\r is two classic Mac newlines, not one. Surely, what you want is this:

switch (*--start) {
case '\n':
  if (start[-1] == '\r')
    --start;
  LLVM_FALLTHROUGH;
case '\r':
  return std::string(start, end-start);
}

Loading

@@ -397,12 +402,13 @@ class Lexer {
/// If a copy needs to be made, it will be allocated out of the provided
/// Buffer.
static StringRef getEncodedStringSegment(StringRef Str,
SmallVectorImpl<char> &Buffer);
SmallVectorImpl<char> &Buffer,
unsigned Modifiers = 0, const std::string &ToReplace = "");
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 80-character lines, here and throughout.

Loading

@@ -505,6 +511,9 @@ class Lexer {
/// Try to lex conflict markers by checking for the presence of the start and
/// end of the marker in diff3 or Perforce style respectively.
bool tryLexConflictMarker();

// new for multiline string literals
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: capitalize and punctuate comments, here and throughout.

Loading

case '\n': // String literals cannot have \n or \r in them.
case '\r':
if (Modifiers & StringLiteralMultiline) // ... unless they are mutli-line
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, typo. (Also, decide if you want to write "multi-line" or "multiline" throughout, and camel-case accordingly in the code.)

Loading

@@ -1192,6 +1207,15 @@ unsigned Lexer::lexCharacter(const char *&CurPtr, char StopQuote,
case '"': ++CurPtr; return '"';
case '\'': ++CurPtr; return '\'';
case '\\': ++CurPtr; return '\\';
case '\n':
LLVM_FALLTHROUGH;
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per existing code in these files, no need for LLVM_FALLTHROUGH; when your case is otherwise empty.

Loading

}

/// determine contents of literal to be normalised - either
/// to strip indenting or normalise line endings to a single \n
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: while you're capitalizing and re-wrapping these comments to 80 characters, U.S. English spellings. (Yes, I know; I'm Canadian, but them's the rules...)

Loading

}

// are there windows line endings in the source, if so return it to be replaced
const char *windowsLinesep = strnstr(Bytes.begin(), "\r\n", Bytes.end()-Bytes.begin());
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit and a more substantive comment.

  • Nit: windowsLineSeparator (capitalize).
  • More substantive comment: strnstr isn't portable, is it? Ironic that this won't work on Windows...

Loading

while ((BytesPtr = (const char *)memchr(BytesPtr, '\n', Bytes.end()-BytesPtr)) != nullptr) {
const char *NextPtr = BytesPtr + 1;
if (*NextPtr != '\n' && *NextPtr != '\r') {
if (BytesPtr[-1] == '\r')
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment about how this treats \r\r.

Loading

@@ -1324,8 +1404,15 @@ void Lexer::lexStringLiteral() {
// NOTE: We only allow single-quote string literals so we can emit useful
// diagnostics about changing them to double quotes.

bool wasErroneous = false;

bool wasErroneous = false, wasWhitespace = false, allWhitespace = true;
Copy link
Collaborator

@xwu xwu Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wasAllWhitespace, surely.

Loading

@kubamracek kubamracek self-requested a review Apr 20, 2017
@ematejska ematejska requested a review from milseman Apr 21, 2017
// CHECK: -2-
print("-2-")
// SKIP-CHECK-NEXT: <"Two Beta">
//print(delimit(""""Two Beta""""))
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Loading

Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for two quotes in the middle of the string?

Loading

BytesPtr = NextPtr;
}
}

/// lexStringLiteral:
/// string_literal ::= ["]([^"\\\n\r]|character_escape)*["]
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment

Loading

// Strips any indent that corresponds to the indent
// of the multi-line string terminating line and
// normalises line endings in the source to \n.
// It also removes any intial empty line.
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap comments at 80 lines.

Loading

bool wasErroneous = false, wasWhitespace = false, allWhitespace = true;
unsigned Modifiers = 0;

// is this the start of a multiline string litersl
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Loading

// CHECK: -4-
print("-4-")
// CHECK-NEXT: <FourDelta>
print(delimit("""Four\
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on swift-evolution, this should be disallowed.

Loading


// CHECK: -14-
print("-14-")
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an error instead of a warning.

Loading


// CHECK: -11-
print("-11-")
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an error.

Loading

// CHECK: -12-
print("-12-")
// Note: The next few tests use physical tab characters, not spaces.
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an error.

Loading


// CHECK: -13-
print("-13-")
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation
Copy link
Contributor

@kubamracek kubamracek Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an error.

Loading

@kubamracek
Copy link
Contributor

@kubamracek kubamracek commented Apr 22, 2017

Hi, @johnno1962, thanks for sending this patch! Do you think you'll have time to address the review comments? I'd like to get this into Swift 4, and we're getting close to the deadline for it.

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 23, 2017

Thanks @xwu and @kubamracek for the comments which I have endevoured to address with this last commit. This should be compliant with the core teams decision except that it includes a small amount of code to implement a new proposal https://github.com/johnno1962c/swift-evolution/blob/master/proposals/0173-newline-escape-in-strings.md to allow escaping of newlines in all strings. This code is around line 1212 of Lexer.cpp if you want to remove it.

There are limitations to the implementation in that while it will normalise end-of-line to \n for sources that use \n, \r\n or \r as the line separator, they can not be mixed within a literal and have this work.

Loading

@kubamracek
Copy link
Contributor

@kubamracek kubamracek commented Apr 23, 2017

@johnno1962, could you extract the newline escaping feature into a separate pull request? Since it's a separate proposal that wasn't accepted yet, we shouldn't block merging this PR on the other feature.

Loading

Copy link
Collaborator

@xwu xwu left a comment

Lots of nits, sorry.

My main feedback would be that I'd like to see more tests. It'd be very important to guarantee that your line normalization code does not strip manually escaped \r\n (or, for that matter, \r + literal newline).

Also, what can be done about this limitation as to normalization of mixed newlines?

Loading

ERROR(lex_unicode_escape_braces,none,
"expected hexadecimal code in braces after unicode escape", ())
ERROR(lex_illegal_multiline_string_start,none,
"inavlid start of multi-line string literal", ())
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Loading

ERROR(lex_illegal_multiline_string_start,none,
"inavlid start of multi-line string literal", ())
ERROR(lex_illegal_multiline_string_end,none,
"inavlid end of multi-line string literal", ())
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Loading

ERROR(lex_ambiguous_string_indent,none,
"invalid mix of multi-line string literal indentation", ())
WARNING(lex_trailing_multiline_whitespace,none,
"includes trailing space characters in multi-line string literal", ())
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning on trailing whitespace is not a part of the approved proposal; please also split into separate PR.

Loading

@@ -505,6 +512,9 @@ class Lexer {
/// Try to lex conflict markers by checking for the presence of the start and
/// end of the marker in diff3 or Perforce style respectively.
bool tryLexConflictMarker();

/// New for multiline string literals
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should have punctuation; also, please describe what it does and not just label it as new.

Loading

@@ -46,6 +46,9 @@ class Token {
/// \brief Whether this token is an escaped `identifier` token.
unsigned EscapedIdentifier : 1;

/// modifiers for string literals
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: capitalize and punctuate.

Loading

Segments.push_back(
StringSegment::getLiteral(getSourceLoc(SegmentStartPtr),
Bytes.end()-SegmentStartPtr));
Bytes.end()-SegmentStartPtr, Modifiers, ToReplace));
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80-character lines please.

Loading


// ===---------- Done --------===
// CHECK-NEXT: Done.
print("Done.")
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more tests to ensure that the errors and warnings you have added behave as intended?

Some other ideas:

  • Check that only one leading newline and one trailing newline is stripped.
  • Check that all other newlines are preserved, including multiple consecutive ones.
  • Check that manually escaped \t, \r, \n, \\ are all possible and correctly lexed, including at the end of a line.
  • Check that a manually escaped \r\n is not normalized to \n.
  • Check that string literal interpolation works correctly.
  • Check that invalid ragged leading indents trigger the expected error.
  • Check that escaping \""" works correctly.

Also, I would like to see, for the purposes of this particular implementation, that end-of-line newlines after \ are preserved and that trailing whitespace is correctly preserved.

Loading

Copy link
Contributor Author

@johnno1962 johnno1962 Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xwu I’ve actioned most of your nits. “wasAllWhitespace” is there to be able to generate errors when there is non-whitespace before the closing delimiter. @kubamracek this commit should be in line with the proposal as accepted. I’ll put newline escapes back in as a separate PR if required. Looking at more tests while the toolchain builds.

Loading

Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo left :)

Why can find-and-replace operations not simply replace all literal \r with \n and all literal \r\n with \n?

Loading

Copy link
Contributor Author

@johnno1962 johnno1962 Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do it the other way around I suppose. It tries to do all the replacing including indent stripping with a single loop due to how the code developed. Seems OK as it is apart from exotic mixes of line endings inside a single literal.

Loading

Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the logic behind doing indent stripping in a single loop, because you need to get to the end of the literal before you know how much to strip. But, without having thought too deeply, it would seem that normalizing line endings can happen line-by-line as you go.

Loading

Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and your logic isn't stripping leading whitespace from escaped newlines, is it?

  """
  \n  \n
  """

...should give "\n \n" and not "\n\n".

Loading

Copy link
Contributor Author

@johnno1962 johnno1962 Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent stripping is done on program text not expanded escapes which happens afterwards

Loading

Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I assumed so--just wanted to check.

Loading

std::string ToReplace;

static StringSegment getLiteral(SourceLoc Loc, unsigned Length,
unsigned Modifiers, const std::string &ToReplace) {
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80-character lines, please.

Loading

@@ -495,7 +502,7 @@ class Lexer {
static unsigned lexUnicodeEscape(const char *&CurPtr, Lexer *Diags);

unsigned lexCharacter(const char *&CurPtr,
char StopQuote, bool EmitDiagnostics);
char StopQuote, bool EmitDiagnostics, unsigned Modifiers = 0);
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80-character lines, please.

Loading

@@ -273,11 +276,17 @@ class Token {
void setText(StringRef T) { Text = T; }

/// \brief Set the token to the specified kind and source range.
void setToken(tok K, StringRef T, unsigned CommentLength = 0) {
void setToken(tok K, StringRef T, unsigned CommentLength = 0, unsigned Modifiers = 0) {
Copy link
Collaborator

@xwu xwu Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80-character lines, please.

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 25, 2017

Bah, you're too modest. I'll use that as a starting point ;-)

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 25, 2017

Can it wait until tomorrow? I have a better diagnostic on the way for "invalid end of multi-line string literal” (small change I’ve committed it but it is untested)

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 25, 2017

@johnno1962 let me know when you're ready for a final test and then merge

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 25, 2017

Sure, sounds good

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 25, 2017

If what I’ve just committed builds then we’re good to go - can you run the test?

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 25, 2017

@swift-ci please smoke test

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 26, 2017

One more time 🙄

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 26, 2017

@johnno1962

/Users/buildnode/jenkins/workspace/swift-PR-osx-smoke-test/branch-master/swift/lib/Parse/Lexer.cpp:1365:25: error: use of undeclared identifier 'getSourceLoc'; did you mean 'swift::Lexer::getSourceLoc'?
        Diags->diagnose(getSourceLoc(start),
                        ^~~~~~~~~~~~
                        swift::Lexer::getSourceLoc
/Users/buildnode/jenkins/workspace/swift-PR-osx-smoke-test/branch-master/swift/include/swift/Parse/Lexer.h:433:20: note: 'swift::Lexer::getSourceLoc' declared here
  static SourceLoc getSourceLoc(const char *Loc) {
                   ^
1 error generated.

BTW, you can get a quicker turn around time by running "ninja swift" in your local swift build directory to just build the compiler (without the standard library).

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 26, 2017

@swift-ci please smoke test

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 26, 2017

Thanks for the tip. I’ve been building toolchains. Is there an easy way to build a toolchain without building all architectures?

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 26, 2017

Nothing supported that I know of. Of course, you can intercept the commands and hack something together for yourself, but that's pretty dirty.

If you're just iterating fast on a .cpp file, "ninja swift" will turn around in about 20 seconds or so. You can then use e.g. ./bin/swift -frontend -parse -verify foo.swift for the error message checking for a test.

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 26, 2017

Toolchains weren't so bad. it’s just that if the day changes you have to rebuild llvm + clang. Looks like tests are ok. Merge away 👍

Loading

@milseman milseman merged commit 981e706 into apple:master Apr 26, 2017
2 checks passed
Loading
@milseman
Copy link
Contributor

@milseman milseman commented Apr 26, 2017

🍾

Loading

@milseman
Copy link
Contributor

@milseman milseman commented Apr 26, 2017

SR for diagnostics improvements: https://bugs.swift.org/browse/SR-4701

@kubamracek do you have an SR for multi-line literals inside of interpolations?

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 26, 2017

Thanks @milseman, @kubamracek and particularly @xwu for all your help. Final toolchain:
http://johnholdsworth.com/swift-LOCAL-2017-04-27-a-osx.tar.gz

Loading

@kubamracek
Copy link
Contributor

@kubamracek kubamracek commented Apr 26, 2017

https://bugs.swift.org/browse/SR-4708: Add support for multiline strings inside string interpolations

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented Apr 26, 2017

@kubamracek I’ve just opened PR #9049 for multiline inside interpolations

Loading

@DevAndArtist
Copy link
Contributor

@DevAndArtist DevAndArtist commented May 11, 2017

Was the warning about trailing whitespaces removed? 😞

let s = """
	abc                                                               
	"""

print(s.characters.count) // prints 66

Loading

@milseman
Copy link
Contributor

@milseman milseman commented May 11, 2017

What would be the workaround if trailing whitespace is desired?

Loading

@DevAndArtist
Copy link
Contributor

@DevAndArtist DevAndArtist commented May 11, 2017

@milseman apple/swift-evolution#695

Without the trailing backslash you'd need something like \("").

Loading

@DevAndArtist
Copy link
Contributor

@DevAndArtist DevAndArtist commented May 11, 2017

The example from above is actually this:

let s = """
	abc                                                               \("")
	"""

print(s.characters.count) // prints 66

which has a visible indication that trailing whitespaces are involved. Ideally we still need the trailing backslash.

Loading

@xwu
Copy link
Collaborator

@xwu xwu commented May 11, 2017

As already discussed on swift-evolution, the accepted proposal does not include warnings about trailing whitespace. This PR correctly implements the proposal as accepted.

Loading

@DevAndArtist
Copy link
Contributor

@DevAndArtist DevAndArtist commented May 12, 2017

@xwu I bet you've included all the points the core team mentioned in the accepted thread, which cases should be errors and which should be allowed, like for instance the blank line without any indent, but you seem exclusively pick things that you like and silently ignore things that you don't like the same way you did during the discussion thread. I'm not being offensive by any means, but I' highly critical about that.

https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20170417/035931.html

That seems like a reasonable thing to warn about. [...]

-Joe

https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20170417/035934.html

I don't want to sit around and watch how we'll introduce a half baked multi-line string literal.

Loading

@DevAndArtist
Copy link
Contributor

@DevAndArtist DevAndArtist commented May 12, 2017

In case you insist for a bug issue: https://bugs.swift.org/browse/SR-4874

Loading

@johnno1962
Copy link
Contributor Author

@johnno1962 johnno1962 commented May 12, 2017

Adrian, you have a subjective opinion that trailing whitespace is critical that didn’t win over the majority of the thread on this proposal. It seems like you're a bit confused as to the idea behind my follow up proposal about newline escapes (elided newlines) I'd not intended it to have anything to do with helping make explicit trailing whitespace.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants