Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of raw string literals #1304

Merged
merged 50 commits into from
Jun 22, 2022
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
a4ff50d
test cases for raw string literals
SlaterLatiao May 31, 2022
a31dda2
raw string literal implementation
SlaterLatiao May 31, 2022
d3be382
match as block string if starting with triple ", and better error mes…
SlaterLatiao Jun 1, 2022
ece0b7f
fix broken test case
SlaterLatiao Jun 1, 2022
4d56ea6
test cases for raw string literals
SlaterLatiao May 31, 2022
7a32eec
raw string literal implementation
SlaterLatiao May 31, 2022
5745c55
match as block string if starting with triple ", and better error mes…
SlaterLatiao Jun 1, 2022
d5f69af
fix broken test case
SlaterLatiao Jun 1, 2022
75bcbcf
Merge branch 'raw_string' of github.com:SlaterLatiao/carbon-lang into…
SlaterLatiao Jun 1, 2022
ac735be
removed unused initial value
SlaterLatiao Jun 3, 2022
ebdf8f7
rename flag to indicate multi-line string and remove comment
SlaterLatiao Jun 3, 2022
743aed6
use * to get value from std::optional
SlaterLatiao Jun 3, 2022
d509aa5
clean-ups
SlaterLatiao Jun 3, 2022
d51377d
removed skip_scan flag and directly return in case of a single line s…
SlaterLatiao Jun 3, 2022
855fe32
Updated error message: simple string -> single-line string.
SlaterLatiao Jun 3, 2022
17bc3cf
Updated test cases according to changes in error message
SlaterLatiao Jun 3, 2022
9ac7418
Removed counting_hashtag flag.
SlaterLatiao Jun 3, 2022
43ab9a6
Implemented ScanHelper class to handle scanning
SlaterLatiao Jun 4, 2022
750b034
Fixed explanation of ReadHashTags.
SlaterLatiao Jun 4, 2022
266359a
Addressed PR comment.
SlaterLatiao Jun 4, 2022
8437c5c
Clarify that scan_helper holds the source text.
SlaterLatiao Jun 6, 2022
51b2af9
Addressed PR comments.
SlaterLatiao Jun 7, 2022
b5791ad
Updated error messages in test cases.
SlaterLatiao Jun 7, 2022
1914495
Added const keyword to return type of GetCurrentStr().
SlaterLatiao Jun 7, 2022
4acf2ae
addressed PR comments.
SlaterLatiao Jun 7, 2022
c8dcc8b
Addressed PR comments.
SlaterLatiao Jun 7, 2022
46ed305
Removed the multi_line flag and skip_read field to improve readability.
SlaterLatiao Jun 8, 2022
cb0039b
Merge branch 'raw_string' of github.com:SlaterLatiao/carbon-lang into…
SlaterLatiao Jun 8, 2022
9cf8448
Copied default parameter value to definition of UnescapeStringLiteral.
SlaterLatiao Jun 13, 2022
54c46c1
Copied default parameter value to definition of ParseBlockStringLiteral.
SlaterLatiao Jun 13, 2022
78da5b9
Prefix CARBON_ to SIMPLE_TOKEN and ARG_TOKEN macros.
SlaterLatiao Jun 13, 2022
3c2e90d
Merge branch 'raw_string' of github.com:SlaterLatiao/carbon-lang into…
SlaterLatiao Jun 13, 2022
70709cd
Rollback redefinition of arguments.
SlaterLatiao Jun 13, 2022
6f45efc
Updated comment on the flex macro.
SlaterLatiao Jun 14, 2022
00401d8
Updated wording.
SlaterLatiao Jun 14, 2022
65facf5
Moved the EOF error out of the loop.
SlaterLatiao Jun 14, 2022
0c91724
Removed duplicated declaration.
SlaterLatiao Jun 14, 2022
f8e8054
Changed type of `hashtag_num` and `leading_quotes` to int.
SlaterLatiao Jun 14, 2022
7cc8cbb
Minor fix: string copy.
SlaterLatiao Jun 17, 2022
a45fd15
Added comment on YyinputWrapper.
SlaterLatiao Jun 17, 2022
24d3149
Garmmar in comment.
SlaterLatiao Jun 17, 2022
ec4477b
Added check of eof before readling next char.
SlaterLatiao Jun 17, 2022
1346f92
Minor updates based on PR comments.
SlaterLatiao Jun 21, 2022
bb63820
Minor changes to address PR comments.
SlaterLatiao Jun 21, 2022
3a8d488
Used a clearer way to calculate `hashtag_num` and `leading_quotes`. S…
SlaterLatiao Jun 21, 2022
aa6e246
Directly copy StringRef for compilation error message.
SlaterLatiao Jun 22, 2022
6a77fea
Make str_with_quote const as we don't change it.
SlaterLatiao Jun 22, 2022
eae97d5
Added TODO for unsupported cases.
SlaterLatiao Jun 22, 2022
2aba1f6
Merged upstream trunk into raw_string.
SlaterLatiao Jun 22, 2022
4e238e8
Fixed a typo.
SlaterLatiao Jun 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
138 changes: 69 additions & 69 deletions common/string_helpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,87 +27,86 @@ static auto FromHex(char c) -> std::optional<char> {
return std::nullopt;
}

auto UnescapeStringLiteral(llvm::StringRef source, bool is_block_string)
-> std::optional<std::string> {
auto UnescapeStringLiteral(llvm::StringRef source, const int hashtag_num,
bool is_block_string) -> std::optional<std::string> {
std::string ret;
ret.reserve(source.size());
std::string escape = "\\";
escape.resize(hashtag_num + 1, '#');
size_t i = 0;
while (i < source.size()) {
char c = source[i];
switch (c) {
case '\\':
++i;
if (i == source.size()) {
return std::nullopt;
}
switch (source[i]) {
case 'n':
ret.push_back('\n');
break;
case 'r':
ret.push_back('\r');
break;
case 't':
ret.push_back('\t');
break;
case '0':
if (i + 1 < source.size() && llvm::isDigit(source[i + 1])) {
// \0[0-9] is reserved.
return std::nullopt;
}
ret.push_back('\0');
break;
case '"':
ret.push_back('"');
break;
case '\'':
ret.push_back('\'');
break;
case '\\':
ret.push_back('\\');
break;
case 'x': {
i += 2;
if (i >= source.size()) {
return std::nullopt;
}
std::optional<char> c1 = FromHex(source[i - 1]);
std::optional<char> c2 = FromHex(source[i]);
if (c1 == std::nullopt || c2 == std::nullopt) {
return std::nullopt;
}
ret.push_back(16 * *c1 + *c2);
break;
if (i + hashtag_num < source.size() &&
source.slice(i, i + hashtag_num + 1).equals(escape)) {
i += hashtag_num + 1;
if (i == source.size()) {
return std::nullopt;
}
switch (source[i]) {
case 'n':
ret.push_back('\n');
break;
case 'r':
ret.push_back('\r');
break;
case 't':
ret.push_back('\t');
break;
case '0':
if (i + 1 < source.size() && llvm::isDigit(source[i + 1])) {
// \0[0-9] is reserved.
return std::nullopt;
}
ret.push_back('\0');
break;
case '"':
ret.push_back('"');
break;
case '\'':
ret.push_back('\'');
break;
case '\\':
ret.push_back('\\');
break;
case 'x': {
i += 2;
if (i >= source.size()) {
return std::nullopt;
}
case 'u':
CARBON_FATAL() << "\\u is not yet supported in string literals";
case '\n':
if (!is_block_string) {
return std::nullopt;
}
break;
default:
// Unsupported.
std::optional<char> c1 = FromHex(source[i - 1]);
std::optional<char> c2 = FromHex(source[i]);
if (c1 == std::nullopt || c2 == std::nullopt) {
return std::nullopt;
}
ret.push_back(16 * *c1 + *c2);
break;
}
break;

case '\t':
// Disallow non-` ` horizontal whitespace:
// https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/whitespace.md
// TODO: This doesn't handle unicode whitespace.
return std::nullopt;

default:
ret.push_back(c);
break;
case 'u':
CARBON_FATAL() << "\\u is not yet supported in string literals";
case '\n':
if (!is_block_string) {
return std::nullopt;
}
break;
default:
// Unsupported.
return std::nullopt;
}
} else if (c == '\t') {
// Disallow non-` ` horizontal whitespace:
// https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/whitespace.md
// TODO: This doesn't handle unicode whitespace.
return std::nullopt;
} else {
ret.push_back(c);
}
++i;
}
return ret;
}

auto ParseBlockStringLiteral(llvm::StringRef source) -> ErrorOr<std::string> {
auto ParseBlockStringLiteral(llvm::StringRef source, const int hashtag_num)
-> ErrorOr<std::string> {
llvm::SmallVector<llvm::StringRef> lines;
source.split(lines, '\n', /*MaxSplit=*/-1, /*KeepEmpty=*/true);
if (lines.size() < 2) {
Expand Down Expand Up @@ -150,8 +149,9 @@ auto ParseBlockStringLiteral(llvm::StringRef source) -> ErrorOr<std::string> {
}
// Unescaping with \n appended to handle things like \\<newline>.
llvm::SmallVector<char> buffer;
std::optional<std::string> unescaped = UnescapeStringLiteral(
(line + "\n").toStringRef(buffer), /*is_block_string=*/true);
std::optional<std::string> unescaped =
UnescapeStringLiteral((line + "\n").toStringRef(buffer), hashtag_num,
/*is_block_string=*/true);
if (!unescaped.has_value()) {
return Error("Invalid escaping in " + line);
}
Expand Down
6 changes: 4 additions & 2 deletions common/string_helpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@ namespace Carbon {
// Unescapes Carbon escape sequences in the source string. Returns std::nullopt
// on bad input. `is_block_string` enables escaping unique to block string
// literals, such as \<newline>.
auto UnescapeStringLiteral(llvm::StringRef source, bool is_block_string = false)
auto UnescapeStringLiteral(llvm::StringRef source, int hashtag_num = 0,
bool is_block_string = false)
-> std::optional<std::string>;

// Parses a block string literal in `source`.
auto ParseBlockStringLiteral(llvm::StringRef source) -> ErrorOr<std::string>;
auto ParseBlockStringLiteral(llvm::StringRef source, int hashtag_num = 0)
-> ErrorOr<std::string>;

// Returns true if the pointer is in the string ref (including equality with
// `ref.end()`). This should be used instead of `<=` comparisons for
Expand Down
8 changes: 8 additions & 0 deletions common/string_helpers_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ TEST(UnescapeStringLiteral, Valid) {
EXPECT_THAT(UnescapeStringLiteral("test\\\\n"), Optional(Eq("test\\n")));
EXPECT_THAT(UnescapeStringLiteral("\\xAA"), Optional(Eq("\xAA")));
EXPECT_THAT(UnescapeStringLiteral("\\x12"), Optional(Eq("\x12")));
EXPECT_THAT(UnescapeStringLiteral("test", 1), Optional(Eq("test")));
EXPECT_THAT(UnescapeStringLiteral("test\\#n", 1), Optional(Eq("test\n")));
}

TEST(UnescapeStringLiteral, Invalid) {
Expand All @@ -43,6 +45,7 @@ TEST(UnescapeStringLiteral, Invalid) {
EXPECT_THAT(UnescapeStringLiteral("\\xaa"), Eq(std::nullopt));
// Reserved.
EXPECT_THAT(UnescapeStringLiteral("\\00"), Eq(std::nullopt));
EXPECT_THAT(UnescapeStringLiteral("\\#00", 1), Eq(std::nullopt));
}

TEST(UnescapeStringLiteral, Nul) {
Expand Down Expand Up @@ -90,6 +93,11 @@ TEST(ParseBlockStringLiteral, FailInvalidEscaping) {
""")";
EXPECT_THAT(ParseBlockStringLiteral(Input).error().message(),
Eq("Invalid escaping in \\q"));
constexpr char InputRaw[] = R"("""
\#q
""")";
EXPECT_THAT(ParseBlockStringLiteral(InputRaw, 1).error().message(),
Eq("Invalid escaping in \\#q"));
}

TEST(ParseBlockStringLiteral, OkEmptyString) {
Expand Down
3 changes: 3 additions & 0 deletions explorer/syntax/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ cc_library(
cc_library(
name = "syntax",
srcs = [
"lex_helper.h",
"lex_scan_helper.cpp",
"lex_scan_helper.h",
"lexer.cpp",
"lexer.h",
"parse.cpp",
Expand Down
25 changes: 25 additions & 0 deletions explorer/syntax/lex_helper.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

#ifndef CARBON_EXPLORER_SYNTAX_LEX_HELPER_H_
#define CARBON_EXPLORER_SYNTAX_LEX_HELPER_H_

// Flex expands this macro immediately before each action.
//
// Advances the current token position by yyleng columns without changing
// the line number, and takes us out of the after-whitespace / after-operand
// state.
#define YY_USER_ACTION \
context.current_token_position.columns(yyleng); \
if (YY_START == AFTER_WHITESPACE || YY_START == AFTER_OPERAND) { \
BEGIN(INITIAL); \
}

#define CARBON_SIMPLE_TOKEN(name) \
Carbon::Parser::make_##name(context.current_token_position);

#define CARBON_ARG_TOKEN(name, arg) \
Carbon::Parser::make_##name(arg, context.current_token_position);

#endif // CARBON_EXPLORER_SYNTAX_LEX_HELPER_H_
68 changes: 68 additions & 0 deletions explorer/syntax/lex_scan_helper.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

#include "explorer/syntax/lex_scan_helper.h"

#include "common/string_helpers.h"
#include "explorer/syntax/lex_helper.h"
#include "llvm/Support/FormatVariadic.h"

namespace Carbon {

auto StringLexHelper::Advance() -> bool {
SlaterLatiao marked this conversation as resolved.
Show resolved Hide resolved
CARBON_CHECK(is_eof_ == false);
const char c = YyinputWrapper(yyscanner_);
if (c <= 0) {
SlaterLatiao marked this conversation as resolved.
Show resolved Hide resolved
context_.RecordSyntaxError("Unexpected end of file");
is_eof_ = true;
return false;
}
str_.push_back(c);
return true;
}

auto ReadHashTags(Carbon::StringLexHelper& scan_helper,
const size_t hashtag_num) -> bool {
for (size_t i = 0; i < hashtag_num; ++i) {
if (!scan_helper.Advance() || scan_helper.last_char() != '#') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backwards from what I expect:

Suggested change
if (!scan_helper.Advance() || scan_helper.last_char() != '#') {
if (scan_helper.last_char() != '#' || !scan_helper.Advance()) {

That way only # characters would be consumed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advance() needs to be called before calling ReadHashTags() if hashtags are checked first. It is still possible to consume a non # char after switching the order.

return false;
}
}
return true;
}

auto ProcessSingleLineString(llvm::StringRef str,
Carbon::ParseAndLexContext& context,
const size_t hashtag_num)
-> Carbon::Parser::symbol_type {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save a copy of str for error messages before you consume the front and back. Also for ProcessMultiLineString.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copying str (the parameter type of str is changed to llvm::StringRef to avoid such copies), the string used for error message will be reconstructed by prepending and appending the quotes. The hashtags are not added, to be consistent with ProcessMultiLineString. The error messages in ProcessMultiLineString are handled in ParseBlockStringLiteral, where the hashtags are already removed when calling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying a llvm::StringRef should be cheap, and not involve copying the string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with copy of llvm::StringRef.

std::string hashtags(hashtag_num, '#');
auto str_with_quote = str;
SlaterLatiao marked this conversation as resolved.
Show resolved Hide resolved
CARBON_CHECK(str.consume_front(hashtags + "\"") &&
str.consume_back("\"" + hashtags));

std::optional<std::string> unescaped =
Carbon::UnescapeStringLiteral(str, hashtag_num);
if (unescaped == std::nullopt) {
return context.RecordSyntaxError(
llvm::formatv("Invalid escaping in string: {0}", str_with_quote));
}
return CARBON_ARG_TOKEN(string_literal, *unescaped);
}

auto ProcessMultiLineString(llvm::StringRef str,
Carbon::ParseAndLexContext& context,
const size_t hashtag_num)
-> Carbon::Parser::symbol_type {
std::string hashtags(hashtag_num, '#');
CARBON_CHECK(str.consume_front(hashtags) && str.consume_back(hashtags));
Carbon::ErrorOr<std::string> block_string =
Carbon::ParseBlockStringLiteral(str, hashtag_num);
if (!block_string.ok()) {
return context.RecordSyntaxError(llvm::formatv(
"Invalid block string: {0}", block_string.error().message()));
}
return CARBON_ARG_TOKEN(string_literal, *block_string);
}

} // namespace Carbon
58 changes: 58 additions & 0 deletions explorer/syntax/lex_scan_helper.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

#ifndef CARBON_EXPLORER_SYNTAX_LEX_SCAN_HELPER_H_
#define CARBON_EXPLORER_SYNTAX_LEX_SCAN_HELPER_H_

#include <string>

#include "explorer/syntax/parse_and_lex_context.h"
#include "explorer/syntax/parser.h"

// Exposes yyinput; defined in lexer.lpp.
extern auto YyinputWrapper(yyscan_t yyscanner) -> int;
SlaterLatiao marked this conversation as resolved.
Show resolved Hide resolved

namespace Carbon {

class StringLexHelper {
josh11b marked this conversation as resolved.
Show resolved Hide resolved
public:
StringLexHelper(const char* text, yyscan_t yyscanner,
Carbon::ParseAndLexContext& context)
: str_(text), yyscanner_(yyscanner), context_(context), is_eof_(false) {}
// Advances yyscanner by one char. Sets is_eof to true and returns false on
// EOF.
auto Advance() -> bool;
// Returns the last scanned char.
auto last_char() -> char { return str_.back(); };
// Returns the scanned string.
auto str() -> const std::string& { return str_; };

auto is_eof() -> bool { return is_eof_; };

private:
std::string str_;
yyscan_t yyscanner_;
Carbon::ParseAndLexContext& context_;
// Skips reading next char.
bool is_eof_;
};

// Tries to Read `hashtag_num` hashtags. Returns true on success.
// Reads `hashtag_num` characters on success, and number of consecutive hashtags
// (< `hashtag_num`) + 1 characters on failure.
auto ReadHashTags(Carbon::StringLexHelper& scan_helper, size_t hashtag_num)
-> bool;

// Removes quotes and escapes a single line string. Reports an error on
// invalid escaping.
auto ProcessSingleLineString(llvm::StringRef str,
Carbon::ParseAndLexContext& context,
size_t hashtag_num) -> Carbon::Parser::symbol_type;
auto ProcessMultiLineString(llvm::StringRef str,
Carbon::ParseAndLexContext& context,
size_t hashtag_num) -> Carbon::Parser::symbol_type;

} // namespace Carbon

#endif // CARBON_EXPLORER_SYNTAX_LEX_SCAN_HELPER_H_