Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of raw string literals #1304

Merged
merged 50 commits into from
Jun 22, 2022
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
a4ff50d
test cases for raw string literals
SlaterLatiao May 31, 2022
a31dda2
raw string literal implementation
SlaterLatiao May 31, 2022
d3be382
match as block string if starting with triple ", and better error mes…
SlaterLatiao Jun 1, 2022
ece0b7f
fix broken test case
SlaterLatiao Jun 1, 2022
4d56ea6
test cases for raw string literals
SlaterLatiao May 31, 2022
7a32eec
raw string literal implementation
SlaterLatiao May 31, 2022
5745c55
match as block string if starting with triple ", and better error mes…
SlaterLatiao Jun 1, 2022
d5f69af
fix broken test case
SlaterLatiao Jun 1, 2022
75bcbcf
Merge branch 'raw_string' of github.com:SlaterLatiao/carbon-lang into…
SlaterLatiao Jun 1, 2022
ac735be
removed unused initial value
SlaterLatiao Jun 3, 2022
ebdf8f7
rename flag to indicate multi-line string and remove comment
SlaterLatiao Jun 3, 2022
743aed6
use * to get value from std::optional
SlaterLatiao Jun 3, 2022
d509aa5
clean-ups
SlaterLatiao Jun 3, 2022
d51377d
removed skip_scan flag and directly return in case of a single line s…
SlaterLatiao Jun 3, 2022
855fe32
Updated error message: simple string -> single-line string.
SlaterLatiao Jun 3, 2022
17bc3cf
Updated test cases according to changes in error message
SlaterLatiao Jun 3, 2022
9ac7418
Removed counting_hashtag flag.
SlaterLatiao Jun 3, 2022
43ab9a6
Implemented ScanHelper class to handle scanning
SlaterLatiao Jun 4, 2022
750b034
Fixed explanation of ReadHashTags.
SlaterLatiao Jun 4, 2022
266359a
Addressed PR comment.
SlaterLatiao Jun 4, 2022
8437c5c
Clarify that scan_helper holds the source text.
SlaterLatiao Jun 6, 2022
51b2af9
Addressed PR comments.
SlaterLatiao Jun 7, 2022
b5791ad
Updated error messages in test cases.
SlaterLatiao Jun 7, 2022
1914495
Added const keyword to return type of GetCurrentStr().
SlaterLatiao Jun 7, 2022
4acf2ae
addressed PR comments.
SlaterLatiao Jun 7, 2022
c8dcc8b
Addressed PR comments.
SlaterLatiao Jun 7, 2022
46ed305
Removed the multi_line flag and skip_read field to improve readability.
SlaterLatiao Jun 8, 2022
cb0039b
Merge branch 'raw_string' of github.com:SlaterLatiao/carbon-lang into…
SlaterLatiao Jun 8, 2022
9cf8448
Copied default parameter value to definition of UnescapeStringLiteral.
SlaterLatiao Jun 13, 2022
54c46c1
Copied default parameter value to definition of ParseBlockStringLiteral.
SlaterLatiao Jun 13, 2022
78da5b9
Prefix CARBON_ to SIMPLE_TOKEN and ARG_TOKEN macros.
SlaterLatiao Jun 13, 2022
3c2e90d
Merge branch 'raw_string' of github.com:SlaterLatiao/carbon-lang into…
SlaterLatiao Jun 13, 2022
70709cd
Rollback redefinition of arguments.
SlaterLatiao Jun 13, 2022
6f45efc
Updated comment on the flex macro.
SlaterLatiao Jun 14, 2022
00401d8
Updated wording.
SlaterLatiao Jun 14, 2022
65facf5
Moved the EOF error out of the loop.
SlaterLatiao Jun 14, 2022
0c91724
Removed duplicated declaration.
SlaterLatiao Jun 14, 2022
f8e8054
Changed type of `hashtag_num` and `leading_quotes` to int.
SlaterLatiao Jun 14, 2022
7cc8cbb
Minor fix: string copy.
SlaterLatiao Jun 17, 2022
a45fd15
Added comment on YyinputWrapper.
SlaterLatiao Jun 17, 2022
24d3149
Garmmar in comment.
SlaterLatiao Jun 17, 2022
ec4477b
Added check of eof before readling next char.
SlaterLatiao Jun 17, 2022
1346f92
Minor updates based on PR comments.
SlaterLatiao Jun 21, 2022
bb63820
Minor changes to address PR comments.
SlaterLatiao Jun 21, 2022
3a8d488
Used a clearer way to calculate `hashtag_num` and `leading_quotes`. S…
SlaterLatiao Jun 21, 2022
aa6e246
Directly copy StringRef for compilation error message.
SlaterLatiao Jun 22, 2022
6a77fea
Make str_with_quote const as we don't change it.
SlaterLatiao Jun 22, 2022
eae97d5
Added TODO for unsupported cases.
SlaterLatiao Jun 22, 2022
2aba1f6
Merged upstream trunk into raw_string.
SlaterLatiao Jun 22, 2022
4e238e8
Fixed a typo.
SlaterLatiao Jun 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
137 changes: 69 additions & 68 deletions common/string_helpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,87 +27,87 @@ static auto FromHex(char c) -> std::optional<char> {
return std::nullopt;
}

auto UnescapeStringLiteral(llvm::StringRef source, bool is_block_string)
auto UnescapeStringLiteral(llvm::StringRef source,
const std::size_t hashtag_num, bool is_block_string)
-> std::optional<std::string> {
std::string ret;
std::string escape = "\\" + std::string(hashtag_num, '#');
ret.reserve(source.size());
SlaterLatiao marked this conversation as resolved.
Show resolved Hide resolved
size_t i = 0;
while (i < source.size()) {
char c = source[i];
switch (c) {
case '\\':
++i;
if (i == source.size()) {
return std::nullopt;
}
switch (source[i]) {
case 'n':
ret.push_back('\n');
break;
case 'r':
ret.push_back('\r');
break;
case 't':
ret.push_back('\t');
break;
case '0':
if (i + 1 < source.size() && llvm::isDigit(source[i + 1])) {
// \0[0-9] is reserved.
return std::nullopt;
}
ret.push_back('\0');
break;
case '"':
ret.push_back('"');
break;
case '\'':
ret.push_back('\'');
break;
case '\\':
ret.push_back('\\');
break;
case 'x': {
i += 2;
if (i >= source.size()) {
return std::nullopt;
}
std::optional<char> c1 = FromHex(source[i - 1]);
std::optional<char> c2 = FromHex(source[i]);
if (c1 == std::nullopt || c2 == std::nullopt) {
return std::nullopt;
}
ret.push_back(16 * *c1 + *c2);
break;
if (i + hashtag_num < source.size() &&
source.slice(i, i + hashtag_num + 1).equals(escape)) {
i += hashtag_num + 1;
if (i == source.size()) {
return std::nullopt;
}
switch (source[i]) {
case 'n':
ret.push_back('\n');
break;
case 'r':
ret.push_back('\r');
break;
case 't':
ret.push_back('\t');
break;
case '0':
if (i + 1 < source.size() && llvm::isDigit(source[i + 1])) {
// \0[0-9] is reserved.
return std::nullopt;
}
ret.push_back('\0');
break;
case '"':
ret.push_back('"');
break;
case '\'':
ret.push_back('\'');
break;
case '\\':
ret.push_back('\\');
break;
case 'x': {
i += 2;
if (i >= source.size()) {
return std::nullopt;
}
case 'u':
CARBON_FATAL() << "\\u is not yet supported in string literals";
case '\n':
if (!is_block_string) {
return std::nullopt;
}
break;
default:
// Unsupported.
std::optional<char> c1 = FromHex(source[i - 1]);
std::optional<char> c2 = FromHex(source[i]);
if (c1 == std::nullopt || c2 == std::nullopt) {
return std::nullopt;
}
ret.push_back(16 * *c1 + *c2);
break;
}
break;

case '\t':
// Disallow non-` ` horizontal whitespace:
// https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/whitespace.md
// TODO: This doesn't handle unicode whitespace.
return std::nullopt;

default:
ret.push_back(c);
break;
case 'u':
CARBON_FATAL() << "\\u is not yet supported in string literals";
case '\n':
if (!is_block_string) {
return std::nullopt;
}
break;
default:
// Unsupported.
return std::nullopt;
}
} else if (c == '\t') {
// Disallow non-` ` horizontal whitespace:
// https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/whitespace.md
// TODO: This doesn't handle unicode whitespace.
return std::nullopt;
} else {
ret.push_back(c);
}
++i;
}
return ret;
}

auto ParseBlockStringLiteral(llvm::StringRef source) -> ErrorOr<std::string> {
auto ParseBlockStringLiteral(llvm::StringRef source,
const std::size_t hashtag_num)
-> ErrorOr<std::string> {
llvm::SmallVector<llvm::StringRef> lines;
source.split(lines, '\n', /*MaxSplit=*/-1, /*KeepEmpty=*/true);
if (lines.size() < 2) {
Expand Down Expand Up @@ -150,8 +150,9 @@ auto ParseBlockStringLiteral(llvm::StringRef source) -> ErrorOr<std::string> {
}
// Unescaping with \n appended to handle things like \\<newline>.
llvm::SmallVector<char> buffer;
std::optional<std::string> unescaped = UnescapeStringLiteral(
(line + "\n").toStringRef(buffer), /*is_block_string=*/true);
std::optional<std::string> unescaped =
UnescapeStringLiteral((line + "\n").toStringRef(buffer), hashtag_num,
/*is_block_string=*/true);
if (!unescaped.has_value()) {
return Error("Invalid escaping in " + line);
}
Expand Down
8 changes: 6 additions & 2 deletions common/string_helpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,15 @@ namespace Carbon {
// Unescapes Carbon escape sequences in the source string. Returns std::nullopt
// on bad input. `is_block_string` enables escaping unique to block string
// literals, such as \<newline>.
auto UnescapeStringLiteral(llvm::StringRef source, bool is_block_string = false)
auto UnescapeStringLiteral(llvm::StringRef source,
const std::size_t hashtag_num = 0,
SlaterLatiao marked this conversation as resolved.
Show resolved Hide resolved
bool is_block_string = false)
-> std::optional<std::string>;

// Parses a block string literal in `source`.
auto ParseBlockStringLiteral(llvm::StringRef source) -> ErrorOr<std::string>;
auto ParseBlockStringLiteral(llvm::StringRef source,
const std::size_t hashtag_num = 0)
SlaterLatiao marked this conversation as resolved.
Show resolved Hide resolved
-> ErrorOr<std::string>;

// Returns true if the pointer is in the string ref (including equality with
// `ref.end()`). This should be used instead of `<=` comparisons for
Expand Down
8 changes: 8 additions & 0 deletions common/string_helpers_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ TEST(UnescapeStringLiteral, Valid) {
EXPECT_THAT(UnescapeStringLiteral("test\\\\n"), Optional(Eq("test\\n")));
EXPECT_THAT(UnescapeStringLiteral("\\xAA"), Optional(Eq("\xAA")));
EXPECT_THAT(UnescapeStringLiteral("\\x12"), Optional(Eq("\x12")));
EXPECT_THAT(UnescapeStringLiteral("test", 1), Optional(Eq("test")));
EXPECT_THAT(UnescapeStringLiteral("test\\#n", 1), Optional(Eq("test\n")));
}

TEST(UnescapeStringLiteral, Invalid) {
Expand All @@ -43,6 +45,7 @@ TEST(UnescapeStringLiteral, Invalid) {
EXPECT_THAT(UnescapeStringLiteral("\\xaa"), Eq(std::nullopt));
// Reserved.
EXPECT_THAT(UnescapeStringLiteral("\\00"), Eq(std::nullopt));
EXPECT_THAT(UnescapeStringLiteral("\\#00", 1), Eq(std::nullopt));
}

TEST(UnescapeStringLiteral, Nul) {
Expand Down Expand Up @@ -90,6 +93,11 @@ TEST(ParseBlockStringLiteral, FailInvalidEscaping) {
""")";
EXPECT_THAT(ParseBlockStringLiteral(Input).error().message(),
Eq("Invalid escaping in \\q"));
constexpr char InputRaw[] = R"("""
\#q
""")";
EXPECT_THAT(ParseBlockStringLiteral(InputRaw, 1).error().message(),
Eq("Invalid escaping in \\#q"));
}

TEST(ParseBlockStringLiteral, OkEmptyString) {
Expand Down