Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update regex literal lexing and emission #40595

Merged
merged 2 commits into from
Dec 18, 2021
Merged

Conversation

hamishknight
Copy link
Collaborator

Based on top of #40531, only the last commit is relevant.


Update the regex lexing implementation to defer to the regex library, which will pass back the pointer from which to resume lexing, and update the emission to call the new Regex(_regexString:version:) overload, that will accept the regex string with delimiters.

Because this uses the library's lexing implementation, the delimiters are now '/.../' and '|...|' instead of plain '...'.

@hamishknight
Copy link
Collaborator Author

@rintaro Mind taking a look at the lexer/parser changes?

Copy link
Member

@rxwei rxwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lib/Parse/Lexer.cpp Show resolved Hide resolved
lib/Parse/Lexer.cpp Outdated Show resolved Hide resolved
include/swift/Parse/ExperimentalRegexBridging.h Outdated Show resolved Hide resolved
rintaro
rintaro previously approved these changes Dec 16, 2021
Comment on lines 1982 to 1992
if (ErrStr) {
diagnose(TokStart, diag::regex_literal_parsing_error, ErrStr);
formToken(tok::unknown, TokStart);
Copy link
Member

@rintaro rintaro Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be some cases where we emit an error, but still want to form a regex literal token.

String literal has a FIXME where even if the end delimiter is missing, we should form a string literal and type check it, so that we can get additional semantic diagnostics, and more importantly, we can get code completion even inside malformed string interpolations. For example, consider "Name: \(user.#HERE# ", in this case, end delimiter is missing because it's consumed as the string interpolation, because ) is missing. But we still want code completion for user.

How about changing regexLiteralLexingFn like:

const char *Ptr = TokStart;
const char *ErrStr = nullptr;

// Success: status = SUCCESS, advances `Ptr`, ErrStr == nullptr..
// Error, but parsable: status = SUCCESS, advances `Ptr` and populate `ErrStr`. 
// Error, non parsable: status = FAILED, advances `Ptr` and populate `ErrStr`. 
// Total failure: status = FAILED, Ptr == TokStart, ErrStr == nullptr.

bool status = regexLiteralLexingFn(&Ptr, BufferEnd, &ErrStr)
if (ErrStr) {
  diagnose(TokStart, diag::regex_literal_parsing_error, ErrStr);
}
if (Ptr == TokStart) {
  return false;
}
assert(Ptr > TokStart);
formToken(status ? tok::unknown : tok::regex_literal, TokStart);
return true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rintaro How does this look?

@rintaro rintaro dismissed their stale review December 16, 2021 21:12

Approved by accident

@rxwei
Copy link
Member

rxwei commented Dec 17, 2021

I updated my PR to fix some build script issues so you might need to rebase this. Hopefully both PRs can be merged today.

…d regex.

- Checkout apple/swift-experimental-string-processing using a tag.
- Build `_MatchingEngine` as part of libswift (`ExperimentalRegex`) using sources from the package.
- Parse regex literals using the parser from `_MatchingEngine`.
- Build both `_MatchingEngine` and `_StringProcessing` as part of core libs using sources from the package.
- Use `Regex<DynamicCaptures>` as the default regex type until we finalize apple/swift-experimental-string-processing#68.
@apple apple deleted a comment from swift-ci Dec 17, 2021
@milseman
Copy link
Contributor

@swift-ci please test macOS platform

Update the lexing implementation to defer to the
regex library, which will pass back the pointer
from to resume lexing, and update the emission to
call the new `Regex(_regexString:version:)`
overload, that will accept the regex string with
delimiters.

Because this uses the library's lexing
implementation, the delimiters are now `'/.../'`
and `'|...|'` instead of plain `'...'`.
@hamishknight
Copy link
Collaborator Author

@swift-ci please test

@swift-ci
Copy link
Collaborator

Build failed
Swift Test OS X Platform
Git Sha - 128f5d4

@milseman
Copy link
Contributor

@swift-ci please test macOS platform

Copy link
Member

@rintaro rintaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lexer/Parser changes LGTM. Thank you!

@milseman milseman merged commit a67a043 into apple:main Dec 18, 2021
@hamishknight hamishknight deleted the straw-bales branch December 18, 2021 10:01
aschwaighofer added a commit to aschwaighofer/swift that referenced this pull request Dec 18, 2021
This reverts commit a67a043, reversing
changes made to 9965df7.

This commit or the earlier commit this commit is based on (apple#40531) broke the
incremental bot.
@aschwaighofer
Copy link
Member

Looks like this or the commit this is based on broke the incremental bot.

aschwaighofer added a commit that referenced this pull request Dec 19, 2021
…ing_and_emission

Revert "Merge pull request #40595 from hamishknight/straw-bales"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants