-
Notifications
You must be signed in to change notification settings - Fork 372
Add support for captured groups in Find & Replace #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
(We're a bit behind because of all the new feedback! Thanks for doing this; we will get to the review as soon as we can!) |
src/buffer/mod.rs
Outdated
if !next_ch.is_ascii_digit() { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One feedback I can give early is that this is not quite correct, I believe. Try something like "$12345" in VS Code for instance. It'll only recognize "$1" as replacement but then use "2345" as a literal string.
I believe this requires us to recognize here how many capture groups we have and then replace those. This may be outside of the scope of what you have time for to implement (probably requires adding new ICU FFIs!), so please feel free to say so and we'll merge it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/buffer/mod.rs
Outdated
@@ -2378,6 +2380,61 @@ impl TextBuffer { | |||
pub fn read_forward(&self, off: usize) -> &[u8] { | |||
self.buffer.read_forward(off) | |||
} | |||
|
|||
/// Processes the replacement string when using regex for capture groups. | |||
fn get_regex_replacement<'a>(&mut self, replacement: &'a str) -> Cow<'a, str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can feel free to return an ArenString
or better yet, Vec<u8, &Arena>
here. You can find various places in this code that demonstrates how to do that. Allocating in the arena is very cheap and copying memory is as well. 🙂
This may help you below because then you won't need String::from_utf8_lossy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the function to return ArenaString
, but I couldn't find a way to remove String::from_utf8_lossy
without maybe changing too much other code, sorry. 😅
856c3bd
to
1f1da92
Compare
let mut status = icu_ffi::U_ZERO_ERROR; | ||
let start = unsafe { (f.uregex_start64)(self.0, group, &mut status) }; | ||
let end = unsafe { (f.uregex_end64)(self.0, group, &mut status) }; | ||
if status.is_failure() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhecker can uregex_start64
fail too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the nice property of ICU is that its functions short-circuit if you pass in an error status. So, if uregex_start64
fails, status
will contain a failure, uregex_end64
will short-circuit and then we check it. Makes some of this code really neat.
src/buffer/mod.rs
Outdated
// Unescape any escaped characters. | ||
while off < replacement.len() && replacement[off] == b'\\' { | ||
off += 2; | ||
text.push(match replacement.get(off - 1).map_or(b'\\', |&c| c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhecker we didn't check whether replacement
contained two characters, only that it contained at least one. what happens if it only contains one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can leave a comment there to make that logic a bit easier to understand. The tl;dr is that this should then result in map_or
returning b'\\'
, because off + 2 - 1
will be out of bounds.
Closes microsoft#111 Co-authored-by: Leonard Hecker <leonard@hecker.io>
Closes #111
I chose to use
$
for captured groups following VS Code's convention.I'm still fairly new to Rust, so feedback is appreciated.