Skip to content

Add support for captured groups in Find & Replace #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 17, 2025

Conversation

viyic
Copy link
Contributor

@viyic viyic commented May 22, 2025

Closes #111

I chose to use $ for captured groups following VS Code's convention.

image
image

I'm still fairly new to Rust, so feedback is appreciated.

@viyic

This comment was marked as resolved.

@DHowett
Copy link
Member

DHowett commented May 23, 2025

(We're a bit behind because of all the new feedback! Thanks for doing this; we will get to the review as soon as we can!)

Comment on lines 2412 to 2414
if !next_ch.is_ascii_digit() {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One feedback I can give early is that this is not quite correct, I believe. Try something like "$12345" in VS Code for instance. It'll only recognize "$1" as replacement but then use "2345" as a literal string.

I believe this requires us to recognize here how many capture groups we have and then replace those. This may be outside of the scope of what you have time for to implement (probably requires adding new ICU FFIs!), so please feel free to say so and we'll merge it as is for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I didn't think that through 😅 I've added the ICU function to get group count and an additional check, and now it seems to work for that case.
Cuplikan layar 2025-05-23 111008
Cuplikan layar 2025-05-23 110952

@@ -2378,6 +2380,61 @@ impl TextBuffer {
pub fn read_forward(&self, off: usize) -> &[u8] {
self.buffer.read_forward(off)
}

/// Processes the replacement string when using regex for capture groups.
fn get_regex_replacement<'a>(&mut self, replacement: &'a str) -> Cow<'a, str> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can feel free to return an ArenString or better yet, Vec<u8, &Arena> here. You can find various places in this code that demonstrates how to do that. Allocating in the arena is very cheap and copying memory is as well. 🙂

This may help you below because then you won't need String::from_utf8_lossy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the function to return ArenaString, but I couldn't find a way to remove String::from_utf8_lossy without maybe changing too much other code, sorry. 😅

@viyic viyic force-pushed the captured_group_replacement branch from 856c3bd to 1f1da92 Compare May 23, 2025 05:06
@lhecker lhecker enabled auto-merge (squash) June 17, 2025 21:17
@lhecker lhecker requested a review from DHowett June 17, 2025 21:17
let mut status = icu_ffi::U_ZERO_ERROR;
let start = unsafe { (f.uregex_start64)(self.0, group, &mut status) };
let end = unsafe { (f.uregex_end64)(self.0, group, &mut status) };
if status.is_failure() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhecker can uregex_start64 fail too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the nice property of ICU is that its functions short-circuit if you pass in an error status. So, if uregex_start64 fails, status will contain a failure, uregex_end64 will short-circuit and then we check it. Makes some of this code really neat.

// Unescape any escaped characters.
while off < replacement.len() && replacement[off] == b'\\' {
off += 2;
text.push(match replacement.get(off - 1).map_or(b'\\', |&c| c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhecker we didn't check whether replacement contained two characters, only that it contained at least one. what happens if it only contains one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can leave a comment there to make that logic a bit easier to understand. The tl;dr is that this should then result in map_or returning b'\\', because off + 2 - 1 will be out of bounds.

@lhecker lhecker merged commit 70f5b73 into microsoft:main Jun 17, 2025
3 checks passed
Erithax pushed a commit to Erithax/edit that referenced this pull request Jun 18, 2025
Closes microsoft#111

Co-authored-by: Leonard Hecker <leonard@hecker.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support replacement string for regex search
3 participants