Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow regex search #9228

Closed
wants to merge 11 commits into from
Closed

Allow regex search #9228

wants to merge 11 commits into from

Conversation

PankajBhojwani
Copy link
Contributor

@PankajBhojwani PankajBhojwani commented Feb 20, 2021

Summary of the Pull Request

Initial stage of implementing regex search

Todos left:

  • Allow find previous
  • Wraparound when search fails
  • Change the icon in the search box

References

#3920

PR Checklist

  • Closes #xxx
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Documentation updated. If checked, please file a pull request on our docs repo and link it here: #xxx
  • Schema updated.
  • I work here

Validation Steps Performed

Comment on lines 394 to 400
// Regex iterators are not bidirectional, so we cannot create a reverse iterator out of one
// So, if we want to find the previous match, we need to manually progress the iterator
while (matches_begin != matches_end)
{
desired = matches_begin;
++matches_begin;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There really should be a better way to do this


// Create the regex object and iterator
std::wregex regexObj{ _inputString };
auto matches_begin = std::wsregex_iterator(concatAll.begin(), concatAll.end(), regexObj);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user accidentally types a pathological regex and the search takes too long, is there any way to abort it? It doesn't look like std::regex_iterator supports any kind of cancellation token.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if cancellation could be implemented by throwing an exception from operators of a custom iterator type. Would there be a risk that such an exception can skip some cleanup in the regexp algorithm, or that the algorithm can copy part of the input to an internal buffer and then spend excessive time examining that buffer, without using the original iterators?

src/buffer/out/search.cpp Outdated Show resolved Hide resolved
src/cascadia/TerminalControl/SearchBoxControl.cpp Outdated Show resolved Hide resolved
@PankajBhojwani
Copy link
Contributor Author

#8588 makes our searching much more efficient, so we will wait till that merges before merging this one.

@github-actions
Copy link

github-actions bot commented Oct 26, 2021

@check-spelling-bot Report

Unrecognized words, please review:

  • AAAAA
  • AAAAAAAAAAAAA
  • BBBBB
  • BBBBBBBB
  • Gravell
  • It'd
  • ok'd
  • pws
  • qos
  • RIS
  • sxs
  • sys
  • We'd
  • ZZZZZ
Previously acknowledged words that are now absent aef apos aspnet bc boostorg BSODs Cac ci COINIT conpixels cplinfo crt cw cx dahall DEFAPP DEFCON df dh dw eb EK ev exeuwp fde fea fmtlib gb gh Gravell's hc hh hk hotkeys HPCON hu isocpp Kd KF KJ KU KX lk llvm mintty msvcrtd Nc NVDA Nx oa OI Oj Oo oq Ou Ov pb pinam pv pw px py qi QJ qo QOL remoting ri Rl rmdir ru smalllogo splashscreen storelogo sx sy sz td tf tl tt uk ul Unk unte vcrt vf vk wx xa xamarin xy Yw yx YZ Zc zd zh ZM zu zy
Some files were were automatically ignored

These sample patterns would exclude them:

^src/host/ft_uia/run\.bat$
^src/host/runft\.bat$
^src/host/runut\.bat$
^src/terminal/adapter/ut_adapter/run\.bat$
^src/terminal/parser/delfuzzpayload\.bat$
^src/terminal/parser/ft_fuzzer/run\.bat$
^src/terminal/parser/ft_fuzzwrapper/run\.bat$
^src/terminal/parser/ut_parser/run\.bat$
^src/tools/integrity/packageuwp/ConsoleUWP\.appxSources$
^src/tools/lnkd/lnkd\.bat$
^src/tools/pixels/pixels\.bat$
^src/tools/texttests/fira\.txt$

You should consider excluding directory paths (e.g. (?:^|/)vendor/), filenames (e.g. (?:^|/)yarn\.lock$), or file extensions (e.g. \.gz$)

You should consider adding them to:

.github/actions/spelling/excludes.txt

File matching is via Perl regular expressions.

To check these files, more of their words need to be in the dictionary than not. You can use patterns.txt to exclude portions, add items to the dictionary (e.g. by adding them to allow.txt), or fix typos.

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the git@github.com:microsoft/terminal.git repository
on the dev/pabhoj/regex_search branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect/alphabet.txt
.github/actions/spelling/expect/expect.txt
.github/actions/spelling/expect/web.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect/9662bc69102e2e152a062a9060cb63d2742582d3.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
(cat '.github/actions/spelling/excludes.txt' - <<EOF
$should_exclude_patterns
EOF
) |grep .|
sort -f |
uniq > '.github/actions/spelling/excludes.txt.temp' &&
mv '.github/actions/spelling/excludes.txt.temp' '.github/actions/spelling/excludes.txt'
}

comment_json=$(mktemp)
curl -L -s -S \
  --header "Content-Type: application/json" \
  "https://api.github.com/repos/microsoft/terminal/issues/comments/952184317" > "$comment_json"
comment_body=$(mktemp)
jq -r .body < "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")
  

patch_add=$(perl -e '$/=undef;
$_=<>;
s{<details>.*}{}s;
s{^#.*}{};
s{\n##.*}{};
s{(?:^|\n)\s*\*}{}g;
s{\s+}{ }g;
print' < "$comment_body")
  

should_exclude_patterns=$(perl -e '$/=undef;
$_=<>;
exit unless s{(?:You should consider excluding directory paths|You should consider adding them to).*}{}s;
s{.*These sample patterns would exclude them:}{}s;
s{.*\`\`\`([^`]*)\`\`\`.*}{$1}m;
print' < "$comment_body" | grep . || true)

update_files
rm $comment_body
git add -u

@zadjii-msft
Copy link
Member

@PankajBhojwani I'm tempted to close this and resurrect it after #15858 merges. Thoughts?

@zadjii-msft zadjii-msft mentioned this pull request Sep 20, 2022
20 tasks
@lhecker
Copy link
Member

lhecker commented Aug 21, 2023

Oh neat, this PR already got a SVG icon for SearchBox_Regex. We can use that in a future PR!
After #15858 all we need to do to support regex is shove a bool all the way through to TextBuffer and that's it. It already uses a regex searcher (ICU uregex) even for non-regex searches (by setting UREGEX_LITERAL).

@PankajBhojwani
Copy link
Contributor Author

@PankajBhojwani I'm tempted to close this and resurrect it after #15858 merges. Thoughts?

Agreed!

@zadjii-msft
Copy link
Member

Gotcha. We'll resurrect... shortly 😄

@DHowett
Copy link
Member

DHowett commented Aug 21, 2023

Delete branch?

@DHowett
Copy link
Member

DHowett commented Aug 22, 2023

Ah, the asset was a copy of the Case Search asset

image

I was keeping the branch around because of the asset, but... yea.

@lhecker @KalleOlaviNiemitalo asked about pathological regeces and cancellation, which we may want to address or at least consider in the followup PR

@DHowett DHowett deleted the dev/pabhoj/regex_search branch August 22, 2023 21:02
@zadjii-msft
Copy link
Member

ACK
image

Yea we'll need a new svg for that 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants