Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ICU for text search #15858

Merged
merged 22 commits into from Aug 24, 2023
Merged

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Aug 21, 2023

The ultimate goal of this PR was to use ICU for text search to

  • Improve Unicode support
    Previously we used towlower and only supported BMP glphs.
  • Improve search performance (10-100x)
    This allows us to search for all results in the entire text buffer
    at once without having to do so asynchronously.

Unfortunately, this required some significant changes too:

  • ICU's search facilities operate on text positions which we need to be
    mapped back to buffer coordinates. This required the introduction of
    CharToColumnMapper to implement sort of a reverse-_charOffsets
    mapping. It turns text (character) positions back into coordinates.
  • Previously search restarted every time you clicked the search button.
    It used the current selection as the starting position for the new
    search. But since ICU's uregex cannot search backwards we're
    required to accumulate all results in a vector first and so we
    need to cache that vector in between searches.
  • We need to know when the cached vector became invalid and so we have
    to track any changes made to TextBuffer. The way this commit solves
    it is by splitting GetRowByOffset into GetRowByOffset for
    const ROW access and GetMutableRowByOffset which increments a
    mutation counter on each call. The Search instance can then compare
    its cached mutation count against the previous mutation count.

Finally, this commit makes 2 semi-unrelated changes:

  • URL search now also uses ICU, since it's closely related to regular
    text search anyways. This significantly improves performance at
    large window sizes.
  • A few minor issues in UiaTracing were fixed. In particular
    2 functions which passed strings as wstring by copy are now
    using wstring_view and TraceLoggingCountedWideString.

Related to #6319 and #8000

Validation Steps Performed

  • Search upward/downward in conhost ✅
  • Search upward/downward in WT ✅
  • Searching for any of ß, ẞ, ss or SS matches any of the other ✅
  • Searching for any of Σ, σ, or ς matches any of the other ✅

@lhecker lhecker added Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal. Area-i18n Internationalization issues (e.g.: non-US input handling doesn't work) labels Aug 21, 2023
@github-actions

This comment has been minimized.

template<typename T>
T _adjustBackward(T column) const noexcept;
template<typename T>
T _adjustForward(T column) const noexcept;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were made into templates so that I can use ptrdiff_t indices with them.

src/buffer/out/Row.cpp Show resolved Hide resolved
_needle(s_CreateNeedleFromString(str)),
_renderData(renderData),
_coordAnchor(s_GetInitialAnchor(renderData, direction))
// - reverse - True when searching backward/upwards in the buffer
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After going through all the callers of Search I found that just using bool parameters is more ergonomic. None of the callers, not even conhost, actually used the two enums. They just converted them back/forth.

I would suggest reviewing this file without the diff viewer. It's a mostly new implementation.

_fpDirectionDownAverage = 0;
_fpMatchCaseAverage = 0;
_uiFindNextClickedTotal = 0;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems no one ever noticed that we passed reverse == true (= upwards_search == true) as the fDirectionDown parameter. In other words, the telemetry for DirectionDownAverage was broken. Since I feel like no one is using this telemetry anyways, I chose to remove it instead of fixing it.

auto searchAnchor = _start;
if (searchBackward)
const std::wstring_view queryText{ text, SysStringLen(text) };
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1);
Copy link
Member Author

@lhecker lhecker Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this code could use the Search class, because Search would stop searching once it found something. The new Search class will go through the entire buffer however.

To fix that we could either add 2 parameters to Search to specify a bounding range for the search or use TextBuffer's function directly. I felt like the latter is a better option, because while the 2 for loops below are verbose and somewhat difficult to understand initially, they're also easier to verify for their correctness IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlos-zamora I didn't quite understand the UiaTextRangeBase::FindText implementation. Could you review that file for me? 🙂 Also, how do I best test that function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing 😊

As for how to test it on your own, you can do that using "Accessibility Insights":

  • open A11y insights
  • Click on the terminal
  • Under "Patterns", Click "Explore..." button next to "TextPattern"
  • A "TextPattern Explorer" window should appear
  • Click the "..." button next to the "Document" range
  • Click "Find..."

This new window should allow you to directly interact with the API much like any screen reader could.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Accessibility Insights is kinda buggy? I can't really tell how it works, but if you open the Find dialog and ever reach a "not found" situation, it'll continue to say "not found" forever.

After testing it, I think that my implementation works correctly. If I type "eeeee" the find dialog will find all "e"s no matter where they are or whether I use forwards or backwards search. So there don't seem to be any off-by-1 errors in the code.

However, there's one big difference to the old code: My new implementation returns a "not found" result when it has reached the end of the buffer. The old code continued to circle around the buffer if you reached the end. (This didn't always work for the backward search due to the start > _start check. It's supposed to be start >= _start, otherwise it can't find results that are at the very start of the buffer at 0,0.)
Was that intentional? Should the UIA search circle around in the buffer when it has reached either end?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this code could use the Search class, because Search would stop searching once it found something. The new Search class will go through the entire buffer however.

Won't this cause a decent performance hit? _getOptimizedBufferSize() used to reduce the search space by ignoring anything past the mutable bottom. Now that we're searching through the whole buffer again, won't FindText('a', ...) be slower. And if the user calls that function repeatedly, the user may have to wait quite a bit before being able to perform the next search? (FYI I haven't verified the performance question via a11y insights yet, so this could all be false)

It seems Accessibility Insights is kinda buggy? I can't really tell how it works, but if you open the Find dialog and ever reach a "not found" situation, it'll continue to say "not found" forever.

You could also use inspect.exe. It's the predecessor to a11y insights and can be found in the Windows SDK's \bin\<version>\<platform> folder. To interact with the API, do the following:

  1. open inspect.exe
  2. click on the terminal (the terminal should be highlighted in the UIA tree in inspect.exe)
  3. Click "Action" > "Text Pattern Explorer..." (a new "Text Explorer" window should appear)
  4. Click "Range" > "Find..."

Was that intentional? Should the UIA search circle around in the buffer when it has reached either end?

Interesting! I was originally going to say that we should keep the same behavior as before, but it looks like this experience is different from MS Word's! There were two notable differences with your implementation:

  1. circling the buffer: MS Word doesn't circle the text area whereas we do in Terminal (old and new impl). So, we probably shouldn't do that anymore.
  2. no more results: MS Word returns something that is interpreted by inspect.exe as "ProcessFind: FindText couldn't find matching text.". But Terminal (new impl) is interpreted as "ProcessFind: MoveEndpointByRange failed. Error -2147024809".

We should align these experiences (which is a bit annoying because your PR wasn't really intended to deal with any of this). If that's easy to do, it'd be best to do that as a part of this PR or in a follow-up before the next release. If not, I think we should keep the old experience, but I know that's not really possible anymore given that the Search class is now completely different. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this code could use the Search class, because Search would stop searching once it found something. The new Search class will go through the entire buffer however.

Won't this cause a decent performance hit? _getOptimizedBufferSize() used to reduce the search space by ignoring anything past the mutable bottom.

The new ICU-based searcher can search the entire text buffer in 4ms.

It also avoids searching any part of the text buffer that has not been "committed" to memory. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICU also has usearch_previous. We could use it for UIA specifically since it doesn't need the total number of results unlike WT (which is why it uses uregex because it's way way faster when searching for all results). Just like uregex it's very easy to use. I can make it fit. 🙂

And I'll try to make our UIA function behave like Word.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it seems that searching through a character-iterator is a C++ API feature and Windows only ships the C API. Well it's probably fine. I feel like this is not going to be a bottleneck no matter what we do.

This was referenced Aug 21, 2023
@zadjii-msft
Copy link
Member

URL search now also uses ICU

Does it just so happen to work across wrapped lines now? @asklar literally reported that to me on Teams this weekend 😄

@lhecker
Copy link
Member Author

lhecker commented Aug 21, 2023

image

@lhecker
Copy link
Member Author

lhecker commented Aug 21, 2023

Oh wait, I think you meant a different problem, right? I think that's something like Terminal::GetHyperlinkAtBufferPosition, because that one searches only a single line?

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving notes before lunch

@@ -122,7 +124,7 @@ class TextBuffer final
ROW& GetScratchpadRow();
ROW& GetScratchpadRow(const TextAttribute& attributes);
const ROW& GetRowByOffset(til::CoordType index) const;
ROW& GetRowByOffset(til::CoordType index);
ROW& GetMutableRowByOffset(til::CoordType index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a ton of this diff is just this rename

Copy link
Member Author

@lhecker lhecker Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. But it's a necessary change because many callers have a TextBuffer& but only need a const ROW&. (TextBufferCellIterator for instance is widely used and does this.) 😟

src/buffer/out/Row.cpp Outdated Show resolved Hide resolved
src/cascadia/TerminalControl/ControlCore.cpp Outdated Show resolved Hide resolved
src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
Comment on lines 28 to 30
_results = textBuffer.SearchText(str, caseInsensitive);
_mutationCount = textBuffer.GetMutationCount();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably, the caller has a write lock on the text buffer so the MutationCount can't change between the search and getting the count, but figured I'd mention it here. If there's a way to guarantee that, it might be worthwhile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could only assert that the lock is being held, but I think that this might be better as a separate PR that does this for all the parts that implicitly rely on the console lock being held. The mutation count could be made atomic, but this won't realistically improve anything, because you need to hold the lock anyways to initiate any kind of search (either to do a new search or to select stuff in the buffer).

src/buffer/out/Row.cpp Show resolved Hide resolved
src/buffer/out/Row.cpp Show resolved Hide resolved
src/buffer/out/Row.cpp Outdated Show resolved Hide resolved
const til::CoordType columnCount = _columnCount;
const til::CoordType scale = _lineRendition != LineRendition::SingleWidth;
const til::CoordType padding = _doubleBytePadded;
return (columnCount - padding) >> scale;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised padding comes before un-scaling, but I think it makes sense? A double-byte padded character that doesn't fit on a double-width row didn't fit because it took up 4 columns.. hm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was really unsure in what order these need to be, and I still am a little.
Thinking about it some more, does it have to be like this?

const auto padding = _doubleBytePadded << scale;
return (columnCount - padding) >> scale;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the same as...

return (columnCount >> scale) - padding;

right?

Copy link
Member Author

@lhecker lhecker Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite since the rounding is different. In your code it would subtract 2 no matter what. For instance, imagine padding is 2 and columnCount is 11. In my code snippet it would return 4 and with yours it would return 3. I think 4 is correct because 4*2+3 = 11 (3 being the max. wide glyph padding).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

@@ -65,6 +65,28 @@ struct RowCopyTextFromState
til::CoordType sourceColumnEnd = 0; // OUT
};

// This structure is basically an inverse of ROW::_charOffsets. If you have a pointer
// into a ROW's text this class can tell you what cell that pointer belongs to.
struct CharToColumnMapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nobody consumes this externally, do they? Can it be a forward declaration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to use this class in other parts of the code base in the future (the renderer primarily).

@@ -26,6 +26,7 @@
<ClCompile Include="..\precomp.cpp">
<PrecompiledHeader>Create</PrecompiledHeader>
</ClCompile>
<ClCompile Include="..\UTextAdapter.cpp" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update sources! and probably the sources for the UT too.z

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Aug 21, 2023
Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

27/35

Comment on lines 1550 to 1552
_searcherGoForward = goForward;
_searcherCaseSensitive = caseSensitive;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you don't want to make these getters on Search?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it some more, I'd like to do that a different time if anything. Other editors reset the search when there's any change whatsoever. Technically we shouldn't need to track these members - we should just reset the search whenever any field in the dialog is changed. I just haven't done the work for that.

src/buffer/out/textBuffer.hpp Outdated Show resolved Hide resolved
@@ -373,12 +372,15 @@ class TextBuffer final

TextAttribute _currentAttributes;
til::CoordType _firstRow = 0; // indexes top row (not necessarily 0)
uint64_t _mutationCount = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fathom a textbuffer having more than 900 quin-zill-trillion mutations. Is this a count, or an opaque number? If it's an opaque number, which is not orderable, wraparound is implicitly OK.

If somebody is going to use > on it, we're in trouble.

if (text.size() == 0)
auto lock = _terminal->LockForWriting();

if (_searcher.IsStale() || _searcherText != text || _searcherGoForward != goForward || _searcherCaseSensitive != caseSensitive)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. we need some way to tell the searcher that it is stale when we switch into the alt buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alt buffer will not have the same mutation count.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a one in 900 quin-jillion-millio nchance that it will, but... fair.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll take that chance! 😄 I mean the worst-case scenario is that the search finds results that are stale.

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Aug 21, 2023
@asklar
Copy link
Member

asklar commented Aug 21, 2023

Oh wait, I think you meant a different problem, right? I think that's something like Terminal::GetHyperlinkAtBufferPosition, because that one searches only a single line?

@lhecker yes this sounds about right - will this PR address hyperlinks being broken up across lines (and launching to the wrong url)? if not, is there a bug tracking that already?

@github-actions

This comment has been minimized.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it. The mutable row stuff is great, and Search is completely understandable by a single human now. Thank you.

I need @carlos-zamora to sign off on the UIA changes -- my biggest concern is searching backwards off a previous search (?) because of accidental off-by-one errors. I am not sure how that API is supposed to work.

Blocking until Carlos signs, but otherwise I am a ✔️

const til::CoordType columnCount = _columnCount;
const til::CoordType scale = _lineRendition != LineRendition::SingleWidth;
const til::CoordType padding = _doubleBytePadded;
return (columnCount - padding) >> scale;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

//
// @param ut the UText to get the length of.
// @return the length, in the native units of the original text string.
static int64_t U_CALLCONV utextNativeLength(UText* ut) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these functions return a failure if the underlying buffer was mutated since the utext adapter was created? Or is that not related to the purpose of these guys?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that this would provide a false sense of security that people might end up relying on. Unlocking the console at all during a search is likely a bug or at least a bad idea because it's pretty error prone (not all state relevant to searching is contained in TextBuffer after all - selections for instance). I think we could make it an assertion? Not sure if it should be assert() or WI_ASSERT(). Probably the former, because searching without holding the lock is still somewhat predictable.

src/buffer/out/UTextAdapter.cpp Outdated Show resolved Hide resolved
src/buffer/out/UTextAdapter.cpp Outdated Show resolved Hide resolved
src/buffer/out/UTextAdapter.h Outdated Show resolved Hide resolved
src/buffer/out/textBuffer.hpp Outdated Show resolved Hide resolved

// An excerpt from the ICU documentation:
//
// Extract text from a UText into a UChar buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you complete this one?

Copy link
Member Author

@lhecker lhecker Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I'll keep the code but remove it from the utextFuncs struct and add a comment that the correctness of the function hasn't been verified yet. I don't think it's worth it to finish that function, because we don't need it right now. But I do think it's worth keeping because the code has already been written and is probably 99% correct (if not 100%). I initially thought I had to implement it so that uregex can work, but it seems I was wrong about that. It uses the ut->chunkContents directly (via UTEXT_NEXT32 and friends).

@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Aug 22, 2023
@zadjii-msft
Copy link
Member

I need @carlos-zamora to sign off on the UIA changes

Basically same here. I think I've got like 3 files left but I didn't want to ✅ without him reading that (cause obviously no one knows how that works)

auto searchAnchor = _start;
if (searchBackward)
const std::wstring_view queryText{ text, SysStringLen(text) };
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1);
const auto results = _pData->GetTextBuffer().SearchText(queryText, !!ignoreCase, _start.y, _end.y + 1);

I think we do !! generally. I'm not really attached to this one though. Idk if one is better than the other.

auto searchAnchor = _start;
if (searchBackward)
const std::wstring_view queryText{ text, SysStringLen(text) };
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this code could use the Search class, because Search would stop searching once it found something. The new Search class will go through the entire buffer however.

Won't this cause a decent performance hit? _getOptimizedBufferSize() used to reduce the search space by ignoring anything past the mutable bottom. Now that we're searching through the whole buffer again, won't FindText('a', ...) be slower. And if the user calls that function repeatedly, the user may have to wait quite a bit before being able to perform the next search? (FYI I haven't verified the performance question via a11y insights yet, so this could all be false)

It seems Accessibility Insights is kinda buggy? I can't really tell how it works, but if you open the Find dialog and ever reach a "not found" situation, it'll continue to say "not found" forever.

You could also use inspect.exe. It's the predecessor to a11y insights and can be found in the Windows SDK's \bin\<version>\<platform> folder. To interact with the API, do the following:

  1. open inspect.exe
  2. click on the terminal (the terminal should be highlighted in the UIA tree in inspect.exe)
  3. Click "Action" > "Text Pattern Explorer..." (a new "Text Explorer" window should appear)
  4. Click "Range" > "Find..."

Was that intentional? Should the UIA search circle around in the buffer when it has reached either end?

Interesting! I was originally going to say that we should keep the same behavior as before, but it looks like this experience is different from MS Word's! There were two notable differences with your implementation:

  1. circling the buffer: MS Word doesn't circle the text area whereas we do in Terminal (old and new impl). So, we probably shouldn't do that anymore.
  2. no more results: MS Word returns something that is interpreted by inspect.exe as "ProcessFind: FindText couldn't find matching text.". But Terminal (new impl) is interpreted as "ProcessFind: MoveEndpointByRange failed. Error -2147024809".

We should align these experiences (which is a bit annoying because your PR wasn't really intended to deal with any of this). If that's easy to do, it'd be best to do that as a part of this PR or in a follow-up before the next release. If not, I think we should keep the old experience, but I know that's not really possible anymore given that the Search class is now completely different. Thoughts?

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm giving this a ✅ on the assumption that the UIA stuff is fine. I don't feel like any of these comments are blocking. And this makes searchv2 so much nicer

src/buffer/out/Row.cpp Outdated Show resolved Hide resolved
src/buffer/out/Row.cpp Show resolved Hide resolved
Comment on lines 1036 to 1037
auto mapper = _createCharToColumnMapper(offset);
return mapper.GetLeadingColumnAt(offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems weird to me that you need to pass the offset into the ctor (via _createCharToColumnMapper), then immediately just call Get*ColumnAt(offset) with the same offset. I don't think I'll block over that but it feels weird

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_createCharToColumnMapper only uses the offset to guess the initial column. GetLeadingColumnAt will then actually compute the proper offset. This approach avoids GetLeadingColumnAt being O(n) most of the time.

The reason the CharToColumnMapper exists as a separate class is because I want to use it for DirectWrite to map the character positions that DirectWrite returns directly back to columns. Instead, right now, we got a really ugly and inefficient approach. You can see it if you search for bufferLineColumn.

src/cascadia/TerminalCore/Terminal.cpp Show resolved Hide resolved
@@ -1348,6 +1345,97 @@ void Terminal::_updateUrlDetection()
}
}

// Interns URegularExpression instances so that they can be reused. This method is thread-safe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I thinks I get what this does, but it might deserve a bit of a longer comment than this.

If I'm not mistaken, this basically creates a process-wide cache of unique_uregexs. That way, if someone comes asking for a particular regex, (on any thread), we can hand back the pre-compiled version of it, if we've already seen it. There's also some affordance for flushing out old cache entries, if there's more than 128.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe intern isn't the right verb:
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird one. In the compiler/language runtime field, it pretty much means means "caching". You almost always only hear it in the context of strings (e.g. "string interning").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL i guess. Feel free to ignore me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I turned it into a small class so we can more easily extract it in the future. I left it in the same place though because I still feel like it's better to keep this relatively "new" code close to where we're going to add custom patterns for the first time soon.

const auto guard = shared.lock.lock_shared();
if (const auto it = shared.set.find(pattern); it != shared.set.end())
{
return ICU::unique_uregex{ uregex_clone(it->second.re.get(), &status) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to get std::move'd on the way out? I always forget how exactly move semantics work

(and below too)

Copy link
Member Author

@lhecker lhecker Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned objects are always being moved, but with much stronger guarantees than regular std::move. return will behave as if there was no function call at all. In other words, returning a std::string by value is like constructing that string in the caller directly.
std::move is basically just a T& with extra syntactic sugar. So, if you return std::move(string); it will actually be forced to call the string(string&&) constructor on the way out, because you're not returning an object value anymore - you're returning a reference which now needs to be converted to a value. That's why one should never return std::move.
The reason it can be seen anyways sometimes is because when you have code like this:

struct StringWrapper {
    // implicit constructor!
    StringWrapper(std::string&& str) {}
};

StringWrapper foobar() {
    std::string str = "...";
    // calls the implicit constructor
    return std::move(str);
}

getFontCollection() in FontCache.h is an example for that.

src/buffer/out/UTextAdapter.cpp Show resolved Hide resolved
@lhecker lhecker added AutoMerge Marked for automatic merge by the bot when requirements are met and removed Needs-Attention The core contributors need to come back around and look at this ASAP. labels Aug 24, 2023
@github-actions

This comment has been minimized.

//
// An alternative approach would be to not make this method thread-safe and give each
// Terminal instance its own cache. I'm not sure which approach would've been better.
ICU::unique_uregex Intern(const std::wstring_view& pattern)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI suppress whitespace changes. The conversion to a class barely changed anything.

@microsoft-github-policy-service microsoft-github-policy-service bot merged commit cd80f3c into main Aug 24, 2023
15 of 17 checks passed
@microsoft-github-policy-service microsoft-github-policy-service bot deleted the dev/lhecker/icu-utext branch August 24, 2023 22:56
zadjii-msft added a commit that referenced this pull request Sep 5, 2023
This is a resurrection of #8588. That PR became painfully stale after
the `ControlCore` split. Original description:

> ## Summary of the Pull Request
> This is a PoC for:
> * Search status in SearchBox (aka number of matches + index of the
current match)
> * Live search (aka search upon typing)
> ## Detailed Description of the Pull Request / Additional comments
> * Introduced this optionally (global setting to enable it)
> * The approach is following:
>   * Every time the filter changes, enumerate all matches
>   * Upon navigation just take the relevant match and select it
> 

I cleaned it up a bit, and added support for also displaying the
positions of the matches in the scrollbar (if `showMarksOnScrollbar` is
also turned on).

It's also been made SUBSTANTIALLY easier after #15858 was merged.

Similar to before, searching while there's piles of output running isn't
_perfect_. But it's pretty awful currently, so that's not the end of the
world.

Gifs below.

* closes #8631 (which is a bullet point in #3920)
* closes #6319


Co-authored-by: Don-Vito <khvitaly@gmail.com>

---------

Co-authored-by: khvitaly <khvitaly@gmail.com>
@zadjii-msft zadjii-msft linked an issue Sep 7, 2023 that may be closed by this pull request
microsoft-github-policy-service bot pushed a commit that referenced this pull request Feb 28, 2024
…row (#16775)

## Summary of the Pull Request

URL detection was broken again in #15858. When the regex matched, we
calculate the column(cell) by its offset, we use forward or backward
iteration of the column to find the correct column that displays the
glyphs of `_chars[offset]`.


https://github.com/microsoft/terminal/blob/abf5d9423a23b13e9af4c19ca35858f9aaf0a63f/src/buffer/out/Row.cpp#L95-L104

However, when calculating the `currentOffset` we forget that MSB of
`_charOffsets[col]` could be `1`, or col is pointing to another glyph in
preceding column.


https://github.com/microsoft/terminal/blob/abf5d9423a23b13e9af4c19ca35858f9aaf0a63f/src/buffer/out/Row.hpp#L223-L226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-i18n Internationalization issues (e.g.: non-US input handling doesn't work) AutoMerge Marked for automatic merge by the bot when requirements are met Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search is often broken while console process is running
5 participants