Skip to content

[AutoFill Debugging] Refactor debug text extraction to work with non-Cocoa ports#51787

Merged
webkit-commit-queue merged 1 commit intoWebKit:mainfrom
whsieh:eng/300135
Oct 4, 2025
Merged

[AutoFill Debugging] Refactor debug text extraction to work with non-Cocoa ports#51787
webkit-commit-queue merged 1 commit intoWebKit:mainfrom
whsieh:eng/300135

Conversation

@whsieh
Copy link
Copy Markdown
Member

@whsieh whsieh commented Oct 3, 2025

b7f73eb

[AutoFill Debugging] Refactor debug text extraction to work with non-Cocoa ports
https://bugs.webkit.org/show_bug.cgi?id=300135
rdar://161919662

Reviewed by Abrar Rahman Protyasha.

This patch moves logic to convert a `WebCore::TextExtraction::Item` into a hierarchical debug text
representation out of platform-specific Swift/ObjC code in `_WKTextExtraction.swift` and
`WKTextExtractionUtilities.mm`, and into a new `TextExtractionToStringConversion.cpp`, which is
platform agnostic.

This also allows us to skip the conversion from `WebCore::TextExtraction::Item` to a tree of
`WKTextExtractionItem` Objective-C objects for the purposes of extracting `-_debugText:`, and
instead directly convert from `WebCore::TextExtraction::Item` to a string representation.

Note that this new helper file is in `Source/WebKit/Shared`, so that it can be invoked from within
the web process if necessary, in a subsequent patch. For now, this helper is only invoked from the
UI process.

No change in behavior (beyond the lack of trailing newline in the output below); see below for more
details.

* LayoutTests/fast/text-extraction/debug-text-extraction-basic-expected.txt:

Rebaseline a layout test; the debug text extraction no longer has a trailing newline with the new
way of aggregating text results per line.

* Source/WebKit/Shared/Extensions/WebExtensionSQLiteStatement.h:
* Source/WebKit/Shared/TextExtractionToStringConversion.cpp: Added.
(WebKit::TextExtractionAggregator::TextExtractionAggregator):
(WebKit::TextExtractionAggregator::~TextExtractionAggregator):

Create a new RAII object to encapsulate the debug text conversion lifecycle. This is necessary
because `TextExtraction::Item` itself is just a struct (not ref-counted) that's passed back in the
completion handler of `WebPageProxy::requestTextExtraction`, but when recursively converting these
text extraction results, we may need to perform asynchronous filtering operations to verify that
(e.g.) text content in extracted text items represent user-visible text, and the
`TextExtractionFilter`'s classifier considers the text valid. This was previously not necessary,
because we took advantage of the fact that we first converted `TextExtraction::Item` to ObjC objects
which we just retained while performing async filtering (and whose text we could mutate as needed).

This is created with the overall completion handler, which is invoked in the destructor. See
`convertToText` below for the usage.

(WebKit::TextExtractionAggregator::create):
(WebKit::TextExtractionAggregator::addResult):

Add a helper method to add results (give a line number, indentation level and comma-separated
components) to the final text extraction. Importantly, note that these results don't necessary come
in in order by line number, due to the fact that extraction for text nodes may involve additional
async validation.

(WebKit::TextExtractionAggregator::advanceToNextLine):

Add a helper method to advance to the next line (i.e. claiming the next line number) when dumping
text for a new item in the extraction result tree.

(WebKit::commaSeparatedString):
(WebKit::escapeString):
(WebKit::eventListenerTypesToStringArray):

Migrate several helper methods over from `_WKTextExtraction.swift`.

(WebKit::partsForItem):
(WebKit::addPartsForText):
(WebKit::addPartsForItem):
(WebKit::addTextRepresentationRecursive):

Implement the bulk of the algorithm here. This recursively traverses extraction results, adding
results to the `aggregator` in the process. This takes an async filtering callback (which returns a
`NativePromise`) which applies to all extracted text items. While any of these promises is pending,
the `aggregator` will be kept alive.

(WebKit::convertToText):
* Source/WebKit/Shared/TextExtractionToStringConversion.h: Copied from Source/WebKit/UIProcess/Cocoa/TextExtraction/WKTextExtractionUtilities.h.
(WebKit::convertToText):
* Source/WebKit/Sources.txt:
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(-[WKWebView _activeNativeMenuItemTitles]):

Make this return a list of item titles, now that we no longer use `WKTextExtractionPopupMenu`.

(-[WKWebView _debugTextWithConfiguration:completionHandler:]):

Reimplement this in terms of `-_requestTextExtractionInternal:completion:`; instead of getting a
`WKTextExtractionResult` and converting that into a text representation, we now go directly from
`TextExtraction::Item` to `NSString *`.

We also reimplement the async filtering logic here (i.e. passing text through
`WebKit::TextExtractionFilter` and `-_validateText:`) using the `TextExtractionFilterCallback` in
`convertToText`, which allows us to delete the now-unused helpers to recursively filter out text in
the ObjC extraction result tree, in `WKTextExtractionUtilities.mm`.

(-[WKWebView _requestTextExtractionInternal:completion:]):

Pull common logic to obtain an `TextExtraction::Item` out into a separate helper method.

(-[WKWebView _requestTextExtraction:completionHandler:]):
(-[WKWebView _validateText:inNode:completionHandler:]):
(-[WKWebView _popupMenuForTextExtractionResults]): Deleted.

Remove `WKTextExtractionPopupMenu` entirely, now that debug text extraction no longer depends on
producing a `WKTextExtractionResult` and asking it for a text representation.

* Source/WebKit/UIProcess/API/Cocoa/WKWebViewInternal.h:
* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.swift:
(WKTextExtractionEventListenerTypes.description): Deleted.
(eventListenerTypesAsArray(_:)): Deleted.
(String.escaped): Deleted.
(WKTextExtractionItem.textRepresentationRecursive(_:)): Deleted.
(WKTextExtractionItem.textRepresentationParts): Deleted.
(WKTextExtractionContainerItem.textRepresentationParts): Deleted.
(WKTextExtractionContentEditableItem.textRepresentationParts): Deleted.
(WKTextExtractionTextFormControlItem.textRepresentationParts): Deleted.
(WKTextExtractionLinkItem.textRepresentationParts): Deleted.
(WKTextExtractionTextItem.textRepresentationParts): Deleted.
(WKTextExtractionScrollableItem.textRepresentationParts): Deleted.
(WKTextExtractionSelectItem.textRepresentationParts): Deleted.
(WKTextExtractionImageItem.textRepresentationParts): Deleted.
(WKTextExtractionPopupMenu.textRepresentation): Deleted.
(WKTextExtractionResult.textRepresentation): Deleted.

Delete now-unused code (see `TextExtractionToStringConversion.cpp` above).

* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtractionInternal.h:
* Source/WebKit/UIProcess/Cocoa/TextExtraction/WKTextExtractionUtilities.h:
* Source/WebKit/UIProcess/Cocoa/TextExtraction/WKTextExtractionUtilities.mm:
(WebKit::filterTextRecursive): Deleted.
(WebKit::filterText): Deleted.

Moved into `WKWebView.mm` (see above).

* Source/WebKit/WebKit.xcodeproj/project.pbxproj:

Canonical link: https://commits.webkit.org/301002@main

6c1924f

Misc iOS, visionOS, tvOS & watchOS macOS Linux Windows Apple Internal
❌ 🧪 style ✅ 🛠 ios ✅ 🛠 mac ✅ 🛠 wpe ✅ 🛠 win ✅ 🛠 ios-apple
✅ 🛠 ios-sim ✅ 🛠 mac-AS-debug ✅ 🧪 wpe-wk2 ❌ 🧪 win-tests ✅ 🛠 mac-apple
✅ 🧪 webkitperl ✅ 🧪 ios-wk2 ✅ 🧪 api-mac ✅ 🧪 api-wpe loading 🛠 vision-apple
✅ 🧪 ios-wk2-wpt ✅ 🧪 mac-wk1 ✅ 🛠 wpe-cairo
✅ 🧪 api-ios ✅ 🧪 mac-wk2 ✅ 🛠 gtk
✅ 🛠 vision ✅ 🧪 mac-AS-debug-wk2 ✅ 🧪 gtk-wk2
✅ 🛠 vision-sim ✅ 🧪 mac-wk2-stress ✅ 🧪 api-gtk
✅ 🛠 🧪 merge ✅ 🧪 vision-wk2 ✅ 🧪 mac-intel-wk2 ✅ 🛠 playstation
✅ 🛠 tv ✅ 🛠 mac-safer-cpp
✅ 🛠 tv-sim
✅ 🛠 watch
✅ 🛠 watch-sim

@whsieh whsieh requested review from cdumez and rr-codes as code owners October 3, 2025 23:04
@whsieh whsieh self-assigned this Oct 3, 2025
@whsieh whsieh added the New Bugs Unclassified bugs are placed in this component until the correct component can be determined. label Oct 3, 2025
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Oct 4, 2025
@whsieh whsieh removed the merging-blocked Applied to prevent a change from being merged label Oct 4, 2025
@webkit-ews-buildbot
Copy link
Copy Markdown
Collaborator

Safer C++ Build #57091 (8fb9cff)

❌ Found 1 failing file with 12 issues. Please address these issues before landing. See WebKit Guidelines for Safer C++ Programming.
(cc @rniwa)

@whsieh
Copy link
Copy Markdown
Member Author

whsieh commented Oct 4, 2025

Thanks for the review!

The safer-CPP failures in WebExtensionSQLiteStatement.cpp all appear to be unrelated. I wonder if it's just failing now due to source unification? (Though it's somewhat unclear why unified source order might matter).

Just in case though, I'll patch those up in a separate PR… this PR.

@webkit-early-warning-system
Copy link
Copy Markdown
Collaborator

Starting EWS tests for 6c1924f. Live statuses available at the PR page, #51787

@whsieh whsieh added the merge-queue Applied to send a pull request to merge-queue label Oct 4, 2025
…Cocoa ports

https://bugs.webkit.org/show_bug.cgi?id=300135
rdar://161919662

Reviewed by Abrar Rahman Protyasha.

This patch moves logic to convert a `WebCore::TextExtraction::Item` into a hierarchical debug text
representation out of platform-specific Swift/ObjC code in `_WKTextExtraction.swift` and
`WKTextExtractionUtilities.mm`, and into a new `TextExtractionToStringConversion.cpp`, which is
platform agnostic.

This also allows us to skip the conversion from `WebCore::TextExtraction::Item` to a tree of
`WKTextExtractionItem` Objective-C objects for the purposes of extracting `-_debugText:`, and
instead directly convert from `WebCore::TextExtraction::Item` to a string representation.

Note that this new helper file is in `Source/WebKit/Shared`, so that it can be invoked from within
the web process if necessary, in a subsequent patch. For now, this helper is only invoked from the
UI process.

No change in behavior (beyond the lack of trailing newline in the output below); see below for more
details.

* LayoutTests/fast/text-extraction/debug-text-extraction-basic-expected.txt:

Rebaseline a layout test; the debug text extraction no longer has a trailing newline with the new
way of aggregating text results per line.

* Source/WebKit/Shared/Extensions/WebExtensionSQLiteStatement.h:
* Source/WebKit/Shared/TextExtractionToStringConversion.cpp: Added.
(WebKit::TextExtractionAggregator::TextExtractionAggregator):
(WebKit::TextExtractionAggregator::~TextExtractionAggregator):

Create a new RAII object to encapsulate the debug text conversion lifecycle. This is necessary
because `TextExtraction::Item` itself is just a struct (not ref-counted) that's passed back in the
completion handler of `WebPageProxy::requestTextExtraction`, but when recursively converting these
text extraction results, we may need to perform asynchronous filtering operations to verify that
(e.g.) text content in extracted text items represent user-visible text, and the
`TextExtractionFilter`'s classifier considers the text valid. This was previously not necessary,
because we took advantage of the fact that we first converted `TextExtraction::Item` to ObjC objects
which we just retained while performing async filtering (and whose text we could mutate as needed).

This is created with the overall completion handler, which is invoked in the destructor. See
`convertToText` below for the usage.

(WebKit::TextExtractionAggregator::create):
(WebKit::TextExtractionAggregator::addResult):

Add a helper method to add results (give a line number, indentation level and comma-separated
components) to the final text extraction. Importantly, note that these results don't necessary come
in in order by line number, due to the fact that extraction for text nodes may involve additional
async validation.

(WebKit::TextExtractionAggregator::advanceToNextLine):

Add a helper method to advance to the next line (i.e. claiming the next line number) when dumping
text for a new item in the extraction result tree.

(WebKit::commaSeparatedString):
(WebKit::escapeString):
(WebKit::eventListenerTypesToStringArray):

Migrate several helper methods over from `_WKTextExtraction.swift`.

(WebKit::partsForItem):
(WebKit::addPartsForText):
(WebKit::addPartsForItem):
(WebKit::addTextRepresentationRecursive):

Implement the bulk of the algorithm here. This recursively traverses extraction results, adding
results to the `aggregator` in the process. This takes an async filtering callback (which returns a
`NativePromise`) which applies to all extracted text items. While any of these promises is pending,
the `aggregator` will be kept alive.

(WebKit::convertToText):
* Source/WebKit/Shared/TextExtractionToStringConversion.h: Copied from Source/WebKit/UIProcess/Cocoa/TextExtraction/WKTextExtractionUtilities.h.
(WebKit::convertToText):
* Source/WebKit/Sources.txt:
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(-[WKWebView _activeNativeMenuItemTitles]):

Make this return a list of item titles, now that we no longer use `WKTextExtractionPopupMenu`.

(-[WKWebView _debugTextWithConfiguration:completionHandler:]):

Reimplement this in terms of `-_requestTextExtractionInternal:completion:`; instead of getting a
`WKTextExtractionResult` and converting that into a text representation, we now go directly from
`TextExtraction::Item` to `NSString *`.

We also reimplement the async filtering logic here (i.e. passing text through
`WebKit::TextExtractionFilter` and `-_validateText:`) using the `TextExtractionFilterCallback` in
`convertToText`, which allows us to delete the now-unused helpers to recursively filter out text in
the ObjC extraction result tree, in `WKTextExtractionUtilities.mm`.

(-[WKWebView _requestTextExtractionInternal:completion:]):

Pull common logic to obtain an `TextExtraction::Item` out into a separate helper method.

(-[WKWebView _requestTextExtraction:completionHandler:]):
(-[WKWebView _validateText:inNode:completionHandler:]):
(-[WKWebView _popupMenuForTextExtractionResults]): Deleted.

Remove `WKTextExtractionPopupMenu` entirely, now that debug text extraction no longer depends on
producing a `WKTextExtractionResult` and asking it for a text representation.

* Source/WebKit/UIProcess/API/Cocoa/WKWebViewInternal.h:
* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.swift:
(WKTextExtractionEventListenerTypes.description): Deleted.
(eventListenerTypesAsArray(_:)): Deleted.
(String.escaped): Deleted.
(WKTextExtractionItem.textRepresentationRecursive(_:)): Deleted.
(WKTextExtractionItem.textRepresentationParts): Deleted.
(WKTextExtractionContainerItem.textRepresentationParts): Deleted.
(WKTextExtractionContentEditableItem.textRepresentationParts): Deleted.
(WKTextExtractionTextFormControlItem.textRepresentationParts): Deleted.
(WKTextExtractionLinkItem.textRepresentationParts): Deleted.
(WKTextExtractionTextItem.textRepresentationParts): Deleted.
(WKTextExtractionScrollableItem.textRepresentationParts): Deleted.
(WKTextExtractionSelectItem.textRepresentationParts): Deleted.
(WKTextExtractionImageItem.textRepresentationParts): Deleted.
(WKTextExtractionPopupMenu.textRepresentation): Deleted.
(WKTextExtractionResult.textRepresentation): Deleted.

Delete now-unused code (see `TextExtractionToStringConversion.cpp` above).

* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtractionInternal.h:
* Source/WebKit/UIProcess/Cocoa/TextExtraction/WKTextExtractionUtilities.h:
* Source/WebKit/UIProcess/Cocoa/TextExtraction/WKTextExtractionUtilities.mm:
(WebKit::filterTextRecursive): Deleted.
(WebKit::filterText): Deleted.

Moved into `WKWebView.mm` (see above).

* Source/WebKit/WebKit.xcodeproj/project.pbxproj:

Canonical link: https://commits.webkit.org/301002@main
@webkit-commit-queue
Copy link
Copy Markdown
Collaborator

Committed 301002@main (b7f73eb): https://commits.webkit.org/301002@main

Reviewed commits have been landed. Closing PR #51787 and removing active labels.

@webkit-commit-queue webkit-commit-queue merged commit b7f73eb into WebKit:main Oct 4, 2025
@webkit-commit-queue webkit-commit-queue removed the merge-queue Applied to send a pull request to merge-queue label Oct 4, 2025
@whsieh whsieh deleted the eng/300135 branch October 4, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New Bugs Unclassified bugs are placed in this component until the correct component can be determined.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants