Add near-miss suggestions for unresolved symbol link error messages #420

d-ronnqvist · 2022-11-08T20:05:00Z

Bug/issue #, if applicable: rdar://59660520

Summary

This adds a mechanism for sorting and filtering strings based on how "similar" they are to a specific string for use to create near-miss suggestions in diagnostics.

The changes are divided in two parts:

One that wraps the the difference capabilities in Swift to calculate the length of the segments of common and difference elements.
Another that "score" these segments to "rank" values based on how "similar" they are to the authored value.

Most of changes in this PR (1500+ LOC) are the tests for both these parts.

For the second part there is no single right or wrong answer for how to do that. The approach I took is to write test of what results I would expect when matching a collection of possibilities against a certain value and then tweak the "score" and "ranking" implementations until they produces the results that I'd expect. This implementation is completely arbitrary.

My hopes is that when there are situations where this doesn't behave well, we'll add those as more tests and tweak the "score" and "ranking" implementations again.

Dependencies

None

Testing

Please give this a spin with some real code bases to see how it performs with real links in real projects.

To do so:

Checkout this branch and build the docc executable.
Build documentation for another project with DocCUseHierarchyBasedLinkResolver enabled and DOCC_EXEC pointed to the locally built docc executable.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

Added tests
Ran the ./bin/test script and it succeeded
Updated documentation if necessary

ktoso · 2022-11-09T00:12:53Z

Very nice, I'm looking forward to trying this! I'm more than happy to run with main based toolchains etc and provide some feedback how it feels from editing some real docs :-)

d-ronnqvist · 2022-11-09T23:27:24Z

@swift-ci please test

d-ronnqvist · 2022-11-10T01:29:33Z

@swift-ci please test

d-ronnqvist · 2022-11-10T17:08:52Z

@swift-ci please test

d-ronnqvist · 2022-11-14T17:15:10Z

@swift-ci please test

d-ronnqvist · 2022-11-18T19:31:01Z

@swift-ci please test

d-ronnqvist · 2022-12-01T15:40:17Z

@swift-ci please test

d-ronnqvist · 2022-12-01T20:01:30Z

@swift-ci please test

d-ronnqvist · 2022-12-05T20:03:06Z

@swift-ci please test

d-ronnqvist · 2022-12-08T17:05:03Z

@swift-ci please test

d-ronnqvist · 2022-12-09T18:39:15Z

@daniel-grumberg or @ethan-kusters do either of you have time to review this?

daniel-grumberg · 2022-12-09T23:24:29Z

Sources/SwiftDocC/Semantics/MarkupReferenceResolver.swift

@@ -71,6 +71,8 @@ struct MarkupReferenceResolver: MarkupRewriter {
            }

            // FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input (rdar://59660520).


Suggested change

// FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input (rdar://59660520).

Or have I misunderstood? Either way this needs clarification.

I don't consider the FIXME's (and rdar://59660520) fully resolved until the near-miss suggestions are structured as as Solution elements on the Diagnostic.

With these changes the near-miss suggestions are part of the error message which both means that they're not presented as fix-its and that they need to be a single line to appear in inline error messages.

Solving that may require breaking changes—since TopicReferenceResolutionResult is a public type. I'm not sure what the best solution is here since the resolver doesn't know the URL or SourceRange which it needs to create Solution elements with a Replacement.

I can add more info to the existing FIXME comment to better describe what's remaining. I could also consider rdar://59660520 done because the information is there and open a new issue for the remaining work to structure the near-miss suggestions are Solution elements with a Replacement. Which do you prefer?

I created a new issue here #438

Updated in 9a6fa12

daniel-grumberg · 2022-12-09T23:25:30Z

Sources/SwiftDocC/Semantics/ReferenceResolver.swift

@@ -85,6 +85,8 @@ struct ReferenceResolver: SemanticVisitor {

        case let .failure(unresolved, errorMessage):
            // FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input.


Same as the previous comment

Updated in 9a6fa12

daniel-grumberg · 2022-12-09T23:41:10Z

Sources/SwiftDocC/Utility/CollectionChanges.swift

+        }
+
+        var changes = ChangeSegmentBuilder(originalCount: from.count)
+        for change in to.difference(from: from, by: areEquivalent) {


This assumes the iteration order of CollectionDifference I would be more comfortable if we explicitly defined the order of operations here or made ChangeSegmentBuilder able to handle the changes coming in without a fixed iteration order.

Especially since the docs for CollectionDifference imply that removals are ordered from lowest offset to largest.

This assumes the iteration order of CollectionDifference [...]

The iteration order of CollectionDifference is documented in the source:

/// A CollectionDifference is itself a Collection. /// /// The enumeration order of `Change` elements is: /// /// 1. `.remove`s, from highest `offset` to lowest /// 2. `.insert`s, from lowest `offset` to highest /// /// This guarantees that applicators on compatible base states are safe when /// written in the form: /// /// ``` /// for c in diff { /// switch c { /// case .remove(offset: let o, element: _, associatedWith: _): /// arr.remove(at: o) /// case .insert(offset: let o, element: let e, associatedWith: _): /// arr.insert(e, at: o) /// } /// } /// ```

but partly because it's documented as an extension and partly because it's implements a protocol conformance, this documentation comment doesn't appear in the rendered documentation.

Especially since the docs for CollectionDifference imply that removals are ordered from lowest offset to largest.

Yes, when accessing the removals or insertions properties directly they are both sorted lowest to highest but when enumerating the CollectionDifference the removals are ordered from highest to lowest (per the documentation comment above).

[...] I would be more comfortable if we explicitly defined the order of operations here or made ChangeSegmentBuilder able to handle the changes coming in without a fixed iteration order.

Since this is the documented safe way to apply the changes to a base state I don't feel that we need to explicitly define an order or make the builder handle arbitrary ordered changes.

I can add a link to the documentation in the source to highlight that the order is well defined.

Fixed in 36fac60

daniel-grumberg · 2022-12-12T17:29:13Z

Sources/SwiftDocC/Utility/CollectionChanges.swift

+        //   Since the removal has to be at a lower index, it can only be applied to the shortened 'original' segment.
+        //
+        // This process repeats, meaning that every removal is always applied to the first segment.
+        let segment = segments[0]


Usage of segment vs segments[0] is inconsistent and slightly confusing to me. Since you need to go through segments[0] to mutate anyway, might be worth just using that everywhere to make the code more readable.

My intention was that segment would indicate a read and segments[0] would indicate a write but there are a few places where a subscript is used to read as well.

I've taken a second look over the insert and remove code and was able to simplify the conditionals a little bit and remove any segments[0] reads (but there are still a segments[1] read)

Updated in 08346bf

daniel-grumberg · 2022-12-12T17:31:15Z

Sources/SwiftDocC/Utility/CollectionChanges.swift

+        else if removalIndex == segment.count - 1 {
+            // Removing at end of segment
+            segments[0].count -= 1
+            assert(segments[0].count > 0, """


I don't think this assert can hit? Wouldn't the control flow go inside the removalIndex == 0 branch? Consider removing the assert

You're right. This assertion was left in from an earlier version of the code. I've removed it.

Fixed as part of 08346bf

d-ronnqvist · 2022-12-13T02:55:24Z

@swift-ci please test

daniel-grumberg · 2022-12-13T14:31:37Z

Sources/SwiftDocC/Utility/CollectionChanges.swift

+            switch segments[insertSegmentIndex].kind {
+            case .insert:
+                // If the next segment is an 'insert' segment, simply increment it
+                segments[insertSegmentIndex].count += 1


Is this case possible? Since the documentation you linked above seems to say that insertions are then applied from lowest offset to highest?

As long as this code is well behaved this shouldn't be possible. I'll turn it into an assertion.

Updated in 77721d8

daniel-grumberg · 2022-12-13T14:34:33Z

Sources/SwiftDocC/Utility/CollectionChanges.swift

+                // If the next segment is a 'remove' segment, skip over it so that insertions are always after removals.
+                segments.insert(Segment(kind: .insert, count: 1), at: insertSegmentIndex + 1)
+
+                assert(insertSegmentIndex + 2 == segments.count || segments[insertSegmentIndex + 2].kind == .common,


This essentially just asserts that insertions are ordered from lowest index to highest (and that remove segments are correctly created).

Maybe if we want to make that kind of assertion, we could assert that the collection difference traversal ordered is correct. That way if that ever changes we can get a clear signal and fix the code.

My goal with these assertions is more to verify that this code does the right thing than to verify that its inputs. For example; if this code inserted or removed at the wrong indices then it could hit one of these assertions even if the removal and insertion indices are passed in the correct order. I prefer to leave this assertion as is.

daniel-grumberg · 2022-12-13T14:45:35Z

Sources/SwiftDocC/Utility/NearMiss.swift

+            } else {
+                // To produce higher contributions for longer common sequences, this implementation sums the sequence (1...length)
+                // and adds an arbitrary constant factor.
+                return Double((1...segment.count).reduce(0, +)) + 3


Nit: not sure this gets peephole optimized, but consider rewriting the this as:

Suggested change

return Double((1...segment.count).reduce(0, +)) + 3

return Double(((segment.count * (segment.count + 1)) / 2) + 3)

You're right, this probably doesn't get optimized.

Fixed in 52f57eb

daniel-grumberg

LGTM

Also, lower one `precondition` to `assert`.

d-ronnqvist · 2022-12-13T18:05:10Z

@swift-ci please test

d-ronnqvist · 2022-12-13T19:29:46Z

@swift-ci please test

d-ronnqvist added 6 commits November 6, 2022 20:31

Add a type to compute consecutive sequence lengths

2c27b20

Add a type to find "near-misses" for diagnostics

f761878

Identify near misses for unresolved symbol link error messages

b768f89

Update comments about adding near-misses with current status.

48a134c

Silence a deprecation warning in one of the unit tests

5934b9e

Update error message to reflect that suggestions can be non-symbols.

74f84de

d-ronnqvist requested review from ethan-kusters, franklinsch and daniel-grumberg and removed request for ethan-kusters and franklinsch November 9, 2022 20:00

Add missing variable binding in condition for Swift < 5.7 compatibility

7d76228

Small scoring adjustments for best-match ranking

0df73fd

Merge branch 'main' into near-miss

2cd33ad

Merge branch 'main' into near-miss

07f2d83

Merge branch 'main' into near-miss

6e9e9e3

d-ronnqvist requested a review from ethan-kusters December 1, 2022 17:29

Merge branch 'main' into near-miss

a7a6a14

Merge branch 'main' into near-miss

bf57811

Merge branch 'main' into near-miss

545fe62

daniel-grumberg reviewed Dec 9, 2022

View reviewed changes

daniel-grumberg reviewed Dec 12, 2022

View reviewed changes

d-ronnqvist added 4 commits December 12, 2022 17:56

Add link to documentation for CollectionDifference enumeration order

36fac60

Simplify the implementations of segment insertions and removals

08346bf

Update some terminology in implementation comments

529d795

Update previous FIXME comments to describe the follow up work

9a6fa12

daniel-grumberg reviewed Dec 13, 2022

View reviewed changes

daniel-grumberg approved these changes Dec 13, 2022

View reviewed changes

Assert that insert segments aren't followed by other insert segments.

77721d8

Also, lower one `precondition` to `assert`.

Make micro optimization in near-miss score calculation

352fe26

d-ronnqvist force-pushed the near-miss branch from 52f57eb to 352fe26 Compare December 13, 2022 19:29

d-ronnqvist merged commit 150eb7d into apple:main Dec 13, 2022

d-ronnqvist deleted the near-miss branch December 13, 2022 21:01

		@@ -71,6 +71,8 @@ struct MarkupReferenceResolver: MarkupRewriter {
		}

		// FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input (rdar://59660520).

		@@ -85,6 +85,8 @@ struct ReferenceResolver: SemanticVisitor {

		case let .failure(unresolved, errorMessage):
		// FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input.

	return Double((1...segment.count).reduce(0, +)) + 3
	return Double(((segment.count * (segment.count + 1)) / 2) + 3)

Add near-miss suggestions for unresolved symbol link error messages #420

Add near-miss suggestions for unresolved symbol link error messages #420

Conversation

d-ronnqvist commented Nov 8, 2022 • edited

Summary

Dependencies

Testing

Checklist

ktoso commented Nov 9, 2022

d-ronnqvist commented Nov 9, 2022

d-ronnqvist commented Nov 10, 2022

d-ronnqvist commented Nov 10, 2022

d-ronnqvist commented Nov 14, 2022

d-ronnqvist commented Nov 18, 2022

d-ronnqvist commented Dec 1, 2022

d-ronnqvist commented Dec 1, 2022

d-ronnqvist commented Dec 5, 2022

d-ronnqvist commented Dec 8, 2022

d-ronnqvist commented Dec 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-ronnqvist commented Dec 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-grumberg Dec 13, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-grumberg left a comment

Choose a reason for hiding this comment

d-ronnqvist commented Dec 13, 2022

d-ronnqvist commented Dec 13, 2022

d-ronnqvist commented Nov 8, 2022 •

edited

daniel-grumberg Dec 13, 2022 •

edited