Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add near-miss suggestions for unresolved symbol link error messages #420

Merged
merged 20 commits into from
Dec 13, 2022

Conversation

d-ronnqvist
Copy link
Contributor

@d-ronnqvist d-ronnqvist commented Nov 8, 2022

Bug/issue #, if applicable: rdar://59660520

Summary

This adds a mechanism for sorting and filtering strings based on how "similar" they are to a specific string for use to create near-miss suggestions in diagnostics.

The changes are divided in two parts:

  • One that wraps the the difference capabilities in Swift to calculate the length of the segments of common and difference elements.
  • Another that "score" these segments to "rank" values based on how "similar" they are to the authored value.

Most of changes in this PR (1500+ LOC) are the tests for both these parts.

For the second part there is no single right or wrong answer for how to do that. The approach I took is to write test of what results I would expect when matching a collection of possibilities against a certain value and then tweak the "score" and "ranking" implementations until they produces the results that I'd expect. This implementation is completely arbitrary.

My hopes is that when there are situations where this doesn't behave well, we'll add those as more tests and tweak the "score" and "ranking" implementations again.

Dependencies

None

Testing

Please give this a spin with some real code bases to see how it performs with real links in real projects.

To do so:

  1. Checkout this branch and build the docc executable.
  2. Build documentation for another project with DocCUseHierarchyBasedLinkResolver enabled and DOCC_EXEC pointed to the locally built docc executable.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary

@ktoso
Copy link
Member

ktoso commented Nov 9, 2022

Very nice, I'm looking forward to trying this! I'm more than happy to run with main based toolchains etc and provide some feedback how it feels from editing some real docs :-)

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@daniel-grumberg or @ethan-kusters do either of you have time to review this?

@@ -71,6 +71,8 @@ struct MarkupReferenceResolver: MarkupRewriter {
}

// FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input (rdar://59660520).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input (rdar://59660520).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or have I misunderstood? Either way this needs clarification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't consider the FIXME's (and rdar://59660520) fully resolved until the near-miss suggestions are structured as as Solution elements on the Diagnostic.

With these changes the near-miss suggestions are part of the error message which both means that they're not presented as fix-its and that they need to be a single line to appear in inline error messages.

Solving that may require breaking changes—since TopicReferenceResolutionResult is a public type. I'm not sure what the best solution is here since the resolver doesn't know the URL or SourceRange which it needs to create Solution elements with a Replacement.

I can add more info to the existing FIXME comment to better describe what's remaining. I could also consider rdar://59660520 done because the information is there and open a new issue for the remaining work to structure the near-miss suggestions are Solution elements with a Replacement. Which do you prefer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a new issue here #438

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 9a6fa12

@@ -85,6 +85,8 @@ struct ReferenceResolver: SemanticVisitor {

case let .failure(unresolved, errorMessage):
// FIXME: Provide near-miss suggestion here. The user is likely to make mistakes with capitalization because of character input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 9a6fa12

}

var changes = ChangeSegmentBuilder(originalCount: from.count)
for change in to.difference(from: from, by: areEquivalent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the iteration order of CollectionDifference I would be more comfortable if we explicitly defined the order of operations here or made ChangeSegmentBuilder able to handle the changes coming in without a fixed iteration order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially since the docs for CollectionDifference imply that removals are ordered from lowest offset to largest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the iteration order of CollectionDifference [...]

The iteration order of CollectionDifference is documented in the source:

/// A CollectionDifference is itself a Collection.
///
/// The enumeration order of `Change` elements is:
///
/// 1. `.remove`s, from highest `offset` to lowest
/// 2. `.insert`s, from lowest `offset` to highest
///
/// This guarantees that applicators on compatible base states are safe when
/// written in the form:
///
/// ```
/// for c in diff {
///   switch c {
///   case .remove(offset: let o, element: _, associatedWith: _):
///     arr.remove(at: o)
///   case .insert(offset: let o, element: let e, associatedWith: _):
///     arr.insert(e, at: o)
///   }
/// }
/// ```

but partly because it's documented as an extension and partly because it's implements a protocol conformance, this documentation comment doesn't appear in the rendered documentation.

Especially since the docs for CollectionDifference imply that removals are ordered from lowest offset to largest.

Yes, when accessing the removals or insertions properties directly they are both sorted lowest to highest but when enumerating the CollectionDifference the removals are ordered from highest to lowest (per the documentation comment above).

[...] I would be more comfortable if we explicitly defined the order of operations here or made ChangeSegmentBuilder able to handle the changes coming in without a fixed iteration order.

Since this is the documented safe way to apply the changes to a base state I don't feel that we need to explicitly define an order or make the builder handle arbitrary ordered changes.

I can add a link to the documentation in the source to highlight that the order is well defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 36fac60

// Since the removal has to be at a lower index, it can only be applied to the shortened 'original' segment.
//
// This process repeats, meaning that every removal is always applied to the first segment.
let segment = segments[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of segment vs segments[0] is inconsistent and slightly confusing to me. Since you need to go through segments[0] to mutate anyway, might be worth just using that everywhere to make the code more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was that segment would indicate a read and segments[0] would indicate a write but there are a few places where a subscript is used to read as well.

I've taken a second look over the insert and remove code and was able to simplify the conditionals a little bit and remove any segments[0] reads (but there are still a segments[1] read)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 08346bf

else if removalIndex == segment.count - 1 {
// Removing at end of segment
segments[0].count -= 1
assert(segments[0].count > 0, """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this assert can hit? Wouldn't the control flow go inside the removalIndex == 0 branch? Consider removing the assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This assertion was left in from an earlier version of the code. I've removed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as part of 08346bf

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

switch segments[insertSegmentIndex].kind {
case .insert:
// If the next segment is an 'insert' segment, simply increment it
segments[insertSegmentIndex].count += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case possible? Since the documentation you linked above seems to say that insertions are then applied from lowest offset to highest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this code is well behaved this shouldn't be possible. I'll turn it into an assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 77721d8

// If the next segment is a 'remove' segment, skip over it so that insertions are always after removals.
segments.insert(Segment(kind: .insert, count: 1), at: insertSegmentIndex + 1)

assert(insertSegmentIndex + 2 == segments.count || segments[insertSegmentIndex + 2].kind == .common,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially just asserts that insertions are ordered from lowest index to highest (and that remove segments are correctly created).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if we want to make that kind of assertion, we could assert that the collection difference traversal ordered is correct. That way if that ever changes we can get a clear signal and fix the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal with these assertions is more to verify that this code does the right thing than to verify that its inputs. For example; if this code inserted or removed at the wrong indices then it could hit one of these assertions even if the removal and insertion indices are passed in the correct order. I prefer to leave this assertion as is.

} else {
// To produce higher contributions for longer common sequences, this implementation sums the sequence (1...length)
// and adds an arbitrary constant factor.
return Double((1...segment.count).reduce(0, +)) + 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not sure this gets peephole optimized, but consider rewriting the this as:

Suggested change
return Double((1...segment.count).reduce(0, +)) + 3
return Double(((segment.count * (segment.count + 1)) / 2) + 3)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this probably doesn't get optimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 52f57eb

Copy link
Contributor

@daniel-grumberg daniel-grumberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist merged commit 150eb7d into apple:main Dec 13, 2022
@d-ronnqvist d-ronnqvist deleted the near-miss branch December 13, 2022 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants