Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continues searching if a non-symbol matches a symbol link #901

Merged
merged 4 commits into from May 3, 2024

Conversation

d-ronnqvist
Copy link
Contributor

Bug/issue #, if applicable: rdar://126580675

Summary

This fixes a bug where DocC would stop searching as soon as it found any match while resolving a link, even if that match wasn't valid for the search (for example a non-symbol match for a symbol link). This resulted in "Symbol links can only resolve symbols" errors for some symbol links if there was a closer non-symbol match to the page that contained the link.

Now, DocC will check that the match is valid for the search, and if it's not; DocC will remember the error message from the first match but continue searching to see if there's another valid match.

Dependencies

None

Testing

In a project with some top-level symbol,

  • Add an article with a heading (or topic section) with the same name as the symbol and also link to that symbol with a symbol link. For example:

    # Some article
    
    An article with a heading with the same name as a symbol
    
    ### SomeClass
    
    - ``SomeClass``
  • Build documentation for the project.

    • The link should resolve to the symbol without warnings.
  • Change the link to a general documentation link (<doc:SomeClass>) and build documentation again.

    • The link should resolve to the heading without warnings.
  • Add another heading to the article and link to it using a symbol link

      # Some article
     
      An article with a heading with the same name as a symbol
     
    + ### OtherHeading
    +
      ### SomeClass
     
      - ``SomeClass``
    + - ``OtherHeading``
  • Build documentation for the project again

    • The new link should result in a warning that "Symbol links can only resolve symbols"

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary Not applicable

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

Comment on lines 217 to 219
if node.identifier == nil {
throw Error.unfindableMatch(node)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check needed if the identifier property inside theNode class is an implicitly unwrapped optional?

fileprivate(set) var identifier: ResolvedIdentifier!

If there's a possibility for this value to be nil, then shouldn't it be an optional value?

All the tests pass without this check, so I'm not clear on whether it's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code didn't change, I just moved it.

If there's a possibility for this value to be nil, then shouldn't it be an optional value?

Yes, it can be nil. If you look at this code (PathHierarchy.swift L302-L305) you can see that for the nodes initialized from the symbol graph file, a node is only assigned an identifier if it has a symbol.

The reason why this exists is that the convert service build documentation with a "sparse" symbol graph file that only contains a single symbol. However, this symbol may have path components that represent symbols that aren't in that symbol graph. For example, if the SGF only contains someMethod() but that symbol is a member of SomeClass then that path hierarchy will create an "unfindable" node for "SomeClass" and use it to build up a documentation hierarchy. Since there isn't a symbol for that node, the convert service provides information about how that path component should be disambiguated via the knownDisambiguatedPathComponents argument. Having a node for the "SomeClass" path components enabled the content to write links like SomeClass/someMethod() or SomeClass-class/someMethod() despite the path hierarchy not knowing that there is a "SomeClass" symbol.

The reason for Error.unfindableMatch(node) is that in a convert service build with only one symbol, it's allowed to traverse past "SomeClass" but since SomeClass doesn't exist in the symbol graph, it's not allowed to resolve a link to only SomeClass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment elsewhere in the path hierarchy implementation that talk about this:

Placeholder values, also called "unfindable elements" or "sparse nodes", are created when constructing a path hierarchy for a "partial" symbol graph file.

When the ConvertService builds documentation for a single symbol with multiple path components, the path hierarchy fills in placeholder nodes for the other path components. This ensures that the nodes in the hierarchy are connected and that there's the same number of nodes—with the same names—between the module node and the non-placeholder node as there would be in the full symbol graph.

The placeholder nodes can be traversed up and down while resolving a link—to reach a non-placeholder node—but the link will be considered "not found" if it ends at a placeholder node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but I agree with @sofiaromorales this is confusing or even dangerous. We need to be 100% sure that the identifier will never be nil before trying to access it... Maybe that's the entire purpose of this function? To insure that no other code that uses PathHierarchy::Node's will ever see a nil identifier? Are nodes ever returned in any other code paths?

@@ -18,14 +18,7 @@ extension PathHierarchy {
/// - Returns: Returns the unique identifier for the found match or raises an error if no match can be found.
/// - Throws: Raises a ``PathHierarchy/Error`` if no match can be found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, it was very easy to state that this method returned a PathHierarchy/Error. However, now that this method calls another very extensive method with its own inner method calls, it's more difficult to assure that this - Throws: Raises a PathHierarchy/Error if no match can be found. is always true.

I checked all the other throw statements inside findNode, and it seems like they all throw a PathHierarchy/Error. However, since there are multiple nested calls, I find it very easy to mistakenly throw another type of error without noticing this comment.

Any thoughts about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still holds true. All errors raised from PathHierarchy/find(path:parent:onlyFindSymbols:) are PathHierarchy/Error. In the future when we eventually update to Swift 6 we can adopt typed throws here to have the compiler enforce this.

@@ -305,7 +310,8 @@ extension PathHierarchy {
}

// Couldn't resolve the collision by look ahead.
return try handleCollision(node: node, remaining: remaining, collisions: collisions, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)
// return try handleCollision(node: node, remaining: remaining, collisions: collisions, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// return try handleCollision(node: node, remaining: remaining, collisions: collisions, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and I was able to verify the fix works using your instructions. However, reviewing this was a bit tricky since it's not obvious how moving the calls to validateReturnValue actually fixes the problem. I wasn't able to understand the impact of this change until I discovered in the debugger that we catch the exceptions in various places and continue on with the algorithm.

I wonder if this code could be more easily understood if we changed the return value of searchForNode to be a new structure that contains the node and maybe some error info, that would only be returned by validateReturnValue. This would make it less likely that some future developer would accidentally return a node from some new code path they added in searchForNode. Right now it seems error prone to assume that all possible code paths (and there are many!) from searchForNode call validateReturnValue. Adding a new struct could allow the Swift compiler to help us remember to call validateReturnValue in the future.

What do you think?

@d-ronnqvist
Copy link
Contributor Author

I wonder if this code could be more easily understood if we changed the return value of searchForNode to be a new structure that contains the node and maybe some error info, that would only be returned by validateReturnValue. This would make it less likely that some future developer would accidentally return a node from some new code path they added in searchForNode. Right now it seems error prone to assume that all possible code paths (and there are many!) from searchForNode call validateReturnValue. Adding a new struct could allow the Swift compiler to help us remember to call validateReturnValue in the future.

What do you think?

I don't quite know what I think...

I got curious so I locally rewrote searchForNode(...) two times to return Result<Node, PathHierarchy.Error> instead of being a throwing function that returns Node. Doing so makes impossible to straight up return the node without and can serve as a reminder to call validateReturnValue(_ node: Node) -> Result<Node, Error> before returning. However, it's still possible to write return .success(node) instead of calling validateReturnValue(...).

Also, since the current implementation already calls and catches some of the internal errors, the Result<Node, PathHierarchy.Error> implementation either needs to repeatedly break up the calling code

+ let children: DisambiguationContainer
+ let pathComponent: PathComponent
+ do {
-     let (children, pathComponent) = try findChildContainer(node: &node, remaining: remaining, rawPathForError: rawPathForError)
+     (children, pathComponent) = try findChildContainer(node: &node, remaining: remaining, rawPathForError: rawPathForError)
+ } catch let error as PathHierarchy.Error {
+     return .failure(error)
+ } catch {
+     fatalError("PathHierarchy only throws PathHierarchy.Error")
+ }

Or the implementation needs to nest multiple do statements to have certain parts that catch some internal errors

do {
    // Do work where we want to return if it raises an error
  
    do {
       // Do work where we want to handle certain errors specifically
    } catch DisambiguationContainer.Error.lookupCollision(let collisions) {
        do {
            // Do more work where we want to return if it raises an error
        }
    }
} catch let error as PathHierarchy.Error {
    return .failure(error)
} catch {
    fatalError("PathHierarchy only throws PathHierarchy.Error")
}

The latter can be written like above, where the inner do statements don't have their own catch statements and instead rely on the error being caught by an outer catch statement. This makes the control flow a little bit harder to follow since the reader need to scan the rest of the method to see where certain errors are caught from certain scopes. Alternatively, the inner do statements can repeat,

} catch let error as PathHierarchy.Error {
    return .failure(error)
} catch {
    fatalError("PathHierarchy only throws PathHierarchy.Error")
}

but that makes the implementation longer and full of do-catch blocks, which makes it harder to spot the business logic for all the error handling code.

One way to solve all this extra internal error handling syntax and make the implementation read better again is to change the the other internal methods to return Result<Node, PathHierarchy.Error> but if we do that then the compiler can no longer help catch cases where we forget to call validateReturnValue(...).

@d-ronnqvist
Copy link
Contributor Author

I think if we want to ensure that validateReturnValue(...) is always called, the best way to move the implementation into an even more more private version and have searchForNode(...) call that and do the validation there:

private func searchForNode(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
    let node = try _searchForNode(descendingFrom: startingPoint, pathComponents: pathComponents, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)
    
    /// Ensure that the return value is valid for this search.
    if node.identifier == nil {
        throw Error.unfindableMatch(node)
    }
    if onlyFindSymbols, node.symbol == nil {
        throw Error.nonSymbolMatchForSymbolLink(path: rawPathForError)
    }
    return node
}

private func _searchForNode(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
    // original implementation without validateReturnValue goes here.
}

If we want to be really, really, sure that no-one accidentally calls _searchForNode(...) instead of searchForNode() we could define it within the implementation of searchForNode()

private func searchForNode(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
    func _innerImplementation(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
        // original implementation without validateReturnValue goes here.
    }

    let node = try _innerImplementation(descendingFrom: startingPoint, pathComponents: pathComponents, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)
    
    /// Ensure that the return value is valid for this search.
    if node.identifier == nil {
        throw Error.unfindableMatch(node)
    }
    if onlyFindSymbols, node.symbol == nil {
        throw Error.nonSymbolMatchForSymbolLink(path: rawPathForError)
    }
    return node
}

That gives the most guarantees from the compiler but it may be a new pattern to the reader. However, once the pattern is understood, the inner implementation can be read independently and the reader can know that the extra return value validation will always run.

I'll change to code to follow this pattern.

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

1 similar comment
@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist merged commit aa0a99e into apple:main May 3, 2024
2 checks passed
@d-ronnqvist d-ronnqvist deleted the non-symbol-match-to-symbol-link branch May 3, 2024 16:30
@patshaughnessy
Copy link
Contributor

"a new structure that contains the node and maybe some error info" ha ha I was describing the Result struct. I've used that in Rust before but not in Swift yet. It would be great to use it here but yes probably a larger rewrite of the search algorithm would be needed to make it work. Thanks for updating this!

@d-ronnqvist
Copy link
Contributor Author

It would be great to use it here but yes probably a larger rewrite of the search algorithm would be needed to make it work.

Like I said above, I did rewrite two different versions of the search algorithm that used the Result type but discarded both because I felt that they made the error handling a more verbose without any making the control flow easier to follow and without any significant extra safeguards from the compiler.

@patshaughnessy
Copy link
Contributor

Yes I saw that you tried rewriting this in a couple of different ways. Thank for looking into this idea!

I suppose what I mean by "larger rewrite" would be to look for an algorithm that doesn't use the error handling approach at all, or at least not with exceptions. Maybe what you meant with:

One way to solve all this extra internal error handling syntax and make the implementation read better again is to change the other internal methods to return Result<Node, PathHierarchy.Error>.

I think what you ended up with here this week is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants