Continues searching if a non-symbol matches a symbol link #901

d-ronnqvist · 2024-04-26T13:44:42Z

Bug/issue #, if applicable: rdar://126580675

Summary

This fixes a bug where DocC would stop searching as soon as it found any match while resolving a link, even if that match wasn't valid for the search (for example a non-symbol match for a symbol link). This resulted in "Symbol links can only resolve symbols" errors for some symbol links if there was a closer non-symbol match to the page that contained the link.

Now, DocC will check that the match is valid for the search, and if it's not; DocC will remember the error message from the first match but continue searching to see if there's another valid match.

Dependencies

None

Testing

In a project with some top-level symbol,

Add an article with a heading (or topic section) with the same name as the symbol and also link to that symbol with a symbol link. For example:
```
# Some article

An article with a heading with the same name as a symbol

### SomeClass

- ``SomeClass``
```
Build documentation for the project.
- The link should resolve to the symbol without warnings.
Change the link to a general documentation link (<doc:SomeClass>) and build documentation again.
- The link should resolve to the heading without warnings.

Add another heading to the article and link to it using a symbol link

  # Some article
 
  An article with a heading with the same name as a symbol
 
+ ### OtherHeading
+
  ### SomeClass
 
  - ``SomeClass``
+ - ``OtherHeading``

Build documentation for the project again
- The new link should result in a warning that "Symbol links can only resolve symbols"

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

Added tests
Ran the ./bin/test script and it succeeded
~~Updated documentation if necessary~~ Not applicable

rdar://126580675

d-ronnqvist · 2024-04-26T13:45:09Z

@swift-ci please test

d-ronnqvist · 2024-04-26T13:54:35Z

@swift-ci please test

sofiaromorales · 2024-04-30T12:14:40Z

Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+Find.swift

+            if node.identifier == nil {
+                throw Error.unfindableMatch(node)
+            }


Why is this check needed if the identifier property inside theNode class is an implicitly unwrapped optional?

fileprivate(set) var identifier: ResolvedIdentifier!

If there's a possibility for this value to be nil, then shouldn't it be an optional value?

All the tests pass without this check, so I'm not clear on whether it's needed.

This code didn't change, I just moved it.

If there's a possibility for this value to be nil, then shouldn't it be an optional value?

Yes, it can be nil. If you look at this code (PathHierarchy.swift L302-L305) you can see that for the nodes initialized from the symbol graph file, a node is only assigned an identifier if it has a symbol.

The reason why this exists is that the convert service build documentation with a "sparse" symbol graph file that only contains a single symbol. However, this symbol may have path components that represent symbols that aren't in that symbol graph. For example, if the SGF only contains someMethod() but that symbol is a member of SomeClass then that path hierarchy will create an "unfindable" node for "SomeClass" and use it to build up a documentation hierarchy. Since there isn't a symbol for that node, the convert service provides information about how that path component should be disambiguated via the knownDisambiguatedPathComponents argument. Having a node for the "SomeClass" path components enabled the content to write links like SomeClass/someMethod() or SomeClass-class/someMethod() despite the path hierarchy not knowing that there is a "SomeClass" symbol.

The reason for Error.unfindableMatch(node) is that in a convert service build with only one symbol, it's allowed to traverse past "SomeClass" but since SomeClass doesn't exist in the symbol graph, it's not allowed to resolve a link to only SomeClass.

There's a comment elsewhere in the path hierarchy implementation that talk about this:

Placeholder values, also called "unfindable elements" or "sparse nodes", are created when constructing a path hierarchy for a "partial" symbol graph file.

When the ConvertService builds documentation for a single symbol with multiple path components, the path hierarchy fills in placeholder nodes for the other path components. This ensures that the nodes in the hierarchy are connected and that there's the same number of nodes—with the same names—between the module node and the non-placeholder node as there would be in the full symbol graph.

The placeholder nodes can be traversed up and down while resolving a link—to reach a non-placeholder node—but the link will be considered "not found" if it ends at a placeholder node.

Makes sense, but I agree with @sofiaromorales this is confusing or even dangerous. We need to be 100% sure that the identifier will never be nil before trying to access it... Maybe that's the entire purpose of this function? To insure that no other code that uses PathHierarchy::Node's will ever see a nil identifier? Are nodes ever returned in any other code paths?

sofiaromorales · 2024-04-30T12:20:28Z

Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+Find.swift

@@ -18,14 +18,7 @@ extension PathHierarchy {
    /// - Returns: Returns the unique identifier for the found match or raises an error if no match can be found.
    /// - Throws: Raises a ``PathHierarchy/Error`` if no match can be found.


Before, it was very easy to state that this method returned a PathHierarchy/Error. However, now that this method calls another very extensive method with its own inner method calls, it's more difficult to assure that this - Throws: Raises a PathHierarchy/Error if no match can be found. is always true.

I checked all the other throw statements inside findNode, and it seems like they all throw a PathHierarchy/Error. However, since there are multiple nested calls, I find it very easy to mistakenly throw another type of error without noticing this comment.

Any thoughts about this?

The comment still holds true. All errors raised from PathHierarchy/find(path:parent:onlyFindSymbols:) are PathHierarchy/Error. In the future when we eventually update to Swift 6 we can adopt typed throws here to have the compiler enforce this.

sofiaromorales · 2024-04-30T12:23:15Z

Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+Find.swift

@@ -305,7 +310,8 @@ extension PathHierarchy {
                }

                // Couldn't resolve the collision by look ahead.
-                return try handleCollision(node: node, remaining: remaining, collisions: collisions, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)
+//                return try handleCollision(node: node, remaining: remaining, collisions: collisions, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)


Suggested change

// return try handleCollision(node: node, remaining: remaining, collisions: collisions, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)

d-ronnqvist · 2024-04-30T12:46:29Z

@swift-ci please test

patshaughnessy

Looks good and I was able to verify the fix works using your instructions. However, reviewing this was a bit tricky since it's not obvious how moving the calls to validateReturnValue actually fixes the problem. I wasn't able to understand the impact of this change until I discovered in the debugger that we catch the exceptions in various places and continue on with the algorithm.

I wonder if this code could be more easily understood if we changed the return value of searchForNode to be a new structure that contains the node and maybe some error info, that would only be returned by validateReturnValue. This would make it less likely that some future developer would accidentally return a node from some new code path they added in searchForNode. Right now it seems error prone to assume that all possible code paths (and there are many!) from searchForNode call validateReturnValue. Adding a new struct could allow the Swift compiler to help us remember to call validateReturnValue in the future.

What do you think?

d-ronnqvist · 2024-05-03T13:12:22Z

I wonder if this code could be more easily understood if we changed the return value of searchForNode to be a new structure that contains the node and maybe some error info, that would only be returned by validateReturnValue. This would make it less likely that some future developer would accidentally return a node from some new code path they added in searchForNode. Right now it seems error prone to assume that all possible code paths (and there are many!) from searchForNode call validateReturnValue. Adding a new struct could allow the Swift compiler to help us remember to call validateReturnValue in the future.

What do you think?

I don't quite know what I think...

I got curious so I locally rewrote searchForNode(...) two times to return Result<Node, PathHierarchy.Error> instead of being a throwing function that returns Node. Doing so makes impossible to straight up return the node without and can serve as a reminder to call validateReturnValue(_ node: Node) -> Result<Node, Error> before returning. However, it's still possible to write return .success(node) instead of calling validateReturnValue(...).

Also, since the current implementation already calls and catches some of the internal errors, the Result<Node, PathHierarchy.Error> implementation either needs to repeatedly break up the calling code

+ let children: DisambiguationContainer
+ let pathComponent: PathComponent
+ do {
-     let (children, pathComponent) = try findChildContainer(node: &node, remaining: remaining, rawPathForError: rawPathForError)
+     (children, pathComponent) = try findChildContainer(node: &node, remaining: remaining, rawPathForError: rawPathForError)
+ } catch let error as PathHierarchy.Error {
+     return .failure(error)
+ } catch {
+     fatalError("PathHierarchy only throws PathHierarchy.Error")
+ }

Or the implementation needs to nest multiple do statements to have certain parts that catch some internal errors

do {
    // Do work where we want to return if it raises an error
  
    do {
       // Do work where we want to handle certain errors specifically
    } catch DisambiguationContainer.Error.lookupCollision(let collisions) {
        do {
            // Do more work where we want to return if it raises an error
        }
    }
} catch let error as PathHierarchy.Error {
    return .failure(error)
} catch {
    fatalError("PathHierarchy only throws PathHierarchy.Error")
}

The latter can be written like above, where the inner do statements don't have their own catch statements and instead rely on the error being caught by an outer catch statement. This makes the control flow a little bit harder to follow since the reader need to scan the rest of the method to see where certain errors are caught from certain scopes. Alternatively, the inner do statements can repeat,

} catch let error as PathHierarchy.Error {
    return .failure(error)
} catch {
    fatalError("PathHierarchy only throws PathHierarchy.Error")
}

but that makes the implementation longer and full of do-catch blocks, which makes it harder to spot the business logic for all the error handling code.

One way to solve all this extra internal error handling syntax and make the implementation read better again is to change the the other internal methods to return Result<Node, PathHierarchy.Error> but if we do that then the compiler can no longer help catch cases where we forget to call validateReturnValue(...).

d-ronnqvist · 2024-05-03T13:43:08Z

I think if we want to ensure that validateReturnValue(...) is always called, the best way to move the implementation into an even more more private version and have searchForNode(...) call that and do the validation there:

private func searchForNode(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
    let node = try _searchForNode(descendingFrom: startingPoint, pathComponents: pathComponents, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)
    
    /// Ensure that the return value is valid for this search.
    if node.identifier == nil {
        throw Error.unfindableMatch(node)
    }
    if onlyFindSymbols, node.symbol == nil {
        throw Error.nonSymbolMatchForSymbolLink(path: rawPathForError)
    }
    return node
}

private func _searchForNode(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
    // original implementation without validateReturnValue goes here.
}

If we want to be really, really, sure that no-one accidentally calls _searchForNode(...) instead of searchForNode() we could define it within the implementation of searchForNode()

private func searchForNode(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
    func _innerImplementation(descendingFrom startingPoint: Node, pathComponents: ArraySlice<PathComponent>, onlyFindSymbols: Bool, rawPathForError: String) throws -> Node {
        // original implementation without validateReturnValue goes here.
    }

    let node = try _innerImplementation(descendingFrom: startingPoint, pathComponents: pathComponents, onlyFindSymbols: onlyFindSymbols, rawPathForError: rawPathForError)
    
    /// Ensure that the return value is valid for this search.
    if node.identifier == nil {
        throw Error.unfindableMatch(node)
    }
    if onlyFindSymbols, node.symbol == nil {
        throw Error.nonSymbolMatchForSymbolLink(path: rawPathForError)
    }
    return node
}

That gives the most guarantees from the compiler but it may be a new pattern to the reader. However, once the pattern is understood, the inner implementation can be read independently and the reader can know that the extra return value validation will always run.

I'll change to code to follow this pattern.

d-ronnqvist · 2024-05-03T13:53:18Z

@swift-ci please test

d-ronnqvist · 2024-05-03T15:52:04Z

@swift-ci please test

patshaughnessy · 2024-05-03T16:51:22Z

"a new structure that contains the node and maybe some error info" ha ha I was describing the Result struct. I've used that in Rust before but not in Swift yet. It would be great to use it here but yes probably a larger rewrite of the search algorithm would be needed to make it work. Thanks for updating this!

d-ronnqvist · 2024-05-04T06:35:43Z

It would be great to use it here but yes probably a larger rewrite of the search algorithm would be needed to make it work.

Like I said above, I did rewrite two different versions of the search algorithm that used the Result type but discarded both because I felt that they made the error handling a more verbose without any making the control flow easier to follow and without any significant extra safeguards from the compiler.

patshaughnessy · 2024-05-04T14:42:38Z

Yes I saw that you tried rewriting this in a couple of different ways. Thank for looking into this idea!

I suppose what I mean by "larger rewrite" would be to look for an algorithm that doesn't use the error handling approach at all, or at least not with exceptions. Maybe what you meant with:

One way to solve all this extra internal error handling syntax and make the implementation read better again is to change the other internal methods to return Result<Node, PathHierarchy.Error>.

I think what you ended up with here this week is great.

rdar://126580675

Continues searching if a non-symbol matches a symbol link

6abed54

rdar://126580675

d-ronnqvist assigned sofiaromorales Apr 26, 2024

d-ronnqvist requested a review from sofiaromorales April 26, 2024 13:54

d-ronnqvist unassigned sofiaromorales Apr 26, 2024

Merge branch 'main' into non-symbol-match-to-symbol-link

45a398b

sofiaromorales reviewed Apr 30, 2024

View reviewed changes

sofiaromorales requested review from patshaughnessy and mayaepps April 30, 2024 12:23

Merge branch 'main' into non-symbol-match-to-symbol-link

b9badcf

patshaughnessy approved these changes May 3, 2024

View reviewed changes

sofiaromorales approved these changes May 3, 2024

View reviewed changes

Use an inner function to guarantee searchForNode return value validation

cfa4103

d-ronnqvist merged commit aa0a99e into apple:main May 3, 2024
2 checks passed

d-ronnqvist deleted the non-symbol-match-to-symbol-link branch May 3, 2024 16:30

d-ronnqvist added a commit to d-ronnqvist/swift-docc that referenced this pull request May 6, 2024

Continues searching if a non-symbol matches a symbol link (apple#901)

fcc2301

rdar://126580675

d-ronnqvist mentioned this pull request May 6, 2024

[6.0] Continues searching if a non-symbol matches a symbol link (#901) #909

Merged

d-ronnqvist added a commit that referenced this pull request May 7, 2024

Continues searching if a non-symbol matches a symbol link (#901) (#909)

6b83d37

rdar://126580675

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continues searching if a non-symbol matches a symbol link #901

Continues searching if a non-symbol matches a symbol link #901

d-ronnqvist commented Apr 26, 2024

d-ronnqvist commented Apr 26, 2024

d-ronnqvist commented Apr 26, 2024

sofiaromorales Apr 30, 2024

d-ronnqvist Apr 30, 2024

d-ronnqvist Apr 30, 2024

patshaughnessy May 3, 2024

sofiaromorales Apr 30, 2024

d-ronnqvist Apr 30, 2024

sofiaromorales Apr 30, 2024

d-ronnqvist commented Apr 30, 2024

patshaughnessy left a comment

d-ronnqvist commented May 3, 2024

d-ronnqvist commented May 3, 2024

d-ronnqvist commented May 3, 2024

d-ronnqvist commented May 3, 2024

patshaughnessy commented May 3, 2024

d-ronnqvist commented May 4, 2024

patshaughnessy commented May 4, 2024

		@@ -18,14 +18,7 @@ extension PathHierarchy {
		/// - Returns: Returns the unique identifier for the found match or raises an error if no match can be found.
		/// - Throws: Raises a ``PathHierarchy/Error`` if no match can be found.

Continues searching if a non-symbol matches a symbol link #901

Continues searching if a non-symbol matches a symbol link #901

Conversation

d-ronnqvist commented Apr 26, 2024

Summary

Dependencies

Testing

Checklist

d-ronnqvist commented Apr 26, 2024

d-ronnqvist commented Apr 26, 2024

sofiaromorales Apr 30, 2024

Choose a reason for hiding this comment

d-ronnqvist Apr 30, 2024

Choose a reason for hiding this comment

d-ronnqvist Apr 30, 2024

Choose a reason for hiding this comment

patshaughnessy May 3, 2024

Choose a reason for hiding this comment

sofiaromorales Apr 30, 2024

Choose a reason for hiding this comment

d-ronnqvist Apr 30, 2024

Choose a reason for hiding this comment

sofiaromorales Apr 30, 2024

Choose a reason for hiding this comment

d-ronnqvist commented Apr 30, 2024

patshaughnessy left a comment

Choose a reason for hiding this comment

d-ronnqvist commented May 3, 2024

d-ronnqvist commented May 3, 2024

d-ronnqvist commented May 3, 2024

d-ronnqvist commented May 3, 2024

patshaughnessy commented May 3, 2024

d-ronnqvist commented May 4, 2024

patshaughnessy commented May 4, 2024