Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Symbol Graph Format Extensible #39

Merged
merged 14 commits into from
Sep 14, 2022

Conversation

theMomax
Copy link
Contributor

@theMomax theMomax commented Jun 9, 2022

This PR is required to allow SwiftDocC to restructure the Symbol Graph Format after decoding the Symbol Graph Files. This is necessary to implement extensions to external types (fix apple/swift-docc#210).

Summary

This PR contains four major changes:

  • change Symbol.KindIdentifier to use a struct-type with static properties for known values
  • allow for registering new Symbol.KindIdentifiers to ensure identifiers get parsed correctly
  • change Codable conformance of Symbol and Relationship to allow for encoding and decoding unknown Mixins
  • change Hashable conformance of Relationship so unknown Mixins can be taken into account while hashing

Please refer to SymbolKit.docc/GraphFormatExtensions.md to see how the new APIs are to be used.

The most critical part of this implementation is how to handle decoding (and encoding) of custom Symbol.KindIdentifiers and Mixins. Symbol (and Relationship's) Codable implementations must know how to encode/decode the custom types. This PR solves this problem by having the instance defining the custom extensions register the extensions to the Encoder/Decoder. This is realized via Encoder/Decoder's userInfo. This choice has two key advantages over registering the extensions to the static context:

  • no need for synchronization
  • different extension views on the symbol graph format can coexist within one executable without influencing each other

Next to the user info based approach, this PR also offers the option to register Symbol.KindIdentifier to the static context for convenience.

#7 only provided a solution for Symbol.KindIdentifier which did not register custom identifiers in a global context, but stored the raw value in the Symbol.KindIdentifier locally. This, however, has one disadvantage: Custom identifiers are parsed differently than those defined by SymbolKit. The language prefix cannot be removed, since the initializer cannot know, if the first section of the identifier is a language, or already important for the language-agnostic identifier.

Dependencies

This PR should be merged after #45 as this PR is based on that branch.

Furthermore, this PR incorporates similar changes as #7 which seems to be stale.

Testing

I added test cases that:

  • check encoding/decoding works with custom mixins
  • Symbol.KindIdentifiers registered on the static context are parsed as expected and appear in allCases
  • Symbol.KindIdentifiers registered on a decoder are decoded as expected and do not appear in allCases
  • Hashable conformance of Relationships works as intended with unknown Mixins which do or do not conform to Hashable

Benchmark

Benchmarked this PR (e6478fd) against main (cfa80d7) using Swift-DocC's bin/benchmark.swift. Swift-DocC was on main (ed1770f974c3a1c6d2d1437cc07bc90a8f220c89). I used a bundle generated using swift run make-test-bundle --sizeFactor 50.

There is no significant change in runtime:

+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Duration for 'bundle-registration' (sec) | +0.54%     | 12.01                | 12.08                |
| Duration for 'convert-total-time' (sec)  | +0.28%     | 19.92                | 19.98                |
| Duration for 'documentation-processing'  | -0.15%     | 7.38                 | 7.37                 |
| Duration for 'finalize-navigation-index' | no change  | 0.24                 | 0.24                 |
| Peak memory footprint (MB)               | -0.21%     | 1966.29              | 1962.25              |
| Data subdirectory size (MB)              | no change  | 538.01               | 538.01               |
| Index subdirectory size (MB)             | no change  | 4.84                 | 4.84                 |
| Total DocC archive size (MB)             | no change  | 566.93               | 566.93               |
| Topic Anchor Checksum                    | no change  | 08d0a84bec913460905f | 08d0a84bec913460905f |
| Topic Graph Checksum                     | no change  | 74d25c4f1ee185ad5676 | 74d25c4f1ee185ad5676 |
+-----------------------------------------------------------------------------------------------------+

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary
  • Ran Benchmarks

@theMomax theMomax force-pushed the graph-format-extensibility branch from 67877ce to 22adbd3 Compare July 7, 2022 16:06
@theMomax theMomax changed the title [WIP] Update Symbol+Relationship Kinds and Make Symbol Graph Format Extensible [WIP] Make Symbol Graph Format Extensible Jul 7, 2022
@theMomax theMomax force-pushed the graph-format-extensibility branch from 22adbd3 to d735e76 Compare July 7, 2022 16:18
@theMomax theMomax changed the title [WIP] Make Symbol Graph Format Extensible Make Symbol Graph Format Extensible Jul 8, 2022
@theMomax theMomax marked this pull request as ready for review July 8, 2022 10:03
@franklinsch
Copy link
Member

Hi @theMomax, I haven't reviewed these changes in detail yet, but I'm curious about these:

  • allow for registering new Symbol.KindIdentifiers to ensure identifiers get parsed correctly
  • change Codable conformance of Symbol and Relationship to allow for encoding and decoding unknown Mixins
  • change Hashable conformance of Relationship so unknown Mixins can be taken into account while hashing

I'm not sure I follow why these are required for apple/swift-docc#210, can you elaborate? Are we expecting clients like Swift-DocC to declare their own kind identifiers?

@theMomax
Copy link
Contributor Author

Thanks, @franklinsch, these are very valid questions.

  • allow for registering new Symbol.KindIdentifiers to ensure identifiers get parsed correctly
  • change Codable conformance of Symbol and Relationship to allow for encoding and decoding unknown Mixins
  • change Hashable conformance of Relationship so unknown Mixins can be taken into account while hashing

I'm not sure I follow why these are required for apple/swift-docc#210, can you elaborate?

Actually, none of those are strictly required by apple/swift-docc#335 to resolve apple/swift-docc#210.

However, before I get into detail on this, let me first answer your second question:

Are we expecting clients like Swift-DocC to declare their own kind identifiers?

Yes, we do. This corresponds to the first point:

  • change Symbol.KindIdentifier to use a struct-type with static properties for known values

My PR apple/swift-docc#335 makes use of this by declaring five new kind identifiers. This change is strictly required as the symbols with the new kind identifiers have to be processed and displayed differently from other symbols (see e.g. here).

Now, coming back to your first point about why the lower three points are required. I see them as a pure consequence of SymbolGraph.Symbol conforming to Codable and Hashable.

  • allow for registering new Symbol.KindIdentifiers to ensure identifiers get parsed correctly

If we allow users of SymbolKit to define new symbol kind identifiers, these custom identifiers should be treated exactly like the predefined ones. If I hadn't introduced this change, parsing would behave like here in #7. This behavior from #7 could lead to a lot of trouble with the KindIdentifier's Equatable conformance, which simply compares the identifier string. I.e. the declared KindIdentifier(rawValue: "custom") and the identifier decoded from "swift.custom" or "c.custom" would be unequal.

  • change Codable conformance of Symbol and Relationship to allow for encoding and decoding unknown Mixins
  • change Hashable conformance of Relationship so unknown Mixins can be taken into account while hashing

I definitely expected Symbol/Relationship to consider custom mixins that conform to Codable in the encoding process before I looked at the implementation. Even though Mixin does not require conformance to Codable/Hashable, I believe that SymbolKit should take all mixins into account which conform to the respective protocol.

Mixins are an awesome way to attach additional information to a symbol graph, which could be very interesting for many applications. For example, imagine a tool that attaches authorship information or other git metadata for automated generation of change-protocols/differential documentation. If, however, custom mixins are not considered in the encoding/decoding process, then these applications have no other choice than maintaining a fork of SymbolKit.

Copy link
Member

@franklinsch franklinsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the summary above, that's super helpful. I agree that the work you're doing is a good opportunity to make the symbol graph coding logic more flexible.

I think the extension kinds you defined in apple/swift-docc#335 could also belong directly in SymbolKit though, since they map well to existing kinds. I'd love to hear more of your thoughts about why you preferred the approach of declaring them in Swift-DocC instead. Should some existing kinds and mixins move to Swift-DocC and should SymbolKit treat them as opaque values instead?

I really appreciate how you've clearly spent time designing an API that makes it easy for clients to define their own mixins and kinds. I have some thoughts on how we could possibly simplify this code, as the extra flexibility it offers does result in more complexity.

/// - Note: When working in an uncontrolled environment where other parts of your executable might be disturbed by your
/// modifications to the symbol graph structure, register identifiers on your coder instead using
/// ``CustomizableCoder/register(symbolKinds:)`` and maintain your own list of ``allCases``.
public static func register(_ identifiers: Self...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering these at the static level on this model seems odd to me from an architecture perspective. It limits how SymbolKit can be used within the same client, e.g., if you're decoding symbol graphs from two different languages that use different kinds within the same client. Is there a reason we can't just parse the value as a string and just wrap it in a KindIdentifier value, without requiring clients to configure something ahead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering these at the static level on this model seems odd to me from an architecture perspective. It limits how SymbolKit can be used within the same client, e.g., if you're decoding symbol graphs from two different languages that use different kinds within the same client.

That's exactly the reason why I provide the alternative of registering the custom kinds to the Encoder/Decoder. However, I think that working with the static context is easier, if working in a controlled environment where there is only one "view" of the symbol graph format. KindIdentifier provides static functionality such as allCases, and I think this functionality should still work when extending the graph format.

Is there a reason we can't just parse the value as a string and just wrap it in a KindIdentifier value, without requiring clients to configure something ahead?

There might be some consequences for memory allocation as already discussed in #7, but I think the biggest problem is the Equatable conformance. For known kinds, SymbolKit removes the language specifier prefix, i.e. swift.enum is recognized as just enum and stored this way. Therefore, KindIdentifier(rawValue: "enum") == KindIdentifier(identifier: "swift.enum") yields true (note that the right initializer is used for decoding, whereas the left one is used for declaring the valid kinds in code). This is the behavior we want, as the language tag is handled elsewhere.

If we don't know all valid kinds (because we don't allow custom kinds to be registered), we can't just throw away the prefix, as it might be important (as for enum.case). Thus, swift.custom must be stored as swift.custom and KindIdentifier(rawValue: "custom") == KindIdentifier(identifier: "swift.custom") yields false. This is very bad, if one wants to handle the kinds in a language agnostic way (as it is currently done in Swift-DocC).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. This implementation of register(_:) as written will create data races in Swift-DocC due to the way it loads symbol graphs and symbols concurrently. This is why change KindIdentifier into a struct to handle unknown symbol kinds #7 originally had its set synchronized, and ultimately did away with the set entirely. The fact that these are registered separately from decoding sidesteps the issue somewhat, but doesn't prevent a data race if registering these symbol kinds on-demand.
  2. I briefly looked into this a while back, but is it possible to register the symbol's "current language" to the decoder before decoding the symbol kind while it's being parsed, then storing that language token to KindIdentifier as well as the parsed kind if it's read in the raw identifier? That way, we can reliably detect that swift.class and objc.class are the same kind, while also letting new languages participate in fully-parsed symbol kinds without needing to conform to SymbolKit's existing listing. This could also be an enhancement that happens later on, but i'd be curious to see whether we can make it work.

Comment on lines 109 to 112
let encoder: ((Self, Mixin, inout KeyedEncodingContainer<CodingKeys>) throws -> ())?
let decoder: ((Self, KeyedDecodingContainer<CodingKeys>) throws -> Mixin?)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be storing these in this model. Also, I can see why you've made these closures for type-erasure purposes, but could we instead just keep a mapping from keys to the types they encode to/decode from, such as a [String: Mixin.Type] dictionary? Then, we'd look up the mixin type for a given key and use that for decoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would you like them to be stored? I can only build the encoder/decoder closures from inside an extension to Mixin, otherwise I cannot access the Self I need for calling container.decode/container.encode.

I can see why you've made these closures for type-erasure purposes, but could we instead just keep a mapping from keys to the types they encode to/decode from, such as a [String: Mixin.Type] dictionary? Then, we'd look up the mixin type for a given key and use that for decoding.

I don't think that's possible with our target language version (might be possible from Swift 5.6):

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the error you're seeing with Swift 5.7. It's worth trying with 5.6 as well, but we should consider bumping the Swift version to 5.6 here, or guarding this code in an #if swift(>=5.7) check. It would remove the need for the type-erasing closures, and we can document that custom mixins are only available in 5.6/5.7. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would definitely be possible. I'd prefer to make a clean cut, though. If we want to go this route, I'd prefer to bump the language version for SymbolKit (and all of apple/swift-docc) to 5.7 and make Mixin Hashable and Codable in one move. I think it would be really weird if bumping the language version to 5.7 in a project using SymbolKit would cause it to behave entirely different.

Sources/SymbolKit/SymbolGraph/Symbol/Symbol.swift Outdated Show resolved Hide resolved
// However, if this `Mixin` does not conform to `Equatable`,
// the `maybeEquatable` merely checks that the compared elements'
// types match.
var maybeEquatable: MaybeEquatable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These mechanisms to implement equality and hashing are ingenious, but I think they're overly-complicated for this use case. If Relationship is Equatable but some of its members are not (e.g., some mixins), then maybe these should not contribute to equality checking of Relationship. It would be the client's responsibility to verify equality for their custom mixins that are Equatable. I'm leaning towards SymbolKit checking equality of Equatable mixins defined in SymbolKit, and to document that clients should do equality checking for their own mixins if required.

What is the purpose of Relationship and Symbol being Equatable anyway? cc @QuietMisdreavus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relationship is Hashable so that Swift-DocC can build sets of them when resolving external symbols. Equatable is there because Hashable requires it. As far as i can tell, Symbol does not implement either of these protocols (though now it could, given that there's a mechanism for Mixin to be hashed/equated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards SymbolKit checking equality of Equatable mixins defined in SymbolKit, and to document that clients should do equality checking for their own mixins if required.

I disagree. I think this would make handling Symbols/Relationships very annoying for clients with custom mixins. E.g. if you wanted to use the types as a key in a Dictionary, you'd always have to wrap them in a type that implements the additional equality constraints.

What would be a situation where you think my approach would be problematic @franklinsch? Or is this more about code cleanliness/performance concerns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added flexibility is nice, my concern is more about SymbolKit having an internal expectation that mixins are Equatable when in fact that's not enforced by the type system. Should we just make Mixin Equatable and Hashable, then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with making Mixin Equatable and Hashable is that it restricts us to Swift 5.7 - before then there's no way to use Mixin generically since the Equatable definition needs to reference the type in question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is more about SymbolKit having an internal expectation that mixins are Equatable when in fact that's not enforced by the type system

SymbolKit can rightfully make the assumption that everything important for equality will be considered in Symbol/Relationship's Equatable conformance. SymbolKit knows for itself all predefined Mixins are Equatable and the rest is up to those defining custom Mixins.

Should we just make Mixin Equatable and Hashable, then?

If we were to do that, Mixin would get "self or associated type requirements", which (pre Swift 5.7) would break A LOT of source code. I'd only do that if we also bump SymbolKit's minimum language version to 5.7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to bumping SymbolKit to 5.7 on the main branch, which tracks the minor/major release after 5.7. The SymbolKit API is still evolving, and clients that need to use <5.7 can depend on SymbolKit's release/5.7 branch instead. Also, because SymbolKit is (mostly) a model library that evolves alongside the Swift compiler, it's not unreasonable to align SymbolKit's compiler version requirements with the toolchain version, even though of course, backwards compatibility wherever we can would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would also mean bumping the language version for all of apple/swift-docc to 5.7 then? Don't get me wrong, I'm absolutely not opposed to adopting Swift 5.7 - after all, it would be a huge quality of life improvement for all maintainers. I'm just not sure if getting rid of this (admittedly rather complex and - consequently - very ugly) type-erasure code justifies such a huge breakage.

///
/// Next to logging warnings, the function allows for either re-throwing the error,
/// skipping the errornous entry, or providing a default value.
static func onDecodingError(_ error: Error) throws -> Mixin?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like it belongs on the Mixin model type. It's purely related to the coding process of the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also implement this using the userInfo on Encoder/Decoder meaning that one would have to also register this behavior for custom Mixins. Does that sound good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be better from an API perspective, yes. Could it be registered at the same time as the mixin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. I'll try it that way!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in b93d460.

@QuietMisdreavus
Copy link
Contributor

I'm gradually reading through this; it'll take some time to digest! What i've read so far is pretty impressive. There are a couple issues i remember running into that are fixed by this PR.

@QuietMisdreavus
Copy link
Contributor

@swift-ci Please test

@theMomax
Copy link
Contributor Author

theMomax commented Jul 26, 2022

I think the extension kinds you defined in apple/swift-docc#335 could also belong directly in SymbolKit though, since they map well to existing kinds. I'd love to hear more of your thoughts about why you preferred the approach of declaring them in Swift-DocC instead. Should some existing kinds and mixins move to Swift-DocC and should SymbolKit treat them as opaque values instead?

I realized I had forgotten to answer this @franklinsch : To put it short, citing the SymbolKit Readme, SymbolKit is The specification and reference model for the Symbol Graph File Format.

The types I add in apple/swift-docc#335 are not part of the Symbol Graph File Format. I use them to bring the graph in a different structure that better fits DocC's document structure. However, this is an implementation detail of Swift-DocC and is irrelevant for the Swift compiler or other language integrations.

@theMomax
Copy link
Contributor Author

I just cleaned up the code again after moving the onDecodingError/onEncodingError functions out of the Mixin protocol. Since I needed to store said error handlers as closures anyway, I merged them with the type-erasure-closures for encoding/decoding. I feel like this looks way cleaner than before and makes me think again if we can avoid bumping the project to Swift 5.7. Please take another look at how it's implemented now.

Of course, there still is the Hashable implementation of Relationship. I think demanding the Hashable/Equatable conformance for all Mixins creates a much bigger inconvenience than assuming what doesn't conform to Equatable isn't important for equality. Of course, the < Swift 5.7 implementation requires 140 lines of ugly type-erasure-code, however, the behavior is documented and the code can be easily replaced with a leaner Swift 5.7 implementation if we decide to drop support for lower language versions in the future.

@franklinsch
Copy link
Member

This looks much cleaner indeed, thanks for doing that! Regarding the rest of the type-erasure code, I'm personally ok with bumping to 5.7 on the main branch of Swift-DocC as well, since the primary purpose of that repository is to build the docc executable for the Swift toolchain, and clients of the SwiftDocC library can depend on the release/5.7 branch if they can't use the latest tools for some reason. For example, if a client uses SwiftDocC to decode Render JSON, you'd expect the version of SwiftDocC they're using to match the version of the Swift toolchain documentation was used to built. @QuietMisdreavus what are your thoughts on this?

I think demanding the Hashable/Equatable conformance for all Mixins creates a much bigger inconvenience than assuming what doesn't conform to Equatable isn't important for equality.

Is it an inconvenience? I expect most Mixins to be model types that would be able to get the synthesised Equatable/Hashable requirements.

@theMomax
Copy link
Contributor Author

Is it an inconvenience? I expect most Mixins to be model types that would be able to get the synthesised Equatable/Hashable requirements.

The Equatable conformance itself is no inconvenience, no. There is an inconvenience when a Mixin should not contribute to equality, because you have to add a no-op/always true Hashable/Equatable implementation, however, that is only a minor inconvenience.

The bigger inconvenience is that - even in Swift 5.7 - one would have to add a lot of any and some keywords when dealing with the Mixin protocol. I'm getting 39 compiler errors in SymbolKit alone when I require Mixins to be Hashable. This would create a lot of noise in this PR, but especially in apple/swift-docc#335, which would make them very hard to review.

Therefore, I suggest to do either of the following:

  • keep the type-erasure-based Hashable conformance as is
  • revert the Hashable conformance to the previous state, where it ignores custom mixins

In both cases I could start two follow up PRs on this repo and apple/swift-docc that add the Hashable requirement to Mixin and bump the language version to 5.7 once apple/swift-docc#335 is merged.

Copy link
Contributor

@QuietMisdreavus QuietMisdreavus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts about the PR and its discussion so far:

I'm fine with keeping the type-erasure code in SymbolKit, though (as i mention below) i would prefer if it were refactored into its own file(s) and more thoroughly documented. It's clever, which is a good thing and a bad thing, but it also resolves an issue with SymbolKit's maintenance - that new mixins need to be accommodated in their parent types' corresponding Equatable/Hashable implementations.

I think updating SymbolKit and Swift-DocC to Swift 5.7 (and other versions in the future) is fine, specifically because of how tied they are to the Swift compiler itself. SymbolKit is effectively the model library for handling symbol graphs as they appear from the compiler. If we add something to the compiler, there's a corresponding change here. Similarly for Swift-DocC, since it's distributed with the Swift toolchains, there's less of an impetus to keep a strict backwards-compatibility story. In both of these cases, people who need to accommodate older compilers can use release branches with little loss of functionality. Specifically in SymbolKit's case, having the model match what you're expecting to parse is important - it's better to have exactly the information you need, instead of that information plus a bunch of empty fields and properties where future data would have gone. I also agree that that enhancement can come later on, to keep the source churn per-PR down.

Overall, i really like this PR. Great work!

/// - Note: When working in an uncontrolled environment where other parts of your executable might be disturbed by your
/// modifications to the symbol graph structure, register identifiers on your coder instead using
/// ``CustomizableCoder/register(symbolKinds:)`` and maintain your own list of ``allCases``.
public static func register(_ identifiers: Self...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. This implementation of register(_:) as written will create data races in Swift-DocC due to the way it loads symbol graphs and symbols concurrently. This is why change KindIdentifier into a struct to handle unknown symbol kinds #7 originally had its set synchronized, and ultimately did away with the set entirely. The fact that these are registered separately from decoding sidesteps the issue somewhat, but doesn't prevent a data race if registering these symbol kinds on-demand.
  2. I briefly looked into this a while back, but is it possible to register the symbol's "current language" to the decoder before decoding the symbol kind while it's being parsed, then storing that language token to KindIdentifier as well as the parsed kind if it's read in the raw identifier? That way, we can reliably detect that swift.class and objc.class are the same kind, while also letting new languages participate in fully-parsed symbol kinds without needing to conform to SymbolKit's existing listing. This could also be an enhancement that happens later on, but i'd be curious to see whether we can make it work.

@theMomax
Copy link
Contributor Author

theMomax commented Aug 1, 2022

I'm fine with keeping the type-erasure code in SymbolKit, though (as i mention below) i would prefer if it were refactored into its own file(s) and more thoroughly documented. It's clever, which is a good thing and a bad thing, but it also resolves an issue with SymbolKit's maintenance - that new mixins need to be accommodated in their parent types' corresponding Equatable/Hashable implementations.

I moved the code to separate files and added a detailed explanation. I changed the behavior a little. Previously, mixins that did not conform to Hashable/Equatable still contributed to the respective protocol implementation of Relationship in the following ways:

  • for hashing, their respective mixinKey was still hashed into the hasher
  • for equality, the count of non-equatable mixins was considered and the type of non-equatable mixins with the same mixinKey had to be equal

Now, mixins that are not Equatable/Hashable are ignored entirely. This unlocks even more flexibility and provides for a cleaner implementation.

The previous behavior can still be achieved by providing an "empty" Hashable conformance where hash(into:) does nothing and ==(lhs:rhs:) always returns true.

This implementation of register(_:) as written will create data races in Swift-DocC due to the way it loads symbol graphs and symbols concurrently. This is why #7 originally had its set synchronized, and ultimately did away with the set entirely. The fact that these are registered separately from decoding sidesteps the issue somewhat, but doesn't prevent a data race if registering these symbol kinds on-demand.

Yes, however, as you already pointed out, it is not intended to be used "on demand". I added a respective note about concurrent access to the documentation. If one wants to register kinds on demand in a concurrent decoding process, one can still do so by registering them on the decoder in a synchronized manner.

I briefly looked into this a while back, but is it possible to register the symbol's "current language" to the decoder before decoding the symbol kind while it's being parsed, then storing that language token to KindIdentifier as well as the parsed kind if it's read in the raw identifier? That way, we can reliably detect that swift.class and objc.class are the same kind, while also letting new languages participate in fully-parsed symbol kinds without needing to conform to SymbolKit's existing listing. This could also be an enhancement that happens later on, but i'd be curious to see whether we can make it work.

That is an interesting idea! Would the "current language" have to be set manually based on the file that is decoded? I think that would be rather annoying. One could also use the identifier.interfaceLanguage like this:

// Symbol.swift

public init(from decoder: Decoder) throws {
            let container = try decoder.container(keyedBy: CodingKeys.self)
            identifier = try container.decode(Identifier.self, forKey: .identifier)
            kind = try container.decode(Kind.self, forKey: .kind)
            kind.identifier = kind.identifier.trimming(identifier.interfaceLanguage)

I'm not sure if the assumption interfaceLanguage == kind prefix holds in all edge-cases, though.

@QuietMisdreavus
Copy link
Contributor

That is an interesting idea! Would the "current language" have to be set manually based on the file that is decoded? I think that would be rather annoying.

The idea i had in mind (that i'm not sure is even possible) is to have the decode logic for Symbol set this after loading its identifier, but before loading its kind. If there was any other way to have this information "automatically", then that would be ideal.

@theMomax
Copy link
Contributor Author

theMomax commented Aug 6, 2022

The idea i had in mind (that i'm not sure is even possible) is to have the decode logic for Symbol set this after loading its identifier, but before loading its kind. If there was any other way to have this information "automatically", then that would be ideal.

I think that is not possible while using Codable. It would, however, be possible to first load identifier, and then decode kind using a simple struct that does no post-processing on the kind.identifier, but just stores the raw strings. Afterwards, one could just create the KindIdentifier using a custom initializer and the information from identifier and kind. That's essentially what the code-sample in my post above does.

What I dislike about this approach is that decoding behavior changes when decoding a Kind as part of a Symbol vs on its own or as part of any other type. That's bad for testability and could cause some unexpected behavior in the future.

@franklinsch
Copy link
Member

@QuietMisdreavus any remaining feedback here?

Copy link
Contributor

@QuietMisdreavus QuietMisdreavus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go now. Thanks so much!

@QuietMisdreavus
Copy link
Contributor

@swift-ci Please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SR-15410] Can't document extensions with DocC [SR-15982] SymbolKit crashes when encoding unknown mixins
4 participants