Skip to content

Commit

Permalink
Proofing
Browse files Browse the repository at this point in the history
  • Loading branch information
johnno1962 committed Feb 20, 2024
1 parent 40c2321 commit ef0b101
Showing 1 changed file with 82 additions and 29 deletions.
111 changes: 82 additions & 29 deletions proposals/0243-character-operators.md
Original file line number Diff line number Diff line change
@@ -1,106 +1,159 @@
# Character Literal Opertaors
# Character Literal Operators

* Proposal: [SE-0243](0243-character-operators.md)
* Authors: [Dianna ma (“Taylor Swift”)](https://github.com/kelvin13), [John Holdsworth](https://github.com/johnno1962)
* Review manager: [Ben Cohen](https://github.com/airspeedswift)
* Status: **Second review**
* Implementation: [apple/swift#NNN](https://github.com/apple/swift/compare/main...johnno1962:swift:character-ops?expand=1)
* Implementation: [apple/swift#71749](https://github.com/apple/swift/pull/71749)
* Threads: [1](https://forums.swift.org/t/prepitch-character-integer-literals/10442) [2](https://forums.swift.org/t/se-0243-codepoint-and-character-literals/21188) [3](https://forums.swift.org/t/single-quoted-character-literals-why-yes-again/61898)

## Introduction

This proposal reboots efforts to improve the ergonomics of the Swift language for a class of code involved in parsing, for example JSON or the Swift language itself. Whereas previously it was thought a single quoted syntax for these literals could be pressed into service alongside integer express-ability it was realised that adding few well chosen operators to the standard library could serve the most pressing use cases. That this works and is performant has been demonstrated in [this PR](https://github.com/apple/swift-syntax/pull/2439#issuecomment-1922292277) to the `swift-syntax` library where the readability of code was increased with an ever so slight improvement in performance.
This proposal improves Swift's character-literal ergonomics. This support is fundamental not only to parsing tasks within the Swift language but also to tasks that require developers to extract and manipulate data. Areas that would benefit include handling domain-specific languages (DSLs) and parsing commonly-used data formats such as JSON. Any workflow based on lexical analysis or tokenization requirements will gain from this proposal.

The Swift community previously considered single-quote syntax for character literals. While working on Swift's Lexer code, another solution came to light. Adding well-chosen operators to the Standard Library tidied up the Lexer implementation with minimal impact on the language. These operators didn't burn the single-quote for future reserved use, they served all the most pressing use-cases effectively and demonstrated small but measurable performance improvements.

This improvement was validated through our work on [PR 2439](https://github.com/apple/swift-syntax/pull/2439#issuecomment-1922292277). The patch showcased how to streamline character-binary integer interchange for low level code. This proposal offers the same readable solution that seamlessly integrates with the established character and style of Swift. Additionally, it provides a slight performance boost, making it a valuable enhancement for performant code.

## Motivation

At present, the rather cumbersome constructs `UInt8(ascii: "c")` or perhaps `UnicadeScalar("c").value` are the only interface to the binary integer equivalent of `unicode scalars` in the Swift language. Data is not always UInt8 or UInt32 however so, frequently these have to be combined with a cast and overall the ergonomics are sub-optimal. For example, when wanting to `switch` over a range of values as in the previous version of the lexer code cluttered with `UInt8(ascii: "x")` in the PR mentioned above.
Swift's existing character-literal constructs are hard to read and an effort to construct. Contorted expressions like `UInt8(ascii: "c")` and `UnicodeScalar("c").value` provide Swift's current entry points to the binary integer equivalent of unicode scalars.

Since `Data` is not always `UInt8` or `UInt32`, these frequently must be combined with casts. User ergonomics are crying out for improvement. Consider the previous version of our lexer code. To `switch` over a range of values, our implementation in [PR 2439](https://github.com/apple/swift-syntax/pull/2439#issuecomment-1922292277) was cluttered with ergonomically unsound expressions like the previously mentioned `UInt8(ascii: "x")`.

Swift deserves better.

Swift allows you to define operators for equivalence and also for the pattern matching used in `switch` statements. It is sufficient therefore to add binary operators to allow direct comparisons between integer types and unicode scalars. This approach has been shown to be perfectly performant and effectively "compiles down to the same thing".
Our proposed change has precedent. Swift allows you to define custom operators for both equivalence and the pattern matching used in `switch` statements and elsewhere. Adding binary operators allows direct comparisons between `Integer` types and Unicode scalars. This approach effectively compiles a more readable solution to the same results.

## Proposed solution

Specifically, this proposal puts forward that the following code be added to the file: stdlib/public/core/UnicodeScalar.swift in the standard library:
We propose to introduce the following code to "stdlib/public/core/UnicodeScalar.swift" in the Swift standard library:

```Swift
/// Allows direct comparisons between UInt8 and double quoted literals.
/// Extends `UInt8` to allow direct comparisons with double quoted literals.
extension UInt8 {
/// Basic equality operator
/// Returns a Boolean indicating whether the `UInt8` is equal to the provided Unicode scalar.
///
/// - Parameters:
/// - i: The `UInt8` value to compare.
/// - s: The Unicode scalar to compare against.
/// - Returns: `true` when the `UInt8` is equal to the provided Unicode scalar; otherwise, `false`.
@_transparent @_alwaysEmitIntoClient
public static func == (i: Self, s: Unicode.Scalar) -> Bool {
return i == UInt8(ascii: s)
}
/// Basic inequality operator

/// Returns a Boolean indicating whether the `UInt8` is not equal to the provided Unicode scalar.
///
/// - Parameters:
/// - i: The `UInt8` value to compare.
/// - s: The Unicode scalar to compare against.
/// - Returns: `true` if the `UInt8` is not equal to the provided Unicode scalar; otherwise, `false`.
@_transparent @_alwaysEmitIntoClient
public static func != (i: Self, s: Unicode.Scalar) -> Bool {
return i != UInt8(ascii: s)
}
/// Used in switch statements

/// Enables pattern matching of Unicode scalars in switch statements.
///
/// - Parameters:
/// - s: The Unicode scalar to match.
/// - i: The `UInt8` value to match against.
/// - Returns: `true` if the Unicode scalar matches the `UInt8` value; otherwise, `false`.
@_transparent @_alwaysEmitIntoClient
public static func ~= (s: Unicode.Scalar, i: Self) -> Bool {
return i == UInt8(ascii: s)
}
}

/// Extends `Optional<UInt8>` to allow direct comparisons with double quoted literals.
extension UInt8? {
/// Optional equality operator
/// Returns a Boolean value indicating whether the optional `UInt8` is equal to the provided Unicode scalar.
///
/// - Parameters:
/// - i: The optional `UInt8` value to compare.
/// - s: The Unicode scalar to compare against.
/// - Returns: `true` if the optional `UInt8` is equal to the provided Unicode scalar; otherwise, `false`.
@_transparent @_alwaysEmitIntoClient
public static func == (i: Self, s: Unicode.Scalar) -> Bool {
return i == UInt8(ascii: s)
}
/// Optional inequality operator

/// Returns a Boolean value indicating whether the optional `UInt8` is not equal to the provided Unicode scalar.
///
/// - Parameters:
/// - i: The optional `UInt8` value to compare.
/// - s: The Unicode scalar to compare against.
/// - Returns: `true` if the optional `UInt8` is not equal to the provided Unicode scalar; otherwise, `false`.
@_transparent @_alwaysEmitIntoClient
public static func != (i: Self, s: Unicode.Scalar) -> Bool {
return i != UInt8(ascii: s)
}
/// Used in switch statements

/// Allows pattern matching of Unicode scalars in switch statements.
///
/// - Parameters:
/// - s: The Unicode scalar to match.
/// - i: The optional `UInt8` value to match against.
/// - Returns: `true` if the Unicode scalar matches the optional `UInt8` value; otherwise, `false`.
@_transparent @_alwaysEmitIntoClient
public static func ~= (s: Unicode.Scalar, i: Self) -> Bool {
return i == UInt8(ascii: s)
}
}

/// Extends `Array` where Element is a FixedWidthInteger, providing initialization from a string of Unicode scalars.
extension Array where Element: FixedWidthInteger {
/// Initialise an Integer array with "unicode scalars"
@inlinable @_alwaysEmitIntoClient @_unavailableInEmbedded
public init(scalars: String) {
self.init(scalars.unicodeScalars.map { Element(unicode: $0) })
}
/// Initializes an array of Integers with Unicode scalars represented by the provided string.
///
/// - Parameter scalars: A string containing Unicode scalars.
@inlinable @_alwaysEmitIntoClient @_unavailableInEmbedded
public init(scalars: String) {
self.init(scalars.unicodeScalars.map { Element(unicode: $0) })
}
}

/// Extends `FixedWidthInteger` providing initialization from a Unicode scalar.
extension FixedWidthInteger {
/// Construct with value `v.value`.
/// Initializes a FixedWidthInteger with the value of the provided Unicode scalar.
///
/// - Parameter unicode: The Unicode scalar to initialize from.
/// - Note: Construct with value `v.value`.
@inlinable @_alwaysEmitIntoClient
public init(unicode v: Unicode.Scalar) {
_precondition(v.value <= Self.max,
"Code point value does not fit into type")
"Code point value does not fit into type")
self = Self(v.value)
}
}
```

The last initialiser can be considered optional but could be considered as an alternative to the existing `IntX(UncodeScalar("c").value)` incantation people are expected to discover at the moment for non-ascii code points.
This last initializer is optional. It provides an alternate to the existing `IntX(UncodeScalar("c").value)` incantation currently needed for non-ASCII code points.

## Source compatibility

The operators proposed are additive and after running the existing test suite the net effect seems to be to change the diagnostics given on some pattern matching code which was invalid anyway. The last initialiser proposed can affect what is currently valid code such as the following:
Our proposed operator suite is additive. After running the existing test suite, it does change diagnostics on a limited part of pattern matching code. We believe this diagnostic information was already flawed, and the change inconsequential. Finally, the last initializer (the one we noted as optional) may affect currently valid code, such as the following:

```Swift
```
unicodeScalars.map(UInt32.init)
`
Becomes ambiguous and needs to be rewritten explicitly as:
```

```Swift
Upon adoption, this becomes ambiguous and will need to be rewritten explicitly as:

```
unicodeScalars.map { UInt32($0) }
```

## Effect on ABI stability

The new operator have been annotated with `@_alwaysEmitIntoClient` so any code using them will back-port to versions of the Swift runtime before these operators were added.
Each new operator has been annotated with `@_alwaysEmitIntoClient`. Any code that adopts these operators will back-port to versions of the Swift runtime before these operators were added.

## Effect on API resilience

The operators are straightforward and it is not anticipated they would need to evolve their ABI.
The operators are simple and focused. We don't anticipate the need to evolve their ABI.

## Alternatives considered

There is a long history to the proposal and this is a much scaled back version with less collateral impact on the language than previously reviewed proposals which still satisfy the main use cases. It uses the features of the language rather than changing the language itself. One could argue that users would still be able to define these operators themselves but in the end it is a question of whether this would be a battery sufficiently useful to be included in the standard library. Including them would help discovery as something that people might have previously expected to work coming from another language would "simply work".
This proposal emerges from a history of consideration. This scaled-back proposal presents less collateral impact on the language than previously reviewed proposals. At the same time, it satisfies the most important use cases.

Our proposed approach embraces Swift's existing language features rather than changing the language to reach its solution. We believe this enhancement is sufficiently useful to merit inclusion in the Standard Library. It will support and improve Swift's tooling and provide better ergonomics for Swift's user base without forcing Swift adopters to write their own solutions. Its inclusion will promote discovery, providing a feature that people might have expected to "simply work".

0 comments on commit ef0b101

Please sign in to comment.