Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character Literals (#1934) #1964

Merged
Merged
Changes from 87 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
d2538d8
Character Literals
cabmeurer Sep 4, 2022
4c38941
Update proposals/p1964.md
cabmeurer Sep 11, 2022
df98ff4
Apply suggestions from code review
cabmeurer Sep 11, 2022
05b9a18
Remove restate of 'Why' from Support prefix decl
cabmeurer Sep 11, 2022
c4e54c1
Remove reduant bullet
cabmeurer Sep 11, 2022
3fde33a
Update Prefix support section and add TODO
cabmeurer Sep 11, 2022
5d1d2f9
Update Operations section with types of operands and results
cabmeurer Sep 19, 2022
d618da8
Fix code format for Problem section
cabmeurer Sep 19, 2022
8dfd0d7
Example showing that using a string is less appropriate than a character
cabmeurer Sep 20, 2022
a72e990
Example showing that using a string is less appropriate than a character
cabmeurer Sep 20, 2022
5d0df06
Update Abstract and Problem sections
cabmeurer Sep 20, 2022
a147db0
Update Abstract and Problem sections
cabmeurer Sep 20, 2022
b700b7f
Type section describing variable's type
cabmeurer Sep 20, 2022
334a7a0
Format Type section
cabmeurer Sep 20, 2022
bdf44ea
Format Type section
cabmeurer Sep 20, 2022
b4daeb0
Update Background section
cabmeurer Sep 21, 2022
398f742
Update Alternatives considered with Disallowing numeric escape sequences
cabmeurer Sep 21, 2022
7e01c75
Update Alternatives considered with Disallowing numeric escape sequences
cabmeurer Sep 21, 2022
fbbbccb
Format Type section
cabmeurer Sep 21, 2022
2d7c899
Update operations and alternatives section
cabmeurer Sep 24, 2022
ebf2548
Update proposals/p1964.md
cabmeurer Sep 27, 2022
d95e3b1
Update proposals/p1964.md
cabmeurer Sep 27, 2022
d8111ba
Update proposals/p1964.md
cabmeurer Sep 27, 2022
b4d3ec2
Update proposals/p1964.md
cabmeurer Sep 27, 2022
b44c751
Update proposals/p1964.md
cabmeurer Sep 27, 2022
aafeac0
Update proposals/p1964.md
cabmeurer Sep 27, 2022
def0273
Update proposals/p1964.md
cabmeurer Sep 27, 2022
21e53b3
Update proposals/p1964.md
cabmeurer Sep 27, 2022
f1d5602
Update proposals/p1964.md
cabmeurer Sep 27, 2022
091ba9c
Update proposals/p1964.md
cabmeurer Sep 27, 2022
50470e8
Update proposals/p1964.md
cabmeurer Sep 27, 2022
7c0829e
Update operations section
cabmeurer Sep 27, 2022
e8454f7
Update operations section
cabmeurer Sep 27, 2022
a2c0a90
Add link to design idea
cabmeurer Sep 27, 2022
9e5ecb9
Explicit disallow other whitespace characters other than word space
cabmeurer Sep 27, 2022
5f2ce1e
Provide details for No Distinct Character Literal alternative
cabmeurer Sep 28, 2022
951b6b7
Fix typo; tilde -> acute accent mark
cabmeurer Sep 28, 2022
b6345d9
Fix spacing
cabmeurer Sep 28, 2022
4f7d678
Better example for Rationale section
cabmeurer Sep 28, 2022
09ca59e
Fix grammer
cabmeurer Sep 28, 2022
7f91aa4
Provide details for not supporting prefix declarations alternative
cabmeurer Sep 28, 2022
4dcc633
Typo
cabmeurer Sep 28, 2022
153e1a5
Grammer
cabmeurer Sep 28, 2022
8b15cbc
Provide details for Disallowing Numeric Escape Sequences in Alternati…
cabmeurer Oct 3, 2022
d321b37
Provide details for Disallowing Numeric Escape Sequences in Alternati…
cabmeurer Oct 3, 2022
8eacb58
Update proposals/p1964.md
cabmeurer Oct 5, 2022
377096c
Update proposals/p1964.md
cabmeurer Oct 5, 2022
8d34805
Update proposals/p1964.md
cabmeurer Oct 5, 2022
0caa49f
Update proposals/p1964.md
cabmeurer Oct 5, 2022
df0fb9c
Update proposals/p1964.md
cabmeurer Oct 5, 2022
0b4451c
Update proposals/p1964.md
cabmeurer Oct 5, 2022
1a53051
Formatting
cabmeurer Oct 5, 2022
2f4eb91
Apply suggestions from code review
cabmeurer Oct 8, 2022
61180df
Apply suggestions from review
cabmeurer Oct 8, 2022
174a551
Update details section with disucssion from discord
cabmeurer Oct 10, 2022
d9a1b99
Update details section with disucssion from discord
cabmeurer Oct 10, 2022
4d3204d
Fix operations statment
cabmeurer Oct 10, 2022
98864dd
Update Details section from discord conslusion
cabmeurer Oct 14, 2022
4aff5f2
Apply suggestions from code review
cabmeurer Oct 20, 2022
680ed7c
Elaborate on 'No distinct character types'
cabmeurer Oct 20, 2022
c119ee9
Add suggestion from review
cabmeurer Oct 20, 2022
be7cbf0
Remove encoding section and add to details section
cabmeurer Oct 20, 2022
5418553
Remove encoding section and add to details section
cabmeurer Oct 20, 2022
abea9b7
Fix typo
cabmeurer Oct 20, 2022
051b100
Add example of comparison
cabmeurer Oct 20, 2022
7c7950e
Typo
cabmeurer Oct 20, 2022
c53a4ce
Apply suggestions from code review
cabmeurer Oct 22, 2022
223aaa5
Apply suggestions from code review
cabmeurer Oct 22, 2022
63f2f41
Add suggestions from review
cabmeurer Oct 22, 2022
7667e25
Apply suggestions from code review
cabmeurer Nov 15, 2022
d35c1f6
Update to for more integer based approach, disallow numeric escape se…
cabmeurer Nov 15, 2022
e1810da
Update details
cabmeurer Nov 16, 2022
10b660d
Consolidate proposla, update details, add alternatives section for su…
cabmeurer Nov 22, 2022
606219e
Apply suggestions from code review
cabmeurer Nov 23, 2022
1abad92
Format header
cabmeurer Nov 23, 2022
d165370
Add suggestions from review
cabmeurer Nov 23, 2022
78192d3
Update types section: Value code point representation
cabmeurer Dec 3, 2022
f6eeff6
Apply suggestions from code review
cabmeurer Dec 9, 2022
cb7e896
Apply suggestions from code review
cabmeurer Dec 18, 2022
7a94dfc
Update types section, create future work section
cabmeurer Jan 2, 2023
afad667
Update operators section
cabmeurer Jan 2, 2023
2393390
Update rationale section
cabmeurer Jan 2, 2023
e222263
Apply suggestions from code review
cabmeurer Jan 2, 2023
d862fdb
Update alternatives considered
cabmeurer Jan 2, 2023
cb1c1e5
Update alternatives considered format
cabmeurer Jan 2, 2023
5d15fec
Apply suggestions from code review
cabmeurer Mar 11, 2023
2ebf115
Update format
cabmeurer Mar 11, 2023
d1af3a7
Apply suggestions from code review
cabmeurer Jun 3, 2023
44ff724
Apply suggestions from code review
cabmeurer Jun 15, 2023
a86d571
Apply suggestions from review - examples for Types
cabmeurer Jun 15, 2023
842ab4d
Apply suggestions from review - format
cabmeurer Jun 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
373 changes: 373 additions & 0 deletions proposals/p1964.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
# Character literals

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/1964)

<!-- toc -->

## Table of contents

- [Abstract](#abstract)
- [Problem](#problem)
- [Background](#background)
- [Proposal](#proposal)
- [Details](#details)
- [Types](#types)
- [Operations](#operations)
- [Rationale](#rationale)
- [Alternatives considered](#alternatives-considered)
- [No distinct character types](#no-distinct-character-types)
- [No distinct character literal](#no-distinct-character-literal)
- [Supporting prefix declarations](#supporting-prefix-declarations)
- [Allowing numeric escape sequences](#allowing-numeric-escape-sequences)
- [Supporting formulations of grapheme clusters and non-code-point code-units](#supporting-formulations-of-grapheme-clusters-and-non-code-point-code-units)
- [Future Work](#future-work)
- [UTF code unit types proposal](#utf-code-unit-types-proposal)

<!-- tocstop -->

cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
## Abstract

This proposal specifies lexical rules for constant characters in Carbon:

Put character literals in single quotes, like `'a'`. Character literals work
like numeric literals:

- Every different literal value has its own type.
- The bit width is determined by the type of the variable the literal is
assigned to, not the literal itself.
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
- A character literal must contain exactly one code point.

Follows the plan from open design idea
[#1934: Character Literals](https://github.com/carbon-language/carbon-lang/issues/1934).

## Problem

Carbon currently has no lexical syntax for character literals, and only provides
string literals and numeric literals. We wish to provide a distinct lexical
syntax for character literals versus string literals.

The advantage of having an explicit character type fundamentally comes down to
characters being represented as integers whereas strings are represented as
buffers. This will allow characters to have different operations, and be more
familiar to use. For example:

```
if (c >= 'A' and c <= 'Z') {
c += 'a' - 'A';
}
```

The example above shows how we would be able to use operations similar to
integers. Being able to use the comparison operations and supporting arithmetic
operations provides an intuitive approach to using characters. This allows us to
remove unnecessary logic of type conversion and other control flow logic, that
is needed to work with a single element string. See [Rationale](#rationale) for
more examples showing more appropriate use of characters over using strings.

## Background

Character Literals by definition is a type of literal in programming for the
representation of a single character's value within the source code of a
computer program. Character literals between languages have some minor nuances
but are fundamentally designed for the same purpose. Languages that have a
dedicated character data type generally include character literals, for example
C++, Java, Swift to name a few. Whereas other languages that lack distinct
character type, like Python use strings of length one to serve the same purpose
a character data type. For more information see
[Character Literals Wiki](https://en.wikipedia.org/wiki/Character_literal),
[Character Literals DBpedia](https://dbpedia.org/page/Character_literal)

## Proposal

Put character literals in single quotes, like `'a'`. Character literals work
like numeric literals:

- Every different literal value has its own type.
- The bit width is determined by the type of the variable the literal is
assigned to, not the literal itself. Follows the plan from #1934.
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
- A character literal will model single Unicode code points that have a single
concrete numerical representation. We will not be supporting other
formulations like code unit sequences or grampheme clusters as these will be
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
modeled with normal string literals.

## Details
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved

- A character literal is a sequence enclosed with single quotes delimiter ('),
of UTF-8 code units that must be a valid encoding. This matches
[the UTF-8 encoding of Carbon source files](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0142.md#character-encoding).
- A character literal must encode exactly one code point.
- It supports addition and subtraction. These operations produce another code
point literal, or produce an error if the result is out of range.
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
- Character literals support some back-slash (`\`) escape sequences, including
`\t`, `\n`, `\r`, `\"`, `\'`, `\\`, `\0`, and `\u{HHHH...}`. See
[String Literals: Escape sequence](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0199.md#escape-sequences).
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved

We will not support:

- character literals that don't contain exactly one Unicode code point;
- multi-line literals;
- "raw" literals (using #'x'#);
- `\x` escape sequences;
- character literals with a single quote (`'`) or back-slash (`\`), except as
part of an escape sequence
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
- empty character literals (`''`);
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
- ASCII control codes (0...31), including whitespace characters other than
word space (tab, line feed, carriage return, form feed, and vertical tab),
except when specified with an escape sequence.

### Types
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved

For the time being we will support character types `Char8`, `Char16`, and
`Char32` that will hold both code units and code points, and will leave the
different UTF-encoding code unit types to another proposal. See
[UTF code unit types proposal](#utf-code-unit-types-proposal).

We will have the type `CharN` and only support literals that map directly to the
complete value of a code point.

```
let allowed: CharN = 'a';
```

The above is allowed because the type of `'a'` is the character literal
consisting of the single Unicode code point 97, which can be converted to
`CharN` since 97 is less than or equal to 0x7F.

cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
### Operations

Character literals representing a single code point support the following
operators:

- Comparison: `<`, `>`, `<=`, `>=` `==`
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
- Plus: `+`. This doesn't concatenate, but allows numerically adjusting the
value:
- Only one operand may be a character literal, the other must be an
integer literal.
Comment on lines +185 to +186
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consequence appears to be that this is invalid:

fn Digit(n: i8) -> Char32 {
  return '0' + n;
}

... and something like return ('0' as Char32) + n; would be needed instead. I think I'm OK with that, but I expect it to be a minor source of friction as that kind of usage is fairly common in C++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that '0' + n has a value that is only known at runtime, what type should it be? Using the type of n here is a bit worrisome, due to overflow. I would be fine with saying the result would be Char32, but maybe that would only make sense for some types of n? For example if n: i64, a Char32 result would be surprising.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like keeping this super-explicit for now (requiring a cast to a specific sized type). We can try to add better defaults if in practice this friction is something users dislike. I'm somewhat hopeful that instead we can have a really easily discovered (and targeted in migration) API for mapping to digits, and avoid how much this comes up in practice. But it seems easy to address if it does come up.

- The result is the character literal whose numeric value is the sum of
numeric value of the operands. If that sum is not a valid Unicode code
point, it is an error.
- Subtract: `-`. This will subtract the value of the two characters, or a
character followed by an integer literal:
- If the `-` is used between two character literals, the result will be an
integer constant. For example, `'z' - 'a'` is equivalent to `25`.
- If the `-` is used between a character literal followed by a integer
literal, this will produce a character constant. For example `'z' - 4`
is equivalent to `'v'`.
- If the `-` is used between a integer literal followed by a character
literal `100 - 'a'`, this will be rejected unless the integer is cast to
a character.

There is intentionally no implicit conversion from character literals to integer
types, but explicit conversions are permitted between character literals and
integer types. Carbon will separate the integer types from character types
entirely.

## Rationale

This proposal supports the goal of making Carbon code
[easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write).
Adding support for a specific character literal supports clean, readable,
concise use and is a much more familiar concept that will make it easier to
adopt Carbon coming from other languages. Have a distinct character literal will
also allow us support useful operations designed to manipulate the literal's
value. When working with an explicit character type we can use operators that
have unique behavior, for example say we wanted to advance a character to the
next literal. In other languages the `+` operator is often used for
concatenation, so using a `String` will produce a type error: `"a" + 1`. However
with a character literal, we can support operations for these use cases:

```
var b: u8;

b = 'a' + 1;
b + 1 == 'c';

cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
```

See [Operations](#operations) and
[No Distinct Character Literal](#no-distinct-character-literal) for more
information.

Further, this design follows other standards set in place by previous proposals.
For example following the
[String Literals: Escaping Sequence](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0199.md#escape-sequences-1)
and representing characters as integers with the behaviour inline with
[Integer Literals](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0143.md).

This also supports our goal for
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
[Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
by ensuring that every kind of character literal that exists in C++ can be
represented in a Carbon character literal. This is done in a way that is natural
to adopt, understand, easy to read by having explicit character types mapped to
the C++ character types and the correct associated encoding.
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved

cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
Finally, the choice to use Unicode and UTF-8 by default reflects the Carbon goal
to prioritize
[modern OS platforms, hardware architectures, and environments](/docs/project/goals.md#modern-os-platforms-hardware-architectures-and-environments).
This reflects the
[growing adoption of UTF-8](https://en.wikipedia.org/wiki/UTF-8#Adoption).

cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
## Alternatives considered
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved

### No distinct character types

Unlike C++, Carbon will separate the integer and the character types. We
considered using `u8`, `u16`, and `u32` instead of `Char8`, `Char16`, and
`Char32`to reduce the number of different types users needed to be aware of and
cabmeurer marked this conversation as resolved.
Show resolved Hide resolved
convert between. We decided against it because it came with a number of
disadvantages:

- `u8`, `u16`, and `u32` have the wrong arithmetic semantics: we don't want
wrapping, and many `uN` operations, like multiplication, division, and
shift, are not meaningful on code units. There may be rare cases where you
want to use those operations, such as if you're implementing a conversion to
or from code units. But in those rare cases it would be reasonable for the
user to convert to an integer type to perform that operation and convert
back when done.
- Some operations want to be able to tell the difference between values that
are intended to be UTF-8 instead of having no specified encoding.
- Some operations want to be able to know that they've been given text rather
than random bytes of data. For example, `Print(0x41 as u8)` would be
expected to print `"65"` while `Print('\u{41}')` and `Print(0x41 as Char8)`
would be expected to print `"A"`.
- It's useful for developers to document the intended meaning of a value, and
using a distinct type is one way to do that.

See [UTF code unit types proposal](#utf-code-unit-types-proposal) for more
information about UTF encoding types for a future proposal.

### No distinct character literal

In principle, a character literal can be represented by reusing string literals
similar to how Python handles character literals, however this would prevent
performing operations on characters as integers. For example, the `+` operator
on strings is used for concatenation, but `+` on a character would change its
value.

```
// `digit` must be in the range 0..9.
fn DigitToChar(digit: i32) -> Char8 {
return '0' + digit;
}
```

josh11b marked this conversation as resolved.
Show resolved Hide resolved
Furthermore, many properties of Unicode characters are defined on ranges of code
points, motivating supporting comparison operators on code points.

```
fn IsDingBatCodePoint(c: Char32) -> bool {
return c >= '\u{2700}' and c <= '\u{27BF}';
}
```

### Supporting prefix declarations

No support is proposed for prefix declarations like `u`, `U`, or `L`. In
practice they are used to specify the character literal types and their encoding
in languages like C and C++. There are a several benefits to omitting prefix
declarations; improved readablitly, simplifying how a character's type is
determined, and how we are encoding character literals. When declaring a
character literal, the type is based on the contents of the character so that
`var c: u8 = 'a'` is a valid character that can be converted to `u8`, in order
to support prefix declarations we would need to extend our type system to have
other exlpicit type checks like in C++; a UTF-16 `u'`, UTF-32 `U'`, and wide
characters `L'`. This would be more familiar for individuals coming to Carbon
from a C++ background, and simplify our approach for C++ Interoperability. At
the cost of diverge from existing standards, for example
[Proposal 142](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0142.md#character-encoding)
states all of Carbon source code should be UTF-8 encoded. Prefix declarations
would detract the readability of the character literals and increase the
complexity of character literal [Types](#types).

### Allowing numeric escape sequences

This proposal does not support numeric escape sequences using `\x`. This
simplifies the design of character types and literals, making them only
represent code points and not code units. However this does come with the
disadvantage of less consistency of character literals with string literals,
since they now accept different escape sequences. We don't want to remove
numeric escape sequence from string literals, so we can support string use cases
like representing invalid encodings.

This approach has the additional concern that if character literals don't
support numeric escape sequences, developers may choose to use numeric literals
instead, at a cost of type-safety and readability. For example, it isn't clear
in `var first_digit: Char8 = 0;` whether `0` is supposed to be a `NUL` character
or the encoding of the `'0'` character (48). We addressed this concern, and type
safety concerns about distinguishing numbers and characters, by making the
integer to character conversions explicit.

### Supporting formulations of grapheme clusters and non-code-point code-units

Rather than explicitly limiting characters literals to a more integer-like
representation of a single Unicode code point, we could represent characters
literal formulations of grapheme clusters and non-code-point code units. What
humans tend to think of as a "character" corresponds to a "grapheme cluster."
The encoding of a grapheme cluster can be arbitrarily long and complex, which
would sacrifice the ability to perform integer operations. If we wanted to add
support for other character formulations, we would need to use separate
spellings to represent a small set of operations that are today expressed with
integer-based math on C++'s character literals. This includes things like
converting an integer between 0 and 9 into the corresponding digit character, or
computing the difference between two digits/two other characters. For these
reasons, we have decided to start out by representing character literals as
single Unicode code points following a more integer-like model. However this
topic should be revisited if we find that there is a significant need for the
additional functionality and attendant complexity for these other character
formulations.

## Future Work

### UTF code unit types proposal

There have been several ideas and discussions around how we would like to handle
UTF code units. This section will hopefully provide some guidance for a future
proposal when the topic is revisited for how we would like to build out
encoding/decoding for character literals.

We will have the types `Char8`, `Char16`, and `Char32` representing code units
in UTF-8, UTF-16, and UTF-32, but we will not support all code units, but only
those which map directly to the complete value of a code point. However,
character literals will use their own types distinct from these:

- We will support value preserving implicit conversions from character
literals to code point or code unit types. In particular, a character
literal converts to a `Char8` UTF-8 code unit if it is less than or equal to
0x7F, and `Char16` UTF-16 code unit if it is less than or equal to 0xFFFF.
- Conversions from string or character literals to a non-value-preserving
encoding must be explicit.
- Conversions from string literals to Unicode strings are implicit, even
though the numeric values of the encoding may change.

We can see whether the particular literal is represented in the variable's type
by only looking at the types.

```
let allowed: Char8 = 'a';
```

The above is allowed because the type of `'a'` is the character literal
consisting of the single Unicode code point 97, which can be converted to
`Char8` since 97 is less than or equal to 0x7F.

```
let error1: Char8 = '😃';
let error2: Char8 = 'AB';
```

However these should produce errors. The type of `'😃'` is the character literal
consisting of the single Unicode code point `0x1F603`, which is greater than
0x7F. The type of `'AB'` is a character literal that is a sequence of two
Unicode code points, which has no conversion to a type that only handles a
single UTF-8 code unit.

All of `'\n'`, and `'\u{A}'` represent the same character and so have the same
type. However, explicitly converting this character literal to another character
set might result in a character with a different value, but that still
represents the newline character.