-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open design idea: character literals #1934
Comments
Hi @zygoloid I'd love to give this a try but I'm new to open source contribution and I write Python. Although I have read the issue, I'm not sure I understand what's expected. Could you please link the source code or have some screenshots added for clarity? Thank you |
I am also interested in this, @opelolo you can take it if you're still interested. Or if you want we can collaborate. |
Sure @cabmeurer we can work together. Although I am still confused as to what's expected. Do you understand the issue? |
@opelolo This issue is asking for someone to write up a design of how character literals will work in Carbon, and take it through the proposal process. For previous examples of this sort of work, see #199 (which proposed string literals), and #143/#144 (which proposed numeric literals).
I'd definitely recommend finding a collaborator who's familiar with C++. The main task here is to write a document, not code, but interoperability and migration with C++ is a central focus of Carbon's design. Also, sometimes if there are implementability questions about a proposal, we ask for a proof-of-concept implementation in the Explorer, which is written in C++. |
@opelolo feel free to message me on the discord and we can get started! Edit: Created a draft as per the proposal process @zygoloid @geoffromer could one of you add the |
Sure, done! |
Hello @geoffromer, I'm good at C++ I can work on this or I can team up with them as well. |
I don't think it makes sense to have multiple groups working on this independently, so I think you should coordinate with @cabmeurer and @opelolo. |
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the |
This comment was marked as off-topic.
This comment was marked as off-topic.
Put character literals in single quotes, like `'a'`. Character literals work like numeric literals: - Every different literal value has its own type. - The bit width is determined by the type of the variable the literal is assigned to, not the literal itself. Follows the plan from #1934. Co-authored-by: Richard Smith <richard@metafoo.co.uk> Co-authored-by: josh11b <josh11b@users.noreply.github.com> Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Fixed with #1964 ! |
We currently have no lexical support for character literals. While in principle we could reuse string literals for this purpose, it would make code more self-documenting and readable if we had a distinct lexical syntax for character literals versus string literals. Based on the syntactic choices made in other languages, single-quoted text (
'x'
) seems likely to be the best option here. We need a proposal for this.One significant open question is how character literals should interact with Unicode: should they represent a single UTF-8 code unit, a single code point, a single grapheme cluster, or something else? The choice seems to depend on what type is being initialized by the literal: for example, something akin to C++'s
char8_t
would want a single UTF-8 code unit, whereas something akin to C++'schar32_t
would want a single Unicode code point, and aGlyph
type in a text rendering system may want something more complex, such as a base character plus a sequence of combining characters.Design idea: Treat character literals exactly the same way as simple string literals but with
'
delimiters instead of"
. Do not support multi-line literals nor "raw" literals using#'x'#
, and do not allow empty character literals to keep'''
unambiguous. As with string literals, each character literal would have a different type.In a little more detail: a character literal is a
'
followed by a sequence of one or more non-newline, non-'
, non-\
characters and escape sequences, followed by another'
. As with string literals, the type of a character literal would depend on its contents, so'a'
and'b'
would have different types, as would'a'
and"a"
, but'\n'
and'\u{A}'
would have the same type. No restriction is placed on the number of UTF-8 code units in a character literal, but conversions from character literal type to each kind of character type would be supported only for characters that are representable, in the same way as we restrict integer literals to only convert to integer types in which they are representable.The text was updated successfully, but these errors were encountered: