Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open design idea: character literals #1934

Closed
zygoloid opened this issue Aug 6, 2022 · 11 comments
Closed

Open design idea: character literals #1934

zygoloid opened this issue Aug 6, 2022 · 11 comments
Labels
design idea An issue recording a specific language design idea that folks can potentially pick up. long term Issues expected to take over 90 days to resolve.

Comments

@zygoloid
Copy link
Contributor

zygoloid commented Aug 6, 2022

We currently have no lexical support for character literals. While in principle we could reuse string literals for this purpose, it would make code more self-documenting and readable if we had a distinct lexical syntax for character literals versus string literals. Based on the syntactic choices made in other languages, single-quoted text ('x') seems likely to be the best option here. We need a proposal for this.

One significant open question is how character literals should interact with Unicode: should they represent a single UTF-8 code unit, a single code point, a single grapheme cluster, or something else? The choice seems to depend on what type is being initialized by the literal: for example, something akin to C++'s char8_t would want a single UTF-8 code unit, whereas something akin to C++'s char32_t would want a single Unicode code point, and a Glyph type in a text rendering system may want something more complex, such as a base character plus a sequence of combining characters.

Design idea: Treat character literals exactly the same way as simple string literals but with ' delimiters instead of ". Do not support multi-line literals nor "raw" literals using #'x'#, and do not allow empty character literals to keep ''' unambiguous. As with string literals, each character literal would have a different type.

In a little more detail: a character literal is a ' followed by a sequence of one or more non-newline, non-', non-\ characters and escape sequences, followed by another '. As with string literals, the type of a character literal would depend on its contents, so 'a' and 'b' would have different types, as would 'a' and "a", but '\n' and '\u{A}' would have the same type. No restriction is placed on the number of UTF-8 code units in a character literal, but conversions from character literal type to each kind of character type would be supported only for characters that are representable, in the same way as we restrict integer literals to only convert to integer types in which they are representable.

@zygoloid zygoloid added good first issue Possibly a good first issue for newcomers design idea An issue recording a specific language design idea that folks can potentially pick up. labels Aug 6, 2022
@opelolo
Copy link

opelolo commented Aug 6, 2022

Hi @zygoloid I'd love to give this a try but I'm new to open source contribution and I write Python.

Although I have read the issue, I'm not sure I understand what's expected. Could you please link the source code or have some screenshots added for clarity? Thank you

@cabmeurer
Copy link
Contributor

I am also interested in this, @opelolo you can take it if you're still interested. Or if you want we can collaborate.

@opelolo
Copy link

opelolo commented Aug 8, 2022

Sure @cabmeurer we can work together. Although I am still confused as to what's expected. Do you understand the issue?

@geoffromer
Copy link
Contributor

@opelolo This issue is asking for someone to write up a design of how character literals will work in Carbon, and take it through the proposal process. For previous examples of this sort of work, see #199 (which proposed string literals), and #143/#144 (which proposed numeric literals).

I'd love to give this a try but I'm new to open source contribution and I write Python.

I'd definitely recommend finding a collaborator who's familiar with C++. The main task here is to write a document, not code, but interoperability and migration with C++ is a central focus of Carbon's design. Also, sometimes if there are implementability questions about a proposal, we ask for a proof-of-concept implementation in the Explorer, which is written in C++.

@cabmeurer
Copy link
Contributor

cabmeurer commented Aug 8, 2022

@opelolo feel free to message me on the discord and we can get started!

Edit: Created a draft as per the proposal process

@zygoloid @geoffromer could one of you add the proposal label to draft when you have some time?

@geoffromer
Copy link
Contributor

@zygoloid @geoffromer could one of you add the proposal label to draft when you have some time?

Sure, done!

@Aniumbott
Copy link

Hello @geoffromer, I'm good at C++ I can work on this or I can team up with them as well.

@geoffromer
Copy link
Contributor

Hello @geoffromer, I'm good at C++ I can work on this or I can team up with them as well.

I don't think it makes sense to have multiple groups working on this independently, so I think you should coordinate with @cabmeurer and @opelolo.

@github-actions
Copy link

github-actions bot commented Dec 6, 2022

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive label. The long term label can also be added for issues which are expected to take time.
This issue is labeled inactive because the last activity was over 90 days ago.

@github-actions github-actions bot added the inactive Issues and PRs which have been inactive for at least 90 days. label Dec 6, 2022
@eeshvardasikcm

This comment was marked as off-topic.

@github-actions github-actions bot removed the inactive Issues and PRs which have been inactive for at least 90 days. label Dec 15, 2022
@josh11b josh11b added the long term Issues expected to take over 90 days to resolve. label Dec 21, 2022
@zygoloid zygoloid removed the good first issue Possibly a good first issue for newcomers label Mar 10, 2023
chandlerc added a commit that referenced this issue Jun 15, 2023
Put character literals in single quotes, like `'a'`. Character literals work
like numeric literals:

-   Every different literal value has its own type.
-   The bit width is determined by the type of the variable the literal is
    assigned to, not the literal itself. Follows the plan from #1934.

Co-authored-by: Richard Smith <richard@metafoo.co.uk>
Co-authored-by: josh11b <josh11b@users.noreply.github.com>
Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
@chandlerc
Copy link
Contributor

Fixed with #1964 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design idea An issue recording a specific language design idea that folks can potentially pick up. long term Issues expected to take over 90 days to resolve.
Projects
None yet
Development

No branches or pull requests

8 participants