Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support u8 type suffix for UTF8 string literals. #58991

Merged

Conversation

AlekseyTs
Copy link
Contributor

@AlekseyTs AlekseyTs commented Jan 21, 2022

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jan 21, 2022

Slightly worried about two lang features going in at the same time that affect strings literals :-)


In reply to: 1018191692

@AlekseyTs
Copy link
Contributor Author

AlekseyTs commented Jan 21, 2022

@cston, @RikkiGibson Please review.


In reply to: 1018610271

@RikkiGibson RikkiGibson self-assigned this Jan 21, 2022
Copy link
Contributor

@RikkiGibson RikkiGibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments and questions


if (!inDirective && TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8')
{
info.Kind = SyntaxKind.UTF8StringLiteralToken;
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the above assignment to info.Kind be in an else below this if? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the above assignment to info.Kind be in an else below this if?

That is an alternative way to get to the same result. If you prefer, I can refactor the code that way.


if (TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8')
{
info.Kind = SyntaxKind.UTF8StringLiteralToken;
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the above assignment to info.Kind be in an else below this if? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an alternative way to get to the same result. If you prefer, I can refactor the code that way.

static void Main()
{
var y = (C2)""dog""u8;
var z = (C3)""cat""u8;
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why an error is produced here. I thought "cat"u8 has natural type ReadOnlySpan<byte> and then we can just use the user-defined conversion. Maybe we can sync offline? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why an error is produced here.

You are correct. This doesn't follow the proposed specification. I guess, I forgot to add a PROTOTYPE comment about that in the Binder. I decided to go with byte[] as the natural type for now. That allows us to get away without special conversion rules for UTF8 string literals (do not confuse with string constant expressions, UTF8 string literals are not constants) because there are already implicit user-defined conversions from byte[] to Span<byte> and ReadOnlySpan<byte>, and there are no existing conversions from ReadOnlySpan<byte> to byte[] or Span<byte>. It feels like the language rules are simpler. Also, even when we are creating a ReadOnlySpan<byte>, we are creating a byte[] first. It feels like byte[] is a better choice for a natural type.

[Fact]
public void Errors_01()
{
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_07.
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "suffixes" #Resolved

[Fact]
public void Errors_05()
{
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_06.
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "suffixes" #Resolved

[Fact]
public void Errors_08()
{
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_07.
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "suffixes" #Resolved

[Fact]
public void Errors_12()
{
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_06.
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "suffixes" #Resolved

[Fact]
public void Interpolation_01()
{
UsingExpression(@"$""hello""u8",
Copy link
Contributor

@RikkiGibson RikkiGibson Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to modify the proposal to specify what happens when @ or $ prefix is used, and either:

  • the u8 suffix is used, or
  • the new utf8 conversion is used

Intuitively I thought that a constant-valued interpolated string would be fine to use with the u8 suffix, but that doesn't appear to be the case. I don't have a strong position on that, but it seems like whatever is decided should be included in the proposal.
#Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where confusion is coming from. I think the proposal is clear: "the language will provide the u8 suffix on string literals to force the type to be UTF8." The suffix is for literals, not for constant expressions. Interpolated strings are not literals, even when they have a constant value.

@@ -82,6 +82,7 @@ private static bool AreTokensEquivalent(GreenNode? before, GreenNode? after, Fun
case SyntaxKind.NumericLiteralToken:
case SyntaxKind.CharacterLiteralToken:
case SyntaxKind.StringLiteralToken:
case SyntaxKind.UTF8StringLiteralToken:
Copy link
Member

@cston cston Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyntaxKind.UTF8StringLiteralToken

Are we testing this case? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we testing this case?

Not explicitly. I'll make note in test plan

";
var comp = CreateCompilation(source, targetFramework: TargetFramework.NetCoreApp, options: TestOptions.DebugExe);
CompileAndVerify(comp, expectedOutput: @"byte[]"). // The behavior has changed
VerifyDiagnostics();
Copy link
Member

@cston cston Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding:

comp = CreateCompilation(source, targetFramework: TargetFramework.NetCoreApp, parseOptions: TestOptions.Regular10, options: TestOptions.DebugExe);
CompileAndVerify(comp, expectedOutput: "string");
``` #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry in "docs/compilers/CSharp/Compiler Breaking Changes - DotNet 7.md".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario will become an error. We will not change semantics based on language version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry in "docs/compilers/CSharp/Compiler Breaking Changes - DotNet 7.md".

I'll add a PROTOTYPE comment about that for now. We might still change the design around this

CompileAndVerify(comp, expectedOutput: @"array").VerifyDiagnostics();
}

[ConditionalFact(typeof(CoreClrOnly))]
Copy link
Member

@cston cston Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ConditionalFact(typeof(CoreClrOnly))]

[Fact] #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HelpersSource refers to types that are not available on desktop. I am fairly comfortable to test this scenario only on CoreClr.

").VerifyDiagnostics();
}

[ConditionalFact(typeof(CoreClrOnly))]
Copy link
Member

@cston cston Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ConditionalFact(typeof(CoreClrOnly))]

[Fact] #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HelpersSource refers to types that are not available on desktop. I am fairly comfortable to test this scenario only on CoreClr.

);
}

[ConditionalFact(typeof(CoreClrOnly))]
Copy link
Member

@cston cston Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ConditionalFact(typeof(CoreClrOnly))]

[Fact] #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HelpersSource refers to types that are not available on desktop. I am fairly comfortable to test this scenario only on CoreClr.

@@ -673,6 +673,9 @@ private BoundExpression BindExpressionInternal(ExpressionSyntax node, BindingDia
case SyntaxKind.NullLiteralExpression:
return BindLiteralConstant((LiteralExpressionSyntax)node, diagnostics);

case SyntaxKind.UTF8StringLiteralExpression:
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, in the raw-string-work we decided to not have a special expression type. it's just a string-literal that consumer can check the token kind on. up to you if you want to do the same. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, in the raw-string-work we decided to not have a special expression type. it's just a string-literal that consumer can check the token kind on. up to you if you want to do the same.

I will add a PROTOTYPE comment for now and will revisit this once raw-string-work gets into this branch.

@@ -77,7 +77,18 @@ private void ScanStringLiteral(ref TokenInfo info, bool inDirective)
}
else
{
info.Kind = SyntaxKind.StringLiteralToken;
if (!inDirective && TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8')
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this may follow the spec, but we probably want to take back to ldm if we support U8 as well as u8 all our other suffixes are case insensitive. so we probably want the same here. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may follow the spec, but we probably want to take back to ldm if we support U8 as well as u8 all our other suffixes are case insensitive. so we probably want the same here.

Will add a PROTOTYPE comment and will bring this design question to LDM

@@ -77,7 +77,18 @@ private void ScanStringLiteral(ref TokenInfo info, bool inDirective)
}
else
{
info.Kind = SyntaxKind.StringLiteralToken;
if (!inDirective && TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8')
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: for raw strings, i allowed them to be parsed in directives, but then give a particular message that it isn't allowed. up to you if you want to do the same. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: for raw strings, i allowed them to be parsed in directives, but then give a particular message that it isn't allowed. up to you if you want to do the same.

I do not believe it is worth the effort

@AlekseyTs AlekseyTs enabled auto-merge (squash) January 25, 2022 14:38
@AlekseyTs AlekseyTs merged commit d681e4e into dotnet:features/Utf8StringLiterals Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants