-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode tables used for checking identifiers seem to be very out of date #44284
Comments
Yes, that is an issue, and it's already tracked by #13474 and #9731.
The C# compiler uses character tables from the framework which it is running on. And since character tables in .Net Framework seem to be fairly outdated, while recent versions of .Net Core use newer tables, whether U+9FED is accepted in an identifier depends on the way you run the compiler. For example, consider the following program: class \u9FED
{
static void Main() {}
} When I compile it using the .Net Core SDK 3.1 (which uses .Net Core 3.1), it works fine:
But if I use .Net Core SDK version 2.2 instead (by editing
It also doesn't work if I use the .Net Framework version of MSBuild:
How are you running the compiler? Is possible for you to switch to running it on a recent version of .Net Core? |
That's strange. I was using the 5.0 preview. I was using the character directly in the source code instead of the escape syntax that you used in that example, though. Would that make a difference? |
Roslyn uses latest Unicode table (Unicode 12.0 in .NET Core 3.1) but only BMP characters. I'm working on the fix: changing |
How exactly are you compiling your code? Using When I use VS 16.6 Preview 5, it doesn't work. I believe this is because VS is a .Net Framework app, and so it runs the compiler on .Net Framework, which means it uses outdated character tables.
It shouldn't. |
Oh, this line actually builds now! I had to retype it for some reason. Maybe I had some lingering surrogate halves somewhere in the code (outside of the string literal of course).
Yes, I was compiling using
Oh nice, so it will probably be 13.0 in 5.0 then.
That's really good to hear! I'm really happy to see the Unicode situation improving. I'm guessing you're using |
Is this issue actionable at this point? |
It seems to me now like the issue, if there is an issue at all, is with the language server and/or the VS Code plugin, since only it complains about that identifier, not the actual compiler. |
Yep, I can confirm that I still get a red squiggly and an error about an unexpected character in VS Code. I'm not sure if the language server is in this repository or another one. |
@Serentty The VS Code plugin for C# uses OmniSharp, whose server part runs on .Net Framework, which would explain why it uses old character tables. There's an issue with more details about why OmniSharp uses .Net Framework at OmniSharp/omnisharp-roslyn#1703. |
Closing, as I expect The VS Code plugin for C# is where that will have to be fixed, and that is being tracked at OmniSharp/omnisharp-roslyn#1703 |
That makes sense! Thanks for figuring out why this happens. |
According to the specification here, many characters which the compiler currently does not allow in identifiers should be allowed, as they have the necessary Unicode properties to qualify. At first I suspected that this might be an issue with characters falling outside of the BMP (which could theoretically also be another issue), but it seems that even many characters within the BMP such as 鿭 (the simplified Chinese character for the element nihonium) which should qualify do not work, since they were added more recently than whatever tables the compiler is using. The C# standard does direct the reader to the Unicode standard version 3.0 section 4.5 for information on character classes, but it does not specify that identifiers are to be limited to those characters which already existed in version 3.0, or that C# is fixed to Unicode 3.0 for character class data.
The text was updated successfully, but these errors were encountered: