Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Supplementary Character / Surrogate Pair info (no code changes) #7221

Merged
merged 1 commit into from Jul 13, 2019

Conversation

@srutzky
Copy link
Contributor

commented Jul 12, 2019

Terminology and info regarding Supplementary Characters / Surrogate Pairs is either incorrect, or at least incomplete (which then leads to incorrect statements and/or code).

  1. Introduce the term "Supplementary Character" since that is often what we are dealing with, not "surrogate pairs" since that is an encoding-specific concept (UTF-16 only).

  2. Add comment re: Supplementary Character code point range, which helps to explain the elif high > 0x10 then Invalid condition (line 173).

  3. Fix URI for Unicode Standard PDF, Chapter 3, and specify the name of the section (i.e. "Surrogates") instead of the section number (i.e. 3.8) since the section number was 3.7 but is now 3.8 (line 174).

  4. Add comment for definition of a valid "surrogate pair" because why make the reader guess or have to go look it up when it will never change? (line 175)

  5. Correct and expand comment with example long Unicode escape sequence (line 64): "\UDEADBEEF" is not a valid escape sequence. Usage of the \U escape has been misstated from the very beginning, both in this documentation as well as the C# Specification documentation, and the language references for "String" for both F# and C#:

    1. \U is used to specify a Unicode code point (or UTF-32 code unit, which maps 1:1 with all Unicode code points, hence they are synonymous), not surrogate pairs. Hence the valid range is 00000000 - 0010FFFF, hence the first two digits are static 0s, and the third digit can only ever be a 0 or 1. This escape sequence can specify either a BMP character or a Supplementary character. Supplementary characters are then encoded as a surrogate pair in UTF-16 only, not in UTF-8 or UTF-32. If you want to specify an actual surrogate pair, then use the \u escape, e.g. \uD83D\uDC7D == \U0001F47D.
    2. Even if you could specify a surrogate pair using \U, "DEADBEEF" is not valid. U+DEAD is a valid surrogate, but it's a low surrogate code point and cannot be specified first in the pair (meaning, at best one could use xxxxDEAD). Also, U+BEEF is not a valid surrogate code point, high or low. Surrogate code points are in the range of U+D800 to U+DFFF.

For more info, please see:

Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)

Fix Supplementary Character / Surrogate Pair info (no code changes)
Terminology and info regarding Supplementary Characters / Surrogate Pairs is either incorrect, or at least incomplete (which then leads to incorrect statements and/or code).

1. Introduce the term "Supplementary Character" since that is often what we are dealing with, not "surrogate pairs" since that is an encoding-specific concept (UTF-16 only).

2. Add comment re: Supplementary Character code point range, which helps to explain the `elif high > 0x10 then Invalid` condition (line 173).

3. Fix URI for Unicode Standard PDF, Chapter 3, and specify the name of the section (i.e. "Surrogates") instead of the section number (i.e. 3.8) since the section number was 3.7 but is now 3.8 (line 174).

4. Add comment for definition of a valid "surrogate pair" because why make the reader guess or have to go look it up when it will never change? (line 175)

5. Correct and expand comment with example long Unicode escape sequence (line 64): `"\UDEADBEEF"` is _not_ a valid escape sequence. Usage of the `\U` escape has been misstated from the very beginning, both in this documentation as well as the C# Specification documentation, and the language references for "String" for both F# and C#:
    1. `\U` is used to specify a Unicode code point (or UTF-32 code unit, which maps 1:1 with all Unicode code points, hence they are synonymous), not surrogate pairs. Hence the valid range is `00000000` - `0010FFFF`, hence the first two digits are static `0`s, and the third digit can only ever be a `0` or `1`. This escape sequence can specify either a BMP character or a Supplementary character. Supplementary characters are then encoded as a surrogate pair in UTF-16 only, not in UTF-8 or UTF-32. If you want to specify an actual surrogate pair, then use the `\u` escape, e.g. `\uD83D\uDC7D` == `\U0001F47D`.
    2. Even if you could specify a surrogate pair using `\U`, "DEADBEEF" is not valid. U+DEAD is a valid surrogate, _but_ it's a low surrogate code point and cannot be specified first in the pair (meaning, at best one could use `\UxxxxDEAD`). Also, U+BEEF is _not_ a valid surrogate code point, high or low. Surrogate code points are in the range of U+D800 to U+DFFF.

For more info, please see:

https://sqlquantumleap.com/2019/06/26/unicode-escape-sequences-across-various-languages-and-platforms-including-supplementary-characters/#fsharp
@KevinRansom
Copy link
Member

left a comment

Nice .. thank you

@KevinRansom KevinRansom merged commit 7780cab into dotnet:master Jul 13, 2019

15 checks passed

WIP Ready for review
Details
fsharp-ci Build #20190712.21 succeeded
Details
fsharp-ci (Linux) Linux succeeded
Details
fsharp-ci (Linux_FCS) Linux_FCS succeeded
Details
fsharp-ci (MacOS_FCS) MacOS_FCS succeeded
Details
fsharp-ci (SourceBuild_Linux) SourceBuild_Linux succeeded
Details
fsharp-ci (SourceBuild_Windows) SourceBuild_Windows succeeded
Details
fsharp-ci (UpToDate_Windows) UpToDate_Windows succeeded
Details
fsharp-ci (Windows coreclr_release) Windows coreclr_release succeeded
Details
fsharp-ci (Windows desktop_release) Windows desktop_release succeeded
Details
fsharp-ci (Windows fsharpqa_release) Windows fsharpqa_release succeeded
Details
fsharp-ci (Windows vs_release) Windows vs_release succeeded
Details
fsharp-ci (Windows_FCS) Windows_FCS succeeded
Details
fsharp-ci (macOS) macOS succeeded
Details
license/cla All CLA requirements met.

TIHan added a commit to TIHan/visualfsharp that referenced this pull request Jul 31, 2019

Merge fsharp47 + dev16.3 into Dim interop (#7)
* Copy sources from Versions.props to NuGet.config (dotnet#7191)

* Only check distinct errors (dotnet#7140)

* Use 1-based column numbers in tests (dotnet#7141)

* Use 1-based column numbers in tests

* Helper that can check for multiple type errors

* Update dependencies from https://github.com/dotnet/arcade build 20190710.8 (dotnet#7200)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19360.8

* Better record and value formatting in tools (dotnet#7021)

* Remove semicolons from record tooltips

* Update to put a space between braces

* Update formatting as best I can, plus some tests I guess

* More baseline updates

* Anonymous records

* Update anon records tests

* Add vsbsl lol

* Update baselines and reduce a simple filter

* Update baselines maybe last time

* Update fsharpqa test

* make tests pass

* Add formatting for values

* Update tests

* Update test

* Update fsharpqa tests

* tryit

* lol

* get yote

* shlerp

* Update tests again I guess

* more update

* mother of pearl

* this is a real turd

* fix portable PDBs for anon records (dotnet#7099)

* fix portable PDBs for anon records (dotnet#7099)

* Moving ElseBranchHasWrongTypeTests over to NUnit (dotnet#7104)

* Port tests for missing else branch to NUnit (dotnet#7209)

* Update dependencies from https://github.com/dotnet/arcade build 20190711.7 (dotnet#7216)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19361.7

* Update dependencies from https://github.com/dotnet/arcade build 20190712.5

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19362.5

* Fix Supplementary Character / Surrogate Pair info (no code changes) (dotnet#7221)

Terminology and info regarding Supplementary Characters / Surrogate Pairs is either incorrect, or at least incomplete (which then leads to incorrect statements and/or code).

1. Introduce the term "Supplementary Character" since that is often what we are dealing with, not "surrogate pairs" since that is an encoding-specific concept (UTF-16 only).

2. Add comment re: Supplementary Character code point range, which helps to explain the `elif high > 0x10 then Invalid` condition (line 173).

3. Fix URI for Unicode Standard PDF, Chapter 3, and specify the name of the section (i.e. "Surrogates") instead of the section number (i.e. 3.8) since the section number was 3.7 but is now 3.8 (line 174).

4. Add comment for definition of a valid "surrogate pair" because why make the reader guess or have to go look it up when it will never change? (line 175)

5. Correct and expand comment with example long Unicode escape sequence (line 64): `"\UDEADBEEF"` is _not_ a valid escape sequence. Usage of the `\U` escape has been misstated from the very beginning, both in this documentation as well as the C# Specification documentation, and the language references for "String" for both F# and C#:
    1. `\U` is used to specify a Unicode code point (or UTF-32 code unit, which maps 1:1 with all Unicode code points, hence they are synonymous), not surrogate pairs. Hence the valid range is `00000000` - `0010FFFF`, hence the first two digits are static `0`s, and the third digit can only ever be a `0` or `1`. This escape sequence can specify either a BMP character or a Supplementary character. Supplementary characters are then encoded as a surrogate pair in UTF-16 only, not in UTF-8 or UTF-32. If you want to specify an actual surrogate pair, then use the `\u` escape, e.g. `\uD83D\uDC7D` == `\U0001F47D`.
    2. Even if you could specify a surrogate pair using `\U`, "DEADBEEF" is not valid. U+DEAD is a valid surrogate, _but_ it's a low surrogate code point and cannot be specified first in the pair (meaning, at best one could use `\UxxxxDEAD`). Also, U+BEEF is _not_ a valid surrogate code point, high or low. Surrogate code points are in the range of U+D800 to U+DFFF.

For more info, please see:

https://sqlquantumleap.com/2019/06/26/unicode-escape-sequences-across-various-languages-and-platforms-including-supplementary-characters/#fsharp

* Update IlxGen.fs (dotnet#7227)

* Check for exit code in compiler tests (dotnet#7211)

* Moving AccessOfTypeAbbreviationTests over to NUnit (dotnet#7226)

* Moving AccessOfTypeAbbreviationTests over to NUnit

* ha! now I know what this `1` means =)

* Error range updated and removed `exit 0`

* Error message prefixed by "This construct is deprecated."

* Error messages changed based on current state of FSI

* Moving ConstructorTests over to NUnit (dotnet#7236)

* [master] Update dependencies from dotnet/arcade (dotnet#7233)

* Update dependencies from https://github.com/dotnet/arcade build 20190713.1

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19363.1

* Update dependencies from https://github.com/dotnet/arcade build 20190714.1

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19364.1

* Update dependencies from https://github.com/dotnet/arcade build 20190715.4 (dotnet#7240)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19365.4

* Moving UpcastDowncastTests over to NUnit (dotnet#7229)

* Moving UpcastDowncastTests over to NUnit

* missing new line

* Update dependencies from https://github.com/dotnet/arcade build 20190716.4 (dotnet#7245)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19366.4

* Move ErrorMessages/NameResolution Tests to NUnit (dotnet#7237)

* Mark name resoltion error message tests for port to NUnit

* Add Nunit tests for FieldNotInRecord, RecordFieldProposal, and GlobalQualifierAfterDot

* Fix expected error message in FieldNotInRecord Compiler test

* Change global Qualifier after dot test back to original fsharpqa version.
Needs seperate assert function for parsing errors instead of typechecking

* Remove unnecessary double ticks from NameResolution tests

* Moving WarnExpressionTests over to NUnit (dotnet#7232)

* move some error and warning tests to NUnit (dotnet#7244)

* move some error and warning tests to NUnit

* CompilerAssert.ParseWithErrors now uses ParseFile instead of ParseAndCheckFileInProject

* merge conflicts

* Update dependencies from https://github.com/dotnet/arcade build 20190717.8

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19367.8

* Move UnitGenericAbstractType To Nunit (dotnet#7257)

* Update dependencies from https://github.com/dotnet/arcade build 20190718.7 (dotnet#7256)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19368.7

* Moving TypeMismatchTests over to NUnit (dotnet#7250)

* Moving TypeMismatchTests over to NUnit

* ci restart

* publish pdbs in FSharp.Core.nupkg (dotnet#7255)

Also publish native symbols so they can be archived later.

* Enable hash algorithm selection (dotnet#7252)

* Enable hash algorithm selection

* Feedback

* More feedback

* Revert "Feedback"

This reverts commit 6ab1b07.

* feedback

* Update dependencies from https://github.com/dotnet/arcade build 20190719.2 (dotnet#7260)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19369.2

* Improve netcore reference selection (dotnet#7263)

* Improve netcore reference selection

* Update baselines

* Moving Libraries Control tests to NUnit (dotnet#7234)

* Moving Libraries Control tests to NUnit

* Names tests as Async instead of Control

* Member constraints and PrimitiveConstraints (dotnet#7210)

* Update dependencies from https://github.com/dotnet/arcade build 20190722.10 (dotnet#7268)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19372.10

* fixes issue dotnet#6832 (dotnet#7259)

* fixes issue dotnet#6832

* Fix comment

* Forgot to remove pdbs from fsharp.compiler.nuget

* Rename Program.fs and exclude FSharpSdk from checks

* Embedded doesn't need to verify tmp or obj directories.

* typo

* typo

* Don't check FSharpSdk for hash

* Make comment match code

* Empty commit to force rebuild

* Color nameof as intrinsic (dotnet#7273)

* [master] Update dependencies from dotnet/arcade (dotnet#7269)

* Update dependencies from https://github.com/dotnet/arcade build 20190723.6

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19373.6

* Update dependencies from https://github.com/dotnet/arcade build 20190724.2

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19374.2

* Update dependencies from https://github.com/dotnet/arcade build 20190725.2

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19375.2

* Update dependencies from https://github.com/dotnet/arcade build 20190725.15 (dotnet#7282)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19375.15

* Fix test assert (dotnet#7283)

* disablewarningtests

* code cleanup prior to optional interop improvements (dotnet#7276)

* add test cases we need to make work

* cleanup method call argument processing

* Update dependencies from https://github.com/dotnet/arcade build 20190726.18 (dotnet#7285)

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19376.18

* Moving ClassesTests over to NUnit (dotnet#7264)

* Moving ClassesTests over to NUnit

* Fixed assertion for ParseWithErrors test

* Move Basic Constants to NUnit (dotnet#7262)

* Move BasicConstants.fs Tests To Nunit

* fix typo

* Another typo

* Moved  Don't Suggest Tests over to NUnit (dotnet#7288)

* [master] Update dependencies from dotnet/arcade (dotnet#7287)

* Update dependencies from https://github.com/dotnet/arcade build 20190727.2

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19377.2

* Update dependencies from https://github.com/dotnet/arcade build 20190728.1

- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19378.1

* Fix langversion with multiple projects (dotnet#7293)

* Resolve merge issues, and take care of Test Framework incompatabilities

* Tweak tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.