Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More String escape sequence improvements (F#) #13257

Merged
merged 2 commits into from Jul 5, 2019

Conversation

Projects
None yet
3 participants
@srutzky
Copy link
Contributor

commented Jul 5, 2019

"Literals" page

In my previous update I seem to have added an extra 0 to the 0010FFFF at the end of the first paragraph in the Remarks section.

"Strings" page

  1. Added 5 missing sequences: \a, \f, v, \x, and \DDD. The following example code ran in LINQPad 5 (Language = "F# Program"):

    printfn "Decimal (NOT Octal) \\DDD requires 3 digits: TAB\9TAB\09TAB\009TAB";
    printfn "\\DDD notation is ISO-8859-1 (U+0000 - U+00FF): {\128-\129-\144-\152-\160-\161}";
    printfn "CHAR for \\DDD = (DDD %% 256); Max = \\999 (U+00E7): {\365-\621-\6210-\176-\100-\999-\1000}";
    printfn "---------------------";
    printfn "\\x only works with two hex digits: TAB\x9TAB\x090TAB";
    printfn "\\x is ISO-8859-1: 0x80 = \x80, 0x81 = \x81, 0x90 = \x90, 0x9A = \x9A, 0x9F = \x9F";
    printfn "\\x is _not_ creating UTF-8: \xE0\xBC\x82"; // UTF-8 bytes for U+0F02
    printfn "---------------------";printfn "Test \\a: \a";
    printfn "Test \\f: \f";
    printfn "Test \\v: \v";

    They can also be found in the source code on GitHub:

  2. Broke \u and \U sequences into separate entries.

  3. Provided range and an example for each Unicode character sequence.

  4. Added "Important" note regarding \DDD being decimal, not octal, notation.

  5. Added note regarding \DDD and \xx effectively being ISO-8859-1 (which is the first 256 Unicode code points), including a link to the WikiPedia article for ISO-8859-1.

  6. NOTE: I changed the Xs into Hs for the \u and \U sequences due to adding the \x sequence and not wanting to have \xXX as I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from the Ds used for the newly added \DDD sequence). If anyone feels strongly that it should remain as X, then it can be changed back.

Please see Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters) for more details.

srutzky added some commits Jul 3, 2019

Minor fix literals page
In my previous update I seem to have added an extra `0` to the `0010FFFF` at the end of the first paragraph in the **Remarks** section.
More String escape sequence improvements (F#)
1.  Added 5 missing sequences: `\a`, `\f`, `v`, `\x`, and `\DDD`. The following example code ran in LINQPad 5 (Language = "F# Program"):

```fsharp
printfn "Decimal (NOT Octal) \\DDD requires 3 digits: TAB\9TAB\09TAB\009TAB";
printfn "\\DDD notation is ISO-8859-1 (U+0000 - U+00FF): {\128-\129-\144-\152-\160-\161}";
printfn "CHAR for \\DDD = (DDD %% 256); Max = \\999 (U+00E7): {\365-\621-\6210-\176-\100-\999-\1000}";
printfn "---------------------";
printfn "\\x only works with two hex digits: TAB\x9TAB\x090TAB";
printfn "\\x is ISO-8859-1: 0x80 = \x80, 0x81 = \x81, 0x90 = \x90, 0x9A = \x9A, 0x9F = \x9F";
printfn "\\x is _not_ creating UTF-8: \xE0\xBC\x82"; // UTF-8 bytes for U+0F02
printfn "---------------------";printfn "Test \\a: \a";
printfn "Test \\f: \f";
printfn "Test \\v: \v";
```

It can also be found in the source code on GitHub:

* Defined here:
    * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lex.fsl#L209
* Processed here:
    * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L142
    * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L179

2. Broke `\u` and `\U` sequences into separate entries.

3. Provided range and an example for each Unicode character sequence.

4. Added "Important" note regarding `\DDD` being decimal, not octal, notation.

5. Added note regarding `\DDD` and `\xx` effectively being ISO-8859-1 (which is the first 256 Unicode code points)

6. **NOTE:** I change the `X`s into `H`s for the `\u` and `\U` sequences due to adding the `\x` sequence and not wanting to have `\xXX` as I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from the `D`s used for the newly added `\DDD` sequence).

Please see https://sqlquantumleap.com/2019/06/26/unicode-escape-sequences-across-various-languages-and-platforms-including-supplementary-characters/#fsharp for more details.

@srutzky srutzky requested a review from cartermp as a code owner Jul 5, 2019

@cartermp
Copy link
Contributor

left a comment

Thanks @srutzky! I definitely love the alien character, too 😄

@cartermp cartermp merged commit bcf6465 into dotnet:master Jul 5, 2019

7 checks passed

Docs Content Validation Status: Succeeded
Details
OpenPublishing.Build Validation status: passed
Details
OpenPublishing.Build (1 of 3) Waiting for processor completed at 14:18:07 PST
OpenPublishing.Build (2 of 3) Preparing completed at 14:20:35 PST
OpenPublishing.Build (3 of 3) Building completed at 14:21:21 PST
WIP Ready for review
Details
license/cla All CLA requirements met.
Details

@mairaw mairaw added this to the July 2019 milestone Jul 6, 2019

@cartermp cartermp referenced this pull request Jul 8, 2019

Closed

Unicode literal range #13271

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.