The requirement that DU Identifiers start with an Upper case character does not account for languages without a notion of case. #9186

KevinRansom · 2020-05-14T01:34:56Z

Looking through the FSharpQA testcode, I came across this test case

  // DU
  type ख़तरxनाक = 
               | Uअलगाववादी     // There's no uppercase/lowercase in Hindi, so I'm adding a latin char
               | Aमिलती of ख़तरनाक
               | X

I think we should lose the requirement that DU identifiers must be upper case. As well as the warning about Upper case identifiers in patterns. There are plenty of languages that do not have the notion of uppercase and lowercase, building language requirements and conventions based around our notions of the alphabet is probably something we should avoid.

In the test above, the hindi code needed to have an upper case latin character prefixed in order to successfully compile.

  <data name="UpperCaseIdentifierInPattern" xml:space="preserve">
    <value>Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.</value>
  </data>
  <data name="NotUpperCaseConstructor" xml:space="preserve">
    <value>Discriminated union cases and exception labels must be uppercase identifiers</value>
  </data>

The text was updated successfully, but these errors were encountered:

KevinRansom · 2020-05-14T01:36:21Z

@dsyme , I'm sure this must have come up before but I don't remember it. Why is the Uppercase requirement useful?

Thanks

Kevin

smoothdeveloper · 2020-05-14T03:32:58Z

@KevinRansom I think it is due to the fact it would cause some ambiguity between matching a DU case using a (generally) lowercase identifier to bind a value.

If DU members are always uppercased and bound value use lowercase identifiers, I think it sets some "conventional distinction" in the way those match expressions look like.

Aside, and jokingly, ख़तरनाक would better be defined as an exception type rather than a DU.

KevinRansom · 2020-05-14T04:37:30Z

@smoothdeveloper is the ambiguity in the developer or the compiler? a hindi writing developer after all has to deal with that ambiguity. using case as a convention only works when using bicameral scripts, Latin, Cyrilic. There are plenty of non bicameral scripts that people use.

cartermp · 2020-05-14T04:50:30Z

Duplicate of #4697 :)

(As a recent learner of Korean I've also been hit by this; I can't define myself as a person type like this)

type 사람 =
    | 필립 of string

smoothdeveloper · 2020-05-14T05:01:52Z

@KevinRansom ambiguity for developper, but I'm mostly throwing darts while blinded, best would be to inquire from the source of truth (@dsyme) as to the motive for the rule, they don't seem explained in his comments in #4697.

I personally like it the way it is for latin script (or bicameral scripts, TIL), and also feel that for non bicameral scripts, this restriction could be lifted.

abelbraaksma · 2020-05-14T19:57:14Z

An alternative, and possibly a middle ground is to check whether the first letter has an uppercase variant or not. This is simple enough, just compare uppercase and lowercase of the first char. If it's the same, the user chose a first character that is unicameral. If different, and the given char is lowercase, it's bicameral, and the warning should be given.

I think the motivation for the warning is that it's easier to recognize errors. A misspelled union case (with capital) triggers the warning. If we don't force uppercase DUs, such errors are harder to spot:

type D = One | Two | Thousand | Other
match x with
| One 
| Two -> "low number" 
| Thousan -> "high number"

Above code checks as complete, the last match is a variable, but this is probably not his intent. With the warning, the user is suggested to use lowercase for the variable, and probably fixes the typo, which in turn shows he missed a case (completeness check). If cases can be lowercase, such useful warnings are harder to get right, if at all.

This feature has saved me quite a few times already, and while I think we should be inclusive of other languages, we shouldn't lose this feature too easily either. Hence my suggestion for middle ground.

KevinRansom · 2020-05-14T20:06:46Z

@abelbraaksma, I am working on exactly that as an approach. Whilst I dislike the capitalization requirement, from a compatibility perspective, the ship has already sailed on it. So Latin/Cyrilic and the like will have the requirement of capitalization, languages without capitals, won't.

abelbraaksma · 2020-05-14T20:08:39Z

Oh, I didn't read the other bug, apparently I'm repeating myself from 2018 ;). It appears it mostly works as expected and as I wrote above, except for what @dsyme wrote here: #4697 (comment) (summary: some languages report false for both IsUpper and IsLower, catering for that would solve the general issue).

KevinRansom · 2020-05-14T20:12:02Z

I have generalized it a bit further :-) In my implementation if IsUpper and IsLower yield the same result then, because you can't tell, it must be okay, it should be noted the identifier is already a valid identifier we are merely deciding about the casing :-)

abelbraaksma · 2020-05-14T20:56:28Z

@KevinRansom, the part were both are true is covered in the code already, both false isn't. That's the actual bug from the other issue. Bottom line is that the intention was there to cater for unicameral languages, it just wasn't complete.

But yeah, the generalized rule is what it should be ;)

KevinRansom added the Feature Request label May 14, 2020

KevinRansom mentioned this issue May 15, 2020

Ensure that scripts without the notion of upper/lower case can create du identifiers #9199

Merged

KevinRansom closed this as completed in #9199 May 22, 2020

auduchinok mentioned this issue Jul 7, 2022

RFC FS 1126 Allow lower-case DU cases when RequireQualifiedAccess is specified #13432

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The requirement that DU Identifiers start with an Upper case character does not account for languages without a notion of case. #9186

The requirement that DU Identifiers start with an Upper case character does not account for languages without a notion of case. #9186

KevinRansom commented May 14, 2020

KevinRansom commented May 14, 2020

smoothdeveloper commented May 14, 2020

KevinRansom commented May 14, 2020

cartermp commented May 14, 2020 •

edited

Loading

smoothdeveloper commented May 14, 2020

abelbraaksma commented May 14, 2020 •

edited

Loading

KevinRansom commented May 14, 2020

abelbraaksma commented May 14, 2020

KevinRansom commented May 14, 2020

abelbraaksma commented May 14, 2020 •

edited

Loading

The requirement that DU Identifiers start with an Upper case character does not account for languages without a notion of case. #9186

The requirement that DU Identifiers start with an Upper case character does not account for languages without a notion of case. #9186

Comments

KevinRansom commented May 14, 2020

KevinRansom commented May 14, 2020

smoothdeveloper commented May 14, 2020

KevinRansom commented May 14, 2020

cartermp commented May 14, 2020 • edited Loading

smoothdeveloper commented May 14, 2020

abelbraaksma commented May 14, 2020 • edited Loading

KevinRansom commented May 14, 2020

abelbraaksma commented May 14, 2020

KevinRansom commented May 14, 2020

abelbraaksma commented May 14, 2020 • edited Loading

cartermp commented May 14, 2020 •

edited

Loading

abelbraaksma commented May 14, 2020 •

edited

Loading

abelbraaksma commented May 14, 2020 •

edited

Loading