Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The requirement that DU Identifiers start with an Upper case character does not account for languages without a notion of case. #9186

Closed
KevinRansom opened this issue May 14, 2020 · 10 comments · Fixed by #9199

Comments

@KevinRansom
Copy link
Member

Looking through the FSharpQA testcode, I came across this test case

  // DU
  type ख़तरxनाक = 
               | Uअलगाववादी     // There's no uppercase/lowercase in Hindi, so I'm adding a latin char
               | Aमिलती of ख़तरनाक
               | X

I think we should lose the requirement that DU identifiers must be upper case. As well as the warning about Upper case identifiers in patterns. There are plenty of languages that do not have the notion of uppercase and lowercase, building language requirements and conventions based around our notions of the alphabet is probably something we should avoid.

In the test above, the hindi code needed to have an upper case latin character prefixed in order to successfully compile.

  <data name="UpperCaseIdentifierInPattern" xml:space="preserve">
    <value>Uppercase variable identifiers should not generally be used in patterns, and may indicate a missing open declaration or a misspelt pattern name.</value>
  </data>
  <data name="NotUpperCaseConstructor" xml:space="preserve">
    <value>Discriminated union cases and exception labels must be uppercase identifiers</value>
  </data>
@KevinRansom
Copy link
Member Author

@dsyme , I'm sure this must have come up before but I don't remember it. Why is the Uppercase requirement useful?

Thanks

Kevin

@smoothdeveloper
Copy link
Contributor

@KevinRansom I think it is due to the fact it would cause some ambiguity between matching a DU case using a (generally) lowercase identifier to bind a value.

If DU members are always uppercased and bound value use lowercase identifiers, I think it sets some "conventional distinction" in the way those match expressions look like.

Aside, and jokingly, ख़तरनाक would better be defined as an exception type rather than a DU.

@KevinRansom
Copy link
Member Author

@smoothdeveloper is the ambiguity in the developer or the compiler? a hindi writing developer after all has to deal with that ambiguity. using case as a convention only works when using bicameral scripts, Latin, Cyrilic. There are plenty of non bicameral scripts that people use.

@cartermp
Copy link
Contributor

cartermp commented May 14, 2020

Duplicate of #4697 :)

(As a recent learner of Korean I've also been hit by this; I can't define myself as a person type like this)

type 사람 =
    | 필립 of string

@smoothdeveloper
Copy link
Contributor

@KevinRansom ambiguity for developper, but I'm mostly throwing darts while blinded, best would be to inquire from the source of truth (@dsyme) as to the motive for the rule, they don't seem explained in his comments in #4697.

I personally like it the way it is for latin script (or bicameral scripts, TIL), and also feel that for non bicameral scripts, this restriction could be lifted.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 14, 2020

An alternative, and possibly a middle ground is to check whether the first letter has an uppercase variant or not. This is simple enough, just compare uppercase and lowercase of the first char. If it's the same, the user chose a first character that is unicameral. If different, and the given char is lowercase, it's bicameral, and the warning should be given.

I think the motivation for the warning is that it's easier to recognize errors. A misspelled union case (with capital) triggers the warning. If we don't force uppercase DUs, such errors are harder to spot:

type D = One | Two | Thousand | Other
match x with
| One 
| Two -> "low number" 
| Thousan -> "high number"

Above code checks as complete, the last match is a variable, but this is probably not his intent. With the warning, the user is suggested to use lowercase for the variable, and probably fixes the typo, which in turn shows he missed a case (completeness check). If cases can be lowercase, such useful warnings are harder to get right, if at all.

This feature has saved me quite a few times already, and while I think we should be inclusive of other languages, we shouldn't lose this feature too easily either. Hence my suggestion for middle ground.

@KevinRansom
Copy link
Member Author

@abelbraaksma, I am working on exactly that as an approach. Whilst I dislike the capitalization requirement, from a compatibility perspective, the ship has already sailed on it. So Latin/Cyrilic and the like will have the requirement of capitalization, languages without capitals, won't.

@abelbraaksma
Copy link
Contributor

Oh, I didn't read the other bug, apparently I'm repeating myself from 2018 ;). It appears it mostly works as expected and as I wrote above, except for what @dsyme wrote here: #4697 (comment) (summary: some languages report false for both IsUpper and IsLower, catering for that would solve the general issue).

@KevinRansom
Copy link
Member Author

I have generalized it a bit further :-) In my implementation if IsUpper and IsLower yield the same result then, because you can't tell, it must be okay, it should be noted the identifier is already a valid identifier we are merely deciding about the casing :-)

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 14, 2020

@KevinRansom, the part were both are true is covered in the code already, both false isn't. That's the actual bug from the other issue. Bottom line is that the intention was there to cater for unicameral languages, it just wasn't complete.

But yeah, the generalized rule is what it should be ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants