Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Discussion) Case insensitive strings as a type #15266

Closed
ashmind opened this issue Sep 24, 2015 · 12 comments
Closed

(Discussion) Case insensitive strings as a type #15266

ashmind opened this issue Sep 24, 2015 · 12 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime design-discussion Ongoing discussion about design without consensus
Milestone

Comments

@ashmind
Copy link

ashmind commented Sep 24, 2015

I have read #14065, and it seems to be a special case of a generic problem.

Consider system where some strings are case sensitive (e.g. hash), and some aren't (e.g. name). Currently .NET defaults to being case sensitive, so I have to remember which string is which in all use cases. And those cases are not only about s1.Equals(s2) -- it is also every call to ToDictionary, every creation of HashSet and various other cases and optimizations that are really hard to keep track of.

It would be great if I could define string as being case-insensitive on type level, e.g. use CaseInsensitiveString. This would affect Comparer selection wherever this type is used as a key, and all comparisons.

Not quite sure whether it makes general sense and what the API would be like, so just putting it here as a discussion point.

@ashmind
Copy link
Author

ashmind commented Sep 24, 2015

Just thought on one more thing -- #14065 discusses a string comparison operator -- e.g.

if (framework ==~ "corefx") // true when framework is "CoreFX"

If we had case insensitive strings instead, we could have a literal syntax in C#, e.g. "x"i:

if (framework == "corefx"i) // true

@CodesInChaos
Copy link

Which culture should be used for comparison?

If you attach it to the string this leads to the question of what should happen when you compare two case insensitive strings which use different cultures.

Or should it always use ordinal comparisons?

@mikedn
Copy link
Contributor

mikedn commented Sep 24, 2015

it is also every call to ToDictionary , every creation of HashSet and various other cases and optimizations that are really hard to keep track of.

And how would a different type would help? Currently you can write:

string[] args = ...;
var set = new HashSet<string>(args, StringComparer.OrdinalIgnoreCase);

But if a separate string type is added you'd need something like the following:

string[] args = ...;
var set = new HashSet<CaseInsensitiveString>(args.Select(s => (CaseInsensitiveString)s));

It doesn't see how this is better.

@ashmind
Copy link
Author

ashmind commented Sep 24, 2015

@CodesInChaos

Which culture should be used for comparison?

This is a good question. Ordinal seems reasonable, but it's up for debate.

@mikedn

And how would a different type would help?

In your example it wouldn't, but let's think on where that string comes from. E.g. let's start with

public class ItemType {
    public CaseInsensitiveString Name { get; }
}

Now some use cases:

var typesByName = types.ToDictionary(t => t.Name);
var found = types.SingleOrDefault(t => t.Name == nameToFind);
var names = types.Select(t => t.Name).ToSet(); // custom, but common extension method

@ashmind
Copy link
Author

ashmind commented Sep 24, 2015

@mikedn So basically you decide which strings are CI when you are writing your domain model.

@JonHanna
Copy link
Contributor

This is a good question. Ordinal seems reasonable, but it's up for debate.

Great for checking keywords. Not so good for anything human-readable.

There's enough variation in whether or not ß == SS and i == I that people are still going to need to customise their case-folding. Once they're customising their case folding, what is saved?

@mikedn
Copy link
Contributor

mikedn commented Sep 24, 2015

So basically you decide which strings are CI when you are writing your domain model.

But in that case a domain specific string type - ItemTypeName for example - would probably be more useful as in addition to case insensitivity it could offer other features such as ensuring that the string has a specific format, length etc.

Of course, there will be cases where CaseInsensitiveString would be good enough but I'm not convinced that there are so many.

And interestingly, one common case where case insensitivity is needed - Windows file names - isn't served well by CaseInsensitiveString. On other OSes file names are case sensitive so this particular common issue can only be deal with by using a domain specific type, say, FileName.

@mburbea
Copy link

mburbea commented Sep 24, 2015

One further step could be instead of a CaseInsentiveString, hows about a CollatedString, which could allow you to specify the Culture and the type of comparison you expect. Similar to how Sql strings are stored. That way if you really need to concern yourself with a comparison that treats ß == ss you can, or you could fail on things like I == I when the encodings differ.

@ashmind
Copy link
Author

ashmind commented Sep 24, 2015

@mburbea

One further step could be instead of a CaseInsentiveString, hows about a CollatedString, which could allow you to specify the Culture and the type of comparison you expect.

That would be good, but the main question is what to do when two of those are compared.

or you could fail on things like I == I when the encodings differ

If by "Fail" you mean "return false" -- it would be OK for dynamic cases, but I think it's inconvenient for static comparisons.

E.g. x.Name == y.Name:

  1. One is "Collated", other is just string = compare using "Collated" comparer
  2. Both are "Collated" with the same comparer = compare using "Collated" comparer
  3. Both are "Collated" with diff comparers = compiles but always returns false? Potential for mistakes.

It is possible to do a generic hack using marker types, like CollatedString<CaseInsensitiveOrdinal>, though that would limit comparers to a specific list (and not sure how .NET team feels about generic hacks).

@ashmind
Copy link
Author

ashmind commented Sep 24, 2015

@mikedn

But in that case a domain specific string type - ItemTypeName for example - would probably be more useful as in addition to case insensitivity it could offer other features such as ensuring that the string has a specific format, length etc.

In perfect world -- yes. However having a .NET type can provide some benefits (most of those can be implemented manually, but many developers will never care enough):

  1. No effort required to implement the type with all Equality members and compare/cast operators
  2. No effort required to implement string interop, e.g. string.StartsWith(mycistring)
  3. No effort required to implement text serialization (once JSON.NET etc pick on new built-in type)
  4. Known type provides info for libraries like EF that can create DBs/queries accordingly
  5. Languages can potentially provide special syntax (istring x = "aaaa"i;, xs.Where(x => x == "test"i))

@AlexGhiondea
Copy link
Contributor

Reading through this, it looks like the clear advantage would come if this was actually a language feature. Having a specific type that holds a case insensitive string seems like overkill in most scenarios.

@KrzysztofCwalina should we move this issue to the Roslyn repo to see if there is interest from the compiler to implement this?

@GrabYourPitchforks
Copy link
Member

Is anybody still driving this? It has been many years since the last comment.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@jkotas jkotas closed this as completed Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime design-discussion Ongoing discussion about design without consensus
Projects
None yet
Development

No branches or pull requests

10 participants