Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Consider translating String.Equals(String, StringComparison) for selected values of StringComparison #1222

Open
divega opened this issue Dec 4, 2014 · 110 comments

Comments

@divega
Copy link
Contributor

divega commented Dec 4, 2014

Currently we support the == operator and the String.Equals(String) methods which both have the same behavior in-memory as String.Equals(String, StringComparison.Ordinal). We could also support the latter without making significant changes and possibility other values of StringComparsion such as OrdinalIgnoreCase by applying LOWER() on both operands.

@divega divega changed the title Query: Consider options for translating String.Equals(String, StringComparison) for selected values of StringComparison Query: Consider translating String.Equals(String, StringComparison) for selected values of StringComparison Dec 4, 2014
@rowanmiller rowanmiller added this to the Backlog milestone Dec 8, 2014
@TrabacchinLuigi
Copy link

Actually I just got a warning about String.Equals (without the third parameter) not being able to be translated and being evaluated locally ...

@JaymzZh
Copy link

JaymzZh commented Dec 26, 2016

After upgrade to 1.1.0, I use String.Equals(String, StringComparison.Ordinal) got this error :

InvalidOperationException: variable '__name_0' of type 'System.String' referenced from scope '', but it is not defined

@divega
Copy link
Contributor Author

divega commented Dec 26, 2016

@jeffiy can you please create a new issue and provide a repro? If this is a regression in 1.1 we may want to fix it in 1.1.1.

@akamud
Copy link

akamud commented Dec 28, 2016

Could this be implemented using Collation on SQL Server? Using LOWER on indexed columns throws the index away for querying, as far as I know, which could hurt performance for people using this.

We wouldn't know this is happening since this would be translated by EF internally and would get us by surprise. Since I would have no warning about "local evaluation" on this case, I would expect that my ORM is doing the right thing and not creating a performance problem. Doing a LOWER should be an explicit decision, in my opinion.

@Suriman
Copy link

Suriman commented Jun 22, 2017

@divega Please, it is a must have.

@xperiandri
Copy link

Still does not work in 2.0
I was shocked that it doesn't!

@xperiandri
Copy link

I used string.Equals(string, string)

@mgolois
Copy link

mgolois commented May 3, 2018

Any update on this?

@mgolois
Copy link

mgolois commented Jul 28, 2018

Migrating to EFCore is becoming a show stopper for us because this issue. Any update @divega @rowanmiller ?

@divega
Copy link
Contributor Author

divega commented Aug 2, 2018

@mgolois This issue is about String.Equals(String, StringComparison). could you please elaborate about what exactly the query looks like that you are trying to port to EF Core?

@divega
Copy link
Contributor Author

divega commented Aug 2, 2018

@mgolois to be more precise:

  1. EF6 completely ignores the StringComparison argument. Is this the behavior that you expect from EF Core? If yes, why?
  2. Also, are you aware that ==, string.Equals(String), and String.Equals(string, string) work? If yes, why is lack of support for String.Equals(String, StringComparison) blocking you?

@divega divega removed this from the Backlog milestone Aug 2, 2018
@xperiandri
Copy link

Does equals usage with StringComparison at least produces a warning?

@divega divega added this to the Backlog milestone Aug 2, 2018
@smitpatel
Copy link
Member

@xperiandri - It does evaluation on client to produce correct results.

@divega
Copy link
Contributor Author

divega commented Aug 2, 2018

@xperiandri and it produces a warning like this in the log:

warn: Microsoft.EntityFrameworkCore.Query[20500]
      => Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor
      The LINQ expression 'where [p].Name.Equals(__name_0, CurrentCulture)' could not be translated and will be evaluated locally.

Which can optionally be turned into an error as explained in the documentation.

@xperiandri
Copy link

Maybe you would add Roslyn analyzer that will display info about that in design time?
Because having this warning in runtime is not an excellent experience.

@xperiandri
Copy link

Which can optionally be turned into an error as explained in the documentation.

It is very good!

@Bartmax
Copy link

Bartmax commented Oct 19, 2018

so what exactly happens right now if I use Equals with OrdinalIgnoreCase ?
does it perform an Equals on the db and then evaluate the ignore case locally or does it gets all records of the db and do the filtering locally ?

@BalassaMarton
Copy link

Is there any plan to do this in the near future? I wasn't aware of the fact that it's not supported, saw the warning when I already used this pattern in several queries. Do you recommend changing it to Where(x => x.SomeProperty.ToLower() == parameter.ToLower()) - is string.ToLower translated correctly?

@smitpatel
Copy link
Member

@divega - This issue suggest that for OrdinalIgnoreCase we could apply Lower on both sides, but that would mean index cannot be used (as pointed out by @akamud ), but there was no definitive answer which direction we would be taking. Can you clarify?

@divega
Copy link
Contributor Author

divega commented Nov 30, 2018

@smitpatel let’s bring this to a design meeting. I would be ok with the overload with StringComparison started to throw in the default implementation in absence of client eval, but I would also be open to grandfathering it (via ignoring the StringComparison argument) if the general feeling is that it is preferable to mitigate the impact of the breaking change.

I think it is preferable not to do the ToLower trick behind the users’ backs. They can still do that themselves explicitly, and there is still a chance that some of them will miss the fact that it isn’t sargable, but at least we didn’t do it without them asking.

@divega
Copy link
Contributor Author

divega commented Nov 30, 2018

@smitpatel Besides, I believe many databases will default to case insensitive comparisons anyway, so what is harder to achieve is a case sensitive one.

@divega
Copy link
Contributor Author

divega commented Nov 30, 2018

@BalassaMarton if you control the database schema it is preferable to specify a column collation that will compare string according to your preference.

If you have a case insensitive collation and can’t change it but need a comparison that is case sensitive, you can do a regular (==) comparison in the database and then filter out false positives on the client with a case sensitive comparison.

I have tried other workarounds using functions that encapsulate comparisons with explicit collations, but SQL Server actually doesn’t support inline scalar functions, and non-inline functions aren’t sargable, so you end up needing to write a TVF. It is possible but complicated.

@Saibamen
Copy link

Saibamen commented Dec 14, 2020

Workaround:

Instead of:
FirstOrDefault(b => b.X.Equals(Y, StringComparison.InvariantCultureIgnoreCase))
use ToLower() to simulate comparison with ignoring case:
FirstOrDefault(b => b.X.ToLower() == Y.ToLower())

Warning: ToLowerInvariant() also not working here

I know this not contains invariant culture, but this is better than waiting for EF Core 6 or 7, staying on old EF 6 or rewriting all API consumers (if we gets rid of comparison and use only FirstOrDefault(b => b.X == Y))

@roji
Copy link
Member

roji commented Dec 15, 2020

@Saibamen it's recommended to look into column collations rather than calling ToLower. With collations, an index defined over your column can be used.

@PMExtra
Copy link

PMExtra commented Jan 9, 2021

In some provider such as PostgreSQL, I think InvariantCultureIgnoreCase should be map to ILIKE for better performance.

@roji
Copy link
Member

roji commented Jan 9, 2021

@PMExtra this issue is about String.Equals, where PostgreSQL ILIKE does pattern matching with wildcards (like LIKE), so that doesn't seem right. You can already use collations with EF Core 5.0 to make PostgreSQL insensitive equality operations.

@PMExtra
Copy link

PMExtra commented Jan 10, 2021

@roji But we can use LIKE or ILIKE without wildcards, So the question is, which is the better performance between TOLOWER(@param) == TOLOWER(field) and field ILIKE @param?

@roji
Copy link
Member

roji commented Jan 10, 2021

@PMExtra I'm not aware of a way to use LIKE/ILIKE without wildcards. If the pattern is constant, we can indeed escape all the wildcards (this is what we currently do for other cases), but not when the pattern is a parameter or column.

More generally, I suggest following the discussing from the beginning to understand why I think the recently-introduced collations functionality is the better way to go. Regardless, if users prefer either TOLOWER or ILIKE, they can already express that themselves.

@IanKemp
Copy link

IanKemp commented Aug 18, 2021

The whole .NET StringComparison vs DB collation pain is yet another reminder to the unwary that clients and databases work very differently, and that databases are still about two decades behind the state-of-the-art (exhibit A: SQL Server's default collation is still case-insensitive). EF.Functions.Collate is at best a band-aid over this gaping dichotomy.

And it doesn't solve the problem that users can write EF code that uses StringComparison and said code will only fail at runtime. The in-memory EF provider doesn't barf on StringComparison, which means the only way to find out you've misused StringComparison is to integration or manually test your code. That's a problem.

At the very least the EF Core team should add a Roslyn analyzer for StringComparison to detect and inform users at compile-time. I would also like to see the in-memory provider updated to fail when it encounters StringComparison. Another useful Roslyn analyzer would be one that warns about explicitly calling ToLower() and ToUpper() inside EF expressions, as those are unlikely to have the desired effect with DB collations...

@roji
Copy link
Member

roji commented Aug 18, 2021

The whole .NET StringComparison vs DB collation pain is yet another reminder to the unwary that clients and databases work very differently, and that databases are still about two decades behind the state-of-the-art (exhibit A: SQL Server's default collation is still case-insensitive). EF.Functions.Collate is at best a band-aid over this gaping dichotomy.

FWIW this isn't about databases "being behind", but about these operations execute in a very different way in the database and in a regular program. The important point here is that databases natively use indexes, which is why the collation must be set up in advance in your schema, and queries need to match that collation; whereas a regular .NET program simply goes over a list of strings in memory, and can do whatever type of comparison it wants. This is precisely why StringComparison makes sense in a .NET program, but not when translating a query to SQL.

Re the rest, there's absolutely no guarantee that queries which work on InMemory will work against a real database - this is very much by-design. The specific constructs and functions that are supported vary across databases (SQL Server, PostgreSQL and Sqlite support very different things), and InMemory simply cannot mimic a real database.

There have been various requests in the past to have an analyzer which detects queries which would fail at runtime - unfortunately this simply isn't feasible for most cases. Specifically for your proposal, consider that StringComparison does make sense on some providers even if it doesn't on most relational databases: here's a proposal to translate it for Cosmos. In other words, the analyzer would have to somehow know - at compile-time - which provider is being targeted, and that's not really possible.

At the end of the day, a robust test suite is the only way to be sure that your queries execute properly, and continue executing properly.

@IanKemp
Copy link

IanKemp commented Aug 19, 2021

There have been various requests in the past to have an analyzer which detects queries which would fail at runtime - unfortunately this simply isn't feasible for most cases. Specifically for your proposal, consider that StringComparison does make sense on some providers even if it doesn't on most relational databases: here's a proposal to translate it for Cosmos. In other words, the analyzer would have to somehow know - at compile-time - which provider is being targeted, and that's not really possible.

Unnecessary. Simply write an analyzer that assumes that people are using a database that supports nothing and should therefore be warned about everything, make it an add-on NuGet package, and allow end-users to ignore the warnings that don't fit their DBMS and/or use-case using the mechanisms that exist in .NET today.

At the end of the day, a robust test suite is the only way to be sure that your queries execute properly, and continue executing properly.

Most people would consider a test suite using the in-memory provider to be sufficiently robust, and yet it demonstrably isn't in this case. The in-memory provider already throws by default when it encounters transactions, behaviour which can be disabled via ConfigureWarnings; I see no good reason why the same cannot be implemented for explicit ToLower/ToUpper calls, and calls containing StringComparison.

@roji
Copy link
Member

roji commented Aug 19, 2021

Unnecessary. Simply write an analyzer that assumes that people are using a database that supports nothing and should therefore be warned about everything, make it an add-on NuGet package, and allow end-users to ignore the warnings that don't fit their DBMS and/or use-case using the mechanisms that exist in .NET today.

That sounds like a pretty bad user experience, and would also not work for applications using multiple providers (which have different capabilities). And once again, predicting at compile-time whether a query will execute successfully is simply not feasible; for the specific case of StringComparison it does seem easy to detect the specific invocation, but in many other cases translatibility depends on more complex factors (what exact parameter is being used, where in the query is the construct/function being used...). And since translatibility can't be successfully predicted in the general case, users have to write tests anyway to ensure their code works.

Stepping back, I understand the initial frustration of writing a query using StringComparison and getting an exception - but that's part of learning how an API works when first starting to use it. LINQ and queries are maybe a bit tougher, since they're an open space where query shapes are basically infinite. But if you try using StringComparison, we already include a very specific message pointing you to the doc page about collations - that seems reasonable enough. Not every issue in programming can be flagged at compile-time.

Most people would consider a test suite using the in-memory provider to be sufficiently robust, and yet it demonstrably isn't in this case.

We very explicitly discourage testing EF Core applications with InMemory, for precisely this kind of thing. InMemory cannot and will never behave like your real database; either create a repository layer above EF and mock that (in which case you provide results directly and don't need InMemory), or test against your production database system (in which case you get full fidelity between test and production).

The in-memory provider already throws by default when it encounters transactions, behaviour which can be disabled via ConfigureWarnings; I see no good reason why the same cannot be implemented for explicit ToLower/ToUpper calls, and calls containing StringComparison.

The difference is that the InMemory provider can support ToLower/ToUpper/StringComparison, whereas it cannot support transactions or raw SQL. And once again, ToLower/ToUpper/StringComparison may make sense in certain other production providers.

@IanKemp
Copy link

IanKemp commented Aug 19, 2021

Not every issue in programming can be flagged at compile-time.

But this particular one can, and relatively easily as you yourself have already agreed. "We can't do it for every case" should not imply "we won't do it for a single simple case", in the same way that perfect should not be the enemy of good enough.

The in-memory provider already throws by default when it encounters transactions, behaviour which can be disabled via ConfigureWarnings; I see no good reason why the same cannot be implemented for explicit ToLower/ToUpper calls, and calls containing StringComparison.

The difference is that the InMemory provider can support ToLower/ToUpper/StringComparison, whereas it cannot support transactions or raw SQL. And once again, ToLower/ToUpper/StringComparison may make sense in certain other production providers.

Which is why you should be able to configure it. Default behaviour:

services.AddDbContext<MyDbContext>(o => o
            .UseInMemoryDatabase("blah")
            .ConfigureWarnings(w => 
                {
                    w.Ignore(InMemoryEventId.UseOfToLowerOrToUpper);
                    w.Ignore(InMemoryEventId.UseOfToStringComparison);
                });

Someone like me using MSSQL in prod:

services.AddDbContext<MyDbContext>(o => o
            .UseInMemoryDatabase("blah")
            .ConfigureWarnings(w => 
                {
                    w.Throw(InMemoryEventId.UseOfToLowerOrToUpper);
                    w.Throw(InMemoryEventId.UseOfToStringComparison);
                });

@roji
Copy link
Member

roji commented Aug 19, 2021

Not every issue in programming can be flagged at compile-time.

But this particular one can, and relatively easily as you yourself have already agreed. "We can't do it for every case" should not imply "we won't do it for a single simple case", in the same way that perfect should not be the enemy of good enough.

I disagree - this would give the illusion that InMemory does approximate the SQL Server behavior; if it catches this specific SQL Server incompatibility, the expectation/assumption would be that it catches in general. It's better to say loud and clear that "this provider does not behave like SQL Server", then to make it match SQL Server on one very small bit. Once again, users must properly test their queries on their production database system, so it's better not to give any illusions to the contrary.

Which is why you should be able to configure it. Default behaviour:

This is not scalable, given the number of other little behavior differences between InMemory and any/all other provider. There would have to be hundreds/thousands of flags controlling every possible imaginable language construct and function.

It also does not help users, since it puts the onus on them to tweak their InMemory configuration to match the SQL Server behavior. A solution that would require every user to fully research which function is supported on their provider, and tweak InMemory accordingly doesn't make sense to me.

@dahlbyk
Copy link

dahlbyk commented Jul 5, 2023

Some folks here may have thoughts on dotnet/roslyn-analyzers#6743

@roji
Copy link
Member

roji commented Jul 6, 2023

@dahlbyk thanks for the point, wrote a response there.

@dahlbyk
Copy link

dahlbyk commented Jul 6, 2023

Pulling over the most relevant part of @roji's comment:

The long-term solution here is probably for EF to have its own diagnostics suppressor, and for that suppressor to be made aware of which database provider is being used (probably explicitly by the user via an .editorconfig setting, since having a suppressor detect the EF provider is probably unfeasible). At that point, we could suppress diagnostics within expressions based on that.

Wondering if this is a problem for EF to solve vs individual providers. Or if EF provides the base Analyzer and providers can register their limitations somehow?

@roji
Copy link
Member

roji commented Jul 6, 2023

Yeah, I'm not sure how this would look like. There's definitely a common functionality there across providers, e.g. to detect that something is within an expression in the first place. I suspect we'll have some common core/relational analyzer functionality which individual providers can leverage or something similar.

@kcaswick
Copy link

Not every issue in programming can be flagged at compile-time.

But this particular one can, and relatively easily as you yourself have already agreed. "We can't do it for every case" should not imply "we won't do it for a single simple case", in the same way that perfect should not be the enemy of good enough.

I disagree - this would give the illusion that InMemory does approximate the SQL Server behavior; if it catches this specific SQL Server incompatibility, the expectation/assumption would be that it catches in general. It's better to say loud and clear that "this provider does not behave like SQL Server", then to make it match SQL Server on one very small bit. Once again, users must properly test their queries on their production database system, so it's better not to give any illusions to the contrary.

I don't want this illusion. What I want is that when I write the C# expression that exactly matches the database behavior, it is translated, rather than triggering an error. Otherwise, even when what you are doing is possible with all the providers, it is impossible† to share a filter expression between logic running against databases, web APIs, and lists.

I'm fine with making this a requirement on providers rather than translating it in EF Core itself, if EF Core doesn't have enough information to know what the C# equivalent of the current collation is. Perhaps the provider would be responsible for registering which StringComparison or CompareInfo, if any, matches each database collation.

I would think the (EF or dotnet) platform should provide the shared functionality of recognizing which comparison an expression is using and/or translating the matching expression(s) to a simple Equals.

† without spending a bunch of time writing custom expression visitors and properly hooking them up as a wrapper around the provider's built-in translation.

@roji
Copy link
Member

roji commented Jul 10, 2023

I don't want this illusion. What I want is that when I write the C# expression that exactly matches the database behavior, it is translated, rather than triggering an error.

In various cases, that's simply not possible. For example, you may be running your EF application against some database which EF did not create; we do not know the collation of the column you're querying, and so cannot know whether the comparison will be case-sensitive or not. EF cannot inspect the schema of each and every column in your database just in order to know what the collation is and how comparisons against that column will operate.

Otherwise, even when what you are doing is possible with all the providers, it is impossible† to share a filter expression between logic running against databases, web APIs, and lists.

This is the (possibly unfortunate) reality: LINQ is a nice abstraction, but it is necessarily leaky because the query is executed differently based on where it's executed. There are many behavioral discrepancies between where queries execute, around case-sensitivity, null semantics, which functions can be translated, and many other factors. By design, EF does not attempt to gloss over and make queries execute the same wherever executed; that is mostly impossible, and where possible, can lead to significant perf degradation.

I'm fine with making this a requirement on providers rather than translating it in EF Core itself, if EF Core doesn't have enough information to know what the C# equivalent of the current collation is. Perhaps the provider would be responsible for registering which StringComparison or CompareInfo, if any, matches each database collation.

This is not just a provider problem, it's a collation problem, and different collations can be defined on different columns. For example, SQL Server by default uses a case-insensitive collation, but that can be overridden both at the database level and at the column level.

@kcaswick
Copy link

I don't want this illusion. What I want is that when I write the C# expression that exactly matches the database behavior, it is translated, rather than triggering an error.

In various cases, that's simply not possible. For example, you may be running your EF application against some database which EF did not create; we do not know the collation of the column you're querying, and so cannot know whether the comparison will be case-sensitive or not. EF cannot inspect the schema of each and every column in your database just in order to know what the collation is and how comparisons against that column will operate.

I was under the impression EF already could know the collation of each and every column, because you can define it as shown in collations and case sensitivity. If that is not the case, I still think the platform should provide the functionality for identifying the "collation" used by a comparison expression, rather than having to rewrite that in each provider, some providers only support StringComparison, others only support CompareInfo from the locale, etc. Then I would send the request to allow matching collations to the team for the provider for MS SQL Server (and any other databases I'm using).

If it is possible to specify collation in the model, but the developer has not, just throw an error that gives enough info for the developer to realize they need to specify it.

Otherwise, even when what you are doing is possible with all the providers, it is impossible† to share a filter expression between logic running against databases, web APIs, and lists.

This is the (possibly unfortunate) reality: LINQ is a nice abstraction, but it is necessarily leaky because the query is executed differently based on where it's executed. There are many behavioral discrepancies between where queries execute, around case-sensitivity, null semantics, which functions can be translated, and many other factors. By design, EF does not attempt to gloss over and make queries execute the same wherever executed; that is mostly impossible, and where possible, can lead to significant perf degradation.

Yes, but it is also preventing the developer from doing it when it is possible, and at least in the case of specified collation, if only the one matching the model is allowed, it should not cause perf degradation. It's preventing the simplest, most common case too, when the only difference is default case-sensitivity, and even the simplest C# expression for case-insensitive compare is not allowed.

I agree that unspecified/default collation (==) should continue to be allowed, and leak the underlying behavior. Only when the expression specifies the collation to compare with (ex. String.Equals(String, StringComparison.Ordinal)) should it be disallowed against non-matching database collations.

@roji
Copy link
Member

roji commented Jul 10, 2023

I was under the impression EF already could know the collation of each and every column, because you can define it as shown in collations and case sensitivity.

As I wrote above, that's only if you're using EF to create the database. But you could be using EF to connect to an existing database that's managed externally. It's not practical for various reasons to ask EF to discover schema details about all columns you'll potentially be querying.

An additional friction point here is that database collations can be very elaborate/complex, and don't necessarily correspond to .NET's StringComparison enum; to get an idea, see the SQL Server collations page (and remember that's just one database in what we have to support). Since StringComparison cannot capture the richness of the database collation, we really cannot say that the comparison will occur in the query as it would have occurred in .NET; this is frequently something that English-only devs miss (case sentivity just isn't enough: what about accent sensitivity, kana sensitivity...).

And at the end of the day, say we implemented this; this would mean that you can start using e.g. StringComparison.OrdinalIgnoreCase, and it would work assuming you've configured your column with the appropriate collation, and throw otherwise. Just like today, the dev would have to go and make sure the column collation is insensitive - that must happen either way. So the only value here is somewhat better discoverability of the fact that you've misconfigured your column collation when you get it wrong; it's not nothing, but I don't think it makes sense in the grand scheme of things.

EF translates from one domain (.NET/LINQ) to another (relational database/SQL); there are mismatches along the way, both between the source and target domains and between the different target domains. This is unfortunately something that devs must accept and embrace.

@dahlbyk
Copy link

dahlbyk commented Jul 10, 2023

Put another way, if a particular column is meant to match case-sensitively or not then very often the optimal way to specify that is in the database rather than scattered in the EF code. Unless you really like table scans.

Thus my request to turn off the compiler warnings suggesting the CultureInfo/StringComparison overloads that we actually don't want to use because the database should be configured for the right comparison.

@kcaswick
Copy link

I was under the impression EF already could know the collation of each and every column, because you can define it as shown in collations and case sensitivity.

As I wrote above, that's only if you're using EF to create the database. But you could be using EF to connect to an existing database that's managed externally. It's not practical for various reasons to ask EF to discover schema details about all columns you'll potentially be querying.

Does adding UseCollation to the model force you to use a migration, or will it currently be ignored if you don't run a migration? Since we're using DACPAC, not migrations, and we call UseCollation, I'm pretty sure it doesn't cause problems.

On projects I've been working on, we've usually used the command line tool to scaffold the context, since we were not usually using migrations. That already exports information about collations, at least for some providers.

An additional friction point here is that database collations can be very elaborate/complex, and don't necessarily correspond to .NET's StringComparison enum; to get an idea, see the SQL Server collations page (and remember that's just one database in what we have to support). Since StringComparison cannot capture the richness of the database collation, we really cannot say that the comparison will occur in the query as it would have occurred in .NET; this is frequently something that English-only devs miss (case sentivity just isn't enough: what about accent sensitivity, kana sensitivity...).

Yes, I expected that for some collations there would be no match for the database collation in .Net. I don't have statistics on global collation use, but I bet a very large fraction use one of Latin1_General_100_CI_AS_SC, Latin1_General_100_CI_AS_SC_UTF8, Latin1_General_CI_AS, *_BIN2_UTF8, Chinese_PRC_CI_AS, or Modern_Spanish_CI_AS. Many others will be supported by methods like Microsoft.SqlServer.Management.Smo.Server.GetStringComparer or have some collations that follow the Unicode rules for sorting and comparison.

And at the end of the day, say we implemented this; this would mean that you can start using e.g. StringComparison.OrdinalIgnoreCase, and it would work assuming you've configured your column with the appropriate collation, and throw otherwise. Just like today, the dev would have to go and make sure the column collation is insensitive - that must happen either way.

The dev really wouldn't have to go make sure of the column collation in many cases... if you know all your non-legacy databases are configured by default with a particular collation, you can just use that one and only worry about the specifics in special circumstances, with the added safety that you'll get an exception if you forget you are in a special circumstance.

So the only value here is somewhat better discoverability of the fact that you've misconfigured your column collation when you get it wrong; it's not nothing, but I don't think it makes sense in the grand scheme of things.

It's not just discoverability of getting it wrong, it is the ability to even run the code that does it right. Such as writing a comparison expression that will work the same across databases and LINQ to objects. (Though perhaps only if you're able to change all of them to similar enough collations.)

Would you say it would be an objectionable feature for EF to have? Or a feature that would be ok, but not important enough to allocate resources to?

@roji
Copy link
Member

roji commented Jul 11, 2023

Does adding UseCollation to the model force you to use a migration, or will it currently be ignored if you don't run a migration? Since we're using DACPAC, not migrations, and we call UseCollation, I'm pretty sure it doesn't cause problems.

Adding UseCollation doesn't force you to use a migration - EF generally doesn't force you to migrations; you give it a connection string and it connects to that database. If you're managing your database externally, it simply doesn't take effect on the column collation. UseCollation may still affect certain things in the query pipeline, but the column collation is what you defined it to be when creating the database.

On projects I've been working on, we've usually used the command line tool to scaffold the context, since we were not usually using migrations. That already exports information about collations, at least for some providers.

Not everyone does that.

It's not just discoverability of getting it wrong, it is the ability to even run the code that does it right. Such as writing a comparison expression that will work the same across databases and LINQ to objects. (Though perhaps only if you're able to change all of them to similar enough collations.)

The point is that if your databases are set up correctly, it makes zero difference whether you're using the StringComparison overload or the overload without StringComparison - you get the exact same behavior. The only thing that your proposal brings is for EF to throw when the database column is not set up correctly, in which case the dev would go and set it up correctly. That's why I can't see this being anything beyond a misconfiguration discoverability feature.

Would you say it would be an objectionable feature for EF to have? Or a feature that would be ok, but not important enough to allocate resources to?

First of all, EF never does the sort of "automatic schema inspection" this feature would require, i.e. to know the collations of all columns in your database in advance in case they are queried (or to get that information on-demand when some column is queried). Aside from the considerable work this would require, this will very likely be a problematic approach for many reasons - so I'd object to doing it, especially for the very little value this brings.

And second, as I wrote above, since things really do not correspond between .NET and the database on this, I really do think it's a better design for the LINQ query to be completely agnostic of collation details. IMO the right way here is for the user to express that "the column should be compared to something" without specifying anything about the how. This promotes understanding of how things are actually working under the hood, i.e. that the how is fully specified on the database-side in the column collation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests