Common LINQ mistakes
This is a list of common LINQ mistakes, which has grown out from reading code of others - and my own - and using the results to train people new to .NET and LINQ.
These mistakes and problems apply to all LINQ providers, but as people new to LINQ are hopefully familiar with SQL, some of the mistakes have explanations or examples in SQL.
In many cases, LINQ provider will fix the issue – but programmer should be smarter than her/his tools and not expext the provider, compiler or interpreter to fix the bad code. Not to mention, in many cases the misuse obfuscates the intent of the code, uses more resources or has invalid/unexpected results.
If you need to find these problems in the code, here is a list of regular expressions for that.
Understanding deferred execution is by far the biggest issue people have when learning LINQ.
There are great many explanations, for example (in no particular order):
- LINQ and Deferred Execution
- Deferred query execution
- Deferred vs Immediate Query Execution in LINQ
- Deferred execution vs immediate execution in LINQ
- Understanding LINQ's Deferred Execution
var person = persons.Where(x => x.LastName == "Smith" && x.FirstName == "John").First();
var person = persons.First(x => x.LastName == "Smith" && x.FirstName == "John");
Depending on optimizations of the LINQ provider, this may mean that your database fetches all John Smiths to your application, and the application then returns the first of those. You definitely don't want that.
This also applies for
Where(x => x.LastName == "Smith").Count() and other similar constructs, all of which obfuscate the intent of the code.
Let's explain the example code with Rubber Duck Debugging:
- Incorrect: I will take all persons named John Smith and use the first of them.
- Correct: I will get and use the first person named John Smith.
Using SingleOrDefault()/Single() instead FirstOrDefault()/First()
var person = persons.SingleOrDefault(x => x.Id == 21034);
var person = persons.FirstOrDefault(x => x.Id == 21034);
Single()/SingleOrDefault() ensures that there is just one and only one match in the set, whereas
First()/FirstOrDefault() returns the first match.
Single.. methods iterate over all elements of the set until two matching items are found – validating that there is just one match, and throw an error if second is found - which is both costly and usually unneeded, unless validating user input or such.
In terms of SQL,
SELECT TOP 2... whereas First is
SELECT TOP 1.... If you have 10M rows, and your match is the very first row,
Single() will still run through all of the rows, validating that there are no more matches, whereas
First() returns immediately.
In most cases, you aren't interested in ensuring that there is just one match - or often it is logically impossible. See the examples above - if your database has two Person entries with same primary key (ID), you have far bigger problems than using LINQ badly...
A special case is
Single() without predicate, which should be used when logically or semantically there should be only a single element in the set.
Not understanding the difference between First() and FirstOrDefault()
This is highly situation-dependent. Sometimes
First() is correct, and sometimes it isn't.
It is important to understand that
Single() throw an exception when match is not found, whereas
SingleOrDefault() won't – and return the default/null value, depending on the data type.
Take, for example:
var person = persons.First(x => x.Id == 21034);
Exception may be the correct behavior if a person with ID 21034 should exist, and not finding the entry is literally exceptional. However, remember that exceptions should not be used for flow control!
On the other hand:
var login = users.First(x => x.Username == "email@example.com");
Instead of having
First() throwing an exception, you probably should use
FirstOrDefault() and check
login variable for null - and log the invalid log-in attempt along with the details.
A very common pattern is "update or create" ("upsert" in SQL terms), for which you should always use
FirstOrDefault() and not exception handling.
Using Count() instead of Any() or All()
if (persons.Count() > 0)... if (persons.Count(x => x.LastName == "Smith") > 0)... if (persons.Count(x => x.LastName == "Smith") == persons.Count())...
if (persons.Any())... if (persons.Any(x => x.LastName == "Smith"))... if (persons.All(x => x.LastName == "Smith"))...
Count() will do the full iteration of the (matching) items in the set. If you are interested that there is a matching item in the set, use
Any() just returns true when there is at least one matching item, with an empty
Any() just returning true when there are items in your set.
E.g. if you have list of 10M persons, and you need to know that a person with last name Smith exists,
persons.Count(x => x.LastName == "Smith") will go through all the persons and check their last name, returning total number of Smiths.
persons.Any(x => x.LastName == "Smith") returns true after encountering the first entry with last name Smith.
All() is the opposite of
Any() - if you need to know that all your persons have lastname Smith,
persons.All(x => x.LastName == "Smith") will return false when encountering first person with last name not Smith, without going over all of the set.
persons.Where(x => x.LastName == "Smith").Where(x => x.FirstName == "John");
persons.Where(x => x.LastName == "Smith" && x.FirstName == "John");
Where() entries may end up in SQL database as multiple embedded queries (in style of
SELECT * FROM (SELECT * FROM PERSONS WHERE LastName = 'Smith') WHERE FirstName = 'John', instead of
SELECT * FROM PERSONS WHERE LastName = 'Smith' AND FirstName = 'John') – or will do multiple iterations over sets.
It also obfuscates the intent of the query. While most if not all LINQ providers optimize the multi-Where() properly - or the underlying provider does that - let's explain the code to our rubber duck again:
- Incorrect: I will fetch all persons with last name Smith, and out of them I will take people with first name John.
- Correct: I will fetch all persons named John Smith.
persons.OrderBy(x => x.LastName).OrderBy(x => x.FirstName);
persons.OrderBy(x => x.LastName).ThenBy(x => x.FirstName);
The "incorrect" version sorts persons by LastName - and then ignores that sort completely and re-sorts persons by FirstName. Correct use is one
OrderBy() followed by
ThenBy(), which can be chained multiple times (
persons.OrderBy(x => x.LastName).ThenBy(x => x.FirstName).ThenBy(x => x.Age)).
This also applies to
Select(x => x)
persons.Where(x => x.LastName == "Smith" && x.FirstName == "John").Select(x => x);
persons.Where(x => x.LastName == "Smith" && x.FirstName == "John");
It is unclear why this artefact pops up every now and then. Maybe the author intended to do something with
Select() and forgot, or thought that
Select() materializes the query, like
Another possible explanation is that they used this pattern to return IEnumerable<T>, but didn't know about
AsEnumerable() extension method.
Using is...as instead of OfType<>()
var employees = persons.Where(x => x is Employee).Select(x => x as Employee); var employees = persons.Where(x => x is Employee).Select(x => (Employee)x); var employees = persons.Select(x => x as Employee).Where(x => x != null);
var employees = persons.OfType<Employee>();
is...as antipattern is a very common occurrence in .NET world.
is does nothing more internally besides trying to cast the object to specified type and returning false if the casting fails. Using
is...as antipattern therefore casts the object to specified type twice - something that really should be avoided (note that c# 7 added
is pattern matching, allowing constructs like
if (person is Employee e)...)
In case of LINQ, OfType<>() returns only the objects of the specified type. This allows both shorter and clearer code, compared to the incorrect version.
Empty Count() for arrays, List<T>, Dictionary<T>, ...
var count = peopleArray.Count(); var count = peopleList.Count(); var count = peopleDictionary.Count();
var count = peopleArray.Length; var count = peopleList.Count; var count = peopleDictionary.Count;
Don't use Enumerable.Count() method for arrays or any collections that implement ICollection<T>/ICollection interface, such as List<T> and Dictionary<T>. Use Length for arrays and Count property for ICollection implementations.
Depending on platform, there are optimizations in place (see here) preventing O(n) operation of
Count() method, just returning existing