[API Proposal]: The Except and ExceptBy methods #110121

Mr0N · 2024-11-24T17:05:02Z

Background and motivation

I propose adding an overload to the Except and ExceptBy methods that allows for removing elements from the provided array without removing duplicates.

API Proposal

IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second)
{
    var set = new HashSet<TSource>(second);

    foreach (TSource element in first)
    {
     
        if (!set.Contains(element))
        {
            yield return element;
        }
    }
}

API Usage

using System.ComponentModel;
using System.Linq;


var range = new int[] { 1 };
var rangeEx = new int[] { 1, 2, 3, 4, 5 ,2,5};

var result = rangeEx.ExceptBy(range, a => a).ToArray();
Write(result);
var write = rangeEx.Except(range);
Write(write.ToArray());

var iter = ExceptIterator(rangeEx, range);
Write(iter.ToArray());
Console.ReadKey();




void Write(params int[] values)
{
    Console.WriteLine($"Array:{string.Join(',',values)}");
}
IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second)
{
    var set = new HashSet<TSource>(second);

    foreach (TSource element in first)
    {
     
        if (!set.Contains(element))
        {
            yield return element;
        }
    }
}

Array:2,3,4,5
Array:2,3,4,5
Array:2,3,4,5,2,5

Alternative Designs

No response

Risks

No response

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-11-24T17:05:35Z

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Clockwork-Muse · 2024-11-24T20:44:40Z

nit: Except (along with Union and Intersect) are terms that come from set math, so any method wouldn't/shouldn't be named this way.

For the rare cases where something like this is needed, most people can probably use the following extension method:

IEnumerable<TSource> WhereNotIn<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second)
{
    var set = new HashSet<TSource>(second);

    return first.Where(e => !set.Contains(e));
}

eiriktsarpalis · 2024-11-24T21:05:39Z

Duplicate of #107338, #75066, and #102112.

Clockwork-Muse · 2024-11-25T05:00:08Z

@eiriktsarpalis - This isn't complaining about the existing behavior, so not a strict duplicate?

Mr0N · 2024-11-25T06:22:55Z

I’m not suggesting changing the logic of the existing method, but rather adding an overload where you can choose whether to remove duplicates. For example, it could be implemented like this:

public static IEnumerable<TSource> Except<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, bool removeDuplicate = true)

`

eiriktsarpalis · 2024-11-25T11:26:19Z

Adding boolean flags that fundamentally change the semantics of a function isn't considered good practice, and is definitely not something we've done before in LINQ. For better or for worse Intersect and Except reflect the semantics of SQL, and it would be more appropriate to consider separate method groups for that functionality. @Clockwork-Muse's use of WhereIn and WhereNotIn seems more appropriate in that regard, but I think these would need to be filed as a distinct proposal.

Mr0N · 2024-11-25T11:34:37Z

Adding boolean flags that fundamentally change the semantics of a function isn't considered good practice, and is definitely not something we've done before in LINQ. For better or for worse Intersect and Except reflect the semantics of SQL, and it would be more appropriate to consider separate method groups for that functionality. @Clockwork-Muse's use of WhereIn and WhereNotIn seems more appropriate in that regard, but I think these would need to be filed as a distinct proposal.

Well, yes, the SQL method is more understandable, but for C#, it's not immediately clear how to use this method. In SQL, you can use a LEFT JOIN to quickly find the difference between two tables, but in C#, it's quite difficult to achieve, especially in terms of performance.

It would be great if there were some method that could work between two collections and leverage all possible optimizations to speed up the process, similar to what SQL offers.

Mr0N · 2024-11-25T11:35:36Z

Without losing extra data

Clockwork-Muse · 2024-11-25T19:36:45Z

but in C#, it's quite difficult to achieve, especially in terms of performance

It would be great if there were some method that could work between two collections and leverage all possible optimizations to speed up the process, similar to what SQL offers.

This is mostly because RDBMSs put a lot more work into their dynamic optimizers than would be reasonable for C#. If you're dealing with a large enough dataset to actually affect program performance, stick it into an actual database (especially because chances are you're going to want to do more things with it).

In SQL, you can use a LEFT JOIN to quickly find the difference between two tables

... I really wish (the iSeries version of) DB2's EXCEPTION JOIN would be added to the standard....

Mr0N · 2024-11-26T19:22:07Z

It would be good to implement such an algorithm in C#, and then use it to separate one collection from another. It seems like a complex algorithm at first glance.

https://en.wikipedia.org/wiki/Sort-merge_join

chat gpt:

Merge Join (Merging Join)

The Merge Join is an algorithm used for efficiently executing a JOIN operation, especially when both tables are already sorted by the column(s) involved in the join. Its key advantage lies in its linear traversal of rows, making it highly performant for large, sorted datasets.

How Does Merge Join Work?

Input Requirements:

Both tables must be sorted on the join column(s). If not, the database management system (DBMS) performs a sort operation before initiating the Merge Join.

Data Comparison:

The algorithm starts with the first row of each table.
It compares the join keys:
- If the key in one table is smaller, the pointer for that table advances to the next row.
- If the keys match, a result row is produced, and both pointers advance.
- This continues until all rows are processed.

Example

Table A:

ID | Name -- | -- 1 | John 2 | Alice 3 | Bob

This behavior is crucial for accurate join results but may increase the size of the output significantly.

Advantages of Merge Join

Efficient for Sorted Data: Works linearly, O(n+m)O(n+m), when data is pre-sorted.
Scalable for Large Datasets: Ideal for large tables with indexes on join columns.
Low Memory Overhead: Requires minimal additional memory compared to Hash Join.

Limitations

Sorting Overhead: If the input data isn’t sorted, the pre-sorting step can be expensive, O(nlog⁡n)O(nlogn).
Equality-Only Conditions: Only supports joins based on equality (=). For conditions like >, <, or !=, other join algorithms are needed.

Mr0N added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Nov 24, 2024

dotnet-issue-labeler bot added the area-System.Linq label Nov 24, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 24, 2024

eiriktsarpalis closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2024

dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: The Except and ExceptBy methods #110121

[API Proposal]: The Except and ExceptBy methods #110121

Mr0N commented Nov 24, 2024 •

edited

Loading

dotnet-policy-service bot commented Nov 24, 2024

Clockwork-Muse commented Nov 24, 2024

eiriktsarpalis commented Nov 24, 2024

Clockwork-Muse commented Nov 25, 2024

Mr0N commented Nov 25, 2024 •

edited

Loading

eiriktsarpalis commented Nov 25, 2024

Mr0N commented Nov 25, 2024

Mr0N commented Nov 25, 2024

Clockwork-Muse commented Nov 25, 2024

Mr0N commented Nov 26, 2024

[API Proposal]: The Except and ExceptBy methods #110121

[API Proposal]: The Except and ExceptBy methods #110121

Comments

Mr0N commented Nov 24, 2024 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

dotnet-policy-service bot commented Nov 24, 2024

Clockwork-Muse commented Nov 24, 2024

eiriktsarpalis commented Nov 24, 2024

Clockwork-Muse commented Nov 25, 2024

Mr0N commented Nov 25, 2024 • edited Loading

eiriktsarpalis commented Nov 25, 2024

Mr0N commented Nov 25, 2024

Mr0N commented Nov 25, 2024

Clockwork-Muse commented Nov 25, 2024

Mr0N commented Nov 26, 2024

chat gpt:

Merge Join (Merging Join)

How Does Merge Join Work?

Input Requirements:

Data Comparison:

Example

Advantages of Merge Join

Limitations

Mr0N commented Nov 24, 2024 •

edited

Loading

Mr0N commented Nov 25, 2024 •

edited

Loading