Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CollectionsMarshal support for the HashSet<T> collection #82840

Closed
theodorzoulias opened this issue Mar 1, 2023 · 4 comments
Closed

Add CollectionsMarshal support for the HashSet<T> collection #82840

theodorzoulias opened this issue Mar 1, 2023 · 4 comments

Comments

@theodorzoulias
Copy link
Contributor

theodorzoulias commented Mar 1, 2023

Background and motivation

The Dictionary<TKey,TValue> collection was improved in .NET 6, by introducing direct ref access to its values. This improvement made the collection more suitable for performance-critical applications, by reducing the number of times a key has to be hashed when adding/modifying values in the collection. For example the GetOrAdd extension method can be implemented efficiently with a single hashing of the key. The same improvement could be made to the HashSet<T> collection as well. A GetOrAdd API has also been proposed for the HashSet<T>, but that proposal was closed with a mention about adding CollectionsMarshal support in the future. This proposal here is exactly about adding this support.

API Proposal

namespace System.Runtime.InteropServices
{
    public static partial class CollectionsMarshal
    {
        public static ref T? GetValueRefOrAddDefault<T> (HashSet<T> hashSet, T item, out bool exists);
        public static ref T GetValueRefOrNullRef<T> (HashSet<T> hashSet, T item);
    }    
}

API Usage

Below is a partial implementation of a concurrent HashSet<T> with bounded capacity. The proposed API GetValueRefOrAddDefault is used when space is available in the collection, and the proposed API GetValueRefOrNullRef is used when the collection is full:

public partial class ConcurrentBoundedHashSet<T>
{
    private readonly HashSet<T> _set = new();
    private readonly int _boundedCapacity;

    /// <summary>
    /// Searches the set for a given value and returns the equal value it finds,
    /// otherwise adds the given value to the set if space is available,
    /// otherwise returns false.
    /// </summary>
    public bool TryGetOrAdd(T equalValue, out T? actualValue)
    {
        lock (_set)
        {
            if (_set.Count < _boundedCapacity)
            {
                ref T? valueRef = ref CollectionsMarshal.GetValueRefOrAddDefault(_set, equalValue,
                    out bool exists);
                if (!exists) valueRef = equalValue;
                actualValue = valueRef; return true;
            }
            else
            {
                ref T valueRef = ref CollectionsMarshal.GetValueRefOrNullRef(_set, equalValue);
                if (!Unsafe.IsNullRef(ref valueRef))
                {
                    actualValue = valueRef; return true;
                }
                else
                {
                    actualValue = default; return false;
                }
            }
        }
    }
}

Alternative Designs

No response

Risks

None that I can think of.

Labels: area-System.Collections / api-suggestion.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 1, 2023
@ghost
Copy link

ghost commented Mar 1, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

The Dictionary<TKey,TValue> collection was improved in .NET 6, by introducing direct ref access to its values. This improvement made the collection more suitable for performance-critical applications, by reducing the number of times a key has to be hashed when adding/modifying values in the collection. For example the GetOrAdd extension method can be implemented efficiently with a single hashing of the key. The same improvement could be made to the HashSet<T> collection as well. A GetOrAdd API has also been proposed for the HashSet<T>, but that proposal was closed with a mention about adding CollectionsMarshal support in the future. This proposal here is exactly about adding this support.

API Proposal

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
        public static ref T? GetValueRefOrAddDefault<T> (HashSet<T> hashSet, T item, out bool exists);
        public static ref T GetValueRefOrNullRef<T> (HashSet<T> hashSet, T item);
    }    
}

API Usage

Below is a partial implementation of a concurrent HashSet<T> with bounded capacity. The proposed API GetValueRefOrAddDefault is used when space is available in the collection, and the proposed API GetValueRefOrNullRef is used when the collection is full:

public partial class ConcurrentBoundedHashSet<T>
{
    private readonly HashSet<T> _set = new();
    private readonly int _boundedCapacity;

    /// <summary>
    /// Searches the set for a given value and returns the equal value it finds,
    /// otherwise adds the given value to the set if space is available,
    /// otherwise returns false.
    /// </summary>
    public bool TryGetOrAdd(T equalValue, out T? actualValue)
    {
        lock (_set)
        {
            if (_set.Count < _boundedCapacity)
            {
                ref T? valueRef = ref CollectionsMarshal.GetValueRefOrAddDefault(_set, equalValue,
                    out bool exists);
                if (!exists) valueRef = equalValue;
                actualValue = valueRef; return true;
            }
            else
            {
                ref T valueRef = ref CollectionsMarshal.GetValueRefOrNullRef(_set, equalValue);
                if (!Unsafe.IsNullRef(ref valueRef))
                {
                    actualValue = valueRef; return true;
                }
                else
                {
                    actualValue = default; return false;
                }
            }
        }
    }
}

Alternative Designs

No response

Risks

None that I can think of.

Labels: area-System.Collections / api-suggestion.

Author: theodorzoulias
Assignees: -
Labels:

area-System.Collections, untriaged

Milestone: -

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Mar 1, 2023

The CollectionMarshal methods for Dictionary provide ref access to the value of a given entry, not its key. In the case of HashSet you'd be necessarily permitting external modification of the key itself which has a number of problems:

  1. What does AddDefault mean in GetValueRefOrAddDefault? If you're actually adding default(T) then that would not be equal to the item parameter and for reference types it would be null which is not a supported key value.
  2. Mutating the resultant ref would corrupt the hashtable unless the value you're setting it with is equal to the original value.

@stephentoub
Copy link
Member

stephentoub commented Mar 1, 2023

I agree with @eiriktsarpalis; the proposed GetValueRefOrAddDefault doesn't seem viable. I could see us adding GetValueRefOrNullRef if it were changed to return a ref readonly, but I don't know what scenarios that would really aid (presumably the same ones that wanted the overload that returns the actual value, and those are relatively rare, then here compounding that rarity by also wanting it by ref).

@theodorzoulias
Copy link
Contributor Author

@eiriktsarpalis, @stephentoub you are both right. This was a poorly thought proposal.

On top of the above, in the "API Usage" example the else block can be trivially replaced with return _set.TryGetValue(equalValue, out actualValue);.

I am closing this one.

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Mar 1, 2023
@teo-tsirpanis teo-tsirpanis closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2023
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Mar 9, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants