Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Overlaps/Within extension methods for ReadOnlySpan<T> #23580

Closed
nietras opened this issue Sep 17, 2017 · 16 comments
Closed

Add Overlaps/Within extension methods for ReadOnlySpan<T> #23580

nietras opened this issue Sep 17, 2017 · 16 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@nietras
Copy link
Contributor

nietras commented Sep 17, 2017

This proposal adds utility methods for checking spans for overlaps or if a span is within in another span. For previous discussions see https://github.com/dotnet/corefx/issues/17793, dotnet/corefxlab#827 and https://github.com/dotnet/corefx/issues/18750. And a closed PR dotnet/corefx#18731.

Proposed API

Add two sets of extension methods Overlaps and Within for ReadOnlySpan<T> in SpanExtensions:

public static class SpanExtensions
{
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);}
}
  • If second span does not match the alignment of first span an ArgumentException is thrown.
  • out int elementOffset is the relative offset or index of the start of the second span to the start of the first span.
  • elementOffset is only valid if the check in question returns true.

Rationale and Usage

For Overlaps the scenario can involve algorithms that transform elements from first to second span,
if the spans overlap the result might be wrong depending on the algorithm. To be able to detect this
the Overlaps method can be called.

For Within the scenario can involve buffer pool management, where spans are leased from a buffer and later needs to be returned to a given buffer. Calling Within allows checking for which
buffer the given span is within and thus where it must be returned to.

Expected Results

  • A: first entirely before second, no overlap
first:   [--------)
         xRef     xRef + xLength
second:           [------------)     
                  yRef         yRef + yLength
  • B: first starts before second and ends inside second
first:   [------------)
         xRef         xRef + xLength
second:              [------------)     
                     yRef         yRef + yLength
  • C: first is entirely within second
first:       [-----------------------)
             xRef                    xRef + xLength
second:    [--------------------------)     
           yRef                       yRef + yLength
  • D: first starts inside target second and ends after second end
first:            [------)
                  xRef   xRef + xLength
second:    [-------)
           yRef   .            yRef + yLength
  • E: first entirely after second, no overlap
first:            [------------)     
                  xRef         xRef + xLength
second:  [--------)
         yRef     yRef + yLength
  • F: first starts before second and
    ends after second, i.e. second is contained in first
first:   [-------------------------------)
         xRef                            xRef + xLength
second:    [--------------------------)     
           yRef                       yRef + yLength
  • G: first is same as second
first:   [--------------------------)
         xRef                       xRef + xLength
second:  [--------------------------)     
         yRef                       yRef + yLength
  • H: first is empty within second
first:    [)
          xRef
second:  [--------------------------)     
         yRef                       yRef + yLength
  • I: first is empty before second
first:  [)
        xRef
second:  [--------------------------)     
         yRef                       yRef + yLength
  • J: first is empty after second
first:                              [)
                                    xRef
second:  [--------------------------)     
         yRef                       yRef + yLength

In table below Overlaps => first.Overlaps(second) or Within => first.Within(second). x => first and y => second.

Overlaps Within Extra checks/comment
A/E false false No way to know which case from these methods alone.
B true false elementOffset > 0
C true true
D true false elementOffset < 0
F true false elementOffset >= 0 && xLength >= (elementOffset + yLength)
G true true elementOffset == 0 && xLength == yLength
H true true xLength == 0
I/J false false No way to know which case from these methods alone.

Examples

public static void RepeatEvenIndices(ReadOnlySpan<byte> src, Span<byte> dst)
{
    if (src.Overlaps(dst, out var elementOffset) && elementOffset >= -1)
    {
        throw new ArgumentException();
    }

    for (int i = 0; i < src.Length / 2; i++)
    {
        dst[i + 0] = src[i * 2];
        dst[i + 1] = src[i * 2];
    }
}

Open Questions

Could this API be expressed differently, perhaps with an all encompassing Overlap method returning an enum? Performance downsides?

Naming of the methods should be reviewed.

Updates

UPDATE 1: Rename Contains to Within with reversed first/second relationship, due to naming confusion. Rename elementIndex to elementOffset latter considered better. elementOffset only valid if check returns true. Revise table and add empty cases.

cc: @ektrah @mgravell @ahsonkhan @shiftylogic @jkotas

@ektrah
Copy link
Member

ektrah commented Sep 17, 2017

A few thoughts:

  • elementIndex cannot be an int because the difference between xRef and yRef may be greater than int.MaxValue.
  • If the spans do not overlap (A and E) then elementIndex is not meaningful, because the GC may change the relative order of the two objects at any time. (This might actually fix the previous issue.)
  • The name Contains is very confusing.
  • The name elementIndex should probably be something like startDifference or startOffset.
  • An empty span does not overlap any other span.

Another idea that might be worth considering:

public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference);
public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference, out int endDifference);

where startDifference = yRef - xRef and endDifference = (yRef + yLength) - (xRef + xLength) = startDifference + (yLength - xLength).

Overlaps Extra checks
A false
B true startDifference > 0 && endDifference > 0
C true startDifference < 0 && endDifference > 0
D true startDifference < 0 && endDifference < 0
E false
F true startDifference > 0 && endDifference < 0
G true startDifference == 0 && endDifference == 0

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 19, 2017

@ektrah, what is the value of having the following:

public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference);

Also, there is no way to distinguish between scenario A & E based on the output of Overlaps (from your design) but I don't think we need to.
@nietras, if Overlaps returns false, does it ever matter if it is because of scenario A instead of E and vice versa?

Regarding:

In case of the second not matching the alignment of first span an ArgumentException is thrown.

Can you please provide an example where we would throw an exception?

Similar to @ektrah, I don't think Contains would work as a name. I think of contains the same way as Array.contains or List.Contains:
https://msdn.microsoft.com/en-us/library/bb384015.aspx
https://msdn.microsoft.com/en-us/library/bhkz42b3(v=vs.110).aspx

I am not sure about ContainsSlice either. I think of ContainsSlice as follows:

Span<byte> src = new byte [1, 2, 3, 4, 5];
Span<byte> items = new byte [2, 3, 4];
src.ContainsSlice(items); // returns true;

Maybe IsContainedWithin?

src.IsContainedWithin(dest)

@ektrah's proposal avoids the naming issue altogether.

@ektrah
Copy link
Member

ektrah commented Sep 19, 2017

what is the value of having the following:

You don't have to type , out _ if you don't need the second out parameter.

I don't have a strong opinion if this overload should be there or not. Since endDifference can be easily computed from startDifference and the span lengths, the overload with out endDifference may actually not be strictly necessary.

Also, there is no way to distinguish between scenario A & E based on the output of Overlaps (from your design) but I don't think we need to.

If the spans overlap, then it's reasonable to assume that the spans refer to the same object in managed memory. (Unsafe code can defeat that assumption.)

If the spans do not overlap (A & E), then it's quite likely that they do not refer to the same object in managed memory. Since there is no defined distance or relative order between two arbitrary objects in managed memory, there is no meaningful elementIndex that Overlaps could return from which the caller could infer if it's A or E.

Can you please provide an example where we would throw an exception?

var bytes = new byte[16];

var first = new Span<byte>(bytes, 0, 8).NonPortableCast<byte, int>();
var second = new Span<byte>(bytes, 7, 8).NonPortableCast<byte, int>();

var overlaps = first.Overlaps(second, out int elementIndex);

What should the value of elementIndex be?

Ignoring this corner case may actually not be that bad. An exception would be fine as well though (IMO).

@ahsonkhan
Copy link
Member

ahsonkhan commented Oct 28, 2017

@nietras, any thoughts or updates for this API proposal (fix the TODOs)?

I would consider @ektrah's idea which removes the need of a Contains method:

public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference, out int endDifference);

You mentioned previously (from https://github.com/dotnet/corefx/issues/18750#issuecomment-296601237):

but we probably need dedicated methods for each since there are different optimizations paths for each

Can you explain why they would have different optimization paths?

@nietras
Copy link
Contributor Author

nietras commented Oct 28, 2017

@ektrah thanks for the feedback and answering questions :)

elementIndex cannot be an int because the difference between xRef and yRef may be greater than int.MaxValue.

They cannot be greater than int.MaxValue since Span<T> is limited to int Length. Note also that this is element offset.

If the spans do not overlap (A and E) then elementIndex is not meaningful, because the GC may change the relative order of the two objects at any time. (This might actually fix the previous issue.)

If a user knows the spans come from the same memory segment this could have utility, but it is an edge case, and I agree. It makes the API/implementation more straightforward too by removing this case.

The name elementIndex should probably be something like startDifference or startOffset.

I think element is crucial in the name here, since the offset is in number of elements, not in bytes. startElementOffset I can live with, or if we favor brevity then just elementOffset, offset at least seems better than index. Good suggestions.

An empty span does not overlap any other span.

There are edge cases here, an empty span has Length == 0, but it still might have a position i.e. a ref so it still could be interpreted to be inside another span. Perhaps the analogy to Rectangle in System.Drawing could be used as a guiding path here, see below.

The name Contains is very confusing.

It is not ideal I agree, but there is prior work especially from System.Drawing (see e https://msdn.microsoft.com/en-us/library/system.drawing.rectangle(v=vs.110).aspx), where Contains is used for example to find rectangles inside other rectangles. Here IntersectsWith is used instead of Overlaps. Although different rectangle types also use simply Intersects. Arguably, Overlaps is better in my mind than Intersects for spans.

Below examples of Rectangle methods:

Contains(Point)	Determines if the specified point is contained within this Rectangle structure.
Contains(Rectangle)	Determines if the rectangular region represented by rect is entirely contained within this Rectangle structure.
IntersectsWith(Rectangle)	Determines if this rectangle intersects with rect.
Intersect(Rectangle)	
Replaces this Rectangle with the intersection of itself and the specified Rectangle.

This also suggests that an Intersect extension method may have interest if one was interested in only the intersection of spans.

There are other examples and one I would then pitch instead, namely Within, this reverses the relationship but I think it reads well:

if (child.Within(parent, out var childElementOffset))
{
    // Return to pool
}
if (child.Within(parent))
{
    // Return to pool
}

consider @ektrah's idea which removes the need of a Contains method/why they would have different optimization paths?

@ahsonkhan the Contains/IsSliceOf/Within method started off as a proposal from @mgravell and I believe the scenario is in a hot spot in memory management for kestrel. Thus, Within needs to be fast, and the checks are a bit different for the two methods, which maybe have different performance characteristics. Maybe, but I don't think that should be the main reason for having Within, it should be usability. In my mind it is better to have dedicated methods for each since code will read much better.

So to sum I would change the proposal to:

public static class SpanExtensions
{
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);}
}

Let me know what you think. And I will try to update the proposal as soon as feedback comes in, sorry for the long reply here. If we still cannot agree on Within perhaps we should consider splitting this proposal again, despite my earlier reservations about this.

Possible other extensions inspired by Rectangle and set theory, just so we consider the family of these set like methods...

public static class SpanExtensions
{
    public static Span<T> Intersect<T>(this Span<T> first, Span<T> second);
    public static ReadOnlySpan<T> Intersect<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
}

There is also the Difference between sets. I.e. removing any elements overlapping between spans from say the first span. All of which is probably a bit theoretical, since I have no concrete use case for these yet.

@ahsonkhan
Copy link
Member

ahsonkhan commented Oct 31, 2017

So to sum I would change the proposal to

@nietras, I have updated the original post with the method rename (Contains -> Within) and parameter rename (elementOffset -> elementIndex).

If you get a chance, please update (or remove) the TODOs and remove the WIP. Marking API ready for review.

I will solicit additional feedback (if any) during API review.

@nietras nietras changed the title [WIP] Proposal: Add Overlaps/Contains extension methods for ReadOnlySpan<T> Proposal: Add Overlaps/Contains extension methods for ReadOnlySpan<T> Oct 31, 2017
@nietras
Copy link
Contributor Author

nietras commented Oct 31, 2017

@ahsonkhan I have revised the proposal and added new cases and checks to remove the last edge cases. Didn't have time to add more examples, hope this is good enough for review.

@terrajobst
Copy link
Member

terrajobst commented Oct 31, 2017

Video

We seem to believe Overlaps seems like a good addition (having both overloads), but it looks like Within is redundant/not any faster.

public static class SpanExtensions
{
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second,  out int elementOffset);
}

@benaadams
Copy link
Member

out int byteOffset? then copes with misaligned

CompareTo uses

Value Meaning
Less than zero This instance precedes obj in the sort order.
Zero This instance occurs in the same position in the sort order as obj.
Greater than zero This instance follows obj in the sort order.

@nietras nietras changed the title Proposal: Add Overlaps/Contains extension methods for ReadOnlySpan<T> Proposal: Add Overlaps/Within extension methods for ReadOnlySpan<T> Oct 31, 2017
@ektrah
Copy link
Member

ektrah commented Oct 31, 2017

Should there also be overloads with Span<T> as the first parameter?

public static class SpanExtensions
{
    public static bool Overlaps<T>(this Span<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this Span<T> first, ReadOnlySpan<T> second, out int elementOffset);
}

Otherwise users have to cast every Span<T> to ReadOnlySpan<T> first when using extension method syntax:

Span<byte> first = ...;
Span<byte> second = ...;
bool result = ((ReadOnlySpan<byte>)first).Overlaps(second);

@nietras
Copy link
Contributor Author

nietras commented Oct 31, 2017

@ektrah Span implicitly converts to ReadOnlySpan

 public static implicit operator ReadOnlySpan<T>(Span<T> span)

UPDATE: Sorry, you are right that for as an extension method this has issues...

it looks like Within is redundant/not any faster.

@terrajobst fair enough. Guess we´ll have to prove that wrong if not 😉 Shall I update the proposal to show only Overlaps has been approved? I also want to add that alignment exception is only thrown if there is overlap and misalignment.

out int byteOffset? then copes with misaligned

@benaadams cast to Span<byte> then check. I believe the current API shape fits most users expectations.

@ektrah
Copy link
Member

ektrah commented Oct 31, 2017

@nietras Yes, except if you use extension method syntax.

@nietras
Copy link
Contributor Author

nietras commented Oct 31, 2017 via email

@ahsonkhan
Copy link
Member

@mgravell, FYI

@karelz karelz changed the title Proposal: Add Overlaps/Within extension methods for ReadOnlySpan<T> Add Overlaps/Within extension methods for ReadOnlySpan<T> Nov 21, 2017
@karelz
Copy link
Member

karelz commented Nov 21, 2017

FYI: The API review discussion was recorded - see https://youtu.be/bHwCPVNQLwo?t=2568 (33 min duration)

@ahsonkhan
Copy link
Member

it looks like Within is redundant/not any faster.

Guess we´ll have to prove that wrong if not 😉

@nietras, feel free to revisit this if it turns out we need Within.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

No branches or pull requests

7 participants