Add Overlaps/Within extension methods for ReadOnlySpan<T> #23580

nietras · 2017-09-17T10:07:34Z

This proposal adds utility methods for checking spans for overlaps or if a span is within in another span. For previous discussions see https://github.com/dotnet/corefx/issues/17793, dotnet/corefxlab#827 and https://github.com/dotnet/corefx/issues/18750. And a closed PR dotnet/corefx#18731.

Proposed API

Add two sets of extension methods Overlaps and Within for ReadOnlySpan<T> in SpanExtensions:

public static class SpanExtensions
{
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);}
}

If second span does not match the alignment of first span an ArgumentException is thrown.
out int elementOffset is the relative offset or index of the start of the second span to the start of the first span.
elementOffset is only valid if the check in question returns true.

Rationale and Usage

For Overlaps the scenario can involve algorithms that transform elements from first to second span,
if the spans overlap the result might be wrong depending on the algorithm. To be able to detect this
the Overlaps method can be called.

For Within the scenario can involve buffer pool management, where spans are leased from a buffer and later needs to be returned to a given buffer. Calling Within allows checking for which
buffer the given span is within and thus where it must be returned to.

Expected Results

A: first entirely before second, no overlap

first:   [--------)
         xRef     xRef + xLength
second:           [------------)     
                  yRef         yRef + yLength

B: first starts before second and ends inside second

first:   [------------)
         xRef         xRef + xLength
second:              [------------)     
                     yRef         yRef + yLength

C: first is entirely within second

first:       [-----------------------)
             xRef                    xRef + xLength
second:    [--------------------------)     
           yRef                       yRef + yLength

D: first starts inside target second and ends after second end

first:            [------)
                  xRef   xRef + xLength
second:    [-------)
           yRef   .            yRef + yLength

E: first entirely after second, no overlap

first:            [------------)     
                  xRef         xRef + xLength
second:  [--------)
         yRef     yRef + yLength

F: first starts before second and
ends after second, i.e. second is contained in first

first:   [-------------------------------)
         xRef                            xRef + xLength
second:    [--------------------------)     
           yRef                       yRef + yLength

G: first is same as second

first:   [--------------------------)
         xRef                       xRef + xLength
second:  [--------------------------)     
         yRef                       yRef + yLength

H: first is empty within second

first:    [)
          xRef
second:  [--------------------------)     
         yRef                       yRef + yLength

I: first is empty before second

first:  [)
        xRef
second:  [--------------------------)     
         yRef                       yRef + yLength

J: first is empty after second

first:                              [)
                                    xRef
second:  [--------------------------)     
         yRef                       yRef + yLength

In table below Overlaps => first.Overlaps(second) or Within => first.Within(second). x => first and y => second.

	`Overlaps`	`Within`	Extra checks/comment
A/E	`false`	`false`	No way to know which case from these methods alone.
B	`true`	`false`	`elementOffset > 0`
C	`true`	`true`
D	`true`	`false`	`elementOffset < 0`
F	`true`	`false`	`elementOffset >= 0 && xLength >= (elementOffset + yLength)`
G	`true`	`true`	`elementOffset == 0 && xLength == yLength`
H	`true`	`true`	`xLength == 0`
I/J	`false`	`false`	No way to know which case from these methods alone.

Examples

public static void RepeatEvenIndices(ReadOnlySpan<byte> src, Span<byte> dst)
{
    if (src.Overlaps(dst, out var elementOffset) && elementOffset >= -1)
    {
        throw new ArgumentException();
    }

    for (int i = 0; i < src.Length / 2; i++)
    {
        dst[i + 0] = src[i * 2];
        dst[i + 1] = src[i * 2];
    }
}

Open Questions

Could this API be expressed differently, perhaps with an all encompassing Overlap method returning an enum? Performance downsides?

Naming of the methods should be reviewed.

Updates

UPDATE 1: Rename Contains to Within with reversed first/second relationship, due to naming confusion. Rename elementIndex to elementOffset latter considered better. elementOffset only valid if check returns true. Revise table and add empty cases.

cc: @ektrah @mgravell @ahsonkhan @shiftylogic @jkotas

The text was updated successfully, but these errors were encountered:

ektrah · 2017-09-17T13:33:48Z

A few thoughts:

elementIndex cannot be an int because the difference between xRef and yRef may be greater than int.MaxValue.
If the spans do not overlap (A and E) then elementIndex is not meaningful, because the GC may change the relative order of the two objects at any time. (This might actually fix the previous issue.)
The name Contains is very confusing.
The name elementIndex should probably be something like startDifference or startOffset.
An empty span does not overlap any other span.

Another idea that might be worth considering:

public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference);
public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference, out int endDifference);

where startDifference = yRef - xRef and endDifference = (yRef + yLength) - (xRef + xLength) = startDifference + (yLength - xLength).

	`Overlaps`	Extra checks
A	`false`
B	`true`	`startDifference > 0 && endDifference > 0`
C	`true`	`startDifference < 0 && endDifference > 0`
D	`true`	`startDifference < 0 && endDifference < 0`
E	`false`
F	`true`	`startDifference > 0 && endDifference < 0`
G	`true`	`startDifference == 0 && endDifference == 0`

ahsonkhan · 2017-09-19T02:11:50Z

@ektrah, what is the value of having the following:

public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference);

Also, there is no way to distinguish between scenario A & E based on the output of Overlaps (from your design) but I don't think we need to.
@nietras, if Overlaps returns false, does it ever matter if it is because of scenario A instead of E and vice versa?

Regarding:

In case of the second not matching the alignment of first span an ArgumentException is thrown.

Can you please provide an example where we would throw an exception?

Similar to @ektrah, I don't think Contains would work as a name. I think of contains the same way as Array.contains or List.Contains:
https://msdn.microsoft.com/en-us/library/bb384015.aspx
https://msdn.microsoft.com/en-us/library/bhkz42b3(v=vs.110).aspx

I am not sure about ContainsSlice either. I think of ContainsSlice as follows:

Span<byte> src = new byte [1, 2, 3, 4, 5];
Span<byte> items = new byte [2, 3, 4];
src.ContainsSlice(items); // returns true;

Maybe IsContainedWithin?

src.IsContainedWithin(dest)

@ektrah's proposal avoids the naming issue altogether.

ektrah · 2017-09-19T14:31:32Z

what is the value of having the following:

You don't have to type , out _ if you don't need the second out parameter.

I don't have a strong opinion if this overload should be there or not. Since endDifference can be easily computed from startDifference and the span lengths, the overload with out endDifference may actually not be strictly necessary.

Also, there is no way to distinguish between scenario A & E based on the output of Overlaps (from your design) but I don't think we need to.

If the spans overlap, then it's reasonable to assume that the spans refer to the same object in managed memory. (Unsafe code can defeat that assumption.)

If the spans do not overlap (A & E), then it's quite likely that they do not refer to the same object in managed memory. Since there is no defined distance or relative order between two arbitrary objects in managed memory, there is no meaningful elementIndex that Overlaps could return from which the caller could infer if it's A or E.

Can you please provide an example where we would throw an exception?

var bytes = new byte[16];

var first = new Span<byte>(bytes, 0, 8).NonPortableCast<byte, int>();
var second = new Span<byte>(bytes, 7, 8).NonPortableCast<byte, int>();

var overlaps = first.Overlaps(second, out int elementIndex);

What should the value of elementIndex be?

Ignoring this corner case may actually not be that bad. An exception would be fine as well though (IMO).

ahsonkhan · 2017-10-28T01:46:12Z

@nietras, any thoughts or updates for this API proposal (fix the TODOs)?

I would consider @ektrah's idea which removes the need of a Contains method:

public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
    out int startDifference, out int endDifference);

You mentioned previously (from https://github.com/dotnet/corefx/issues/18750#issuecomment-296601237):

but we probably need dedicated methods for each since there are different optimizations paths for each

Can you explain why they would have different optimization paths?

nietras · 2017-10-28T11:43:33Z

@ektrah thanks for the feedback and answering questions :)

elementIndex cannot be an int because the difference between xRef and yRef may be greater than int.MaxValue.

They cannot be greater than int.MaxValue since Span<T> is limited to int Length. Note also that this is element offset.

If the spans do not overlap (A and E) then elementIndex is not meaningful, because the GC may change the relative order of the two objects at any time. (This might actually fix the previous issue.)

If a user knows the spans come from the same memory segment this could have utility, but it is an edge case, and I agree. It makes the API/implementation more straightforward too by removing this case.

The name elementIndex should probably be something like startDifference or startOffset.

I think element is crucial in the name here, since the offset is in number of elements, not in bytes. startElementOffset I can live with, or if we favor brevity then just elementOffset, offset at least seems better than index. Good suggestions.

An empty span does not overlap any other span.

There are edge cases here, an empty span has Length == 0, but it still might have a position i.e. a ref so it still could be interpreted to be inside another span. Perhaps the analogy to Rectangle in System.Drawing could be used as a guiding path here, see below.

The name Contains is very confusing.

It is not ideal I agree, but there is prior work especially from System.Drawing (see e https://msdn.microsoft.com/en-us/library/system.drawing.rectangle(v=vs.110).aspx), where Contains is used for example to find rectangles inside other rectangles. Here IntersectsWith is used instead of Overlaps. Although different rectangle types also use simply Intersects. Arguably, Overlaps is better in my mind than Intersects for spans.

Below examples of Rectangle methods:

Contains(Point)	Determines if the specified point is contained within this Rectangle structure.
Contains(Rectangle)	Determines if the rectangular region represented by rect is entirely contained within this Rectangle structure.
IntersectsWith(Rectangle)	Determines if this rectangle intersects with rect.
Intersect(Rectangle)	
Replaces this Rectangle with the intersection of itself and the specified Rectangle.

This also suggests that an Intersect extension method may have interest if one was interested in only the intersection of spans.

There are other examples and one I would then pitch instead, namely Within, this reverses the relationship but I think it reads well:

if (child.Within(parent, out var childElementOffset))
{
    // Return to pool
}
if (child.Within(parent))
{
    // Return to pool
}

consider @ektrah's idea which removes the need of a Contains method/why they would have different optimization paths?

@ahsonkhan the Contains/IsSliceOf/Within method started off as a proposal from @mgravell and I believe the scenario is in a hot spot in memory management for kestrel. Thus, Within needs to be fast, and the checks are a bit different for the two methods, which maybe have different performance characteristics. Maybe, but I don't think that should be the main reason for having Within, it should be usability. In my mind it is better to have dedicated methods for each since code will read much better.

So to sum I would change the proposal to:

public static class SpanExtensions
{
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Within<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second, 
        out int elementOffset);}
}

Let me know what you think. And I will try to update the proposal as soon as feedback comes in, sorry for the long reply here. If we still cannot agree on Within perhaps we should consider splitting this proposal again, despite my earlier reservations about this.

Possible other extensions inspired by Rectangle and set theory, just so we consider the family of these set like methods...

public static class SpanExtensions
{
    public static Span<T> Intersect<T>(this Span<T> first, Span<T> second);
    public static ReadOnlySpan<T> Intersect<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
}

There is also the Difference between sets. I.e. removing any elements overlapping between spans from say the first span. All of which is probably a bit theoretical, since I have no concrete use case for these yet.

ahsonkhan · 2017-10-31T01:36:42Z

So to sum I would change the proposal to

@nietras, I have updated the original post with the method rename (Contains -> Within) and parameter rename (elementOffset -> elementIndex).

If you get a chance, please update (or remove) the TODOs and remove the WIP. Marking API ready for review.

I will solicit additional feedback (if any) during API review.

nietras · 2017-10-31T10:53:55Z

@ahsonkhan I have revised the proposal and added new cases and checks to remove the last edge cases. Didn't have time to add more examples, hope this is good enough for review.

terrajobst · 2017-10-31T18:21:01Z

Video

We seem to believe Overlaps seems like a good addition (having both overloads), but it looks like Within is redundant/not any faster.

public static class SpanExtensions
{
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second,  out int elementOffset);
}

benaadams · 2017-10-31T18:21:05Z

out int byteOffset? then copes with misaligned

CompareTo uses

Value	Meaning
Less than zero	This instance precedes obj in the sort order.
Zero	This instance occurs in the same position in the sort order as obj.
Greater than zero	This instance follows obj in the sort order.

ektrah · 2017-10-31T18:31:00Z

Should there also be overloads with Span<T> as the first parameter?

public static class SpanExtensions
{
    public static bool Overlaps<T>(this Span<T> first, ReadOnlySpan<T> second);
    public static bool Overlaps<T>(this Span<T> first, ReadOnlySpan<T> second, out int elementOffset);
}

Otherwise users have to cast every Span<T> to ReadOnlySpan<T> first when using extension method syntax:

Span<byte> first = ...;
Span<byte> second = ...;
bool result = ((ReadOnlySpan<byte>)first).Overlaps(second);

nietras · 2017-10-31T19:06:48Z

@ektrah Span implicitly converts to ReadOnlySpan

 public static implicit operator ReadOnlySpan<T>(Span<T> span)

UPDATE: Sorry, you are right that for as an extension method this has issues...

it looks like Within is redundant/not any faster.

@terrajobst fair enough. Guess we´ll have to prove that wrong if not 😉 Shall I update the proposal to show only Overlaps has been approved? I also want to add that alignment exception is only thrown if there is overlap and misalignment.

out int byteOffset? then copes with misaligned

@benaadams cast to Span<byte> then check. I believe the current API shape fits most users expectations.

ektrah · 2017-10-31T19:08:16Z

@nietras Yes, except if you use extension method syntax.

nietras · 2017-10-31T19:12:45Z

@ektrah yeah I updated my comment two seconds after realizing that. My mistake. I think it would be good to add these overloads too, otherwise they become too cumbersome.

…

On Oct 31, 2017 8:08 PM, "ektrah" ***@***.***> wrote: @nietras <https://github.com/nietras> Yes, except if you use extension method syntax. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/dotnet/corefx/issues/24103#issuecomment-340875103>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKTG73GwXCkJhZ6__5Kz7Z1__HRnrTB5ks5sx3BEgaJpZM4PaGZ9> .

ahsonkhan · 2017-10-31T19:55:13Z

@mgravell, FYI

karelz · 2017-11-21T17:40:17Z

FYI: The API review discussion was recorded - see https://youtu.be/bHwCPVNQLwo?t=2568 (33 min duration)

ahsonkhan · 2017-12-04T22:10:13Z

it looks like Within is redundant/not any faster.

Guess we´ll have to prove that wrong if not 😉

@nietras, feel free to revisit this if it turns out we need Within.

nietras changed the title ~~[WIP] Proposal: Add Overlaps/Contains extension methods for ReadOnlySpan<T>~~ Proposal: Add Overlaps/Contains extension methods for ReadOnlySpan<T> Oct 31, 2017

nietras changed the title ~~Proposal: Add Overlaps/Contains extension methods for ReadOnlySpan<T>~~ Proposal: Add Overlaps/Within extension methods for ReadOnlySpan<T> Oct 31, 2017

ahsonkhan assigned ektrah Nov 15, 2017

karelz changed the title ~~Proposal: Add Overlaps/Within extension methods for ReadOnlySpan<T>~~ Add Overlaps/Within extension methods for ReadOnlySpan<T> Nov 21, 2017

ahsonkhan closed this as completed in dotnet/corefx#24980 Dec 4, 2017

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 2.1.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Overlaps/Within extension methods for ReadOnlySpan<T> #23580

Add Overlaps/Within extension methods for ReadOnlySpan<T> #23580

nietras commented Sep 17, 2017

ektrah commented Sep 17, 2017 •

edited

Loading

ahsonkhan commented Sep 19, 2017 •

edited

Loading

ektrah commented Sep 19, 2017

ahsonkhan commented Oct 28, 2017 •

edited

Loading

nietras commented Oct 28, 2017 •

edited

Loading

ahsonkhan commented Oct 31, 2017 •

edited

Loading

nietras commented Oct 31, 2017

terrajobst commented Oct 31, 2017 •

edited

Loading

benaadams commented Oct 31, 2017

ektrah commented Oct 31, 2017 •

edited

Loading

nietras commented Oct 31, 2017 •

edited

Loading

ektrah commented Oct 31, 2017

nietras commented Oct 31, 2017 via email •

edited

Loading

ahsonkhan commented Oct 31, 2017

karelz commented Nov 21, 2017

ahsonkhan commented Dec 4, 2017

Add Overlaps/Within extension methods for ReadOnlySpan<T> #23580

Add Overlaps/Within extension methods for ReadOnlySpan<T> #23580

Comments

nietras commented Sep 17, 2017

Proposed API

Rationale and Usage

Expected Results

Examples

Open Questions

Updates

ektrah commented Sep 17, 2017 • edited Loading

ahsonkhan commented Sep 19, 2017 • edited Loading

ektrah commented Sep 19, 2017

ahsonkhan commented Oct 28, 2017 • edited Loading

nietras commented Oct 28, 2017 • edited Loading

ahsonkhan commented Oct 31, 2017 • edited Loading

nietras commented Oct 31, 2017

terrajobst commented Oct 31, 2017 • edited Loading

benaadams commented Oct 31, 2017

ektrah commented Oct 31, 2017 • edited Loading

nietras commented Oct 31, 2017 • edited Loading

ektrah commented Oct 31, 2017

nietras commented Oct 31, 2017 via email • edited Loading

ahsonkhan commented Oct 31, 2017

karelz commented Nov 21, 2017

ahsonkhan commented Dec 4, 2017

ektrah commented Sep 17, 2017 •

edited

Loading

ahsonkhan commented Sep 19, 2017 •

edited

Loading

ahsonkhan commented Oct 28, 2017 •

edited

Loading

nietras commented Oct 28, 2017 •

edited

Loading

ahsonkhan commented Oct 31, 2017 •

edited

Loading

terrajobst commented Oct 31, 2017 •

edited

Loading

ektrah commented Oct 31, 2017 •

edited

Loading

nietras commented Oct 31, 2017 •

edited

Loading

nietras commented Oct 31, 2017 via email •

edited

Loading