Skip to content
This repository was archived by the owner on Aug 2, 2023. It is now read-only.

Simplified OwnedBuffer Pinning Support#1527

Merged
KrzysztofCwalina merged 6 commits into
dotnet:masterfrom
KrzysztofCwalina:PreparingToChangeOBToInterface
May 6, 2017
Merged

Simplified OwnedBuffer Pinning Support#1527
KrzysztofCwalina merged 6 commits into
dotnet:masterfrom
KrzysztofCwalina:PreparingToChangeOBToInterface

Conversation

@KrzysztofCwalina
Copy link
Copy Markdown
Member

@KrzysztofCwalina KrzysztofCwalina commented May 4, 2017

  • Removed TryGetPointer. Pin should now be used to get to buffer pointers.
  • Pin got faster. The method used to be implemented in the base class and just caledl through TryGetPointer. Now OwnedBufer subtypes have specialized logic for pinning that is often simpler/faster than the base class implementation. Compare MemoryPoolBlock.Pin after this PR to OwnedBuffer.Pin before.
  • Getting Span from OwnedBuffer is now done through AsSpan method. This allows to avoid slicing. See Buffer.get_Span.

The perf (elapsed time and instructions retired) of E2EPipelineTests benchmark improved very slightly 0.5%, but consistently.

cc: @davidfowl, @mjp41, @ahsonkhan, @shiftylogic

@KrzysztofCwalina
Copy link
Copy Markdown
Member Author

Before

 Test Name                                                                                       | Metric                                        | Iterations |    AVERAGE |    STDEV.S |        MIN |        MAX
:----------------------------------------------------------------------------------------------- |:--------------------------------------------- |:----------:| ----------:| ----------:| ----------:| ----------:
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Duration                                      |     11     |    986.483 |     24.952 |    958.166 |   1039.723
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Instructions Retired                          |     11     | 7.220E+009 | 4.629E+007 | 7.131E+009 | 7.274E+009
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Branch Mispredictions                         |     11     | 9.290E+005 |  70012.088 | 7.987E+005 | 1.049E+006
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Cache Misses                                  |     11     | 6.639E+006 | 1.358E+006 | 4.387E+006 | 9.671E+006
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Allocation Size on Benchmark Execution Thread |     11     | 5.043E+007 |   3695.813 | 5.043E+007 | 5.044E+007
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Duration                                      |     11     |    972.915 |      8.116 |    962.184 |    986.075
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Instructions Retired                          |     11     | 7.223E+009 | 4.696E+007 | 7.121E+009 | 7.276E+009
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Branch Mispredictions                         |     11     | 9.447E+005 |  44315.340 | 8.929E+005 | 1.028E+006
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Cache Misses                                  |     11     | 6.436E+006 | 1.033E+006 | 4.317E+006 | 7.487E+006
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Allocation Size on Benchmark Execution Thread |     11     | 5.044E+007 |   4608.166 | 5.043E+007 | 5.044E+007

After

 Test Name                                                                                       | Metric                                        | Iterations |    AVERAGE |    STDEV.S |        MIN |        MAX
:----------------------------------------------------------------------------------------------- |:--------------------------------------------- |:----------:| ----------:| ----------:| ----------:| ----------:
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Duration                                      |     11     |    982.122 |     20.869 |    954.812 |   1029.571
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Instructions Retired                          |     11     | 7.214E+009 | 4.710E+007 | 7.126E+009 | 7.268E+009
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Branch Mispredictions                         |     11     | 9.000E+005 |  83564.241 | 7.823E+005 | 1.085E+006
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Cache Misses                                  |     11     | 6.725E+006 | 1.397E+006 | 4.202E+006 | 9.572E+006
 E2EPipelineTests.TechEmpowerHelloWorldNoIO(numberOfRequests: 10000, concurrentConnections: 256) | Allocation Size on Benchmark Execution Thread |     11     | 5.043E+007 |   2853.642 | 5.043E+007 | 5.044E+007
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Duration                                      |     11     |    970.666 |      7.329 |    953.240 |    982.167
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Instructions Retired                          |     11     | 7.217E+009 | 4.575E+007 | 7.115E+009 | 7.269E+009
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Branch Mispredictions                         |     11     | 9.168E+005 |  36673.187 | 8.520E+005 | 9.748E+005
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Cache Misses                                  |     11     | 6.567E+006 | 1.066E+006 | 4.063E+006 | 7.676E+006
 E2EPipelineTests.TechEmpowerJsonNoIO(numberOfRequests: 10000, concurrentConnections: 256)       | Allocation Size on Benchmark Execution Thread |     11     | 5.043E+007 |   3951.940 | 5.043E+007 | 5.044E+007

@KrzysztofCwalina KrzysztofCwalina force-pushed the PreparingToChangeOBToInterface branch from 6d22339 to fda80d6 Compare May 4, 2017 16:34
@KrzysztofCwalina KrzysztofCwalina force-pushed the PreparingToChangeOBToInterface branch from fda80d6 to 99216a6 Compare May 4, 2017 17:01
Copy link
Copy Markdown
Member

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see less and less value in OwnedBuffer the more changes we make to it. What are the semantics of Retain/Release? If I call Retain twice and then Release what behavior should I expect? If it's basically the reference counting behavior then lets call it what it is instead of being abstract. Trying to make it too abstract will break systems that rely on reference counting for correctness (aka pipelines)

@KrzysztofCwalina
Copy link
Copy Markdown
Member Author

KrzysztofCwalina commented May 4, 2017

The contract is: if Retain is called more than Release, Buffer.Span will succeed.

This requires reference counting for unmanaged buffers and pooled arrays. It does not require reference counting for array based buffers.

Also, this change does not seem to have anything to do with your concern. It literally just optimizes the Buffer.Pin and Buffer.Span code paths, plus fixes a bunch of bugs. Can we resolve the concern offline (i.e. not as part of this PR)?


#endregion

protected static unsafe void* Add(void* pointer, int offset)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this Add method belong here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in many subtypes. I would like to move it somewhere, but I am not sure where.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using Unsafe.Add? We use it in Primitive Formatters a lot.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe.Add takes ref T. I need byte*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe.AsPointer(Unsafe.Add(ref Unsafe.AsRef(myPointer), offset))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be exactly that since I'm going from memory. But then you don't have to do the sizeof math and it all gets inlined anyways.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is code that implementers of OwnedBuffers will have to write. I would prefer this be simpler than going back and forth between references and pointers. Let me leave the PR as-is and we can think separately what's the best way to provide convenient API for this scenario. I filed issue #1529

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is adding it to S.R.CS.U a valid option?

cc @jkotas

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. S.R.CS.Unsafe has both void* and ref T overloads for most methods. void* overloads for Add and Subtract are missing for no good reason.

return true;
}

public unsafe bool TryGetArray(out ArraySegment<T> buffer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method still need to be unsafe?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

public abstract int Length { get; }

public abstract Span<T> Span { get; }
public abstract Span<T> AsSpan(int index, int length);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this overload? Why not just use AsSpan().Slice(index, length)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is faster. The method is virtual, so the same optimizations we were able to do in Span probably will not work here. Moreover, it allows subclasses to materialize only part of the buffer representing this range.

Copy link
Copy Markdown
Contributor

@ahsonkhan ahsonkhan May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is faster. The method is virtual, so the same optimizations we were able to do in Span probably will not work here.

Does this mean that the IndexOf that takes an index/count vs Slice analysis doesn't apply to this? If so, and it is faster, then it is good to keep.

Moreover, it allows subclasses to materialize only part of the buffer representing this range.

Why exactly can't the subtypes just slice the span?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the IndexOf that takes an index/count vs Slice analysis doesn't apply to this? If so, and it is faster, then it is good to keep.

It does not apply because IndexOf and Slice on Span are non-virtual, can be inlined, and code optimized out.

Why exactly can't the subtypes just slice the span?

Let's say the buffer wraps memory mapped file. AsSpan(0, 10) needs to read only first ten bytes of the file. Span.Slice(0, 10) would first have to read the whole file and then create a span of just first ten bytes.

Copy link
Copy Markdown
Contributor

@ahsonkhan ahsonkhan May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span.Slice(0, 10) would first have to read the whole file and then create a span of just first ten bytes.

I don't completely understand this. The cost of creating a span isn't proportional to the input buffer size it is wrapping, correct? It is a constant cost operation. So why does Span.Slice(0, 10) read the whole file whereas AsSpan(0, 10) does not?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you buffer is in memory, then you are right. But OwnedBuffer is an abstraction and not all it's data have to be in memory.


namespace System.Buffers
{
public class OwnedArray<T> : OwnedBuffer<T>
Copy link
Copy Markdown
Contributor

@ahsonkhan ahsonkhan May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only change here moving OwnedArrray.cs from System/Buffers/Internals to System/Buffers or are there other changes as well?

Edit: Nevermind, I see the changes.

_array = array;
}

public OwnedArray(ArraySegment<T> segment)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to remove the OwnedArray constructor that takes an ArraySegment?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires fields for offset and length to be stored in OwnedArray.

return _array;
}
if (IsDisposed) BufferPrimitivesThrowHelper.ThrowObjectDisposedException(nameof(OwnedBuffer));
return _array.Slice(index, length);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_array.AsSpan().Slice(index, length); instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was wondering why this Slice method is still here. I will fix.

public void* PinnedPointer => _pointer;
public void* PinnedPointer {
get {
if (_pointer == null) throw new InvalidOperationException();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.


public override void Retain()
{
if (IsDisposed) throw new InvalidOperationException();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BufferPrimitivesThrowHelper?


public OwnedArray(T[] array)
{
if (array == null) throw new ArgumentNullException(nameof(array));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BufferPrimitivesThrowHelper?

return new Span<byte>(Slab.Array, _offset, _length);
}
if (IsDisposed) PipelinesThrowHelper.ThrowObjectDisposedException(nameof(MemoryPoolBlock));
if (length > _length - index) throw new ArgumentOutOfRangeException();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. The slab array is very large and the span ctor will often work

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it then be if (length > _length - (_offset + index))?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because _length is not the length of the whole buffer, but rather the length of the section represented by this OwnedBuffer.

if (IsDisposed) PipelinesThrowHelper.ThrowObjectDisposedException(nameof(MemoryPoolBlock));
pointer = (Slab.NativePointer + _offset + index).ToPointer();
return true;
if (index > _length) throw new ArgumentException(nameof(index));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ThrowHelper.

newStart = 0;
newEnd = length;
return buffer;
return (OwnedArray<byte>)buffer;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this cast necessary?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implicit cast caannot go through two hops (array -> OwnedArray -> OwnedBuffer)

@mjp41
Copy link
Copy Markdown
Member

mjp41 commented May 5, 2017

@KrzysztofCwalina I have been experimenting with your refactor today partly to enable ReferenceCounter.

I wonder if it makes sense to create a method that is called when a BufferHandle is released:

   protected internal abstract void ReleaseHandle(); 

I tried to refactor the code a little to remove the lifetime management from the surface that Buffer sees, introducing a BufferSource that is what Buffer uses, which doesn't have the Retain/Release that are likely to map into reference counting. This would enable making subclasses that don't use reference counting for Pin, while still having reference counting for lifetime management, which may please @davidfowl. Then OwnedBuffer is a subclass which implements lifetime management.

I have pushed the first part of my experiment mjp41@944933e

@KrzysztofCwalina KrzysztofCwalina force-pushed the PreparingToChangeOBToInterface branch from 1e650f7 to 2c2483a Compare May 5, 2017 15:32
@KrzysztofCwalina
Copy link
Copy Markdown
Member Author

@mj1856, how is ReleaseHandle different from Release?

Also, BufferSource has RetainHandle and ReleaseHandle. How are these different from Retain/Release?

@mjp41
Copy link
Copy Markdown
Member

mjp41 commented May 5, 2017

So RetainHandle constructs the Handle for accessing, but it is not required to be hooked up to any lifetime management.

ReleaseHandle is called when a BufferHandle is disposed. But again it is not required to be hooked up with any lifetime management.

At the moment, I hooked them up to calling the OwnedBuffer lifetime management routines, but they don't have to. This allows for BufferHandle not to be lifetime managed even if the underlying BufferSource is by implementing the OwnedBuffer.

@KrzysztofCwalina
Copy link
Copy Markdown
Member Author

Sorry, I still don't understand the difference. OwnedBuffer.Retain/Release can do whatever the subclasses want, i.e. don't have to do lifetime management.

@mjp41
Copy link
Copy Markdown
Member

mjp41 commented May 5, 2017

So for BufferSegment, the implementation requires Retain and Release to do lifetime management otherwise it won't work. The BufferHandle in this case, will be forced to follow the same lifetime management strategy. By adding this extra part, then it can do a different/non-existant strategy for the BufferHandle. This would allow say to use ReferenceCounter for BufferHandles and interlocked for the Retain and Release calls coming from BufferSegments.

@KrzysztofCwalina
Copy link
Copy Markdown
Member Author

KrzysztofCwalina commented May 5, 2017

Ah, I get it. You want to have different lifetime management strategy for stack bound handles and heap ownedbuffer.

Though, I have to say I am not a big fan of having two similar but different APIs. I really think we need to simplify lifetime management and I am not sure if this is a simplification or additional complexity that users/implementers would have to think about. Let me think about it a bit more.

dotnet-bot and others added 2 commits May 5, 2017 09:52
…otnet#1516)

* Renaming charactersWritten/Consumed to codePointsWritten/Consumed.

* Renaming codePoints to codeUnits

* Updating Utf8TextEncoder arg name to match public surface of TextEncoder.
@KrzysztofCwalina
Copy link
Copy Markdown
Member Author

@davidfowl, any more feedback on this?

}
}

private ArraySegment<byte> _buffer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private fields belong at the top of the class.

Copy link
Copy Markdown
Member

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few style nits.

@KodrAus
Copy link
Copy Markdown
Contributor

KodrAus commented May 6, 2017

@KrzysztofCwalina @mjp41 Is there a reason OwnedBuffer.Retain/OwnedBuffer.Release isn't idempotent but Buffer.Retain/Buffer.Release over that owner is?

@KrzysztofCwalina
Copy link
Copy Markdown
Member Author

@KodrAus, they seem to be equivalent. Buffer.Retain, just calls OwnedBuffer.Retain.

@KrzysztofCwalina KrzysztofCwalina merged commit 7ffb869 into dotnet:master May 6, 2017
@KodrAus
Copy link
Copy Markdown
Contributor

KodrAus commented May 6, 2017

@KrzysztofCwalina I guess the difference is OwnedBuffer.Retain returns void, whereas Buffer.Retain returns a BufferHandle, which can only be Released once when disposed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants