Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data slice mutation support #11939

Merged
merged 11 commits into from Sep 16, 2017
Merged

Conversation

phausler
Copy link
Member

@phausler phausler commented Sep 15, 2017

Mutation of Data slices had issues when it comes to re-indexing to the base slice region. Previously when a slice of Data was mutated the re-calculation of the slice region was incorrectly assigned to a relative region to the previous backing. Now the slice region will copy the specific region out creating a new data upon CoW semantics and offset the index region upon access. Mutating a non CoW slice will retain the previous region backing but adjust the remaining allocation buffer accordingly. To validate this I have added tests for combinations that should approximate all possible combinations of mutations, backing storages and CoW scenarios.

This fixes the following issues:
rdar://problem/34206043
SR-5887
SR-5873
SR-5810

tl;dr – Slicing a CoW should result in fillet mignon not ground chuck.

@phausler
Copy link
Member Author

@swift-ci please test

@phausler
Copy link
Member Author

@swift-ci please test

@swift-ci
Copy link
Collaborator

Build failed
Swift Test Linux Platform
Git Sha - 95bd083

@swift-ci
Copy link
Collaborator

Build failed
Swift Test OS X Platform
Git Sha - 95bd083

Copy link
Contributor

@itaiferber itaiferber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall — minor code style comments, but nothing blocking. Glad we have such a thorough list of slicing tests now!


public var bytes: UnsafeRawPointer? {
@inline(__always)
get {
switch _backing {
case .swift:
return UnsafeRawPointer(_bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know how much this matters, but you should be able to shorten this a little bit:

let bytesPtr: UnsafeRawPointer?
switch _backing {
case .swift: fallthrough
case .immutable: fallthrough
case .mutable: bytesPtr = _bytes

case .customReference(let d): bytesPtr = d.bytes
case .customMutableReference(let d): bytesPtr = d.bytes
}

return bytesPtr?.advanced(by: -_offset)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, use a default case for .swift, .immutable, and .mutable (we do this inconsistently below, and I think we can use it in more places)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably can be; for now I was avoiding doing massive refactoring with the un-related switches to reduce risk.

case .mutable:
return try apply(UnsafeRawBufferPointer(start: _bytes?.advanced(by: range.lowerBound - _offset), count: Swift.min(range.count, _length)))
case .customReference(let d):
if d._isCompact() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we factor this out like the switch above?

let reference: Data
switch _backing {
case .swift: fallthrough
case .immutable: fallthrough
case .mutable:
    return try ...

case .customReference(let d): reference = d
case .customMutableReference(let d): reference = d
}

if reference._isCompact() {
    ...
} else {
    ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I think this can be done in a subsequent change

let len = d.length
return try apply(UnsafeMutableRawBufferPointer(start: d.mutableBytes.advanced(by:range.lowerBound - _offset), count: Swift.min(range.count, len - range.lowerBound)))
case .immutable(let d):
let data = d.mutableCopy() as! NSMutableData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same here

@@ -176,20 +273,18 @@ public final class _DataStorage {
}
}

public func enumerateBytes(_ block: (_ buffer: UnsafeBufferPointer<UInt8>, _ byteIndex: Data.Index, _ stop: inout Bool) -> Void) {
public func enumerateBytes(in range: Range<Int>, _ block: (_ buffer: UnsafeBufferPointer<UInt8>, _ byteIndex: Data.Index, _ stop: inout Bool) -> Void) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this kind of change need to be marked as @_versioned at some point? This would be ABI breaking because of inlining, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fully certain that matters just yet, but definitely once ABI stability hits we should consider these types of changes; thankfully the instance of _DataStorage itself is versioned so we should be in the clear here.

@@ -349,6 +444,8 @@ public final class _DataStorage {

}

////////////***** AUDITED UP TO HERE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean? Audited by who and for what?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, will remove that. it was my first pass on the offset additions and I forgot to remove it when I did the remainder of the audits

@@ -1137,7 +1273,7 @@ public struct Data : ReferenceConvertible, Equatable, Hashable, RandomAccessColl
}
@inline(__always)
set {
precondition(count >= 0, "Count must be positive")
precondition(count >= 0, "count must be positive")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're already fixing this comment — should this not be "non-negative" instead of "positive"? (Or "count must not be negative"?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this actually changed, perhaps it was the sync across sclf?

self[idx] = byte
idx = newIndex
var ptr = UnsafeMutablePointer<UInt8>.allocate(capacity: estimatedCount)
defer { ptr.deallocate(capacity: estimatedCount) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the optional stack-allocating version here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! I think that would be a reasonable change to avoid perf regressions here

var ptr = UnsafeMutablePointer<UInt8>.allocate(capacity: estimatedCount)
defer { ptr.deallocate(capacity: estimatedCount) }
let buffer = UnsafeMutableBufferPointer(start: ptr, count: estimatedCount)
var (iterator, endPoint) = newElements._copyContents(initializing: buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better yet, can we avoid an intermediate copy in the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fully certain that can be done easily without a more extensive refactor

@phausler
Copy link
Member Author

As a note: this patch is intended to be as minimal as possible to reduce the risk of regressions since we are considering this for an earlier submission. So some improvements might make it in with later work we have scheduled for integrating other functionality.

@michael-lehew
Copy link
Member

michael-lehew commented Sep 15, 2017

tl;dr – Slicing a CoW should result in fillet mignon not ground chuck.

This is a winning comment regardless of the rest of the content (reading now) :)

Copy link
Member

@michael-lehew michael-lehew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall, the tests are great improvement.

if (amount <= buffer->capacity) { return true; }
void *newMemory;
if (buffer->onStack) {
newMemory = malloc(amount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for malloc fail and abort?

memcpy(newMemory, buffer->memory, buffer->capacity);
buffer->onStack = false;
} else {
newMemory = realloc(buffer->memory, amount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will leak if realloc fails (the normal realloc nasty bug)

@phausler
Copy link
Member Author

@swift-ci please test

@swift-ci
Copy link
Collaborator

Build failed
Swift Test Linux Platform
Git Sha - d7b0fac

@swift-ci
Copy link
Collaborator

Build failed
Swift Test OS X Platform
Git Sha - d7b0fac

@@ -198,7 +293,7 @@ public final class _DataStorage {
d.enumerateBytes { (ptr, range, stop) in
var stopv = false
let bytePtr = ptr.bindMemory(to: UInt8.self, capacity: range.length)
block(UnsafeBufferPointer(start: bytePtr, count: range.length), range.location, &stopv)
block(UnsafeBufferPointer(start: bytePtr, count: range.length), range.location - _offset, &stopv)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned to you separately, these still need to be clamped to the range parameter passed to enumerateBytes. It might be a good idea to avoid the variable aliasing as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, working on a test and patch for that one; thanks again

return d.bytes.advanced(by: index - _offset).assumingMemoryBound(to: UInt8.self).pointee
} else {
var byte: UInt8 = 0
d.enumerateBytes { (ptr, range, stop) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more code-size-efficient to just call getBytes:range:NSMakeRange(offsetIndex, 1) here. Discontiguous data implementations of that method should do essentially the same thing that you have written here.

@@ -1329,22 +1460,26 @@ public struct Data : ReferenceConvertible, Equatable, Hashable, RandomAccessColl
if !isKnownUniquelyReferenced(&_backing) {
_backing = _backing.mutableCopy(_sliceRange)
}
_backing.append(buffer.baseAddress!, length: buffer.count * MemoryLayout<SourceType>.stride)
_backing.replaceBytes(in: NSRange(location: _sliceRange.upperBound, length: _backing.length - (_sliceRange.upperBound - _backing._offset)), with: buffer.baseAddress, length: buffer.count * MemoryLayout<SourceType>.stride)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd to me from a principled/conceptual standpoint. It seems like the Data should not reaching outside of its _sliceRange, and instead that this should be _backing.replaceBytes(in: NSRange(location: _sliceRange.upperBound, length: 0)…, which is exactly how you'd implement an abstract "append" if "replace" was the only primitive available.

In practice, I get it. Either:

  1. The slice is the last reference of a previously larger backing, and we don't need to keep any bytes after our _sliceRange around anymore.
    or
  2. _backing is newly allocated due to CoW and the length parameter ends up 0 anyway.

I feel vaguely like a slightly different abstraction would satisfy both the principled and practiced side, but I'm not sure what that'd be right now.

Do we ever have occasion to clean up the bytes that precede the slice range when the slice is the last reference?


static inline _Bool _withStackOrHeapBuffer(size_t amount, void (__attribute__((noescape)) ^ _Nonnull applier)(_ConditionalAllocationBuffer *_Nonnull)) {
_ConditionalAllocationBuffer buffer;
buffer.capacity = malloc_good_size(amount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why round up prior to determining onStack? If the requested amount of memory is 500 bytes, malloc_good_size will probably return 512 or higher, which puts it out of stack viability (on the main thread). From a quick reading, it seems like we intend for allocations of that size to be on the stack.

I'm curious what the benefit to malloc_good_size is here at all. I suppose if you expected it to grow, you'd want to malloc_good_size to give you some more time before needing to realloc, but I don't actually see _resizeConditionalAllocationBuffer being called anywhere.

func test_validateMutation_appendBytes() {
var data = Data(bytes: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
data.append("hello", count: 5)
expectEqual(data[data.startIndex.advanced(by: 5)], 0x5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems odd since it doesn't test that it actually appended the new bytes.

let other = Data(bytes: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
data.append(other)
expectEqual(data[data.startIndex.advanced(by: 9)], 9)
expectEqual(data[data.startIndex.advanced(by: 10)], 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test the length too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably I should change it to the full comparison version

}

func test_validateMutation_cow_mutableBacking_replaceSubrangeWithBytes() {
var data = NSData(b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh… I thought extending an AllOnesData's count filled it with ones. Guess I'm wrong?

}

func test_validateMutation_cow_mutableBacking_replaceSubrangeWithBytes() {
var data = NSData(b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If data really is expected to be [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] after data.count = 10, then this doesn't really test for anything.

…e in the case of custom backing (e.g. dispatch_data_t)
@phausler
Copy link
Member Author

@kperryua the latest change should address the enumeration of slice regions such that a slice of discontiguous data will enumerate regions that are restricted to the slice range.

@phausler
Copy link
Member Author

@swift-ci please test

@swift-ci
Copy link
Collaborator

Build failed
Swift Test Linux Platform
Git Sha - 1cf5445

@swift-ci
Copy link
Collaborator

Build failed
Swift Test OS X Platform
Git Sha - 1cf5445

@phausler phausler merged commit c8bbce6 into apple:master Sep 16, 2017
phausler added a commit to phausler/swift that referenced this pull request Sep 19, 2017
* Mutations of slices of data should preserve relative indexing as well as cow semantics of slices

* Ensure hashes of ranges are uniform to the expected hash for Data

* Correct a few mistakes in the slice mutation tests

* Update sequence initializations to avoid directly calling mutableCopy which prevents slice offset mismatches

* Avoid invalid index slices in creating mirrors

* Restore the original Data description

* Resetting a slice region should expand the slice to the maximum of the region (not a out of bounds index of the backing buffer)

* Remove stray comment and use a stack buffer for sequence appending

* Return false when allocations fail in _resizeConditionalAllocationBuffer (not yet in use)

* Enumeration of regions of a slice should be limited to the slice range in the case of custom backing (e.g. dispatch_data_t)

* adjust assertion warnings for data indexes that are negative
tkremenek added a commit that referenced this pull request Sep 20, 2017
@CharlesJS
Copy link

tl;dr – Slicing a CoW should result in fillet mignon not ground chuck.

Dang it, that's better than both my "The CoW got tipped" and "mutates like nuclear fallout" puns. Nicely done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants