New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Immutable arrays with mutable values #2

Closed
andrewsardone opened this Issue Jun 4, 2014 · 32 comments

Comments

Projects
None yet
8 participants
@andrewsardone
Owner

andrewsardone commented Jun 4, 2014

Thanks to @cdzombak, who noted this interesting passage in the Swift book:

Immutability has a slightly different meaning for arrays, however. You are still not allowed to perform any action that has the potential to change the size of an immutable array, but you are allowed to set a new value for an existing index in the array. This enables Swift’s Array type to provide optimal performance for array operations when the size of an array is fixed.

This raises two questions:

  1. What optimizations are being helped and why?
  2. Why, even for array constants, are you allowed to mutate the contents at
    specific indices? This seems counterintuitive in an
    NSArray/NSMutableArray world, and when Dictionary seemingly doesn’t
    suppurt such mutation when set to a constant.
@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone

andrewsardone Jun 4, 2014

Owner

It’s not really relevant, but I think some reading on Go’s arrays versus slices is tangentially interesting.

Owner

andrewsardone commented Jun 4, 2014

It’s not really relevant, but I think some reading on Go’s arrays versus slices is tangentially interesting.

@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone

andrewsardone Jun 4, 2014

Owner

Here’s an example of the behavior:

var arrayVar = ["foo", "bar", "baz"]
arrayVar[0] = "andrew"
arrayVar += "bing"
// arrayVar is now ["andrew", "bar", "baz", "bing"]

let arrayConst = ["foo", "bar", "baz"]
arrayConst[0] = "andrew" // arrayConst is now ["andrew", "bar", "baz]
arrayConst += "bing" // error: could not find an overload for '+=' that accepts the supplied arguments
Owner

andrewsardone commented Jun 4, 2014

Here’s an example of the behavior:

var arrayVar = ["foo", "bar", "baz"]
arrayVar[0] = "andrew"
arrayVar += "bing"
// arrayVar is now ["andrew", "bar", "baz", "bing"]

let arrayConst = ["foo", "bar", "baz"]
arrayConst[0] = "andrew" // arrayConst is now ["andrew", "bar", "baz]
arrayConst += "bing" // error: could not find an overload for '+=' that accepts the supplied arguments
@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone
Owner

andrewsardone commented Jun 4, 2014

It looks like @joshaber is similarly bummed out.

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 4, 2014

Collaborator

Caveat: I'm busy with conference things, so I haven't gotten as far in the Swift book as you all have. So things I say might be silly.

It's unclear whether the passage quoted in #2 (comment) says that (allowing replacement in a constant array allows performance enhancements), or whether (the enhancements are due to constant arrays being a constant length).

I don't know what sort of optimizations allowing replacement could permit. But then I don't know much about compiler optimizations in general.

If in fact it's the constant length aspect, not allowing replacement, that permits optimizations, then why was this decision made? This feels like a huge problem with an otherwise well-designed language, so I assume there must be a good reason, but what is it and why can't we paper over it in the compiler or runtime or somewhere?

@jmont noted this in a tweet to me:

I think you're about to hit the fun part of Swift arrays… see unshare and copy

I took a brief tour through the book where those methods are mentioned.

It can be useful to ensure that you have a unique copy of an array before performing an action on that array’s contents, or before passing that array to a function or method. You ensure the uniqueness of an array reference by calling the unshare method on a variable of array type. (The unshare method cannot be called on a constant array.)

If multiple variables currently refer to the same array, and you call the unshare method on one of those variables, the array is copied, so that the variable has its own independent copy of the array. However, no copying takes place if the variable is already the only reference to the array.

Having to copy mutable things is nothing new, but this passage opens up more questions for me than it answers:

  • why only support unshare for array references, not dictionaries? why not extend it to be a general-purpose method on any class?
  • is copying always achieved with copy() on any object, perhaps any object that implements a protocol? (I suspect this will be answered as I read the book.)
  • why can't unshare be called on a constant array? "constant" arrays aren't immutable, so presumably I want to unshare them before using them.
  • why expose this to the programmer? Strings are passed by value, but under the hood they're only copy-on-write, for performance. Why not do the same thing for arrays, so you'd simply always write copy in your code and the runtime will only copy the array when necessary?

Between mutable "constant" arrays and this special-case unshare/copy concept that only applies to arrays, I'm feeling really, really sad.

Collaborator

cdzombak commented Jun 4, 2014

Caveat: I'm busy with conference things, so I haven't gotten as far in the Swift book as you all have. So things I say might be silly.

It's unclear whether the passage quoted in #2 (comment) says that (allowing replacement in a constant array allows performance enhancements), or whether (the enhancements are due to constant arrays being a constant length).

I don't know what sort of optimizations allowing replacement could permit. But then I don't know much about compiler optimizations in general.

If in fact it's the constant length aspect, not allowing replacement, that permits optimizations, then why was this decision made? This feels like a huge problem with an otherwise well-designed language, so I assume there must be a good reason, but what is it and why can't we paper over it in the compiler or runtime or somewhere?

@jmont noted this in a tweet to me:

I think you're about to hit the fun part of Swift arrays… see unshare and copy

I took a brief tour through the book where those methods are mentioned.

It can be useful to ensure that you have a unique copy of an array before performing an action on that array’s contents, or before passing that array to a function or method. You ensure the uniqueness of an array reference by calling the unshare method on a variable of array type. (The unshare method cannot be called on a constant array.)

If multiple variables currently refer to the same array, and you call the unshare method on one of those variables, the array is copied, so that the variable has its own independent copy of the array. However, no copying takes place if the variable is already the only reference to the array.

Having to copy mutable things is nothing new, but this passage opens up more questions for me than it answers:

  • why only support unshare for array references, not dictionaries? why not extend it to be a general-purpose method on any class?
  • is copying always achieved with copy() on any object, perhaps any object that implements a protocol? (I suspect this will be answered as I read the book.)
  • why can't unshare be called on a constant array? "constant" arrays aren't immutable, so presumably I want to unshare them before using them.
  • why expose this to the programmer? Strings are passed by value, but under the hood they're only copy-on-write, for performance. Why not do the same thing for arrays, so you'd simply always write copy in your code and the runtime will only copy the array when necessary?

Between mutable "constant" arrays and this special-case unshare/copy concept that only applies to arrays, I'm feeling really, really sad.

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 4, 2014

Collaborator

There are some discussions on the Apple dev forums which haven't gone anywhere:

Someone filed a radar on it: http://www.openradar.me/radar?id=5883417732317184 but it's even wrong:

Sure, I can call unshare() defensively, but then what's the point of having the optimization at all?

But the book says that's not true:

(The unshare method cannot be called on a constant array.)

Collaborator

cdzombak commented Jun 4, 2014

There are some discussions on the Apple dev forums which haven't gone anywhere:

Someone filed a radar on it: http://www.openradar.me/radar?id=5883417732317184 but it's even wrong:

Sure, I can call unshare() defensively, but then what's the point of having the optimization at all?

But the book says that's not true:

(The unshare method cannot be called on a constant array.)

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 4, 2014

Collaborator

I filed another radar: http://www.openradar.me/17150380

Collaborator

cdzombak commented Jun 4, 2014

I filed another radar: http://www.openradar.me/17150380

@jmont

This comment has been minimized.

Show comment
Hide comment
@jmont

jmont Jun 4, 2014

I think the optimization here is that comparing arrays' identities is really cheap, since if multiple variables refer to the same array, then the === operator can just compare addresses. If an element is added, then a new array instance is created, and then the arrays' addresses will be different (which means the arrays are not identical).

The tradeoff here is one of usability to the programmer (how does any of this make sense?! And why do only arrays behave like this??) and the high possibility of bugs if values in an array are changed on a shared array.

I personally don't often compare arrays for identity (or equality), but I'm think this optimization is here for a very good reason.

Excerpt From: Apple Inc. “The Swift Programming Language.” iBooks. https://itun.es/us/jEUH0.l

It can sometimes be useful to find out if two constants or variables refer to exactly the same instance of a class. To enable this, Swift provides two identity operators (===) and (!==) [sic].

jmont commented Jun 4, 2014

I think the optimization here is that comparing arrays' identities is really cheap, since if multiple variables refer to the same array, then the === operator can just compare addresses. If an element is added, then a new array instance is created, and then the arrays' addresses will be different (which means the arrays are not identical).

The tradeoff here is one of usability to the programmer (how does any of this make sense?! And why do only arrays behave like this??) and the high possibility of bugs if values in an array are changed on a shared array.

I personally don't often compare arrays for identity (or equality), but I'm think this optimization is here for a very good reason.

Excerpt From: Apple Inc. “The Swift Programming Language.” iBooks. https://itun.es/us/jEUH0.l

It can sometimes be useful to find out if two constants or variables refer to exactly the same instance of a class. To enable this, Swift provides two identity operators (===) and (!==) [sic].
@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone

andrewsardone Jun 4, 2014

Owner

Good notes on unshare & copy, @cdzombak, and thanks for filing a radar!

Owner

andrewsardone commented Jun 4, 2014

Good notes on unshare & copy, @cdzombak, and thanks for filing a radar!

@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone

andrewsardone Jun 4, 2014

Owner

This note makes things even more boggling:

why can't unshare be called on a constant array? "constant" arrays aren't immutable, so presumably I want to unshare them before using them.

Owner

andrewsardone commented Jun 4, 2014

This note makes things even more boggling:

why can't unshare be called on a constant array? "constant" arrays aren't immutable, so presumably I want to unshare them before using them.

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 4, 2014

Collaborator

I'm wondering if I've missed some deeper concept behind value types vs.
reference types, constant or variable references, and mutability rules. It
seems like the language conflates some of these concerns, and I'm wondering
if I'm missing some kind of deeper concept.

On Wednesday, June 4, 2014, Andrew Sardone notifications@github.com wrote:

This note
#2 (comment)
makes things even more boggling:

why can't unshare be called on a constant array? "constant" arrays
aren't immutable, so presumably I want to unshare them before using them.


Reply to this email directly or view it on GitHub
#2 (comment)
.

Collaborator

cdzombak commented Jun 4, 2014

I'm wondering if I've missed some deeper concept behind value types vs.
reference types, constant or variable references, and mutability rules. It
seems like the language conflates some of these concerns, and I'm wondering
if I'm missing some kind of deeper concept.

On Wednesday, June 4, 2014, Andrew Sardone notifications@github.com wrote:

This note
#2 (comment)
makes things even more boggling:

why can't unshare be called on a constant array? "constant" arrays
aren't immutable, so presumably I want to unshare them before using them.


Reply to this email directly or view it on GitHub
#2 (comment)
.

@KevinVitale

This comment has been minimized.

Show comment
Hide comment
@KevinVitale

KevinVitale Jun 4, 2014

Collaborator

Maybe we can use this convention for the time being?

Be sure to not modify a const (this will error):

let xray = ["kevin", "andrew"]
xray.replaceRange(0..1, with: ["chris"])

vs.
Go nuts!

var xray = ["kevin", "andrew"]
xray.replaceRange(0..1, with: ["chris"])

Bonus: since subscript ranges can't be used to append to arrays (of any declaration), neither can replaceRance

Collaborator

KevinVitale commented Jun 4, 2014

Maybe we can use this convention for the time being?

Be sure to not modify a const (this will error):

let xray = ["kevin", "andrew"]
xray.replaceRange(0..1, with: ["chris"])

vs.
Go nuts!

var xray = ["kevin", "andrew"]
xray.replaceRange(0..1, with: ["chris"])

Bonus: since subscript ranges can't be used to append to arrays (of any declaration), neither can replaceRance

@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone

andrewsardone Jun 4, 2014

Owner

Conventions to protect against language pecularities? What are we, Objective-C developers? :trollface:

Owner

andrewsardone commented Jun 4, 2014

Conventions to protect against language pecularities? What are we, Objective-C developers? :trollface:

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 4, 2014

Collaborator

Conventions to protect against language pecularities? What are we, Objective-C developers? :trollface:

TOO SOON

Collaborator

cdzombak commented Jun 4, 2014

Conventions to protect against language pecularities? What are we, Objective-C developers? :trollface:

TOO SOON

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 4, 2014

Collaborator

edit: added some replies to myself/additional commentary in this comment

Okay, I talked to Ted at a lab, and I got a kind of intuitive sense for what's going on. Let's see if I can actually express it here…

At the core, I think, the problem is that we're still learning how to think about Swift's reference semantics. Think about them this way: var is a variable reference, let is a constant reference. Any object — struct or class — that these refer to could be mutated (imagine a constant reference to a class that holds and mutates internal state).

They have a somewhat overloaded meaning right now for arrays and dictionaries. What we're considering as "overloaded" traits (im/mutability for arrays based on what sort of reference you hold) are effectively a consequence of the implementation. (The array itself is strictly speaking not explicitly immutable.)

I am unclear (ie. I didn't explicitly ask) whether these same details are the reason that constant dictionaries are immutable, and I don't know how dictionaries are implemented; so the rest of this comment will focus on Swift arrays exclusively.

The reference you hold to a Swift array is effectively a reference to a buffer in memory. That means there are certain things you can and can't do if you've made that reference constant: you can replace elements, because the buffer remains at the same location. You can't add objects, however, because that could require changing that reference.

There may be additional underlying details that I don't yet grasp, and I don't quite understand why that detail precludes removing elements from a "constant" array.

So, there's really no such thing as immutable primitive Swift arrays, they're just kind of immutable due to implementation details.

This is also why copy() is available for constant arrays, but unshare() isn't. If you want to unshare() an array, your reference to it may need to change, which isn't possible with a constant reference.

As I was told, they actually started out with strong immutability for constant arrays, but they abandoned it because it turned out that providing great performance with C-style arrays (a constant reference to a memory buffer) was incompatible with providing perfect immutability. I still don't really understand these optimization concerns and I need to reason through why various arrangements of copy-on-write semantics couldn't be made to work in some way; if anyone wants to explain this to me, please do.

Note that this is a bit of a break from what we're used to considering as immutable data types in ObjC — as far as I can tell, what we're thinking about as mutability rules are actually implementation details determined by the kind of reference you hold to an object. That's where our confusion is coming from, and that's why these things seem conflated at first glance.


Let's see if some experiments can help us develop an understanding of how these details can actually comt into play, and how we're not really seeing immutability but an implementation detail.

let a = ["a", "c"]
var b = a
a === b  // true ; we've set a var to a constant array, and identity is the same

b[0] = "y"
a === b  // true: a and b are still the underlying object after replacement

b += "b"
a === b  // false ; copy-on-write semantics for other modifications to the array

Interestingly, assigning a constant array to a variable yields the same object. Replacing an object in that variable also modifies the original constant array; they're still the same object. Appending to that variable array causes a copy of the array to a new object, and the var is reassigned to reference that new array in memory.

let anotherA = a
anotherA[0] = "x"
anotherA === a  // true; replacement is possible even in another constant reference to the same array

Replacing an object in a constant reference to that original array modifies the original array, as you'd expect now that you understand the implementation. Same as replacing in the var above.

let c = [1, 2, 3]
let d = c.copy()

d === c  // false

At least copying works as you expect.


The engineers I spoke to also noted that you can still use immutable arrays via NSArray, should you wish to enforce proper immutability.

And it's interesting to note that the bulk of what we're learning as Swift is actually the standard library, not the language. Most things we'd think are primitives are, in Swift, actually part of the standard library. Really the only primitives in Swift are LLVM IR primitives (I think, if I'm remembering this correctly). This has some interesting implications which I have yet to really think through.


The reference you hold to a Swift array is effectively a reference to a buffer in memory.

That buffer is laid out something like this:

 -------------------------------------------
| length | capacity | item A | item B | ... |
 -------------------------------------------

I still don't really understand these optimization concerns and I need to reason through why various arrangements of copy-on-write semantics couldn't be made to work in some way; if anyone wants to explain this to me, please do.

One could imagine that providing a strong immutability guarantee alongside a constant reference would add a good deal of complexity and extra checks every time someone mentions that reference.

Key takeaway

But as I noted, as I understand, this semi-immutability is an implementation detail. The point of a constant array is to provide a constant reference to the same array. The implementation of the primitive array dictates that an array with a constant reference to it can't change its size. I don't think immutability is a design goal at all.

Collaborator

cdzombak commented Jun 4, 2014

edit: added some replies to myself/additional commentary in this comment

Okay, I talked to Ted at a lab, and I got a kind of intuitive sense for what's going on. Let's see if I can actually express it here…

At the core, I think, the problem is that we're still learning how to think about Swift's reference semantics. Think about them this way: var is a variable reference, let is a constant reference. Any object — struct or class — that these refer to could be mutated (imagine a constant reference to a class that holds and mutates internal state).

They have a somewhat overloaded meaning right now for arrays and dictionaries. What we're considering as "overloaded" traits (im/mutability for arrays based on what sort of reference you hold) are effectively a consequence of the implementation. (The array itself is strictly speaking not explicitly immutable.)

I am unclear (ie. I didn't explicitly ask) whether these same details are the reason that constant dictionaries are immutable, and I don't know how dictionaries are implemented; so the rest of this comment will focus on Swift arrays exclusively.

The reference you hold to a Swift array is effectively a reference to a buffer in memory. That means there are certain things you can and can't do if you've made that reference constant: you can replace elements, because the buffer remains at the same location. You can't add objects, however, because that could require changing that reference.

There may be additional underlying details that I don't yet grasp, and I don't quite understand why that detail precludes removing elements from a "constant" array.

So, there's really no such thing as immutable primitive Swift arrays, they're just kind of immutable due to implementation details.

This is also why copy() is available for constant arrays, but unshare() isn't. If you want to unshare() an array, your reference to it may need to change, which isn't possible with a constant reference.

As I was told, they actually started out with strong immutability for constant arrays, but they abandoned it because it turned out that providing great performance with C-style arrays (a constant reference to a memory buffer) was incompatible with providing perfect immutability. I still don't really understand these optimization concerns and I need to reason through why various arrangements of copy-on-write semantics couldn't be made to work in some way; if anyone wants to explain this to me, please do.

Note that this is a bit of a break from what we're used to considering as immutable data types in ObjC — as far as I can tell, what we're thinking about as mutability rules are actually implementation details determined by the kind of reference you hold to an object. That's where our confusion is coming from, and that's why these things seem conflated at first glance.


Let's see if some experiments can help us develop an understanding of how these details can actually comt into play, and how we're not really seeing immutability but an implementation detail.

let a = ["a", "c"]
var b = a
a === b  // true ; we've set a var to a constant array, and identity is the same

b[0] = "y"
a === b  // true: a and b are still the underlying object after replacement

b += "b"
a === b  // false ; copy-on-write semantics for other modifications to the array

Interestingly, assigning a constant array to a variable yields the same object. Replacing an object in that variable also modifies the original constant array; they're still the same object. Appending to that variable array causes a copy of the array to a new object, and the var is reassigned to reference that new array in memory.

let anotherA = a
anotherA[0] = "x"
anotherA === a  // true; replacement is possible even in another constant reference to the same array

Replacing an object in a constant reference to that original array modifies the original array, as you'd expect now that you understand the implementation. Same as replacing in the var above.

let c = [1, 2, 3]
let d = c.copy()

d === c  // false

At least copying works as you expect.


The engineers I spoke to also noted that you can still use immutable arrays via NSArray, should you wish to enforce proper immutability.

And it's interesting to note that the bulk of what we're learning as Swift is actually the standard library, not the language. Most things we'd think are primitives are, in Swift, actually part of the standard library. Really the only primitives in Swift are LLVM IR primitives (I think, if I'm remembering this correctly). This has some interesting implications which I have yet to really think through.


The reference you hold to a Swift array is effectively a reference to a buffer in memory.

That buffer is laid out something like this:

 -------------------------------------------
| length | capacity | item A | item B | ... |
 -------------------------------------------

I still don't really understand these optimization concerns and I need to reason through why various arrangements of copy-on-write semantics couldn't be made to work in some way; if anyone wants to explain this to me, please do.

One could imagine that providing a strong immutability guarantee alongside a constant reference would add a good deal of complexity and extra checks every time someone mentions that reference.

Key takeaway

But as I noted, as I understand, this semi-immutability is an implementation detail. The point of a constant array is to provide a constant reference to the same array. The implementation of the primitive array dictates that an array with a constant reference to it can't change its size. I don't think immutability is a design goal at all.

@KevinVitale

This comment has been minimized.

Show comment
Hide comment
@KevinVitale

KevinVitale Jun 5, 2014

Collaborator

@cdzombak Excellent stuff! So, my takeaway involves the following:

Let's review how let is introduced in the documentation:

“The value of a constant doesn’t need to be known at compile time, but you must assign it a value exactly once.”

From what you described, the value of Array<T> types is the starting memory address to the buffer in memory. Therefore, it would now make sense that changing elements in the array doesn't violate this principle.

Furthermore, also stated is:

“The value of a constant cannot be changed once it is set”

Again, the way in which Array<T> types are implemented wouldn't violate how let is defined. This could probably explain why the semantics of Dictionary differ.

Finally, given how such implementation details affect the copy semantics of different class types, it may just help better understand the reasoning behind the decision to use let instead of const.

Collaborator

KevinVitale commented Jun 5, 2014

@cdzombak Excellent stuff! So, my takeaway involves the following:

Let's review how let is introduced in the documentation:

“The value of a constant doesn’t need to be known at compile time, but you must assign it a value exactly once.”

From what you described, the value of Array<T> types is the starting memory address to the buffer in memory. Therefore, it would now make sense that changing elements in the array doesn't violate this principle.

Furthermore, also stated is:

“The value of a constant cannot be changed once it is set”

Again, the way in which Array<T> types are implemented wouldn't violate how let is defined. This could probably explain why the semantics of Dictionary differ.

Finally, given how such implementation details affect the copy semantics of different class types, it may just help better understand the reasoning behind the decision to use let instead of const.

@lefb766

This comment has been minimized.

Show comment
Hide comment
@lefb766

lefb766 Jun 5, 2014

I noticed that we can safely replace foo.unshare() to foo = foo.copy(), foo.append(bar) to foo += [bar] etc..

Among most modern programming languages, explicit reassignment is the only way to alter the reference bound to an identifier. However, some methods of Swift's Array behave as if they implicitly reassign new reference to their object's identifier, that is unnatural and make me feel ugly.

I imagine that those methods are just aliases for corresponding operations involving reassignment. It is natural that we cannot perform unshare() on constant array if foo.unshare() is an alias for foo = foo.copy(). But I am not sure that is enough to explain all of the strange behaviors of Swift's Array.

lefb766 commented Jun 5, 2014

I noticed that we can safely replace foo.unshare() to foo = foo.copy(), foo.append(bar) to foo += [bar] etc..

Among most modern programming languages, explicit reassignment is the only way to alter the reference bound to an identifier. However, some methods of Swift's Array behave as if they implicitly reassign new reference to their object's identifier, that is unnatural and make me feel ugly.

I imagine that those methods are just aliases for corresponding operations involving reassignment. It is natural that we cannot perform unshare() on constant array if foo.unshare() is an alias for foo = foo.copy(). But I am not sure that is enough to explain all of the strange behaviors of Swift's Array.

@roop

This comment has been minimized.

Show comment
Hide comment
@roop

roop Jun 5, 2014

I think this anomaly is strongly tied to Assignment and Copy Behavior for Arrays, which says:

For arrays, copying only takes place when you perform an action that has the potential to modify
the length of the array. This includes appending, inserting, or removing items, or using a ranged
subscript to replace a range of items in the array.

From the docs, whenever something is assigned or passed to a function:

  1. All classes always use pass-by-reference
  2. Dictionaries internally use pass-by-reference, but use copy-on-write to make it seem like pass-by-value
  3. Arrays use pass-by-reference whenever the size of the array is guaranteed to remain constant, and copy-on-write in case the size is not guaranteed to be constant (e.g. calls to append(), replaceRange()).

In other words, it's as if Arrays are always passed-by-reference, and some of the modifying methods call a hypothetical copyOnWrite() method internally, but some don't. Specifically, append() would call copyOnWrite() internally, but sort() wouldn't. Note that copyOnWrite() requires ref counting to know if there are any other strong references to this data.

Now, my hypothesis:

  • let disables copy-on-write. For non-Array types, this means they cannot be modified at all (since all modifications would need copy-on-write). For Arrays, this means only those methods that can potentially modify the size are disabled.
  • An Array's copy() method increments ref_count and returns an instance that internally points to the same data. This would enable it to use copy-on-write when it has to be modified.
  • An Array's unshare() method is basically just the hypothetical copyOnWrite() method.

So, it appears that the problem or anomaly is with the inconsistency in how Array is implemented: Some of the modifying methods of Array use copy-on-write semantics internally, but some don't.

roop commented Jun 5, 2014

I think this anomaly is strongly tied to Assignment and Copy Behavior for Arrays, which says:

For arrays, copying only takes place when you perform an action that has the potential to modify
the length of the array. This includes appending, inserting, or removing items, or using a ranged
subscript to replace a range of items in the array.

From the docs, whenever something is assigned or passed to a function:

  1. All classes always use pass-by-reference
  2. Dictionaries internally use pass-by-reference, but use copy-on-write to make it seem like pass-by-value
  3. Arrays use pass-by-reference whenever the size of the array is guaranteed to remain constant, and copy-on-write in case the size is not guaranteed to be constant (e.g. calls to append(), replaceRange()).

In other words, it's as if Arrays are always passed-by-reference, and some of the modifying methods call a hypothetical copyOnWrite() method internally, but some don't. Specifically, append() would call copyOnWrite() internally, but sort() wouldn't. Note that copyOnWrite() requires ref counting to know if there are any other strong references to this data.

Now, my hypothesis:

  • let disables copy-on-write. For non-Array types, this means they cannot be modified at all (since all modifications would need copy-on-write). For Arrays, this means only those methods that can potentially modify the size are disabled.
  • An Array's copy() method increments ref_count and returns an instance that internally points to the same data. This would enable it to use copy-on-write when it has to be modified.
  • An Array's unshare() method is basically just the hypothetical copyOnWrite() method.

So, it appears that the problem or anomaly is with the inconsistency in how Array is implemented: Some of the modifying methods of Array use copy-on-write semantics internally, but some don't.

@roop

This comment has been minimized.

Show comment
Hide comment
@roop

roop Jun 5, 2014

@lefb766

I noticed that we can safely replace foo.unshare() to foo = foo.copy()

Forcing a copy of an Array seems to say unshare() and copy() are not equivalent:

If you simply need to be sure that your reference to an array’s contents is the only reference in existence, call the unshare method, not the copy method. The unshare method does not make a copy of the array unless it is necessary to do so. The copy method always copies the array, even if it is already unshared.

Also, copy() is allowed on let arrays but unshare() is not.

let a = [1, 2, 3]
var b = a.copy() // valid
b.append(4) // valid, doesn't change 'a'
a.unshare() // error

roop commented Jun 5, 2014

@lefb766

I noticed that we can safely replace foo.unshare() to foo = foo.copy()

Forcing a copy of an Array seems to say unshare() and copy() are not equivalent:

If you simply need to be sure that your reference to an array’s contents is the only reference in existence, call the unshare method, not the copy method. The unshare method does not make a copy of the array unless it is necessary to do so. The copy method always copies the array, even if it is already unshared.

Also, copy() is allowed on let arrays but unshare() is not.

let a = [1, 2, 3]
var b = a.copy() // valid
b.append(4) // valid, doesn't change 'a'
a.unshare() // error
@lefb766

This comment has been minimized.

Show comment
Hide comment
@lefb766

lefb766 Jun 5, 2014

@roop
I didn't know that unshare() does copy-on-write but copy() doesn't. The alias hypothesis might be wrong.

Also, copy() is allowed on let arrays but unshare() is not.

Yes. We can call copy() on let arrays, but can't assign it to the original identifier.

let a = [1, 2, 3]
a = a.copy() // invalid

On my understanding, we can ensure an array has unique reference to its content at the time just after unshare() called but can't ensure that for the future. For example,

var a = [1, 2, 3]
a.unshare()
var b = a
b[0] = 4
println(a[0])

will prints 4. Is that right? if so, foo.unshare() and foo = foo.copy() have the same external behavior (i.e. we can foo = foo.copy() instead of foo.unshare()) regardless of the performance.

lefb766 commented Jun 5, 2014

@roop
I didn't know that unshare() does copy-on-write but copy() doesn't. The alias hypothesis might be wrong.

Also, copy() is allowed on let arrays but unshare() is not.

Yes. We can call copy() on let arrays, but can't assign it to the original identifier.

let a = [1, 2, 3]
a = a.copy() // invalid

On my understanding, we can ensure an array has unique reference to its content at the time just after unshare() called but can't ensure that for the future. For example,

var a = [1, 2, 3]
a.unshare()
var b = a
b[0] = 4
println(a[0])

will prints 4. Is that right? if so, foo.unshare() and foo = foo.copy() have the same external behavior (i.e. we can foo = foo.copy() instead of foo.unshare()) regardless of the performance.

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 5, 2014

Collaborator

Finally, given how such implementation details affect the copy semantics of different class types, it may just help better understand the reasoning behind the decision to use let instead of const.

@KevinVitale could you expand on that? I'm having some trouble parsing that sentence.

Collaborator

cdzombak commented Jun 5, 2014

Finally, given how such implementation details affect the copy semantics of different class types, it may just help better understand the reasoning behind the decision to use let instead of const.

@KevinVitale could you expand on that? I'm having some trouble parsing that sentence.

@KevinVitale

This comment has been minimized.

Show comment
Hide comment
@KevinVitale

KevinVitale Jun 5, 2014

Collaborator

@cdzombak Some criticism has been thrown at Swift's decision to use the let keyword, instead of perhaps the const keyword, to denote "constant variables". Indeed, the Swift book repeatedly refers to let variables as "constants."

However, given the info you've shared, Swift's definition of "constant" isn't the same as the const keyword well understood and universally defined across multiple languages. Rather, Swift's concept of "constant" implies something about the implementation of its types. I believe had const actually been used, things would have been doubly confusing, and made even more people cringe than what's happening in the current debate.

To surmise: something else was needed, so let happened.

Collaborator

KevinVitale commented Jun 5, 2014

@cdzombak Some criticism has been thrown at Swift's decision to use the let keyword, instead of perhaps the const keyword, to denote "constant variables". Indeed, the Swift book repeatedly refers to let variables as "constants."

However, given the info you've shared, Swift's definition of "constant" isn't the same as the const keyword well understood and universally defined across multiple languages. Rather, Swift's concept of "constant" implies something about the implementation of its types. I believe had const actually been used, things would have been doubly confusing, and made even more people cringe than what's happening in the current debate.

To surmise: something else was needed, so let happened.

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jun 6, 2014

Collaborator

I think I misunderstood this on Wednesday and wrote some incorrect things above. let is, in fact, overloaded to dictate immutability for value types (ie. structs, which includes "primitives" like Int). It means no such thing for reference types. This makes intuitive sense and allows you to use value types as you'd expect. Arrays are an unusual special case, since they're also a struct.

I am hoping some more experimentation and examining the Array struct definition in detail (⌘-click on the keyword Array to go to its definition in Xcode) might be informative.

I'll write more about this and array semantics this weekend, after the conference :)

Collaborator

cdzombak commented Jun 6, 2014

I think I misunderstood this on Wednesday and wrote some incorrect things above. let is, in fact, overloaded to dictate immutability for value types (ie. structs, which includes "primitives" like Int). It means no such thing for reference types. This makes intuitive sense and allows you to use value types as you'd expect. Arrays are an unusual special case, since they're also a struct.

I am hoping some more experimentation and examining the Array struct definition in detail (⌘-click on the keyword Array to go to its definition in Xcode) might be informative.

I'll write more about this and array semantics this weekend, after the conference :)

@apsanz

This comment has been minimized.

Show comment
Hide comment
@apsanz

apsanz Jun 9, 2014

I still don't understand why they decided on such a confusing half way immutable behavior. If it should be mutable, make it behave like a reference type all the time and forget about the copy on write stuff that makes it seem like a value type. If you want to make it seem like a value type go all the way. If it is a "constant" array just block reassign values. I don't see what the performance penalty of that would be? Seems like it could just a compile time check.

apsanz commented Jun 9, 2014

I still don't understand why they decided on such a confusing half way immutable behavior. If it should be mutable, make it behave like a reference type all the time and forget about the copy on write stuff that makes it seem like a value type. If you want to make it seem like a value type go all the way. If it is a "constant" array just block reassign values. I don't see what the performance penalty of that would be? Seems like it could just a compile time check.

@emovla

This comment has been minimized.

Show comment
Hide comment
@emovla

emovla Jun 9, 2014

I believe the reason for allowing element modification without doing a copy-on-write is an optimization for the multithreaded world, when otherwise doing a lock or some sort of synchronization on each element access would be very expensive.

(AFAIK that's why copy on write was abandoned for c++ strings for example.)

emovla commented Jun 9, 2014

I believe the reason for allowing element modification without doing a copy-on-write is an optimization for the multithreaded world, when otherwise doing a lock or some sort of synchronization on each element access would be very expensive.

(AFAIK that's why copy on write was abandoned for c++ strings for example.)

@apsanz

This comment has been minimized.

Show comment
Hide comment
@apsanz

apsanz Jun 10, 2014

I am fine with not doing a copy on write but swift seems to already do it, at least in cases which change the size of the array. Not sure why you would even bother with copy on write for a few cases. It seems like it will cause subtle bugs.

Just allowing the var behavior to be the same, I don't see why you can't just block reassignment for let constant arrays. Real constant arrays will be much more useful in a multithreaded world.

apsanz commented Jun 10, 2014

I am fine with not doing a copy on write but swift seems to already do it, at least in cases which change the size of the array. Not sure why you would even bother with copy on write for a few cases. It seems like it will cause subtle bugs.

Just allowing the var behavior to be the same, I don't see why you can't just block reassignment for let constant arrays. Real constant arrays will be much more useful in a multithreaded world.

@emovla

This comment has been minimized.

Show comment
Hide comment
@emovla

emovla Jun 10, 2014

Well imagine you assign an array from a var a let. (assuming copy on write won't be done per element access).

To guarantee complete immutability they would have to copy the array (bad).
So then if they block the reassignment, that would mean they'd have to say something like "well you can't modify elements, but someone else can sneak a modification behind your back".

(In the Apple's forums people complain you can't have immutable array as you can in ObjectiveC, but in ObjC anyone can sneak in a NSMutableArray instead of a NSArray, so the situation is similar wrt to API's with array/string params. You still need to make a copy if you want to be sure no one else changes your array.)

emovla commented Jun 10, 2014

Well imagine you assign an array from a var a let. (assuming copy on write won't be done per element access).

To guarantee complete immutability they would have to copy the array (bad).
So then if they block the reassignment, that would mean they'd have to say something like "well you can't modify elements, but someone else can sneak a modification behind your back".

(In the Apple's forums people complain you can't have immutable array as you can in ObjectiveC, but in ObjC anyone can sneak in a NSMutableArray instead of a NSArray, so the situation is similar wrt to API's with array/string params. You still need to make a copy if you want to be sure no one else changes your array.)

@apsanz

This comment has been minimized.

Show comment
Hide comment
@apsanz

apsanz Jun 10, 2014

Personally, I don't see the problem with defensively copying when changing from var to let. They are decided to do this with every size changing operation already. The issue is if the "if let" pattern might make it more common. Thanks for explaining this because it's the only was this behavior makes some sense.

apsanz commented Jun 10, 2014

Personally, I don't see the problem with defensively copying when changing from var to let. They are decided to do this with every size changing operation already. The issue is if the "if let" pattern might make it more common. Thanks for explaining this because it's the only was this behavior makes some sense.

@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone

andrewsardone Jun 26, 2014

Owner

I've been preoccupied with other things, but I'm just catching up on this thread. Thanks for all the insight!

There's still a lot of confusion, but for what it's worth it sounds like Chris Lattner and crew will be improving these Array semantics:

I can confirm that array semantics are going to change significantly in later seeds, to be more similar to dictionary and strings.

I guess this info is on ice until a later seed.

Owner

andrewsardone commented Jun 26, 2014

I've been preoccupied with other things, but I'm just catching up on this thread. Thanks for all the insight!

There's still a lot of confusion, but for what it's worth it sounds like Chris Lattner and crew will be improving these Array semantics:

I can confirm that array semantics are going to change significantly in later seeds, to be more similar to dictionary and strings.

I guess this info is on ice until a later seed.

@roop

This comment has been minimized.

Show comment
Hide comment
@roop

roop Jun 27, 2014

I think I mostly understand what's happening. Explained in detail in this blog post: Arrays in Swift Beta 2 - Problem, Solution and Workaround.

roop commented Jun 27, 2014

I think I mostly understand what's happening. Explained in detail in this blog post: Arrays in Swift Beta 2 - Problem, Solution and Workaround.

@andrewsardone

This comment has been minimized.

Show comment
Hide comment
@andrewsardone

andrewsardone Jun 29, 2014

Owner

Thanks for sharing, @roop! I’ll give this a read.

Owner

andrewsardone commented Jun 29, 2014

Thanks for sharing, @roop! I’ll give this a read.

@KevinVitale

This comment has been minimized.

Show comment
Hide comment
@KevinVitale

KevinVitale Jul 7, 2014

Collaborator

Resolved in Swift Beta 3:

Array in Swift has been completely redesigned to have full value semantics like Dictionary and String have always had in Swift. This resolves various mutability problems – now a 'let' array is completely immutable, and a 'var' array is completely mutable – composes properly with Dictionary and String, and solves other deeper problems. Value semantics may be surprising if you are used to NSArray or C arrays: a copy of the array now produces a full and independent copy of all of the elements using an efficient lazy copy implementation. This is a major change for Array, and there are still some performance issues to be addressed.

Collaborator

KevinVitale commented Jul 7, 2014

Resolved in Swift Beta 3:

Array in Swift has been completely redesigned to have full value semantics like Dictionary and String have always had in Swift. This resolves various mutability problems – now a 'let' array is completely immutable, and a 'var' array is completely mutable – composes properly with Dictionary and String, and solves other deeper problems. Value semantics may be surprising if you are used to NSArray or C arrays: a copy of the array now produces a full and independent copy of all of the elements using an efficient lazy copy implementation. This is a major change for Array, and there are still some performance issues to be addressed.

@cdzombak

This comment has been minimized.

Show comment
Hide comment
@cdzombak

cdzombak Jul 7, 2014

Collaborator

In light of this fix, any objections to closing this issue?

Collaborator

cdzombak commented Jul 7, 2014

In light of this fix, any objections to closing this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment