Maps should take into account the `==` operator if available instead of pointer comparison #308

dumblob · 2014-10-29T14:27:36Z

load time import time
m=(map<time.DateTime, any>){->}
x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)

false
= none

load time import time
m=(map<int, any>){->}
x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)
m[x.value] = 5
if (y.value in m)
  io.writeln(true)
else
 io.writeln(false)

true
= none

Also I'm not sure about blind pointer comparison for all non-primitive data. I'd probably disable the default fallback to pointer comparison for data which Dao doesn't know anything about (i.e. those, which don't have any routine for the == operator defined).

The text was updated successfully, but these errors were encountered:

Night-walker · 2014-10-29T15:39:33Z

== operator is not sufficient for that, as it doesn't tell how to hash the given value or compare it for being lesser/greater then other values. Given the dual nature of Dao maps (somehow, dualism is a typical case for Dao), it is problematic to define requirements for a class/type to be compatible with map.

Comparison with < and <= in most cases don't make real sense (io::Stream? fs::Entry? xml::Element?). But if you don't define them, then you won't be able to use your class with any map (since it's impossible to distinguish hashes and trees by the type). And you yourself cannot implement comparison and hashing in terms of DaoValue pointers in pure Dao.

So disabling "fallback to pointer comparison" would essentially deny the use of almost all non-built-in types/classes with maps. It doesn't look like a bright prospect.

If you really want to store certain values instead of references, you better just serialize class instances into numeric or string data, just as you did in your second example. It should prove to be a far less troublesome approach.

dumblob · 2014-10-29T15:53:48Z

Yes, you're right, but roughly said, the first example (without serialization) should fail or warn the user about it in cases, where the output won't be what the user expects. Because I have no idea how to detect it, I just proposed disabling the fallback for such types while being aware of the edge cases in which one needs to use some weird data (i.e. those where < and <= doesn't make sense) for map keys.

dumblob · 2014-10-29T15:58:20Z

The current behavior is very extremely error prone (yes, I admit that this took me almost an hour to figure it out in a code shuffling with such map in different places).

Night-walker · 2014-10-29T16:27:47Z

Yes, you're right, but roughly said, the first example (without serialization) should fail or warn the user about it in cases, where the output won't be what the user expects.

Personally, I find the way it currently works very simple and clear. We're not on Java/.NET, after all.

I just proposed disabling the fallback for such types while being aware of the edge cases in which one needs to use some weird data

It's not an edge case. There is plenty of use cases for storing non-comparable objects (and tuples with these objects, and lists, and etc.) in maps. It is highly impractical to discard all that. The approach from languages with (more or less) unified type systems is arguably ill-suited here.

The current behavior is very extremely error prone (yes, I admit that this took me almost an hour to figure it out in a code shuffling with such map in different places).

I don't think so, not for Dao at least. A counter-example: a map reports that it contains value you never actually put there because of overloaded comparison implementation. Even better, what to do if inner data of the object changes? It should then imply a different place in a hash or tree! How do you propose to solve that?

dumblob · 2014-10-29T16:58:09Z

It's not an edge case. There is plenty of use cases for storing non-comparable objects (and tuples with these objects, and lists, and etc.) in maps. It is highly impractical to discard all that. The approach from languages with (more or less) unified type systems is arguably ill-suited here.

Are you really sure, that you'd like to store tuples with these objects, and lists, and etc. as keys in a map? Well, why not, but then requiring that these keys are invar makes perfect sense to me as the bunch of data represents a unique combination.

I don't think so, not for Dao at least. A counter-example: a map reports that it contains value you never actually put there because of overloaded comparison implementation. Even better, what to do if inner data of the object changes? It should then imply a different place in a hash or tree! How do you propose to solve that?

We have the magic invar, don't we? It could be forced for those as mentioned above.

Btw, the first example above was initially

load time import time
m={->}
x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)

but because of the recent bug with any in map keys, I've added the casting (map<time.DateTime, any>) which makes it explicit and let's say at least a bit clearer. I'm pretty sure though, that this piece of code should somehow be forbidden or work out-of-box like expected. I'd be perfectly fine with writing:

load time import time
m={->}
invar x=time.make(2014, 1, 1)
invar y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)

Night-walker · 2014-10-29T18:44:07Z

Are you really sure, that you'd like to store tuples with these objects, and lists, and etc. as keys in a map?

Why not? I don't want to care about what can be put in a map, and what cannot. I may want to associate some data with some objects, and map is a natural choice regardless of what those objects are.

Well, why not, but then requiring that these keys are invar makes perfect sense to me as the bunch of data represents a unique combination.

It simply cannot be assured. There is no way to request an object which cannot be changed in any other context, no invar will do that.

We have the magic invar, don't we? It could be forced for those as mentioned above.

As I said, it can't. It just guarantees local immutability.

Also note that (as you like things running quickly :) ) using Dao-level functions for comparison will result in significant overhead during any operation on maps. Instead of calling C function which does switch and e.g. return x.data < y.data? -1 : (x.data > y.data? 1 : 0) there will be practically normal Dao routine call with lots of related manipulations plus execution of the code in the routine body. And generally this will happen multiple times, depending on map structure.

dumblob · 2014-10-29T19:23:47Z

As I said, it can't. It just guarantees local immutability.

Also note that (as you like things running quickly :) ) using Dao-level functions for comparison will result in significant overhead during any operation on maps. Instead of calling C function which does switch and e.g. return x.data < y.data? -1 : (x.data > y.data? 1 : 0) there will be practically normal Dao routine call with lots of related manipulations plus execution of the code in the routine body. And generally this will happen multiple times, depending on map structure.

I was fully aware of all this when I proposed it. I never ever want to experience code like

load time import time
m={->}
x=something_returning_some_type(...)  # might be also a variant type
y=something_returning_some_type(...)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)

to return false. It's absolutely opaque as you don't know which branch will be chosen without looking at the return type of something_returning_some_type() which might also be any. It's just a nonse that if (m[y]) ... succeeds without throwing any exception, but if (y in m) ... will evaluate to false (note that these two are very often used together). I want always be absolutely sure how the in operator will behave the same even without a deep knowledge about the type of the left operand. And I'm talking about deep knowledge as the type might be variant, might be a renamed/aliased primitive type etc.

Any ideas how to prevent this discrepancy?

Night-walker · 2014-10-29T20:03:12Z

I was fully aware of all this when I proposed it. I never ever want to experience code like

load time import time
m={->}
x=something_returning_some_type(...) # might be also a variant type
y=something_returning_some_type(...)
m[x] = 5
if (y in m)
io.writeln(true)
else
io.writeln(false)
to return false.

It can't possibly be achieved universally. If you construct two values in the same way, it doesn't mean they are equal. You can't always rely on it even for time.make() (what if there is os.setenv('TZ=...')). Moreover, if objects are put in a map using some potentially non-unique field of those objects, you can't really store objects -- you store their unique field values. It may be just as confusing and error-prone.

It's just a nonse that if (m[y]) ... succeeds without throwing any exception, but if (y in m) ... will evaluate to false (note that these two are very often used together)

Not sure what you mean.

(dao) f = io::Stream()
= Stream[02437CD8]
(dao) m = {->}
= { -> }
(dao) f in m
= false
(dao) m[f]
[[Error::Key]] --- Invalid key:

In code snippet:
      1 :  GETVG       :     0 ,     1 ,     1 ;     1;   f
>>    2 :  GETI        :     0 ,     1 ,     2 ;     1;   m[f]
      3 :  RETURN      :     2 ,     1 ,     0 ;     1;   m[f]
Raised by:  __main__(), at instruction 2 in line 1 in file "interactive codes";

Everything's fine as far as I can see.

I want always be absolutely sure how the in operator will behave the same even without a deep knowledge about the type of the left operand.

With the current approach, everything is as plain and explicit as it can possibly be. For map operations, any data is treated by its value. For an object, its value is, naturally, the object itself. Not some hidden data within it, and not something returned by the method which you cannot check.

dumblob · 2014-10-31T09:46:43Z

Not sure what you mean.

I'm right now away from my computer, but IIRC, it was something with any:

f = io::Stream()
f2 = io::Stream()
m = {->}
f in m
f2 in m
m[f] = 5
f in m
f2 in m
m[f]
m[f2]

Anyway, the whole problem is exactly what you've described - how to distinguish by syntax, that we're dealing with object (i.e. pointer) or with a primitive type in the map key. It's similar to passing data to routines which is explicit/obvious (because you can't simply work with the reference directly in Dao, you have to use some object interface like methods or overloaded operators etc.), but with map keys it's hidden :(

Night-walker · 2014-10-31T11:39:45Z

I don't see any problem. There is no real necessity to distinguish primitive and non-primitive data here, as it is handled in simple and uniform way. There is no hidden values or methods which act behind the scene when interacting with a map, everything's clear and predictable.

You just have to keep in mind that DateTime, BigInt, etc. are not scalar values. That won't change simply because of some ad-hoc handling for maps etc. The only good option I see is value classes which mimic primitive (scalar) values. We already touched this topic, and such feature was deemed not worth the efforts required to provide it.

dumblob · 2014-10-31T14:29:48Z

We already touched this topic, and such feature was deemed not worth the efforts required to provide it.

Hm, I had to miss this discussion or I simply forgot it :( . Can you please point me there? I was rather thinking that this is actually quite similar problem to what we were discussing in #263 about implicit/explicit references. Either way, it seems we should at least unite somehow an interface for serialization of non-primitive data (e.g. by writing it to documentation).

Btw the problem with any is really there (it's the one from #306):

(dao) f = io::Stream()
= Stream[0x203bb40]
(dao) f2 = io::Stream()
= Stream[0x2188e40]
(dao) m = {->}
= { -> }
(dao) f in m
= false
(dao) f2 in m
= false
(dao) m[f] = 5
= 5
(dao) f in m
= false                            # nonsense!
(dao) f2 in m
= false
(dao) m[f]
= 5                                # a proof that (f in m) returning false is a nonsense
(dao) m[f2]
[[Error::Key]] --- Invalid key:

In code snippet:
      1 :  GETVG       :     0 ,     3 ,     1 ;     1;   f2
>>    2 :  GETI        :     0 ,     1 ,     2 ;     1;   m[f2]
      3 :  RETURN      :     2 ,     1 ,     0 ;     1;   m[f2]
Raised by:  __main__(), at instruction 2 in line 1 in file "interactive codes";

Night-walker · 2014-10-31T16:42:59Z

From here.

I have considered this before (I even left some comments regarding this in source long time ago). But this is probably not a good idea for Dao instances. For C data types, it may be OK to support this, as you mentioned some of them are essentially scalars. I actually have been considering this for bigint. But there may be too many places that require changing, probably better not to do it (for now at least).

Btw the problem with any is really there

That's just a bug.

dumblob · 2014-10-31T17:38:21Z

From here.

Thank you. It seems, it's still an open question for the future. At least this issue I raised proves that the current state will be painful.

Night-walker · 2014-10-31T17:59:29Z

Well, I wouldn't call this a dire issue: when an object can be uniquely identified by a scalar value, you may (but don't have to) use that value instead of the object itself as a key.

I doubt using overloaded operators is a good idea here anyway. I suppose only a noticeable conceptual shift from reference-based objects to value-based ones could provide ground for the behavior you want. But that's an extra layer of complexity, so I'm not sure it's a good idea either.

dumblob · 2014-10-31T18:23:12Z

Considering that there'll be quite a large amount of wrappers and scalar-like objects (C data types), it doesn't sound that futile to me.

About the extra layer of complexity, it doesn't look that bad. Of course the devil is in the detail, but in general the problem is not that much about implementation, but rather about all the particular decisions where a reference should be used and where a value.

dumblob · 2014-11-04T07:49:23Z

I woke up today and was thinking about adding something like scalar<> wrapping type which would enforce the needed interface on objects when used. The usage could then look like:

i = BigInt(5)
m1: map<@K, @V> = {->}
m2: map<scalar<@K>, @V> = {->}
m1[i] = 'abc'
m2[i] = 'def'
i += 4
io.writeln(m1[i])  # { BigInt<0x...> -> "abc" }
io.writeln(m2[i])  # error, because the key is missing

This wouldn't change the simplicity and semantics of the current approach each object is treated as pointer, would retain a seamless-treatment for existing scalar types (as those would be compatible with scalar<> out-of-box), but allow a very simple and transparent compile-time check for the value-like behavior. Also the implementation should be quite straightforward as we already have similar "wrap" types.

Night-walker · 2014-11-04T19:22:39Z

The thing is, it does not resolve this situation by itself. If e.g. BigInt does not define specific "map interface", this scalar will be of no use other then raising compile-time error. And if BigInt does provide some special identification/comparison means, scalar should simply be redundant, only making it all more complex and variational.

If any change is to take place (of which I am not certain), I think it should be on the side of the class/type in order to ensure simple and intuitive behavior in all cases.

dumblob · 2014-11-05T08:28:52Z

If e.g. BigInt does not define specific "map interface", this scalar will be of no use other then raising compile-time error.

That's the goal. Btw I wouldn't call it "map interface" as it will have more use cases not less important than map (serialization of such scalar-like objects is very common - in case of scalar<>, it would look like (scalar<@T>)my_bigint_number).

And if BigInt does provide some special identification/comparison means, scalar should simply be redundant, only making it all more complex and variational.

Why redundant? We want to work with pointers (as it's simple and fast), but in certain cases we want both - pointer and scalar-like handling (depending on the situation which is not known at the time the class/type is defined).

daokoder · 2014-11-05T16:25:10Z

I don't think so, not for Dao at least. A counter-example: a map reports that it contains value you never actually put there because of overloaded comparison implementation. Even better, what to do if inner data of the object changes? It should then imply a different place in a hash or tree! How do you propose to solve that?

Right, this is the real issue of supporting user defined comparison for map keys. As @Night-walker also pointed out, invar cannot solve the problem here. I agree with @Night-walker that, pointer comparison is the only correct way to do for map keys, unless the key objects are truly immutable.

I was fully aware of all this when I proposed it. I never ever want to experience code like
load time import time
m={->}
x=something_returning_some_type(...)  # might be also a variant type
y=something_returning_some_type(...)
m[x] = 5
if (y in m)
 io.writeln(true)
else
 io.writeln(false)
to return false. It's absolutely opaque as you don't know which branch will be chosen without looking at the return type of something_returning_some_type() which might also be any. It's just a nonse that if (m[y]) ... succeeds without throwing any exception, but if (y in m) ... will evaluate to false (note that these two are very often used together). I want always be absolutely sure how the in operator will behave the same even without a deep knowledge about the type of the left operand. And I'm talking about deep knowledge as the type might be variant, might be a renamed/aliased primitive type etc.

Any ideas how to prevent this discrepancy?

The only solution I can think of is to make the type that you want to behavior as you described immutable, and make its object unique with respect to its data. So for example, for DateTime, the type could be implemented such that, each DateTime object will corresponds to a unique time value. So,

x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)

will always return the same object. This way DateTime can be compared as pointers in map keys. If the type provides no method for users to modify its objects, user defined comparisons could also be supported. For user defined scalar-like C data types such BigInt and DateTime, such comparison can be naturally implemented as C functions for efficiency.

It should be pointed out that, there is no way to do it similarly for Dao class types, as they cannot be made truly immutable (again invar cannot fully guarantee this). But I don't see any issue for not supporting it for Dao class types.

Night-walker · 2014-11-05T17:42:04Z

The only solution I can think of is to make the type that you want to behavior as you described immutable, and make its object unique with respect to its data. So for example, for DateTime, the type could be implemented such that, each DateTime object will corresponds to a unique time value.

That's a nice and simple solution, albeit it should still be implemented on the user's side, as always keeping a hash of possibly unlimited size behind the scene is probably unreasonable.

If the type provides no method for users to modify its objects, user defined comparisons could also be supported. For user defined scalar-like C data types such BigInt and DateTime, such comparison can be naturally implemented as C functions for efficiency.

If objects are treated similar to scalar values when used in maps, it may make sense to extend this behavior onto other cases like assignment/passing to routine. Otherwise there will be an inconsistency. I think it is simpler to reason about the behavior of different data when you can draw a strict line between scalar-like values and reference-based objects. If not, it seems better to leave things simple.

daokoder · 2014-11-05T18:00:20Z

it should still be implemented on the user's side

Yes, this was what I had in mind.

If objects are treated similar to scalar values when used in maps, it may make sense to extend this behavior onto other cases like assignment/passing to routine.

This is not an issue, because the object/pointer and the data is one-to-one related, so there needs no data copying or any special treatment other than pointer assignment/copying.

dumblob · 2014-11-05T18:01:14Z

I think it is simpler to reason about the behavior of different data when you can draw a strict line between scalar-like values and reference-based objects. If not, it seems better to leave things simple.

I definitely agree, but those cases near this strict line should be covered by some mechanism like the proposed scalar<>. This way we would avoid changes in assignment/passing to routines etc. How it will be implemented is another issue (btw the idea of a hash table is a nice one and should scale pretty well).

Night-walker · 2014-11-06T14:33:29Z

One way or another, I am against ad-hoc mechanisms which break the conceptual meaning of data one operates on. That is, when behavior differs (conceptually) depending on the context.

If e.g. DateTime is treated as an (opaque) object when doing assignment/passing, and as a scalar time_t value when referring to a map<DateTime, ...>, that's inconsistent, confusing and error-prone. DateTime should always be treated either as an object or as a value, so that you can safely and easily abstract away from its technical side.

I definitely agree, but those cases near this strict line should be covered by some mechanism like the proposed scalar<>.

There should not be any edge cases, exceptions or magical transmutation wands like scalar<>. Either a type represents a scalar value, or it is an opaque object. That, I believe, is the only way of not making a mess of all this.

dumblob · 2014-11-06T15:05:39Z

There should not be any edge cases, exceptions or magical transmutation wands like scalar<>.

Why magical transmutation wands? It's explicit in all cases I can think of and doesn't mess anything up. It's like specifying an interface ScalarInterface (whose methods are private/accessible_only_using scalar<>) instead of scalar<MyScalarLikeType>.

Either a type represents a scalar value, or it is an opaque object.

That would make sense if we had a simple mechanism how to define both scalar and non-scalar types. Currently we can define only classes or compound types whereas both are always non-scalar (which is a sane default choice).

Night-walker · 2014-11-06T15:36:31Z

Why magical transmutation wands? It's explicit in all cases I can think of and doesn't mess anything up. It's like specifying an interface ScalarInterface (whose methods are private/accessible_only_using scalar<>) instead of scalar.

It's magical because it de-facto turns an object into scalar value in certain local context. I consider this to be too hackish.

That would make sense if we had a simple mechanism how to define both scalar and non-scalar types. Currently we can define only classes or compound types whereas both are always non-scalar (which is a sane default choice).

If you want 1:1 correspondence of object reference and underlying value (like in the example with DateTime), just use a custom constructor routine providing you with flyweight/unique objects by using a hash. At least it's more clear and predictable then ad-hoc hacks which make the whole meaning of object (class instance) vague and its behavior unclear.

dumblob · 2014-11-06T15:42:52Z

just use a custom constructor routine

That means also custom type which is not exactly what would one expect to do with scalar-like objects (especially those provided in official dao-modules).

daokoder · 2014-11-06T16:12:35Z

Supporting things like scalar<> would pull in several other things/issues about complications and overheads. For instance, there will be need for supporting customized copying, and such copying may need to be invoked every time an object is move or assigned to a variable with type scalar<x>. The overhead associated with this would be unpredictable. And the use of scalar<x> on types that do not support clean copying (or copying not done right) would have unpredictable consequences.

I think the approach I proposed is more clean and predictable. Also it should have made scalar<> redundant. It may be even unnecessary to create 1:1 correspondence between objects and values/data, if customized comparison is supported. In other words, creating fully immutable types with customized comparisons seem to be the right solution for this.

dumblob · 2014-11-06T16:52:52Z

In other words, creating fully immutable types with customized comparisons seem to be the right solution for this.

Yes, but I'm scared about classes. The approach you outlined in #308 (comment) assumes that one will never need to create a scalar-like object in Dao, but only in C. I agree, that it's the simpliest, least-problematic and fastest solution, but I'm not sure if it's sufficient (considering the huge amount of classes I've seen in Java with implemented method .equal()). I only hope, that such scalar-like objects will be needed in most cases only as bindings or wrappers over existing libraries and thus there won't be an need to define them in Dao.

dumblob · 2014-11-06T17:24:58Z

It might not be indeed a big issue in Dao, but for the new system programming language there will be needed a more comprehensive solution as construction of scalar-like objects will need to be done directly in the language.

Night-walker · 2014-11-06T21:13:41Z

Yes, but I'm scared about classes. The approach you outlined in #308 (comment) assumes that one will never need to create a scalar-like object in Dao, but only in C. I agree, that it's the simpliest, least-problematic and fastest solution, but I'm not sure if it's sufficient (considering the huge amount of classes I've seen in Java with implemented method .equal()). I only hope, that such scalar-like objects will be needed in most cases only as bindings or wrappers over existing libraries and thus there won't be an need to define them in Dao.

If there is need to create scalar-like objects in C, it exists for Dao as well. Simply because implementing everything in C is impractical. Dao should better provide sufficient and robust means for expanding its domain, which assures it can easily be extended by its users without knowing C and DaoVM API. It would be good to avoid further divergence of wrapped types and Dao classes.

Night-walker · 2014-11-10T15:43:29Z

Of course I trust myself. But without the guarantee provide by the rule, how can we "boast" that we have true immutability in Dao :)

I doubt it is that important. Paying more attention to thread-safety is of more significance, I think: C-written code (even for a method marked by invar) may easily not be re-entrant. That can cause much more problems then mutating hash keys (radioactive :) ), which are rather unlikely to be created incidentally.

The rule is actually extremely simple, if you have a look at the code for this rule:)

OK, but writing code to conform to this rule will not always be simple and straightforward. I am sure it will often take more time and efforts then adding some invars to a non-invar class. I don't feel enthusiastic to play such games, to bypass crude restrictions via a sub-optimal patch-code (with respect for simplicity, readability, performance) -- that may simply become absurd. For the cases of Dao-written immutable classes I can think of, it is simpler to not use such invar at all rather then facing the necessity to adapt the code to conform to it.

dumblob · 2014-11-10T17:15:33Z

I am sure it will often take more time and efforts then adding some invars to a non-invar class.

This is what I'm afraid of the most. Already now, I'm feeling a bit reluctant to get into the hassle of invar class just to increase the 99% immutability (achievable by careful class implementation and using invar wherever possible) to 100%.

Night-walker · 2014-11-10T18:27:00Z

This is what I'm afraid of the most. Already now, I'm feeling a bit reluctant to get into the hassle of invar class just to increase the 99% immutability (achievable by careful class implementation and using invar wherever possible) to 100%.

Yes. invar methods, for instance, are mainly useful not because they prevent modification of class data within themselves, but because the user of your class will not be able to call anything aside them for invar class instances, and because the user knows these methods don't touch class instance's data. From this point of view, these sophisticated rules for invar class constructor stand for no real purpose other then self-control, which is redundant unless it's late Friday and you're drunk and half-sleeping.

I don't think "100% immutable!!!" label on a class changes a lot. It does not guarantee you can absolutely safely access it concurrently without syncing, or serialize it and deserialize back without disrupting anything, for instance. For me it is sufficient that the class is designed as immutable -- that is, it provides only invar-based interface. Using invar class may just make this a bit simpler and clearer -- but those draconian constructor restrictions doesn't add much value in this regard.

Night-walker · 2014-11-11T06:32:16Z

Finally, there may be cases where you actually need to 'leak' some data outside in the constructor. For instance, we already concluded that it's perfectly reasonable to create hashes of object in order to avoid creation of duplicates. Obviously, one then need to consult such hash inside the constructor, which will not be possible with the current implementation. There may be many cases like this one.

At the same time, these constructor rules still cannot ensure logical immutability. I can easily store an ID of primitive kind in a class instance, and then fetch some data through that ID making it look like the instance actually contains this data. Will it always be the same? No one can say. Is this immutability? Well, not really.

There can't be 100%-auto-assured actual immutability, and I don't think it matters much. invar for a class should be a design hint, not a jail.

daokoder · 2014-11-11T07:44:26Z

Now I disabled the constructor restriction. We might add something in the future if we can find a clever and non-obstructive way to do it. For now, the restriction is too much.

At the same time, these constructor rules still cannot ensure logical immutability. I can easily store an ID of primitive kind in a class instance, and then fetch some data through that ID making it look like the instance actually contains this data. Will it always be the same? No one can say. Is this immutability? Well, not really.

I think for all our discussions, "immutability" was meant for the objects themselves, not anything else logically associated with it. Anyway, I think we can conclude our discussion on the immutability now :)

Night-walker · 2014-11-11T07:57:20Z

OK, good :)

Now, the last thing troubling me is implicit conversion of var to invar. Isn't it better to just oblige the use of invar for class fields? I can hardly imagine myself using var for invar fields in any case, that's definitely counter-intuitive and confusing. Implicit addition of invar is fine, but discarding explicit var is a bit too much. Even if a class was not initially written as invar, replacing var fields with invar is a matter of seconds, and I would still do that even though it's technically redundant because readability and consistency are more important then saving few keystrokes.

daokoder · 2014-11-11T08:08:14Z

Now, the last thing troubling me is implicit conversion of var to invar. Isn't it better to just oblige the use of invar for class fields?

I have decided to make the change, but forgot about it in the last commit. Thanks for reminding.

daokoder · 2014-11-11T16:07:49Z

Now, the last thing troubling me is implicit conversion of var to invar. Isn't it better to just oblige the use of invar for class fields?
I have decided to make the change, but forgot about it in the last commit. Thanks for reminding.

Done.

Also, now I have added support for user defined comparisons by using overloaded (pseudo) operator <=>. Such comparison is used in map keys and most other places. For hashing, I am using overloaded casting (int)( hashing = false ).

Now I feel we don't have to try hard to ensure absolute consistency in map key comparisons. Because for user defined type it is just not possible, as @Night-walker has mentioned the following possibility:

I can easily store an ID of primitive kind in a class instance, and then fetch some data through that ID making it look like the instance actually contains this data.

Another reason is that, we cannot enforce referential transparency on user defined comparisons (well, we could use the same rule previously used for constructors, but it is too obtrusive). So we have to trust users on this. I am also considering to trust users for other key types such as tuples.

Night-walker · 2014-11-11T19:32:26Z

I only wonder why hashing = false for something meant exactly for hashing :)

Overall, everything's nice and clear now; I will definitely make use of invar classes as well as the new operators.

Night-walker · 2014-11-11T19:57:33Z

I only wonder why hashing = false for something meant exactly for hashing :)

I guess it's because such operator can behave both as usual cast operator (so hashing is implicitly false), and as hashed value generator (for which hashing can be viewed as true).

Night-walker · 2014-11-11T20:18:51Z

Changed DateTime to invar type.

Night-walker · 2014-11-11T20:40:55Z

BTW, it may make sense to even allow to pass instances of invar classes with separate invar<@T> attribute to non-invar parameters and variables, just like it's allowed for strings, for example. Solely for simplicity; it may not be considered violation of immutability.

daokoder · 2014-11-12T15:07:07Z

I guess it's because such operator can behave both as usual cast operator (so hashing is implicitly false), and as hashed value generator (for which hashing can be viewed as true).

That's the idea.

BTW, it may make sense to even allow to pass instances of invar classes with separate invar@T attribute to non-invar parameters and variables, just like it's allowed for strings, for example.

Right, this should be more convenient.

dumblob · 2014-11-12T17:49:33Z

Shouldn't be the artificial operator <=> overloaded routine be picked up if it exists and there is no == overloaded routine?

invar class C {
  invar x
  routine C(){ x = 5 }
  routine <=>(invar self: C, invar other: C){
    x == other.x ? 0 : (x < other.x) ? -1 : 1
  }
}
a = C()
b = C()
io.writeln('eq', a == b)

eq false

With defined == routine:

invar class C {
  invar x
  routine C(){ x = 5 }
  routine <=>(invar self: C, invar other: C){
    x == other.x ? 0 : (x < other.x) ? -1 : 1
  }
  routine ==(invar self: C, invar other: C){ x == other.x }
}
a = C()
b = C()
io.writeln('eq', a == b)

eq true

Btw, there is missing argument in the <=> routine signalling which type of comparison is asked (either < > or ==).

dumblob · 2014-11-12T19:03:25Z

Also only one casting routine with hashing = true should be alllowed in one class.

daokoder · 2014-11-13T04:45:22Z

Shouldn't be the artificial operator <=> overloaded routine be picked up if it exists and there is no == overloaded routine?

I have consider this, but there is a minor issue: the result of <=>() cannot be directly used for ==, so supporting it cannot be as simple as pushing the call the stack, which is currently how overloaded operators are called.

Btw, there is missing argument in the <=> routine signalling which type of comparison is asked (either < > or ==).

Right, I forgot about this. This should make it more convenient to use <=> for other comparisons.

Also only one casting routine with hashing = true should be allowed in one class.

There is really no need to enforce this, since it really harmless. It would be awkward to enforce such thing by checking the parameter names and types, and raising errors saying that certain parameters cannot be used with certain methods. And consider that there are many other cases such enforcing may sound "reasonable" (consider other operator overloading), it would just become a mess.

Night-walker · 2014-11-13T06:03:58Z

Then perhaps it's time to drop overloading of all other comparison operators other then combined <=> in order to make the handling of user-defined comparison transparent.

Also, I wonder what should be done when a class is comparable for equality, but not for greater/lesser relation. It means that plain pointer comparison should be used instead, but how to tell DaoVM that your <=> cannot handle all the operations?

daokoder · 2014-11-13T08:54:45Z

Then perhaps it's time to drop overloading of all other comparison operators other then combined <=> in order to make the handling of user-defined comparison transparent.

Maybe.

Also, I wonder what should be done when a class is comparable for equality, but not for greater/lesser relation. It means that plain pointer comparison should be used instead, but how to tell DaoVM that your <=> cannot handle all the operations?

That's simple, simply define the following overloading:

routine <=>( other: UserType, comparison: enum<EQ> )

for equality and inequality checking only.

daokoder · 2014-11-13T11:13:29Z

After considering the all the pros and cons of using <=>, I decided to remove the use of <=>. For map keys, overloaded == and < will be used. It may be a bit less efficient, but there will be no inconsistency of any kind.

Night-walker · 2014-11-13T11:38:21Z

That would be simpler indeed.

dumblob · 2014-11-13T17:01:00Z

The surprising thing is, that the implementation of <=> was much slower:

invar class A {
  invar x: int
  routine A(val: int) { x = val }
  routine ==(invar self: A, invar other: A) {
    # make it produce similar VM code as class B
    x == other.x ? 1 : x < other.x ? 0 : 0
  }
  routine <(invar self: A, invar other: A) {
    x == other.x ? 0 : x < other.x ? 1 : 0
  }
}

invar class B {
  invar x: int
  routine B(val: int) { x = val }
  routine <=>(invar self: A, invar other: A) {
    x == other.x ? 0 : x < other.x ? -1 : 1
  }
}

var ma: map<A, int> = {=>}
var mb: map<B, int> = {=>}

var la: list<A> = {}
var lb: list<B> = {}

for (i = 1 : 10**6) {
  la.append(A(rand(i)))
  lb.append(B(rand(i)))
}

routine measure_A_for() { for (X in la) ma[X] = X.x }
routine measure_B_for() { for (X in lb) mb[X] = X.x }
routine measure_A_iter() { la.iterate { ma[X] = X.x } }
routine measure_B_iter() { lb.iterate { mb[X] = X.x } }
measure_A_for()
measure_B_for()
measure_A_iter()
measure_B_iter()

0$ dao -p del/test_map.dao

============== Program Profile (Time in Seconds) ==============
-------------------------------------------------------------------------------
Routine                                                   :    #Calls, CPU Time
-------------------------------------------------------------------------------
__main__(  )                                              :         1,    22.71
measure_B_for(  )                                         :         1,    16.13
measure_B_iter(  )                                        :         2,    10.39
measure_A_for(  )                                         :         1,     2.05
measure_A_iter(  )                                        :         2,     1.39
var<list<A>>::append( item:A, ...:@T )                    :   1000000,     0.95
var<list<B>>::append( item:B, ...:@T )                    :   1000000,     0.94
B( val:int )                                              :   1000000,     0.74
A( val:int )                                              :   1000000,     0.74
var<list<A>>::iterate( direction=enum<forw~ )             :         1,     0.00
var<list<B>>::iterate( direction=enum<forw~ )             :         1,     0.00

-------------------------------------------------------------------------------
Routine                         :                   Caller,    #Calls, CPU Time
-------------------------------------------------------------------------------
__main__()                      :                         ,         1,    22.71
measure_B_for()                 :               __main__(),         1,    16.13
measure_B_iter()                :  var<list<B>>::iterate(),         1,    10.39
                                :               __main__(),         1,     0.00
measure_A_for()                 :               __main__(),         1,     2.05
measure_A_iter()                :  var<list<A>>::iterate(),         1,     1.39
                                :               __main__(),         1,     0.00
var<list<A>>::append()          :               __main__(),   1000000,     0.95
var<list<B>>::append()          :               __main__(),   1000000,     0.94

daokoder · 2014-11-13T17:42:12Z

The surprising thing is, that the implementation of <=> was much slower:

I am afraid your test is wrong. You cannot test these two approaches in a single test, because they are not supported simultaneously.

The B class using <=> is actually using pointer comparisons, so all the keys are distinct, and there 10**6 of them. While the A class using == and < actually compares the values, and there are much less number of distinct values, so there are much less keys in the map containing instance of class A. That's why it is much faster.

dumblob · 2014-11-13T17:44:34Z

The B class using <=> is actually using pointer comparisons,

I thought so, but wasn't sure :(

dumblob · 2014-11-13T17:46:40Z

Anyway, one can nicely see that calling two methods instead of doing pointer comparison is cca 7.5x slower.

dumblob · 2014-11-13T20:56:37Z

Oh yeah. I made a copy&paste mistake in the benchmark code. Instead of routine <=>(invar self: A, invar other: A) { ... there should have been routine <=>(invar self: B, invar other: B) { .... This kind of mistakes should be checked automatically for all operator routines and should produce an error if the first argument of such routine (and possibly the others as well) doesn't match the base type (in case of inheritance, mixins etc.). The results then look very differently:

0$ dao -p del/test_map.dao 

============== Program Profile (Time in Seconds) ==============
-------------------------------------------------------------------------------
Routine                                                   :    #Calls, CPU Time
-------------------------------------------------------------------------------
measure_B_iter(  )                                        :         2,   173.47
measure_A_iter(  )                                        :         2,   169.38
measure_A_for(  )                                         :         1,   121.43
A::==( other:invar<A> )                                   : 145494630,    96.15
__main__(  )                                              :         1,    19.74
measure_B_for(  )                                         :         1,     2.34
var<list<A>>::append( item:A, ...:@T )                    :   1000000,     0.81
var<list<B>>::append( item:B, ...:@T )                    :   1000000,     0.81
B( val:int )                                              :   1000000,     0.68
A( val:int )                                              :   1000000,     0.68
var<list<A>>::iterate( direction=enum<forw~ )             :         1,     0.00
var<list<B>>::iterate( direction=enum<forw~ )             :         1,     0.00

-------------------------------------------------------------------------------
Routine                         :                   Caller,    #Calls, CPU Time
-------------------------------------------------------------------------------
measure_B_iter()                :  var<list<B>>::iterate(),         1,   173.47
                                :               __main__(),         1,     0.00
measure_A_iter()                :  var<list<A>>::iterate(),         1,   169.38
                                :               __main__(),         1,     0.00
measure_A_for()                 :               __main__(),         1,   121.43
A::==()                         :         measure_A_iter(),  76749043,    51.87
                                :          measure_A_for(),  68745587,    44.28
__main__()                      :                         ,         1,    19.74
measure_B_for()                 :               __main__(),         1,     2.34
var<list<A>>::append()          :               __main__(),   1000000,     0.81
var<list<B>>::append()          :               __main__(),   1000000,     0.81
B()                             :               __main__(),   1000000,     0.68
A()                             :               __main__(),   1000000,     0.68
var<list<A>>::iterate()         :         measure_A_iter(),         1,     0.00
var<list<B>>::iterate()         :         measure_B_iter(),         1,     0.00

daokoder · 2014-11-14T00:59:44Z

Anyway, one can nicely see that calling two methods instead of doing pointer comparison is cca 7.5x slower.

It seems you still don't understand why there is such difference in speed. Please read my comment again, where I have pointed out, the speed difference is not due to the comparison, but due to the fact the maps contain very different number of keys!

dumblob · 2014-11-14T07:39:52Z

Sorry for that, I was too tired and written a nonsense :(

daokoder · 2014-11-14T14:27:22Z

Sorry for that, I was too tired and written a nonsense :(

Just be easy, and often double check.

BTW, it may make sense to even allow to pass instances of invar classes with separate invar@T attribute to non-invar parameters and variables, just like it's allowed for strings, for example. Solely for simplicity; it may not be considered violation of immutability.

Now it is done.

daokoder · 2014-11-18T09:50:16Z

I think we can close this issue now.

daokoder closed this as completed Nov 18, 2014

Maps should take into account the == operator if available instead of pointer comparison #308

Maps should take into account the == operator if available instead of pointer comparison #308

Comments

dumblob commented Oct 29, 2014

Night-walker commented Oct 29, 2014

dumblob commented Oct 29, 2014

dumblob commented Oct 29, 2014

Night-walker commented Oct 29, 2014

dumblob commented Oct 29, 2014

Night-walker commented Oct 29, 2014

dumblob commented Oct 29, 2014

Night-walker commented Oct 29, 2014

dumblob commented Oct 31, 2014

Night-walker commented Oct 31, 2014

dumblob commented Oct 31, 2014

Night-walker commented Oct 31, 2014

dumblob commented Oct 31, 2014

Night-walker commented Oct 31, 2014

dumblob commented Oct 31, 2014

dumblob commented Nov 4, 2014

Night-walker commented Nov 4, 2014

dumblob commented Nov 5, 2014

daokoder commented Nov 5, 2014

Night-walker commented Nov 5, 2014

daokoder commented Nov 5, 2014

dumblob commented Nov 5, 2014

Night-walker commented Nov 6, 2014

dumblob commented Nov 6, 2014

Night-walker commented Nov 6, 2014

dumblob commented Nov 6, 2014

daokoder commented Nov 6, 2014

dumblob commented Nov 6, 2014

dumblob commented Nov 6, 2014

Night-walker commented Nov 6, 2014

Night-walker commented Nov 10, 2014

dumblob commented Nov 10, 2014

Night-walker commented Nov 10, 2014

Night-walker commented Nov 11, 2014

daokoder commented Nov 11, 2014

Night-walker commented Nov 11, 2014

daokoder commented Nov 11, 2014

daokoder commented Nov 11, 2014

Night-walker commented Nov 11, 2014

Night-walker commented Nov 11, 2014

Night-walker commented Nov 11, 2014

Night-walker commented Nov 11, 2014

daokoder commented Nov 12, 2014

dumblob commented Nov 12, 2014

dumblob commented Nov 12, 2014

daokoder commented Nov 13, 2014

Night-walker commented Nov 13, 2014

daokoder commented Nov 13, 2014

daokoder commented Nov 13, 2014

Night-walker commented Nov 13, 2014

dumblob commented Nov 13, 2014

daokoder commented Nov 13, 2014

dumblob commented Nov 13, 2014

dumblob commented Nov 13, 2014

dumblob commented Nov 13, 2014

daokoder commented Nov 14, 2014

dumblob commented Nov 14, 2014

daokoder commented Nov 14, 2014

daokoder commented Nov 18, 2014

Maps should take into account the `==` operator if available instead of pointer comparison #308

Maps should take into account the `==` operator if available instead of pointer comparison #308