-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maps should take into account the ==
operator if available instead of pointer comparison
#308
Comments
Comparison with So disabling "fallback to pointer comparison" would essentially deny the use of almost all non-built-in types/classes with maps. It doesn't look like a bright prospect. If you really want to store certain values instead of references, you better just serialize class instances into numeric or string data, just as you did in your second example. It should prove to be a far less troublesome approach. |
Yes, you're right, but roughly said, the first example (without serialization) should fail or warn the user about it in cases, where the output won't be what the user expects. Because I have no idea how to detect it, I just proposed disabling the fallback for such types while being aware of the edge cases in which one needs to use some weird data (i.e. those where |
The current behavior is very extremely error prone (yes, I admit that this took me almost an hour to figure it out in a code shuffling with such map in different places). |
Personally, I find the way it currently works very simple and clear. We're not on Java/.NET, after all.
It's not an edge case. There is plenty of use cases for storing non-comparable objects (and tuples with these objects, and lists, and etc.) in maps. It is highly impractical to discard all that. The approach from languages with (more or less) unified type systems is arguably ill-suited here.
I don't think so, not for Dao at least. A counter-example: a map reports that it contains value you never actually put there because of overloaded comparison implementation. Even better, what to do if inner data of the object changes? It should then imply a different place in a hash or tree! How do you propose to solve that? |
Are you really sure, that you'd like to store tuples with these objects, and lists, and etc. as keys in a map? Well, why not, but then requiring that these keys are
We have the magic Btw, the first example above was initially load time import time
m={->}
x=time.make(2014, 1, 1)
y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
io.writeln(true)
else
io.writeln(false) but because of the recent bug with load time import time
m={->}
invar x=time.make(2014, 1, 1)
invar y=time.make(2014, 1, 1)
m[x] = 5
if (y in m)
io.writeln(true)
else
io.writeln(false) |
Why not? I don't want to care about what can be put in a map, and what cannot. I may want to associate some data with some objects, and map is a natural choice regardless of what those objects are.
It simply cannot be assured. There is no way to request an object which cannot be changed in any other context, no
As I said, it can't. It just guarantees local immutability. Also note that (as you like things running quickly :) ) using Dao-level functions for comparison will result in significant overhead during any operation on maps. Instead of calling C function which does |
I was fully aware of all this when I proposed it. I never ever want to experience code like
to return Any ideas how to prevent this discrepancy? |
It can't possibly be achieved universally. If you construct two values in the same way, it doesn't mean they are equal. You can't always rely on it even for
Not sure what you mean. (dao) f = io::Stream()
= Stream[02437CD8]
(dao) m = {->}
= { -> }
(dao) f in m
= false
(dao) m[f]
[[Error::Key]] --- Invalid key:
In code snippet:
1 : GETVG : 0 , 1 , 1 ; 1; f
>> 2 : GETI : 0 , 1 , 2 ; 1; m[f]
3 : RETURN : 2 , 1 , 0 ; 1; m[f]
Raised by: __main__(), at instruction 2 in line 1 in file "interactive codes"; Everything's fine as far as I can see.
With the current approach, everything is as plain and explicit as it can possibly be. For map operations, any data is treated by its value. For an object, its value is, naturally, the object itself. Not some hidden data within it, and not something returned by the method which you cannot check. |
I'm right now away from my computer, but IIRC, it was something with f = io::Stream()
f2 = io::Stream()
m = {->}
f in m
f2 in m
m[f] = 5
f in m
f2 in m
m[f]
m[f2] Anyway, the whole problem is exactly what you've described - how to distinguish by syntax, that we're dealing with object (i.e. pointer) or with a primitive type in the map key. It's similar to passing data to routines which is explicit/obvious (because you can't simply work with the reference directly in Dao, you have to use some object interface like methods or overloaded operators etc.), but with map keys it's hidden :( |
I don't see any problem. There is no real necessity to distinguish primitive and non-primitive data here, as it is handled in simple and uniform way. There is no hidden values or methods which act behind the scene when interacting with a map, everything's clear and predictable. You just have to keep in mind that |
Hm, I had to miss this discussion or I simply forgot it :( . Can you please point me there? I was rather thinking that this is actually quite similar problem to what we were discussing in #263 about implicit/explicit references. Either way, it seems we should at least unite somehow an interface for serialization of non-primitive data (e.g. by writing it to documentation). Btw the problem with (dao) f = io::Stream()
= Stream[0x203bb40]
(dao) f2 = io::Stream()
= Stream[0x2188e40]
(dao) m = {->}
= { -> }
(dao) f in m
= false
(dao) f2 in m
= false
(dao) m[f] = 5
= 5
(dao) f in m
= false # nonsense!
(dao) f2 in m
= false
(dao) m[f]
= 5 # a proof that (f in m) returning false is a nonsense
(dao) m[f2]
[[Error::Key]] --- Invalid key:
In code snippet:
1 : GETVG : 0 , 3 , 1 ; 1; f2
>> 2 : GETI : 0 , 1 , 2 ; 1; m[f2]
3 : RETURN : 2 , 1 , 0 ; 1; m[f2]
Raised by: __main__(), at instruction 2 in line 1 in file "interactive codes"; |
From here.
That's just a bug. |
Thank you. It seems, it's still an open question for the future. At least this issue I raised proves that the current state will be painful. |
Well, I wouldn't call this a dire issue: when an object can be uniquely identified by a scalar value, you may (but don't have to) use that value instead of the object itself as a key. I doubt using overloaded operators is a good idea here anyway. I suppose only a noticeable conceptual shift from reference-based objects to value-based ones could provide ground for the behavior you want. But that's an extra layer of complexity, so I'm not sure it's a good idea either. |
Considering that there'll be quite a large amount of wrappers and scalar-like objects (C data types), it doesn't sound that futile to me. About the extra layer of complexity, it doesn't look that bad. Of course the devil is in the detail, but in general the problem is not that much about implementation, but rather about all the particular decisions where a reference should be used and where a value. |
I woke up today and was thinking about adding something like i = BigInt(5)
m1: map<@K, @V> = {->}
m2: map<scalar<@K>, @V> = {->}
m1[i] = 'abc'
m2[i] = 'def'
i += 4
io.writeln(m1[i]) # { BigInt<0x...> -> "abc" }
io.writeln(m2[i]) # error, because the key is missing This wouldn't change the simplicity and semantics of the current approach each object is treated as pointer, would retain a seamless-treatment for existing scalar types (as those would be compatible with |
The thing is, it does not resolve this situation by itself. If e.g. If any change is to take place (of which I am not certain), I think it should be on the side of the class/type in order to ensure simple and intuitive behavior in all cases. |
That's the goal. Btw I wouldn't call it "map interface" as it will have more use cases not less important than map (serialization of such scalar-like objects is very common - in case of
Why redundant? We want to work with pointers (as it's simple and fast), but in certain cases we want both - pointer and scalar-like handling (depending on the situation which is not known at the time the class/type is defined). |
Right, this is the real issue of supporting user defined comparison for map keys. As @Night-walker also pointed out,
The only solution I can think of is to make the type that you want to behavior as you described immutable, and make its object unique with respect to its data. So for example, for DateTime, the type could be implemented such that, each DateTime object will corresponds to a unique time value. So,
will always return the same object. This way DateTime can be compared as pointers in map keys. If the type provides no method for users to modify its objects, user defined comparisons could also be supported. For user defined scalar-like C data types such BigInt and DateTime, such comparison can be naturally implemented as C functions for efficiency. It should be pointed out that, there is no way to do it similarly for Dao class types, as they cannot be made truly immutable (again |
That's a nice and simple solution, albeit it should still be implemented on the user's side, as always keeping a hash of possibly unlimited size behind the scene is probably unreasonable.
If objects are treated similar to scalar values when used in maps, it may make sense to extend this behavior onto other cases like assignment/passing to routine. Otherwise there will be an inconsistency. I think it is simpler to reason about the behavior of different data when you can draw a strict line between scalar-like values and reference-based objects. If not, it seems better to leave things simple. |
Yes, this was what I had in mind.
This is not an issue, because the object/pointer and the data is one-to-one related, so there needs no data copying or any special treatment other than pointer assignment/copying. |
I definitely agree, but those cases near this strict line should be covered by some mechanism like the proposed |
One way or another, I am against ad-hoc mechanisms which break the conceptual meaning of data one operates on. That is, when behavior differs (conceptually) depending on the context. If e.g.
There should not be any edge cases, exceptions or magical transmutation wands like |
Why magical transmutation wands? It's explicit in all cases I can think of and doesn't mess anything up. It's like specifying an interface
That would make sense if we had a simple mechanism how to define both scalar and non-scalar types. Currently we can define only classes or compound types whereas both are always non-scalar (which is a sane default choice). |
It's magical because it de-facto turns an object into scalar value in certain local context. I consider this to be too hackish.
If you want 1:1 correspondence of object reference and underlying value (like in the example with |
That means also custom type which is not exactly what would one expect to do with scalar-like objects (especially those provided in official dao-modules). |
Supporting things like I think the approach I proposed is more clean and predictable. Also it should have made |
Yes, but I'm scared about classes. The approach you outlined in #308 (comment) assumes that one will never need to create a scalar-like object in Dao, but only in C. I agree, that it's the simpliest, least-problematic and fastest solution, but I'm not sure if it's sufficient (considering the huge amount of classes I've seen in Java with implemented method |
It might not be indeed a big issue in Dao, but for the new system programming language there will be needed a more comprehensive solution as construction of scalar-like objects will need to be done directly in the language. |
If there is need to create scalar-like objects in C, it exists for Dao as well. Simply because implementing everything in C is impractical. Dao should better provide sufficient and robust means for expanding its domain, which assures it can easily be extended by its users without knowing C and DaoVM API. It would be good to avoid further divergence of wrapped types and Dao classes. |
I doubt it is that important. Paying more attention to thread-safety is of more significance, I think: C-written code (even for a method marked by
OK, but writing code to conform to this rule will not always be simple and straightforward. I am sure it will often take more time and efforts then adding some |
This is what I'm afraid of the most. Already now, I'm feeling a bit reluctant to get into the hassle of |
Yes. I don't think "100% immutable!!!" label on a class changes a lot. It does not guarantee you can absolutely safely access it concurrently without syncing, or serialize it and deserialize back without disrupting anything, for instance. For me it is sufficient that the class is designed as immutable -- that is, it provides only |
Finally, there may be cases where you actually need to 'leak' some data outside in the constructor. For instance, we already concluded that it's perfectly reasonable to create hashes of object in order to avoid creation of duplicates. Obviously, one then need to consult such hash inside the constructor, which will not be possible with the current implementation. There may be many cases like this one. At the same time, these constructor rules still cannot ensure logical immutability. I can easily store an ID of primitive kind in a class instance, and then fetch some data through that ID making it look like the instance actually contains this data. Will it always be the same? No one can say. Is this immutability? Well, not really. There can't be 100%-auto-assured actual immutability, and I don't think it matters much. |
Now I disabled the constructor restriction. We might add something in the future if we can find a clever and non-obstructive way to do it. For now, the restriction is too much.
I think for all our discussions, "immutability" was meant for the objects themselves, not anything else logically associated with it. Anyway, I think we can conclude our discussion on the immutability now :) |
OK, good :) Now, the last thing troubling me is implicit conversion of |
I have decided to make the change, but forgot about it in the last commit. Thanks for reminding. |
Done. Also, now I have added support for user defined comparisons by using overloaded (pseudo) operator Now I feel we don't have to try hard to ensure absolute consistency in map key comparisons. Because for user defined type it is just not possible, as @Night-walker has mentioned the following possibility:
Another reason is that, we cannot enforce referential transparency on user defined comparisons (well, we could use the same rule previously used for constructors, but it is too obtrusive). So we have to trust users on this. I am also considering to trust users for other key types such as tuples. |
I only wonder why Overall, everything's nice and clear now; I will definitely make use of |
I guess it's because such operator can behave both as usual cast operator (so |
Changed |
BTW, it may make sense to even allow to pass instances of |
That's the idea.
Right, this should be more convenient. |
Shouldn't be the artificial operator invar class C {
invar x
routine C(){ x = 5 }
routine <=>(invar self: C, invar other: C){
x == other.x ? 0 : (x < other.x) ? -1 : 1
}
}
a = C()
b = C()
io.writeln('eq', a == b)
With defined invar class C {
invar x
routine C(){ x = 5 }
routine <=>(invar self: C, invar other: C){
x == other.x ? 0 : (x < other.x) ? -1 : 1
}
routine ==(invar self: C, invar other: C){ x == other.x }
}
a = C()
b = C()
io.writeln('eq', a == b)
Btw, there is missing argument in the |
Also only one casting routine with |
I have consider this, but there is a minor issue: the result of
Right, I forgot about this. This should make it more convenient to use
There is really no need to enforce this, since it really harmless. It would be awkward to enforce such thing by checking the parameter names and types, and raising errors saying that certain parameters cannot be used with certain methods. And consider that there are many other cases such enforcing may sound "reasonable" (consider other operator overloading), it would just become a mess. |
Then perhaps it's time to drop overloading of all other comparison operators other then combined Also, I wonder what should be done when a class is comparable for equality, but not for greater/lesser relation. It means that plain pointer comparison should be used instead, but how to tell DaoVM that your |
Maybe.
That's simple, simply define the following overloading: routine <=>( other: UserType, comparison: enum<EQ> ) for equality and inequality checking only. |
After considering the all the pros and cons of using |
That would be simpler indeed. |
The surprising thing is, that the implementation of invar class A {
invar x: int
routine A(val: int) { x = val }
routine ==(invar self: A, invar other: A) {
# make it produce similar VM code as class B
x == other.x ? 1 : x < other.x ? 0 : 0
}
routine <(invar self: A, invar other: A) {
x == other.x ? 0 : x < other.x ? 1 : 0
}
}
invar class B {
invar x: int
routine B(val: int) { x = val }
routine <=>(invar self: A, invar other: A) {
x == other.x ? 0 : x < other.x ? -1 : 1
}
}
var ma: map<A, int> = {=>}
var mb: map<B, int> = {=>}
var la: list<A> = {}
var lb: list<B> = {}
for (i = 1 : 10**6) {
la.append(A(rand(i)))
lb.append(B(rand(i)))
}
routine measure_A_for() { for (X in la) ma[X] = X.x }
routine measure_B_for() { for (X in lb) mb[X] = X.x }
routine measure_A_iter() { la.iterate { ma[X] = X.x } }
routine measure_B_iter() { lb.iterate { mb[X] = X.x } }
measure_A_for()
measure_B_for()
measure_A_iter()
measure_B_iter()
|
I am afraid your test is wrong. You cannot test these two approaches in a single test, because they are not supported simultaneously. The |
I thought so, but wasn't sure :( |
Anyway, one can nicely see that calling two methods instead of doing pointer comparison is cca 7.5x slower. |
Oh yeah. I made a copy&paste mistake in the benchmark code. Instead of
|
It seems you still don't understand why there is such difference in speed. Please read my comment again, where I have pointed out, the speed difference is not due to the comparison, but due to the fact the maps contain very different number of keys! |
Sorry for that, I was too tired and written a nonsense :( |
Just be easy, and often double check.
Now it is done. |
I think we can close this issue now. |
Also I'm not sure about blind pointer comparison for all non-primitive data. I'd probably disable the default fallback to pointer comparison for data which Dao doesn't know anything about (i.e. those, which don't have any routine for the
==
operator defined).The text was updated successfully, but these errors were encountered: