fixed indexing issue when vertices are <:Integer #133
Conversation
How precisely is it enumerating the vertices? |
Well, perhaps it's not. I'm passing vertices(g) as an additional argument, and my concern was that doing so might force an evaluation of the vertices which could possibly get expensive. |
|
I don't think it is, and I don't think it can be. Consider:
The old (bad) code would simply return The new code, which fixes all cases, needs at a minimum one search through an unordered list of vertices, which I believe is O(n). It could be faster depending on how Julia does array (vector) searching. Also, depending on dispatch method, this is no worse than others (see common.jl for some examples with other types). |
Agreed, but we definitely need a "fast case" available when the vertices are 1-based. Maybe @lindahua should comment on how this was intended to be handled in the initial design. |
The only way to do that, I think, is to figure out a way to check to see whether vertices(g) is a range. Even then, if you add a node that is nonsequential afterwards, you're basically hosed, and this will break any attempt to remove a node. |
Or the convention could be that by using integers for vertices, you're guaranteeing that that the indices are 1-based and contiguous. |
I think that would be a mistake. This is the reason Consider my use case, which is mapping UNIX UIDs to activity. I'd have to construct a custom (memory-laden) type as a vertex identifier, or use a non-indexing property which would destroy insert performance. Anyone who does time-based graphing using epoch will also be burned by this. It would also mean that graphs couldn't have a vertex with integer value <= 0. |
Well, there's no memory overhead for creating a custom |
So, I'm not sure it's O(n). I don't know enough about how Julia does vector searching, but IIRC unordered lists are O(n) search. Another option is to create a new graph type that would have the guarantees you desire. This would have the advantage of requiring the user to explicitly acknowledge the indexing limitations (and would allow us to throw appropriate error messages if constraints are violated). Regardless, I urge adoption of this PR because as it stands right now, 0.3.3 is broken for char and nonconsecutive/nonpositive ints, and 0.4.x is broken for nonconsecutive/nonpositive ints. This could really burn someone who's expecting proper behavior. IMO, it's better to have a correct result whose derivation MAY BE slower, than a guaranteed incorrect result that's delivered lightning fast. You can optimize speed, but correctness is boolean. ETA: EVEN FOR contiguous, 1-based sequences of vertices, this code is broken:
|
I am not satisfied with the current way of handling vertex/edge indexing either. In the original design, what we intend is:
|
The following problem can be easily resolved by adding a bound checking in
However, using an algorithm of complexity |
I wish I had known of this implicit assumption - I wouldn't have spent the last two days trying to optimize dijkstra and pulling my hair out due to the catastrophic errors introduced by vertex_index(). If you decide to keep this constraint, I would urge copious documentation, and a rewrite of add_vertex! must ensure that the new vertex is sequential. I really think this is a mistake, though - it's certainly one that will preclude me from using this package for my application - and without documentation and constraint checks, will cause others to rely on expected functionality that will (silently) break their code. Right now, there's nothing magical about Integer vertices that requires them to be positive, 1-based, and sequential. I would again recommend a special graph type that enforces these constraints with the advantage of speed. It's not obvious at all in the documentation or the implementation that using an Integer for a vertex should result in incorrect indexing. If you decide to enforce the limitation, then you should explicitly disallow all Integer vertex types other than unsigned (including char for 0.3.x), you should ensure that add_vertex! does not require a vertex parameter, and, as a side effect, you will give up any possibility of deleting vertices in any efficient manner. That last part means that this package will be completely unsuitable for modeling dynamic graphs such as AMI, power systems, and user activity networks. In any case, if that's the decision, so be it, and I'll close the PR. |
@sbromberger Your concerns are valid. I agree that this implicit assumption is a recipe for errors (and even disasters in some applications). A stopgap to remedy this:
|
I think your suggestion is a good long-term solution as well - this is what I was thinking about when I proposed creating a special constrained graph for fast indexing. In order to make SimpleGraph safe, it will be necessary to modify For performance, perhaps using a heap or other search-optimized structure for non-SimpleGraph vertices would be a possibility. If memory serves, you'd never get O(1), but you could approach O(log n). |
1 similar comment
I've made the following changes:
|
…or add_vertex\!() deprecated add_vertex\!(g::SimpleGraph, v)
So this change isn't correct because if the vertex type is |
Are there any other graph types that need to be excluded? On Dec 20, 2014, at 11:58, Miles Lubin notifications@github.com wrote: So this change isn't correct because if the vertex type is KeyVertex, which — |
Probably the right way is to do something like:
Then undefine |
I'm not sure I follow:
...which would put us right back into #131 (comment) issue 1. What am I missing? |
Removing the definition |
That "breaks" simple_graph (that is, it would force simple_graph to be O(n) which is what @lindahua didn't want), ...unless we keep the second commit (items 1 and 2 from #133 (comment)). Is that what you're suggesting? |
Yeah, remove |
Yep, I got it now. See latest? |
LGTM, but tests are failing and the docs should be updated also. |
Thank goodness for CI. I'm not sure this is going to work, since |
I think the code will need to be updated to use |
Thanks @sbromberger Fixes #131 Fixes #127 Closes #133
Fixes #131. This was a simple change, but may have memory implications, as it's enumerating the vertices. Tests on graphs with order=1e7 don't show any appreciable slowdown, though.