Skip to content
This repository has been archived by the owner on Oct 21, 2021. It is now read-only.

fixed indexing issue when vertices are <:Integer #133

Closed
wants to merge 3 commits into from
Closed

fixed indexing issue when vertices are <:Integer #133

wants to merge 3 commits into from

Conversation

sbromberger
Copy link
Contributor

Fixes #131. This was a simple change, but may have memory implications, as it's enumerating the vertices. Tests on graphs with order=1e7 don't show any appreciable slowdown, though.

@mlubin
Copy link
Contributor

mlubin commented Dec 19, 2014

How precisely is it enumerating the vertices?

@sbromberger
Copy link
Contributor Author

Well, perhaps it's not. I'm passing vertices(g) as an additional argument, and my concern was that doing so might force an evaluation of the vertices which could possibly get expensive.

@mlubin
Copy link
Contributor

mlubin commented Dec 19, 2014

vertex_index should be a fast O(1) operation when the vertices are integers. As long as we're careful to preserve that, this seems fine.

@sbromberger
Copy link
Contributor Author

I don't think it is, and I don't think it can be. Consider:

h = graph([10,11,12,13],[])

The old (bad) code would simply return 10 for vertex_index(10,h). This is clearly incorrect and has the side effect of corrupting any type <:Integer. It is only correct when the vertices are 1-based sequences.

The new code, which fixes all cases, needs at a minimum one search through an unordered list of vertices, which I believe is O(n). It could be faster depending on how Julia does array (vector) searching.

Also, depending on dispatch method, this is no worse than others (see common.jl for some examples with other types).

@mlubin
Copy link
Contributor

mlubin commented Dec 19, 2014

Agreed, but we definitely need a "fast case" available when the vertices are 1-based. Maybe @lindahua should comment on how this was intended to be handled in the initial design.

@sbromberger
Copy link
Contributor Author

The only way to do that, I think, is to figure out a way to check to see whether vertices(g) is a range. Even then, if you add a node that is nonsequential afterwards, you're basically hosed, and this will break any attempt to remove a node.

@mlubin
Copy link
Contributor

mlubin commented Dec 19, 2014

Or the convention could be that by using integers for vertices, you're guaranteeing that that the indices are 1-based and contiguous.

@sbromberger
Copy link
Contributor Author

I think that would be a mistake. This is the reason char breaks in 3.3, and it would continue to break all subtypes of Integer, which include all signed, unsigned, bigint, and bool types.

Consider my use case, which is mapping UNIX UIDs to activity. I'd have to construct a custom (memory-laden) type as a vertex identifier, or use a non-indexing property which would destroy insert performance.

Anyone who does time-based graphing using epoch will also be burned by this.

It would also mean that graphs couldn't have a vertex with integer value <= 0.

@mlubin
Copy link
Contributor

mlubin commented Dec 19, 2014

Well, there's no memory overhead for creating a custom immutable type, though it certainly is ugly. I have to object to making vertex lookups O(n) in the linear base case since it will silently increase the algorithmic complexity of existing code. Hopefully we can gather a few more opinions here.

@sbromberger
Copy link
Contributor Author

So, I'm not sure it's O(n). I don't know enough about how Julia does vector searching, but IIRC unordered lists are O(n) search.

Another option is to create a new graph type that would have the guarantees you desire. This would have the advantage of requiring the user to explicitly acknowledge the indexing limitations (and would allow us to throw appropriate error messages if constraints are violated).

Regardless, I urge adoption of this PR because as it stands right now, 0.3.3 is broken for char and nonconsecutive/nonpositive ints, and 0.4.x is broken for nonconsecutive/nonpositive ints. This could really burn someone who's expecting proper behavior. IMO, it's better to have a correct result whose derivation MAY BE slower, than a guaranteed incorrect result that's delivered lightning fast. You can optimize speed, but correctness is boolean.

ETA: EVEN FOR contiguous, 1-based sequences of vertices, this code is broken:

julia> a = simple_graph(10)
Directed Graph (10 vertices, 0 edges)

julia> vertex_index(1000,a)
1000

@lindahua
Copy link
Contributor

I am not satisfied with the current way of handling vertex/edge indexing either.

In the original design, what we intend is:

  • When using Int as vertex type, you implicitly assume that vertex_index(i, g) == i always holds.
  • If you want more sophisticated way of indexing vertices, you have to explicitly introduce a new vertex type, and handle the indexing by overriding vertex_index(v, g) for the customized vertex type.

@lindahua
Copy link
Contributor

The following problem can be easily resolved by adding a bound checking in vertex_index:

julia> a = simple_graph(10)
Directed Graph (10 vertices, 0 edges)

julia> vertex_index(1000,a)
1000

However, using an algorithm of complexity O(n) or O(log n) for vertex(i::Int, g::AbstractGraph) will cause huge performance issues in many applications.

@sbromberger
Copy link
Contributor Author

@lindahua

When using Int as vertex type, you implicitly assume that vertex_index(i, g) == i always holds.

I wish I had known of this implicit assumption - I wouldn't have spent the last two days trying to optimize dijkstra and pulling my hair out due to the catastrophic errors introduced by vertex_index().

If you decide to keep this constraint, I would urge copious documentation, and a rewrite of add_vertex! must ensure that the new vertex is sequential.

I really think this is a mistake, though - it's certainly one that will preclude me from using this package for my application - and without documentation and constraint checks, will cause others to rely on expected functionality that will (silently) break their code. Right now, there's nothing magical about Integer vertices that requires them to be positive, 1-based, and sequential. I would again recommend a special graph type that enforces these constraints with the advantage of speed.

It's not obvious at all in the documentation or the implementation that using an Integer for a vertex should result in incorrect indexing. If you decide to enforce the limitation, then you should explicitly disallow all Integer vertex types other than unsigned (including char for 0.3.x), you should ensure that add_vertex! does not require a vertex parameter, and, as a side effect, you will give up any possibility of deleting vertices in any efficient manner.

That last part means that this package will be completely unsuitable for modeling dynamic graphs such as AMI, power systems, and user activity networks.

In any case, if that's the decision, so be it, and I'll close the PR.

@lindahua
Copy link
Contributor

@sbromberger Your concerns are valid.

I agree that this implicit assumption is a recipe for errors (and even disasters in some applications).

A stopgap to remedy this:

  • restrict vertex_index(i, g) = i to only SimpleGraph.
  • for others, using a safer way.

@sbromberger
Copy link
Contributor Author

I think your suggestion is a good long-term solution as well - this is what I was thinking about when I proposed creating a special constrained graph for fast indexing.

In order to make SimpleGraph safe, it will be necessary to modify add_vertex! so that v is ignored (you can't / don't want to use it), and will need to enforce bounds checking on vertex_index.

For performance, perhaps using a heap or other search-optimized structure for non-SimpleGraph vertices would be a possibility. If memory serves, you'd never get O(1), but you could approach O(log n).

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 5c7460e on sbromberger:fix_vertex_index into 6525b82 on JuliaLang:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 5c7460e on sbromberger:fix_vertex_index into 6525b82 on JuliaLang:master.

@sbromberger
Copy link
Contributor Author

I've made the following changes:

  1. created a new add_vertex!(g::SimpleGraph) that does not take a vertex, since these graphs must ensure consistent state of the vertex range, and deprecated add_vertex!(g::SimpleGraph, v)

  2. made vertex_index(v::Integer, g::SimpleGraph) do proper bounds checking and still produce an O(1) result (= v) - fixes incorrect behavior of vertex_index (in all versions) #131 (comment) issue 2.

  3. kept the original change for all other graphs to ensure that vertex_index() does the right thing (even if it's O(n)). - fixes incorrect behavior of vertex_index (in all versions) #131 (comment) issue 1.

…or add_vertex\!()

deprecated add_vertex\!(g::SimpleGraph, v)
@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling fe2b228 on sbromberger:fix_vertex_index into 6525b82 on JuliaLang:master.

@mlubin
Copy link
Contributor

mlubin commented Dec 20, 2014

So this change isn't correct because if the vertex type is KeyVertex, which stores its own index, vertex_index will still end up doing a linear search on the list of vertices.

@sbromberger
Copy link
Contributor Author

Are there any other graph types that need to be excluded?

On Dec 20, 2014, at 11:58, Miles Lubin notifications@github.com wrote:

So this change isn't correct because if the vertex type is KeyVertex, which
stores its own index, vertex_index will still end up doing a linear search
on the list of vertices.


Reply to this email directly or view it on GitHub
#133 (comment).

@mlubin
Copy link
Contributor

mlubin commented Dec 21, 2014

Probably the right way is to do something like:

function vertex_index{V<:ProvidedVertexType}(v::V, g::GenericGraph{V})
    if applicable(vertex_index, v)
        return vertex_index(v)
    else
        return vertex_index(v,vertices(g))
    end
end

Then undefine vertex_index(v::Integer) = v if this is consensus.

@sbromberger
Copy link
Contributor Author

@mlubin

I'm not sure I follow:

julia> a = graph([2,4,6,8],[])
Directed Graph (4 vertices, 0 edges)

julia> applicable(vertex_index, 2)
true

...which would put us right back into #131 (comment) issue 1. What am I missing?

@mlubin
Copy link
Contributor

mlubin commented Dec 21, 2014

Removing the definition vertex_index(v::Integer) = v

@sbromberger
Copy link
Contributor Author

That "breaks" simple_graph (that is, it would force simple_graph to be O(n) which is what @lindahua didn't want), ...unless we keep the second commit (items 1 and 2 from #133 (comment)). Is that what you're suggesting?

@mlubin
Copy link
Contributor

mlubin commented Dec 21, 2014

Yeah, remove vertex_index(v::Integer) and define vertex_index(v::Integer, g::SimpleGraph).

@sbromberger
Copy link
Contributor Author

Yep, I got it now. See latest?

@mlubin
Copy link
Contributor

mlubin commented Dec 21, 2014

LGTM, but tests are failing and the docs should be updated also.

@sbromberger
Copy link
Contributor Author

Thank goodness for CI. I'm not sure this is going to work, since vertex_index(v) is being called by other areas of the code. (How one can have a vertex_index call without an associated graph is puzzling to me.)

@mlubin
Copy link
Contributor

mlubin commented Dec 21, 2014

I think the code will need to be updated to use vertex_index(v,g) instead. They're probably a holdover from before the graph argument was introduced.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incorrect behavior of vertex_index (in all versions)
5 participants