Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Add dag-topo in ShortestPaths #1486

Closed
wants to merge 3 commits into from

Conversation

kanishk509
Copy link

#1485
Uses topo sort + DP to find shortest paths in a DAG.
I'll work on writing test cases benchmarks soon.

@sbromberger
Copy link
Owner

I'd be interested in the benchmarks. You're essentially doing two traversals here: the first in topological sort, and then your own. I can't see how this would be faster than djikstra but I'll wait to see the numbers :)

@kanishk509
Copy link
Author

Comparison with Dijkstra.
I used a simple tree-structure DAG:

function make_dag(;breadth, len_chain)
    sz = breadth * len_chain + 1

    dag = LightGraphs.SimpleGraphs.SimpleDiGraph(sz)
    mat = spzeros(Int, sz, sz)

    for v in 1:breadth
        add_edge!(dag, sz, v)
        mat[sz, v] = 1
    end

    for v in breadth+1:sz-1
        add_edge!(dag, v-breadth, v)
        mat[v-breadth, v] = 1
    end
    
    return (dag, mat)
end

This gives the following type graph (for breadth=10, len_chain=3): image

Tested with sizes from 1K-10K:

djk = LightGraphs.ShortestPaths.Dijkstra()
dag_topo = LightGraphs.ShortestPaths.DagTopo()

for b in 1000:1000:10000
    (dag, mat) = make_dag(breadth=b, len_chain=10);
    println("sz=", nv(dag))
    
    println("  dijkstra:")
    @btime LightGraphs.ShortestPaths.shortest_paths(dag, nv(dag), mat, djk);
    
    println("  dag-topo:")
    @btime LightGraphs.ShortestPaths.shortest_paths(dag, nv(dag), mat, dag_topo);
end
sz=10001
  dijkstra:
  2.232 ms (54 allocations: 663.00 KiB)
  dag-topo:
  400.126 μs (35 allocations: 503.59 KiB)
sz=20001
  dijkstra:
  4.832 ms (61 allocations: 1.34 MiB)
  dag-topo:
  933.452 μs (37 allocations: 1005.03 KiB)
sz=30001
  dijkstra:
  7.842 ms (65 allocations: 2.19 MiB)
  dag-topo:
  1.886 ms (37 allocations: 1.22 MiB)
sz=40001
  dijkstra:
  13.940 ms (65 allocations: 2.58 MiB)
  dag-topo:
  3.152 ms (38 allocations: 1.96 MiB)
sz=50001
  dijkstra:
  19.705 ms (66 allocations: 3.63 MiB)
  dag-topo:
  4.242 ms (38 allocations: 2.20 MiB)
sz=60001
  dijkstra:
  25.034 ms (66 allocations: 4.02 MiB)
  dag-topo:
  5.579 ms (38 allocations: 2.44 MiB)
sz=70001
  dijkstra:
  30.471 ms (72 allocations: 4.94 MiB)
  dag-topo:
  6.809 ms (39 allocations: 3.68 MiB)
sz=80001
  dijkstra:
  36.878 ms (72 allocations: 5.33 MiB)
  dag-topo:
  8.007 ms (39 allocations: 3.92 MiB)
sz=90001
  dijkstra:
  38.698 ms (67 allocations: 6.50 MiB)
  dag-topo:
  9.244 ms (39 allocations: 4.16 MiB)
sz=100001
  dijkstra:
  44.362 ms (67 allocations: 6.89 MiB)
  dag-topo:
  10.261 ms (39 allocations: 4.40 MiB)

@kanishk509
Copy link
Author

We can optimize dag-topo by writing our own topological sort where we only visit vertices reachable from the source(s). Traversals.topological_sort() visits all the vertices.

@sbromberger
Copy link
Owner

Ah. The issue with this code being in LightGraphs proper is that it depends on being passed a DAG. As I mentioned in the related issue, it's probably better if you create a separate specialized graph type that is ONLY a DAG so that we don't get errors when we try to feed it a general AbstractGraph:

julia> g = SimpleDiGraph(LightGraphs.Generators.Cycle(10))
{10, 10} directed simple Int64 graph

julia> ShortestPaths.shortest_paths(g, 1, ShortestPaths.DagTopo())
ERROR: Cycles are not allowed in this function.
Stacktrace:
 [1] macro expansion at /home/seth/.julia/dev/LightGraphs/src/Traversals/depthfirst.jl:174 [inlined]

In general within LightGraphs, we don't want to produce errors like this. We'd much rather throw a MethodError because the function isn't specialized for the specific graph type being passed in.

@kanishk509
Copy link
Author

@sbromberger I thought about this and although I agree that a separate DAG type would be nice, I would like to make the case that this algo should be included in LightGraphs.ShortestPaths based on SimpleDiGraph itself.

  1. Letting the user decide if he wants to use this algo and ensure that his graph is a DAG, is in principle, same as what we are currently doing with algorithms like Dijkstra (assumption that the user passes non-negative weight matrix), no-negative-cycle requirement and other constraints that different algorithms have. We can do something about the error handling in case of a non-DAG. Dijkstra also just gives out the wrong result if negative weights are passed, right? (btw, dag-topo can handle negative weights).

  2. It's the fastest algo for this type of graph. The initial benchmarks look promising, and theoretically, the speedup(ratio) vs dijkstra should increase with the size of the graph, as the logV factor becomes substantial.

  3. This is a fairly common use-case, judging by how often I've used it. Other libraries like Boost Graph Library also have this as part of their shortest-paths arsenal.

Thoughts?

@jpfairbanks
Copy link
Contributor

I think that throwing an exception in the topological sort is a fine way to handle this. Since the algorithm is not being set as the default and has "Dag" in the name, I think we can rely on people to ensure that their graph is a DAG or catch the exception if they aren't sure.

The exception should tell people to use Dijkstra's for graphs that can contain cycles. So far we have used the introduction of dependencies to scope the boundaries of LG. Since this just uses two features we already have (toposort and traversal), I think it is good to include.

A bike shedding issue is that DAG is an abbreviation so it should probably be all caps.

@kanishk509
Copy link
Author

kanishk509 commented Nov 9, 2020

I'm catching the CycleError from topological_sort and throwing a DAGError (should it be named something else?).

@sbromberger
Copy link
Owner

I'm not sure I agree, especially given the discussion in the related issue about the usefulness of storing additional topological data in the DAG data structure. If that's beneficial - and I think it is - then this argues for a separate package that handles DAGs, just like we have for weighted graphs, metagraphs, vertex-safe graphs, and what we've always talked about regarding random graphs and small graphs (I think @matbesancon actually implemented a few of these, with O(1) memory for small graphs).

@kanishk509
Copy link
Author

try/catch is an antipattern in Julia. This may significantly impact performance, especially in the sad path.

Oh, I wasn't aware of that. TIL. What would you suggest as an alternative in a case like this?

then this argues for a separate package that handles DAGs

Yeah that does make sense.

@sbromberger
Copy link
Owner

@kanishk509 have you decided on an approach here? If you wanted to create a DAG graph type in a new package and implement the LightGraphs API, we could transfer it into the JuliaGraphs organization and make you the maintainer.

@kanishk509
Copy link
Author

@kanishk509 have you decided on an approach here? If you wanted to create a DAG graph type in a new package and implement the LightGraphs API, we could transfer it into the JuliaGraphs organization and make you the maintainer.

Hey @sbromberger,

I did sketch out roughly how I would want to implement a DAG:

struct DAG
    g::SimpleDiGraph
    topoOrder::<some list type>
    isStale::Bool
end

Here we would construct a topo-order on the creation of the graph and set the isStale flag true if the graph is modified in a way that modifies that order. We would re-calculate the topo-order lazily when required (for eg. on a call to shortest path).
This would be ideal for static graphs and also a good compromise for dynamic graphs.
I had also thought of various methods that I would like to implement for the DAG type.

The project has been on my to-do list for a while. However, I haven't been able to devote much time to it because of an ongoing internship.

I was wondering if JuliaGraphs would be interested in participating in GSoC'21? If so and if this project is not currently urgent, do you think this would make a good project proposal? (I'm a CS undergrad and eligible to apply). I realize that it's still a few months off (student applications start in March), but I would be able to properly work on this then as my internship would be wrapped up by then.

Let me know what you think.

Cheers.

@sbromberger
Copy link
Owner

That looks good, but I wonder if you can avoid the extra memory. That is, you're carrying the full graph around, and then a topological ordering on the graph, that will be 2V + E. Are there more space-efficient ways of storing a DAG?

As far as GSOC, I've been a GSOC mentor the past 4-5 years (when Julia has gotten slots), but this year I won't have time to do it. There are others here who have also been GSOC mentors, though. It's probably worth discussing, and a proposal seems like a good idea.

@codecov
Copy link

codecov bot commented Dec 31, 2020

Codecov Report

Merging #1486 (b1c5408) into master (ce53372) will decrease coverage by 0.30%.
The diff coverage is 13.04%.

@@            Coverage Diff             @@
##           master    #1486      +/-   ##
==========================================
- Coverage   92.66%   92.35%   -0.31%     
==========================================
  Files         195      196       +1     
  Lines        6325     6345      +20     
==========================================
- Hits         5861     5860       -1     
- Misses        464      485      +21     

@sbromberger
Copy link
Owner

Also, just a thought:

We would re-calculate the topo-order lazily when required

You should give the user the option. Lazy recalculation in library code could create nondeterministic behavior in end-user code.

@kanishk509
Copy link
Author

@sbromberger Thanks for the pointers, I'm making a note of them.

What would be the proper place to ask about GSoC participation and whether someone would be interested in mentoring such a project? Should I post it on the Gitter channel?

Base automatically changed from master to development January 11, 2021 21:00
Base automatically changed from development to master January 11, 2021 21:02
Base automatically changed from master to master-old January 11, 2021 21:08
@sbromberger sbromberger deleted the branch sbromberger:master-old January 11, 2021 21:11
@sbromberger
Copy link
Owner

Sorry about this - this PR was closed because we renamed some branches. Would you mind rebasing on master and reopening the PR? Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants