[WIP] Search interface rework #55

ferranpujolcamins · 2018-11-12T01:12:22Z

Introduction

This PR is the follow of PR #43.

The goals of this PR are:

Remove code duplication and homogenize the api for DFS and BFS.
Extend the api for DFS and BFS so an arbitrary computation can be performed over a graph (a closure is fed with the graph vertices in a DFS or BFS order).

Explanation of the changes

GraphTraverser is a protocol representing an algorithm to traverse a graph. Types implementing it implement a graph traversal algorithm in its from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int? method.

An extension to GraphTraverser provides with default implementations for several convenience methods that wrap from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int?.

DFS and BFS are structs conforming to GraphTraverser, so both algorithms have the same api provided by the extension. Furthermore, both for BFS and DFS, there's an extension to Graph with convenience methods to perform a search by calling a method on a graph instance.

DFSVisitOrder and BFSVisitOrder are variants of DFS and BFS that can guarantee a certain order of traversal for the neighbors of each visited vertex. I've kept them as a separate algorithm because the visit order has a performance penalty, even when the ordering closure is a no-op ({ $0 }). DFSand BFS have the withVisitOrder method to easily construct a DFSVisitOrder and BFSVisitOrder variant.

Examples

// Perform a DFS on graph from Seattle to Miami
_ = graph.dfs(from: "Seattle", to: "Miami")
_ = DFS(on: graph).from("Seattle", to: "Miami")

// Perform a BFS on graph visiting neighbors in lexicographic order
_ = BFS(on: graph)
         .withVisitOrder({ $0.sorted(by: { $0.v < $1.v }) })
         .from(index: 0, toIndex: 5)
_ = BFSVisitOrder(on: graph, withVisitOrder: { $0.sorted(by: { $0.v < $1.v }) }))
         .from(index: 0, toIndex: 5)

// Perform an abstract computation over graph in a depth-first order
_ = DFS(on: graph).from(0, goalTest: somePredicate, reducer: { print($0) })

API Changes

This PR introduces one and only one breaking api change:
findAll is gone, now you must choose between findAllBFS and findAllDFS

All other changes are non-breaking changes to the api.

Performance

Swift is not Rust, so more often than not abstraction comes with a performance penalty. In this PR I've sacrificed a lot of performance in favor of less code duplication.

Below there's a table with the results of some of the performance tests compared to master, stating execution time difference (+ is bad, - is good):

Test	vs. master (`cf9a6a0`)	vs. master before latest perf. improvements (`d01175e`)
testDfsInStarGraphWithGoalTestByIndex	+45%	+11%
testBfsInStarGraphWithGoalTestByIndex	+28%	-89%
testDfsInPathWithGoalTestByIndex	+42%	+2%
testBfsInPathWithGoalTestByIndex	+34%	+1%
testDfsInCompleteGraphWithGoalTestByIndex	-98%	-98%
testBfsInCompleteGraphWithGoalTestByIndex	-4%	-26%
testDfsInStarGraphByIndex	-42%	-62%
testBfsInStarGraphByIndex	+118%	-86%
testDfsInPathByIndex	+85%	+30%
testBfsInPathByIndex	+75%	+21%

As you can see, this PR introduces a major performance regression compared to our current master, although with some surprising big improvements on some cases. So, if performance was the only consideration we should not merge this PR.

When comparing to a commit on master before the latest merged performance improvements, we see that after merging this PR we could ship a new version with good performance improvements in most cases and only some performance drop of up to 30% in few cases.

Future Performance

This PR as is, is bad for performance. The main reason of this drop of performance is that all the methods implemented in the extension of GraphTraverser are now written as special cases of an abstract algorithm that operates with closures. Before, this algorithms had no function calls at all.

The good news is that this inefficient methods can be overwritten in DFS and BFS. For example, bfs(fromIndex: Int, toIndex: Int) -> [E] is the method which has seen its performance dropped the most. We could write a specialized version of the dfs for it, with no closures, just like it is in master now. We must decide what balance between code reuse and performance we want. I left this for future PR though.

The someone implementing a new totally new GraphTraverser can follow a progressive path:

Implement GraphTraverser and get a bunch of non-optimized convenience methods for free.
Write code using their GraphTraverser implementation, benchmark it and then decide if some methods need to be specialized for better performance.

Also if Swift ever gets more efficient abstractions we might be able to remove some of the specialized functions.

Notes

I've tried merging the goalTest and the reducer closures of the from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int? method into a single closure returning a tuple of bools, but this resulted in a slight loss of performance. It's ok since I thing the current solution with two closure is more ergonomic.
I've tried making DFS and BFS conform to Sequence, so both can be consumed with a for in loop but it seemed to drop performance a bit.
Can the Dijkstra's algorithm be made to conform GraphTraverser? I think we could. But shall we? The main point of GraphTraverser is to get the bunch of convenience methods it defines in its extension for free. Are those methods a sound interface to Dijkstra's algorithm?

ferranpujolcamins · 2018-11-12T01:39:38Z

TODO

Add test coverage for DFSVisitOrder and BFSVisitOrder

davecom · 2018-11-12T02:11:59Z

Hi @ferranpujolcamins ,
Thank you for the amazing work on this and the great write up. I wish everyone in open source wrote as thoughtful and thorough a dialog as you did here. It's a very interesting set of pros and cons that we have here:

Pros:

Removes some code duplication
Creates an abstraction that allows more flexibility if someone wants to create a custom search
Maybe will make adding some additional search algorithms in the future a little easier (or maybe not as well; since there may be performance costs and issues in conforming to the protocol)

Cons:

We have something that is a significant performance regression (very significant in some test cases)
Adds some real complexity to the code base. The current search implementations are very easy to reason about since there are not a lot of moving parts; now someone digging in needs to know what a GraphTraverser is; what a VisitOrder is; and what certain closures should provide. I guess SwiftGraph is at the point now though, where it's no longer a teaching project. People are using it in real apps and maybe we shouldn't worry as much about whether someone new can learn an algorithm from it? On the other hand this can also be another point against the performance hit. If we are thinking about SwiftGraph now as mainly targeted towards production users, then shouldn't we really avoid these big performance regressions that come with the added complexity?
Even though the point here originally (I think) was to eliminate code duplication; even though we're doing that we're actually adding more protocols, more structs, and more code in total to maintain than what we had before.

To get some little things out of the way: I agree with the API break with regards to findAllBFS and findAllDFS. I really don't think that's a big deal at all.

So, is the cool abstraction we're getting worth the performance degradation and the added complexity being introduced into the code base? My guess is the vast majority of users of SwiftGraph would benefit more from a built-in implementation of A* than they would from being able to customize BFS or DFS. I don't think GraphTraverser as it stands would be a suitable protocol for Dijkstra since Dijkstra returns the shortest distance to every vertex, not a single final vertex. So, I think that would stay separate. I think we can implement A* on top of GraphTraverser. But of course we can also pretty easily implement it on its own as well.

Personally, I think the cons outweigh the pros. Maybe I am missing some though. I'd like to discuss it further. I also, frankly, feel bad saying that because I see you put a lot of work into this and I see how well laid out and structured the code is, even if it does add complexity. If we were getting performance improvements for extra code complexity, I'd say let's do it. If we were getting simplified code for a slight performance decrease I'd say let's do it too. But I don't think we're actually getting simplified code here—we're getting several new structs and a new protocol for what were previously ~10-15 line single methods.

I would also add two further points:

I have an existing A* implementation from SwiftPriorityQueue that I plan to add. So, I will wait until we decide on this about how to add it.
I think your idea about a hybrid approach may be the best of both worlds in terms of performance and customization, but it actually makes the complexity issue even worse (we will then have all of the original methods plus all of this added machinery for non-simple cases with customized search properties).

In conclusion, I think the four things we need to ask ourselves with regards to this change are how it impacts: performance, user customization, maintenance, and code complexity. I would say they are as follows:

performance 👎🏼
customization 👍🏼
maintenance 😐 (neutral—more code to maintain with less code duplication, but 👎🏼 if we do hybrid)
code complexity 👎🏼

I think we should talk about it more though. I am open to making the change I just want to make sure we are doing it for the right reasons and not just because it's cooler from an algorithmic perspective or because you've already put a lot of time into it...

Best,
Dave

ferranpujolcamins · 2018-11-13T18:55:39Z

Hi @davecom thanks for your answer.

To be honest, when I started implementing all this my hope was that I could get code reuse without sacrificing performance by the use of generics. It turned out that generics in Swift are not like Rust or C++ so at the end I got super bad performance. But I wanted to present the results anyway to know your thoughts. Thanks to your explanation I now better understand what's your view on SwiftGraph so I think we can get to common ground. I'm opening yet another PR soon, so you can compare.
I will try to:

Do not drop performance on existing methods
Try to bring the unification of BFS and DFS apis without adding complexity. Let's just forget about making them conform to a protocol and simply make sure they both have the same set of methods. Also let's forget about shared implementations, let them be two totally independent classes.
Implement abstract DFS and BFS algorithms. So I really want DFS and BFS to get something like func from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int? and also the ability to specify a visit order. Let's just add this method to both DFS and BFS directly and just add an adhoc implementation to both of them. Forget about the other methods of this classes begin special cases of this one and just have different implementation.
Forget about the builder pattern, since it'll be no longer needed. (If we like it, we can add it afterwards)
KISS

davecom · 2018-11-13T19:36:41Z

Hi @ferranpujolcamins ,
I agree with the five goals that you laid out in bullet points. I think we can add the method in point three without modifying the existing methods. Yes, we have some code duplication, but it may be the only way to maintain performance. We can even pretty much reuse the implementations from your existing pull request without the protocol and maybe a couple other small changes. So, this is kind of your hybrid approach. In fact, maybe we should just do the hybrid approach? I don't know. I agree with KISS. It seems here we've run into a situation where KISS is in conflict with DRY. 😀

I really appreciate all of your work on the project. You've given it a good shot in the arm.

Best,
Dave

ferranpujolcamins · 2018-11-13T19:40:57Z

Thanks!
By hybrid approach you mean adding specialized implementations to BFS and DFS to this PR until we match performance on master?

davecom · 2018-11-13T19:53:10Z

Yes, what do you think? Or we could follow your advice above:

Try to bring the unification of BFS and DFS apis without adding complexity. Let's just forget about making them conform to a protocol and simply make sure they both have the same set of methods. Also let's forget about shared implementations, let them be two totally independent classes.

In that case can we still reuse most of your code? I think we can?

ferranpujolcamins · 2018-11-13T19:56:00Z

Let me finish this alternative simpler PR so we can compare.

ferranpujolcamins · 2018-11-17T19:02:13Z

Maybe the way to go to get top performance, api flexibility and code maintainability is meta-programming: https://github.com/krzysztofzablocki/Sourcery/blob/master/guides/Writing%20templates.md

ferranpujolcamins · 2018-11-18T18:31:01Z

I've tried to get good performance with meta-programming and again is difficult. To get max performance, each variant of the algorithm needs to be very specific, thus is difficult to work with a generic algorithm that fits all methods. The complication we would be adding to get good performance is too high.

So I vote to merge #57. If we have good test coverage and tests for all edge cases for all methods we will be fine.

ferranpujolcamins added 2 commits November 11, 2018 19:00

Add performance tests and add baselines

b74111b

Make common interface for DFS and BFS

c481e89

Add tests

4aede8c

ferranpujolcamins force-pushed the Search-interface-rework branch from cab1921 to 4aede8c Compare November 12, 2018 01:46

ferranpujolcamins mentioned this pull request Nov 14, 2018

Dfs improvement #56

Merged

ferranpujolcamins changed the title ~~Search interface rework~~ [WIP] Search interface rework Nov 17, 2018

ferranpujolcamins mentioned this pull request Nov 17, 2018

[WIP] Abstract dfs and bfs #57

Merged

ferranpujolcamins closed this Nov 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Search interface rework #55

[WIP] Search interface rework #55

ferranpujolcamins commented Nov 12, 2018 •

edited

ferranpujolcamins commented Nov 12, 2018 •

edited

davecom commented Nov 12, 2018

ferranpujolcamins commented Nov 13, 2018

davecom commented Nov 13, 2018 •

edited

ferranpujolcamins commented Nov 13, 2018

davecom commented Nov 13, 2018

ferranpujolcamins commented Nov 13, 2018

ferranpujolcamins commented Nov 17, 2018

ferranpujolcamins commented Nov 18, 2018 •

edited

[WIP] Search interface rework #55

[WIP] Search interface rework #55

Conversation

ferranpujolcamins commented Nov 12, 2018 • edited

Introduction

Explanation of the changes

Examples

API Changes

Performance

Future Performance

Notes

ferranpujolcamins commented Nov 12, 2018 • edited

TODO

davecom commented Nov 12, 2018

ferranpujolcamins commented Nov 13, 2018

davecom commented Nov 13, 2018 • edited

ferranpujolcamins commented Nov 13, 2018

davecom commented Nov 13, 2018

ferranpujolcamins commented Nov 13, 2018

ferranpujolcamins commented Nov 17, 2018

ferranpujolcamins commented Nov 18, 2018 • edited

ferranpujolcamins commented Nov 12, 2018 •

edited

ferranpujolcamins commented Nov 12, 2018 •

edited

davecom commented Nov 13, 2018 •

edited

ferranpujolcamins commented Nov 18, 2018 •

edited