Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Search interface rework #55

Conversation

ferranpujolcamins
Copy link
Collaborator

@ferranpujolcamins ferranpujolcamins commented Nov 12, 2018

Introduction

This PR is the follow of PR #43.

The goals of this PR are:

  • Remove code duplication and homogenize the api for DFS and BFS.
  • Extend the api for DFS and BFS so an arbitrary computation can be performed over a graph (a closure is fed with the graph vertices in a DFS or BFS order).

Explanation of the changes

GraphTraverser is a protocol representing an algorithm to traverse a graph. Types implementing it implement a graph traversal algorithm in its from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int? method.

An extension to GraphTraverser provides with default implementations for several convenience methods that wrap from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int?.

DFS and BFS are structs conforming to GraphTraverser, so both algorithms have the same api provided by the extension. Furthermore, both for BFS and DFS, there's an extension to Graph with convenience methods to perform a search by calling a method on a graph instance.

DFSVisitOrder and BFSVisitOrder are variants of DFS and BFS that can guarantee a certain order of traversal for the neighbors of each visited vertex. I've kept them as a separate algorithm because the visit order has a performance penalty, even when the ordering closure is a no-op ({ $0 }). DFSand BFS have the withVisitOrder method to easily construct a DFSVisitOrder and BFSVisitOrder variant.

Examples

// Perform a DFS on graph from Seattle to Miami
_ = graph.dfs(from: "Seattle", to: "Miami")
_ = DFS(on: graph).from("Seattle", to: "Miami")

// Perform a BFS on graph visiting neighbors in lexicographic order
_ = BFS(on: graph)
         .withVisitOrder({ $0.sorted(by: { $0.v < $1.v }) })
         .from(index: 0, toIndex: 5)
_ = BFSVisitOrder(on: graph, withVisitOrder: { $0.sorted(by: { $0.v < $1.v }) }))
         .from(index: 0, toIndex: 5)

// Perform an abstract computation over graph in a depth-first order
_ = DFS(on: graph).from(0, goalTest: somePredicate, reducer: { print($0) })

API Changes

This PR introduces one and only one breaking api change:
findAll is gone, now you must choose between findAllBFS and findAllDFS

All other changes are non-breaking changes to the api.

Performance

Swift is not Rust, so more often than not abstraction comes with a performance penalty. In this PR I've sacrificed a lot of performance in favor of less code duplication.

Below there's a table with the results of some of the performance tests compared to master, stating execution time difference (+ is bad, - is good):

Test   vs. master (cf9a6a0) vs. master before latest perf. improvements (d01175e)
testDfsInStarGraphWithGoalTestByIndex +45% +11%
testBfsInStarGraphWithGoalTestByIndex +28% -89%
testDfsInPathWithGoalTestByIndex +42% +2%
testBfsInPathWithGoalTestByIndex +34% +1%
testDfsInCompleteGraphWithGoalTestByIndex -98% -98%
testBfsInCompleteGraphWithGoalTestByIndex -4% -26%
testDfsInStarGraphByIndex -42% -62%
testBfsInStarGraphByIndex +118% -86%
testDfsInPathByIndex +85% +30%
testBfsInPathByIndex +75% +21%

As you can see, this PR introduces a major performance regression compared to our current master, although with some surprising big improvements on some cases. So, if performance was the only consideration we should not merge this PR.

When comparing to a commit on master before the latest merged performance improvements, we see that after merging this PR we could ship a new version with good performance improvements in most cases and only some performance drop of up to 30% in few cases.

Future Performance

This PR as is, is bad for performance. The main reason of this drop of performance is that all the methods implemented in the extension of GraphTraverser are now written as special cases of an abstract algorithm that operates with closures. Before, this algorithms had no function calls at all.

The good news is that this inefficient methods can be overwritten in DFS and BFS. For example, bfs(fromIndex: Int, toIndex: Int) -> [E] is the method which has seen its performance dropped the most. We could write a specialized version of the dfs for it, with no closures, just like it is in master now. We must decide what balance between code reuse and performance we want. I left this for future PR though.

The someone implementing a new totally new GraphTraverser can follow a progressive path:

  1. Implement GraphTraverser and get a bunch of non-optimized convenience methods for free.
  2. Write code using their GraphTraverser implementation, benchmark it and then decide if some methods need to be specialized for better performance.

Also if Swift ever gets more efficient abstractions we might be able to remove some of the specialized functions.

Notes

  • I've tried merging the goalTest and the reducer closures of the from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int? method into a single closure returning a tuple of bools, but this resulted in a slight loss of performance. It's ok since I thing the current solution with two closure is more ergonomic.
  • I've tried making DFS and BFS conform to Sequence, so both can be consumed with a for in loop but it seemed to drop performance a bit.
  • Can the Dijkstra's algorithm be made to conform GraphTraverser? I think we could. But shall we? The main point of GraphTraverser is to get the bunch of convenience methods it defines in its extension for free. Are those methods a sound interface to Dijkstra's algorithm?

@ferranpujolcamins
Copy link
Collaborator Author

ferranpujolcamins commented Nov 12, 2018

TODO

  • Add test coverage for DFSVisitOrder and BFSVisitOrder

@davecom
Copy link
Owner

davecom commented Nov 12, 2018

Hi @ferranpujolcamins ,
Thank you for the amazing work on this and the great write up. I wish everyone in open source wrote as thoughtful and thorough a dialog as you did here. It's a very interesting set of pros and cons that we have here:

Pros:

  • Removes some code duplication
  • Creates an abstraction that allows more flexibility if someone wants to create a custom search
  • Maybe will make adding some additional search algorithms in the future a little easier (or maybe not as well; since there may be performance costs and issues in conforming to the protocol)

Cons:

  • We have something that is a significant performance regression (very significant in some test cases)
  • Adds some real complexity to the code base. The current search implementations are very easy to reason about since there are not a lot of moving parts; now someone digging in needs to know what a GraphTraverser is; what a VisitOrder is; and what certain closures should provide. I guess SwiftGraph is at the point now though, where it's no longer a teaching project. People are using it in real apps and maybe we shouldn't worry as much about whether someone new can learn an algorithm from it? On the other hand this can also be another point against the performance hit. If we are thinking about SwiftGraph now as mainly targeted towards production users, then shouldn't we really avoid these big performance regressions that come with the added complexity?
  • Even though the point here originally (I think) was to eliminate code duplication; even though we're doing that we're actually adding more protocols, more structs, and more code in total to maintain than what we had before.

To get some little things out of the way: I agree with the API break with regards to findAllBFS and findAllDFS. I really don't think that's a big deal at all.

So, is the cool abstraction we're getting worth the performance degradation and the added complexity being introduced into the code base? My guess is the vast majority of users of SwiftGraph would benefit more from a built-in implementation of A* than they would from being able to customize BFS or DFS. I don't think GraphTraverser as it stands would be a suitable protocol for Dijkstra since Dijkstra returns the shortest distance to every vertex, not a single final vertex. So, I think that would stay separate. I think we can implement A* on top of GraphTraverser. But of course we can also pretty easily implement it on its own as well.

Personally, I think the cons outweigh the pros. Maybe I am missing some though. I'd like to discuss it further. I also, frankly, feel bad saying that because I see you put a lot of work into this and I see how well laid out and structured the code is, even if it does add complexity. If we were getting performance improvements for extra code complexity, I'd say let's do it. If we were getting simplified code for a slight performance decrease I'd say let's do it too. But I don't think we're actually getting simplified code here—we're getting several new structs and a new protocol for what were previously ~10-15 line single methods.

I would also add two further points:

  • I have an existing A* implementation from SwiftPriorityQueue that I plan to add. So, I will wait until we decide on this about how to add it.
  • I think your idea about a hybrid approach may be the best of both worlds in terms of performance and customization, but it actually makes the complexity issue even worse (we will then have all of the original methods plus all of this added machinery for non-simple cases with customized search properties).

In conclusion, I think the four things we need to ask ourselves with regards to this change are how it impacts: performance, user customization, maintenance, and code complexity. I would say they are as follows:

  • performance 👎🏼
  • customization 👍🏼
  • maintenance 😐 (neutral—more code to maintain with less code duplication, but 👎🏼 if we do hybrid)
  • code complexity 👎🏼

I think we should talk about it more though. I am open to making the change I just want to make sure we are doing it for the right reasons and not just because it's cooler from an algorithmic perspective or because you've already put a lot of time into it...

Best,
Dave

@ferranpujolcamins
Copy link
Collaborator Author

Hi @davecom thanks for your answer.

To be honest, when I started implementing all this my hope was that I could get code reuse without sacrificing performance by the use of generics. It turned out that generics in Swift are not like Rust or C++ so at the end I got super bad performance. But I wanted to present the results anyway to know your thoughts. Thanks to your explanation I now better understand what's your view on SwiftGraph so I think we can get to common ground. I'm opening yet another PR soon, so you can compare.
I will try to:

  • Do not drop performance on existing methods
  • Try to bring the unification of BFS and DFS apis without adding complexity. Let's just forget about making them conform to a protocol and simply make sure they both have the same set of methods. Also let's forget about shared implementations, let them be two totally independent classes.
  • Implement abstract DFS and BFS algorithms. So I really want DFS and BFS to get something like func from(_ initalVertexIndex: Int, goalTest: (Int) -> Bool, reducer: G.Reducer) -> Int? and also the ability to specify a visit order. Let's just add this method to both DFS and BFS directly and just add an adhoc implementation to both of them. Forget about the other methods of this classes begin special cases of this one and just have different implementation.
  • Forget about the builder pattern, since it'll be no longer needed. (If we like it, we can add it afterwards)
  • KISS

@davecom
Copy link
Owner

davecom commented Nov 13, 2018

Hi @ferranpujolcamins ,
I agree with the five goals that you laid out in bullet points. I think we can add the method in point three without modifying the existing methods. Yes, we have some code duplication, but it may be the only way to maintain performance. We can even pretty much reuse the implementations from your existing pull request without the protocol and maybe a couple other small changes. So, this is kind of your hybrid approach. In fact, maybe we should just do the hybrid approach? I don't know. I agree with KISS. It seems here we've run into a situation where KISS is in conflict with DRY. 😀

I really appreciate all of your work on the project. You've given it a good shot in the arm.

Best,
Dave

@ferranpujolcamins
Copy link
Collaborator Author

Thanks!
By hybrid approach you mean adding specialized implementations to BFS and DFS to this PR until we match performance on master?

@davecom
Copy link
Owner

davecom commented Nov 13, 2018

Yes, what do you think? Or we could follow your advice above:

Try to bring the unification of BFS and DFS apis without adding complexity. Let's just forget about making them conform to a protocol and simply make sure they both have the same set of methods. Also let's forget about shared implementations, let them be two totally independent classes.

In that case can we still reuse most of your code? I think we can?

@ferranpujolcamins
Copy link
Collaborator Author

Let me finish this alternative simpler PR so we can compare.

@ferranpujolcamins ferranpujolcamins mentioned this pull request Nov 14, 2018
@ferranpujolcamins
Copy link
Collaborator Author

Maybe the way to go to get top performance, api flexibility and code maintainability is meta-programming: https://github.com/krzysztofzablocki/Sourcery/blob/master/guides/Writing%20templates.md

@ferranpujolcamins ferranpujolcamins changed the title Search interface rework [WIP] Search interface rework Nov 17, 2018
@ferranpujolcamins
Copy link
Collaborator Author

ferranpujolcamins commented Nov 18, 2018

I've tried to get good performance with meta-programming and again is difficult. To get max performance, each variant of the algorithm needs to be very specific, thus is difficult to work with a generic algorithm that fits all methods. The complication we would be adding to get good performance is too high.

So I vote to merge #57. If we have good test coverage and tests for all edge cases for all methods we will be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants