Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimization by introducing bfs and topological sort #290

Closed
wants to merge 12 commits into from

Conversation

Enter-tainer
Copy link
Contributor

@Enter-tainer Enter-tainer commented May 20, 2022

For a DAG, we can find its shortest path by using bfs in $O(V+E)$, while dijkstra takes $O(V+E\log V)$ time. BTW, I guess maybe we can introduce heuristic search algorithms like A* to further optimize it.

I also use the arena to allocate nodes to avoid the overhead of refcount.

As for the performance issue discussed in #236, it took ~25s on my machine, and currently, it only takes ~15s.

A unit test is failing and some cleaning work is needed. I'm still working on this. Feel free to give comments!

@Wilfred
Copy link
Owner

Wilfred commented May 20, 2022

Wow, this is really awesome. I've spent a lot of time trying to make this code faster, so it's amazing to see more performance wins!

Do you have any sense of how much of the perf win is arenas, vs how much is switching to DFS (edit: oops, I meant BFS)?

I did experiment with A* search in this branch: https://github.com/Wilfred/difftastic/tree/a_star_module but I wasn't able to get a performance improvement and I was concerned it was significantly more complex. It's quite possible I was doing something wrong.

@Enter-tainer
Copy link
Contributor Author

I've done some profiles and I found that it takes a lot of time(30%) to drop the keys and values of the hashmap.
image

I guess it is because of the overhead introduced by Rc. So I switch to arenas and try to use references to avoid this issue. And the profile result seems good -- drop only takes 10% time.
image

@Enter-tainer
Copy link
Contributor Author

Enter-tainer commented May 20, 2022

And it seems that quite a lot of time is used on the rehash of hashmap. This happens when inserting into the hashmap and it is not large enough to hold the elements. In this situation, all data will be rehashed which is really expensive. So if we are able to estimate the vertex count before actually starting finding the shortest path, we can reserve the hashmap.

As for the DFS, I'm not pretty sure but I think naive DFS may not be able to find the shortest path. The BFS shortest path algorithm for DAG is correct because the nearer the vertex is, the earlier it is added to the BFS queue. So when we find the end vertex, the shortest path is found. Maybe we can use dp and DFS to solve this: $\displaystyle cost(V) = \min_{(V, V_i) \in E}cost(V_i) + edge(V, V_i)$. The time complexity of dp is the same as BFS: $O(V+E)$. But I think the dp algorithm still needs a hashmap or something to store the intermediate costs. I'm not pretty sure whether this will be faster or not.

And there are some statistics when I was testing:

  • baseline: ~25s
  • bfs: ~20s
  • bfs+arena: ~15s

I also did investigations on parallel algorithms such as parallel BFS. But I found it really hard because Vertex is not Send because it contains Cell<T>. This makes it impossible to send it around threads 😢.

@Enter-tainer
Copy link
Contributor Author

Enter-tainer commented May 20, 2022

I'm not familiar with the unit test. The error msg is as follows:

failures:

---- dijkstra::tests::atom_after_empty_list stdout ----
thread 'dijkstra::tests::atom_after_empty_list' panicked at 'assertion failed: `(left == right)`
  left: `[EnterNovelDelimiterLHS { contiguous: false }, EnterNovelDelimiterRHS { contiguous: false }, UnchangedNode { depth_difference: 0 }, UnchangedNode { depth_difference: 0 }, ExitDelimiterLHS, ExitDelimiterRHS]`,
 right: `[EnterNovelDelimiterRHS { contiguous: false }, EnterNovelDelimiterLHS { contiguous: false }, UnchangedNode { depth_difference: 0 }, UnchangedNode { depth_difference: 0 }, ExitDelimiterRHS, ExitDelimiterLHS]`', src/dijkstra.rs:320:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    dijkstra::tests::atom_after_empty_list

test result: FAILED. 102 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.04s

It seems that everything is equal, except for LHS and RHS being swapped. Any ideas on this?

@Wilfred
Copy link
Owner

Wilfred commented May 20, 2022

Just update that test: the two routes have the same length, so both are correct.

@Wilfred
Copy link
Owner

Wilfred commented May 21, 2022

FWIW it's totally reasonable to make Syntax implement Send. Syntax is immutable, and it only mutates the Cells during init.

@Enter-tainer Enter-tainer marked this pull request as ready for review May 21, 2022 04:09
@Enter-tainer
Copy link
Contributor Author

FWIW it's totally reasonable to make Syntax implement Send. Syntax is immutable, and it only mutates the Cells during init.

Yeah, I'm working on this. I've dived into it a little and I found that if we are to make &Vertex Send, Vertex itself must be Sync. Therefore, Syntax and SyntaxInfo must be Sync too. Unfortunately, SyntaxInfo contains Cell and it is !Sync. I'm trying to replace it with something like OnceCell. But I'm not sure whether it will work or not. Because OnceCell requires at most 1 time assignment. So if the init process assigns the cells more than once, this may not work.

@Wilfred
Copy link
Owner

Wilfred commented May 21, 2022

Wait, are you sure that BFS is giving the correct answer? It's a DAG, but the edges are weighted. BFS will find the route with the fewest edges, which isn't the same as the shortest route .

I see a significant number of sample files have changed their output. I'd expected these not to change.

@Enter-tainer
Copy link
Contributor Author

Wait, are you sure that BFS is giving the correct answer? It's a DAG, but the edges are weighted. BFS will find the route with the fewest edges, which isn't the same as the shortest route .

I see a significant number of sample files have changed their output. I'd expected these not to change.

Oh I think I've missed something, I will fix it. Maybe using dp.

@Wilfred
Copy link
Owner

Wilfred commented May 22, 2022

I've merged the arena allocation improvements in 1646d45 :)

Wilfred added a commit that referenced this pull request May 22, 2022
This reduces instruction counts by 4-6% on the sample files.

Thanks to @Enter-tainer for suggesting this in #290.
@Enter-tainer Enter-tainer changed the title perf: optimization by introducing arena and bfs perf: optimization by introducing bfs May 22, 2022
@Enter-tainer Enter-tainer marked this pull request as draft May 22, 2022 16:37
@Enter-tainer Enter-tainer changed the title perf: optimization by introducing bfs perf: optimization by introducing bfs and topological sort May 22, 2022
@Enter-tainer
Copy link
Contributor Author

Hi, I profile on the latest master branch. And I found that it is pretty fast! The performance is very similar to the wrong BFS code in this PR.

image

And currently, the bottleneck seems to be neighbour and hashmap-related functions. For the hashmap, I guess this may result from the large memory footprint, making it cache inefficient. Profile result shows a 20% cache miss.

So code in this PR might not be helpful. But I will still investigate performance optimizations! I will send a new PR if I can further optimize difftastic.

Wilfred pushed a commit that referenced this pull request Mar 16, 2023
Add `nameof` contextual keyword test cases
Wilfred pushed a commit that referenced this pull request Jun 13, 2023
chore: generate and sync latest changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants