perf: optimization by introducing bfs and topological sort #290

Enter-tainer · 2022-05-20T02:52:55Z

For a DAG, we can find its shortest path by using bfs in $O(V+E)$, while dijkstra takes $O(V+E\log V)$ time. BTW, I guess maybe we can introduce heuristic search algorithms like A* to further optimize it.

I also use the arena to allocate nodes to avoid the overhead of refcount.

As for the performance issue discussed in #236, it took ~25s on my machine, and currently, it only takes ~15s.

A unit test is failing and some cleaning work is needed. I'm still working on this. Feel free to give comments!

Wilfred · 2022-05-20T17:09:32Z

Wow, this is really awesome. I've spent a lot of time trying to make this code faster, so it's amazing to see more performance wins!

Do you have any sense of how much of the perf win is arenas, vs how much is switching to DFS (edit: oops, I meant BFS)?

I did experiment with A* search in this branch: https://github.com/Wilfred/difftastic/tree/a_star_module but I wasn't able to get a performance improvement and I was concerned it was significantly more complex. It's quite possible I was doing something wrong.

Enter-tainer · 2022-05-20T17:22:52Z

I've done some profiles and I found that it takes a lot of time(30%) to drop the keys and values of the hashmap.

I guess it is because of the overhead introduced by Rc. So I switch to arenas and try to use references to avoid this issue. And the profile result seems good -- drop only takes 10% time.

Enter-tainer · 2022-05-20T17:26:47Z

And it seems that quite a lot of time is used on the rehash of hashmap. This happens when inserting into the hashmap and it is not large enough to hold the elements. In this situation, all data will be rehashed which is really expensive. So if we are able to estimate the vertex count before actually starting finding the shortest path, we can reserve the hashmap.

As for the DFS, I'm not pretty sure but I think naive DFS may not be able to find the shortest path. The BFS shortest path algorithm for DAG is correct because the nearer the vertex is, the earlier it is added to the BFS queue. So when we find the end vertex, the shortest path is found. Maybe we can use dp and DFS to solve this: $\displaystyle cost(V) = \min_{(V, V_i) \in E}cost(V_i) + edge(V, V_i)$. The time complexity of dp is the same as BFS: $O(V+E)$. But I think the dp algorithm still needs a hashmap or something to store the intermediate costs. I'm not pretty sure whether this will be faster or not.

And there are some statistics when I was testing:

baseline: ~25s
bfs: ~20s
bfs+arena: ~15s

I also did investigations on parallel algorithms such as parallel BFS. But I found it really hard because Vertex is not Send because it contains Cell<T>. This makes it impossible to send it around threads 😢.

Enter-tainer · 2022-05-20T17:46:55Z

I'm not familiar with the unit test. The error msg is as follows:

failures:

---- dijkstra::tests::atom_after_empty_list stdout ----
thread 'dijkstra::tests::atom_after_empty_list' panicked at 'assertion failed: `(left == right)`
  left: `[EnterNovelDelimiterLHS { contiguous: false }, EnterNovelDelimiterRHS { contiguous: false }, UnchangedNode { depth_difference: 0 }, UnchangedNode { depth_difference: 0 }, ExitDelimiterLHS, ExitDelimiterRHS]`,
 right: `[EnterNovelDelimiterRHS { contiguous: false }, EnterNovelDelimiterLHS { contiguous: false }, UnchangedNode { depth_difference: 0 }, UnchangedNode { depth_difference: 0 }, ExitDelimiterRHS, ExitDelimiterLHS]`', src/dijkstra.rs:320:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    dijkstra::tests::atom_after_empty_list

test result: FAILED. 102 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.04s

It seems that everything is equal, except for LHS and RHS being swapped. Any ideas on this?

Wilfred · 2022-05-20T21:06:45Z

Just update that test: the two routes have the same length, so both are correct.

Wilfred · 2022-05-21T01:21:31Z

FWIW it's totally reasonable to make Syntax implement Send. Syntax is immutable, and it only mutates the Cells during init.

Enter-tainer · 2022-05-21T04:31:13Z

FWIW it's totally reasonable to make Syntax implement Send. Syntax is immutable, and it only mutates the Cells during init.

Yeah, I'm working on this. I've dived into it a little and I found that if we are to make &Vertex Send, Vertex itself must be Sync. Therefore, Syntax and SyntaxInfo must be Sync too. Unfortunately, SyntaxInfo contains Cell and it is !Sync. I'm trying to replace it with something like OnceCell. But I'm not sure whether it will work or not. Because OnceCell requires at most 1 time assignment. So if the init process assigns the cells more than once, this may not work.

Wilfred · 2022-05-21T04:43:57Z

Wait, are you sure that BFS is giving the correct answer? It's a DAG, but the edges are weighted. BFS will find the route with the fewest edges, which isn't the same as the shortest route .

I see a significant number of sample files have changed their output. I'd expected these not to change.

Enter-tainer · 2022-05-21T04:46:12Z

Wait, are you sure that BFS is giving the correct answer? It's a DAG, but the edges are weighted. BFS will find the route with the fewest edges, which isn't the same as the shortest route .

I see a significant number of sample files have changed their output. I'd expected these not to change.

Oh I think I've missed something, I will fix it. Maybe using dp.

Wilfred · 2022-05-22T05:28:33Z

I've merged the arena allocation improvements in 1646d45 :)

@Enter-tainer

This reduces instruction counts by 4-6% on the sample files. Thanks to @Enter-tainer for suggesting this in #290.

Enter-tainer · 2022-05-28T06:07:15Z

Hi, I profile on the latest master branch. And I found that it is pretty fast! The performance is very similar to the wrong BFS code in this PR.

And currently, the bottleneck seems to be neighbour and hashmap-related functions. For the hashmap, I guess this may result from the large memory footprint, making it cache inefficient. Profile result shows a 20% cache miss.

So code in this PR might not be helpful. But I will still investigate performance optimizations! I will send a new PR if I can further optimize difftastic.

Add `nameof` contextual keyword test cases

chore: generate and sync latest changes

Enter-tainer added 2 commits May 20, 2022 02:03

use bfs to calculate shortest path

e7d7d92

chore: apply clippy

af90231

Enter-tainer added 2 commits May 21, 2022 12:07

test: change testcase to pass ut

f5d1684

chore: dijkstra -> bfs

fb9648d

Enter-tainer marked this pull request as ready for review May 21, 2022 04:09

fix: shortest path and apply fmt

ab7d3ac

Enter-tainer force-pushed the perf branch from ab7d3ac to fb9648d Compare May 21, 2022 07:56

Merge branch 'master' into perf

b0616f5

Wilfred added a commit that referenced this pull request May 22, 2022

Preallocate the predecessors hashmap to reduce rehashing/reallocations

1fefc27

This reduces instruction counts by 4-6% on the sample files. Thanks to @Enter-tainer for suggesting this in #290.

Merge branch 'master' into perf

400e416

Enter-tainer changed the title ~~perf: optimization by introducing arena and bfs~~ perf: optimization by introducing bfs May 22, 2022

Enter-tainer marked this pull request as draft May 22, 2022 16:37

Enter-tainer changed the title ~~perf: optimization by introducing bfs~~ perf: optimization by introducing bfs and topological sort May 22, 2022

Enter-tainer added 5 commits May 24, 2022 22:17

Merge branch 'master' into perf

a728d78

keep in track with upstream and resolve conflicts

eecb8df

add bfs

7879904

update code to keep up with upstream changes

cdf87ff

fmt code

84a7a0f

Enter-tainer closed this May 28, 2022

Wilfred pushed a commit that referenced this pull request Mar 16, 2023

Merge pull request #290 from tamasvajk/contextual-keywords-nameof

eed2576

Add `nameof` contextual keyword test cases

Wilfred pushed a commit that referenced this pull request Jun 13, 2023

Merge pull request #290 from tree-sitter/generation

007a184

chore: generate and sync latest changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimization by introducing bfs and topological sort #290

perf: optimization by introducing bfs and topological sort #290

Enter-tainer commented May 20, 2022 •

edited

Wilfred commented May 20, 2022 •

edited

Enter-tainer commented May 20, 2022

Enter-tainer commented May 20, 2022 •

edited

Enter-tainer commented May 20, 2022 •

edited

Wilfred commented May 20, 2022

Wilfred commented May 21, 2022

Enter-tainer commented May 21, 2022

Wilfred commented May 21, 2022

Enter-tainer commented May 21, 2022

Wilfred commented May 22, 2022

Enter-tainer commented May 28, 2022

perf: optimization by introducing bfs and topological sort #290

perf: optimization by introducing bfs and topological sort #290

Conversation

Enter-tainer commented May 20, 2022 • edited

Wilfred commented May 20, 2022 • edited

Enter-tainer commented May 20, 2022

Enter-tainer commented May 20, 2022 • edited

Enter-tainer commented May 20, 2022 • edited

Wilfred commented May 20, 2022

Wilfred commented May 21, 2022

Enter-tainer commented May 21, 2022

Wilfred commented May 21, 2022

Enter-tainer commented May 21, 2022

Wilfred commented May 22, 2022

Enter-tainer commented May 28, 2022

Enter-tainer commented May 20, 2022 •

edited

Wilfred commented May 20, 2022 •

edited

Enter-tainer commented May 20, 2022 •

edited

Enter-tainer commented May 20, 2022 •

edited