Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes the performance issue in Project() #2383

Merged
merged 1 commit into from
Jan 15, 2023

Conversation

sighingnow
Copy link
Collaborator

@sighingnow sighingnow commented Jan 15, 2023

What do these changes do?

ProjectToSimple() requires scan all edges to establish the new offset ranges. The previous implements runs slow (slower than scan edges for many times in pagerank) because

  • It use a scan implementation to find the start and end point for a given label. As the CSR is internally sorted, a bisect could be enough, the complexity is O(E).
  • It use a single-thread implementation for vertices
  • The method getRangeOfLabel takes std::shared_ptr as argument, the shared ptr would be constructed and destructed for O(V) times.

This PR optimize the implementation. Benchmark shows that it improve the Project() operation from 2~3 seconds to 0.02~0.1 seconds (consider the overhead of creating threads for parallelism) on an internal datasets. (PageRank requires about 0.02~0.03 seconds to scan the edges).

Related issue number

Part of #1300

@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2023

Codecov Report

Merging #2383 (e7549dc) into main (45cec05) will decrease coverage by 0.76%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2383      +/-   ##
==========================================
- Coverage   73.20%   72.43%   -0.77%     
==========================================
  Files          88       88              
  Lines        9752     9753       +1     
==========================================
- Hits         7139     7065      -74     
- Misses       2613     2688      +75     
Impacted Files Coverage Δ
python/graphscope/config.py 100.00% <100.00%> (ø)
python/graphscope/tests/unittest/test_context.py 81.35% <100.00%> (+0.32%) ⬆️
python/graphscope/tests/unittest/test_java_app.py 52.38% <0.00%> (-47.62%) ⬇️
python/graphscope/analytical/app/java_app.py 24.36% <0.00%> (-4.57%) ⬇️
python/graphscope/client/rpc.py 81.34% <0.00%> (-2.99%) ⬇️
python/graphscope/framework/graph_utils.py 80.15% <0.00%> (-2.39%) ⬇️
python/graphscope/client/session.py 75.37% <0.00%> (-1.51%) ⬇️
python/graphscope/framework/app.py 90.32% <0.00%> (-1.39%) ⬇️
python/graphscope/framework/utils.py 67.50% <0.00%> (-1.00%) ⬇️
python/graphscope/framework/graph_schema.py 64.61% <0.00%> (-0.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45cec05...e7549dc. Read the comment docs.

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
@sighingnow sighingnow merged commit 379a881 into alibaba:main Jan 15, 2023
@sighingnow sighingnow deleted the ht/fix-project branch January 15, 2023 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants