Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparation for benchmark in hpc #22

Merged
merged 20 commits into from
Jan 13, 2021
Merged

Conversation

georgegito
Copy link
Collaborator

No description provided.

@georgegito
Copy link
Collaborator Author

georgegito commented Jan 10, 2021

For increasing number of processes, average nodes each query point visits is increasing, but total time is decreasing:

  • Parallelism outruns this visits increase.

  • Check cache misses.

  • Analyze VPT search behavior for increasing number of processors.

@georgegito
Copy link
Collaborator Author

georgegito commented Jan 10, 2021

Need to declare a reasonable b value with respect to local n.

#Update: b = 0.3 * log2(n / world_size)

Repository owner deleted a comment from georgegito Jan 10, 2021
@Stavrosfil
Copy link
Owner

We can also measure how long it takes to calculate each local knn based on the tree, and how long each data transaction takes, to search for possible bottlenecks.

@georgegito
Copy link
Collaborator Author

v1: Almost perfect speedup (check cache misses).
v2: The number of visited nodes increases by increasing processors, so speedup is a little worse.

@georgegito
Copy link
Collaborator Author

We can also measure how long it takes to calculate each local knn based on the tree, and how long each data transaction takes, to search for possible bottlenecks.

Sure, we could choose some of our datasets and analyze them in depth.

@Stavrosfil
Copy link
Owner

Stavrosfil commented Jan 11, 2021

I will perform an in depth cache miss and branch miss analysis for the single threaded instance of the problem, and maybe a simple profiling report for the speedup analysis.
Appropriate plots will be included in the final report.

@georgegito
Copy link
Collaborator Author

We could compare v1 versus v2 average communication time and analyze the tree sending time.

@Stavrosfil
Copy link
Owner

Yes, each possible procedure in the algorithm should be analyzed to an extend. Communication times, tree build and rebuild and processing times should be measured for comparisons.

We should also take b and y block size into account, both for speed and cache misses.

@georgegito
Copy link
Collaborator Author

Yes, each possible procedure in the algorithm should be analyzed to an extend. Communication times, tree build and rebuild and processing times should be measured for comparisons.

We should also take b and y block size into account, both for speed and cache misses.

If we take b and y block size into account too, we will need 10 pages for the report..

@Stavrosfil Stavrosfil linked an issue Jan 11, 2021 that may be closed by this pull request
@Stavrosfil
Copy link
Owner

Stavrosfil commented Jan 11, 2021

Let's move the discussion to #23 to avoid clutter here.

@Stavrosfil Stavrosfil merged commit 147c3bf into master Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmarking - Profiling discussion
2 participants