Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: add table with amount of memory required at different matrix sizes #3

Closed
DilumAluthge opened this issue Jan 20, 2021 · 5 comments · Fixed by #6
Closed

Docs: add table with amount of memory required at different matrix sizes #3

DilumAluthge opened this issue Jan 20, 2021 · 5 comments · Fixed by #6

Comments

@DilumAluthge
Copy link
Member

E.g. here's what I'm envisioning:

Matrix Size Required memory
10 by 10 123 MB
100 by 100 456 MB
1k by 1k 789 MB
10k by 10k 123 GB
50k by 50k 456 GB
73k by 73k 789 GB
100k by 100k 123 TB

I just made those numbers up - obviously they are incorrect. But it would be nice to have a table like that, with the actual numbers.

That way, when people want to run these benchmarks on their own computer, they can just:

  1. Look up how much RAM their computer has
  2. Cross-reference with this table to figure out the biggest matrix size they can use

Also, if someone wants to run the benchmarks on a cluster, they can ask their scheduler (e.g. SLURM) for the necessary amount of memory.

@chriselrod
Copy link
Collaborator

Worth pointing out that it tries to be reasonably efficient, i.e. it reuses the same memory for all arrays being benchmarked. So only the largest array matters.
This is important because of a BenchmarkTools bug that stops memory used in benchmarks from being freed if you interpolated them.

This gives the memory requirement for the largest matrix, in GiB:

julia> mem_req(s, T = Float64) = 3s^2*sizeof(T) / (1 << 30)
mem_req (generic function with 2 methods)

julia> mem_req(10_000)
2.2351741790771484

julia> mem_req(20_000)
8.940696716308594

So for 10_000 x 10_000, the three matrices would take up about 2.25 GiB. Of course, some extra memory is needed for the Julia process, the OS, etc.

2.25 GiB isn't too bad. However, if you have an 8 core (16 thread) computer capable of running full-rate AVX2 at 4 GHz, you have

julia> 4 * (4 + 4) * 2 * 8
512

512 theoretical peak GFLOPS.
That means each multiplication would take a minimum of

julia> time_est(sz, gflops) = 2e-9 * sz^3 / gflops
time_est (generic function with 1 method)

julia> time_est(10_000, 512)
3.9062500000000004

julia> time_est(20_000, 512)
31.250000000000004

3.9 or 31 seconds. Running a lot of benchmarks with big matrices can take a while.

@DilumAluthge
Copy link
Member Author

Based on the above mem_req function, here's what I get:


Float64

Matrix Size Memory
1k by 1k 0.02 GiB
2k by 2k 0.09 GiB
3k by 3k 0.2 GiB
4k by 4k 0.36 GiB
5k by 5k 0.56 GiB
6k by 6k 0.8 GiB
7k by 7k 1.1 GiB
8k by 8k 1.43 GiB
9k by 9k 1.81 GiB
10k by 10k 2.24 GiB
11k by 11k 2.7 GiB
12k by 12k 3.22 GiB
13k by 13k 3.78 GiB
14k by 14k 4.38 GiB
15k by 15k 5.03 GiB
16k by 16k 5.72 GiB
17k by 17k 6.46 GiB
18k by 18k 7.24 GiB
19k by 19k 8.07 GiB
20k by 20k 8.94 GiB
30k by 30k 20.12 GiB
40k by 40k 35.76 GiB
50k by 50k 55.88 GiB
60k by 60k 80.47 GiB
70k by 70k 109.52 GiB
80k by 80k 143.05 GiB
90k by 90k 181.05 GiB
100k by 100k 223.52 GiB

Float32

Matrix Size Memory
1k by 1k 0.01 GiB
2k by 2k 0.04 GiB
3k by 3k 0.1 GiB
4k by 4k 0.18 GiB
5k by 5k 0.28 GiB
6k by 6k 0.4 GiB
7k by 7k 0.55 GiB
8k by 8k 0.72 GiB
9k by 9k 0.91 GiB
10k by 10k 1.12 GiB
11k by 11k 1.35 GiB
12k by 12k 1.61 GiB
13k by 13k 1.89 GiB
14k by 14k 2.19 GiB
15k by 15k 2.51 GiB
16k by 16k 2.86 GiB
17k by 17k 3.23 GiB
18k by 18k 3.62 GiB
19k by 19k 4.03 GiB
20k by 20k 4.47 GiB
30k by 30k 10.06 GiB
40k by 40k 17.88 GiB
50k by 50k 27.94 GiB
60k by 60k 40.23 GiB
70k by 70k 54.76 GiB
80k by 80k 71.53 GiB
90k by 90k 90.52 GiB
100k by 100k 111.76 GiB

Int64

Matrix Size Memory
1k by 1k 0.02 GiB
2k by 2k 0.09 GiB
3k by 3k 0.2 GiB
4k by 4k 0.36 GiB
5k by 5k 0.56 GiB
6k by 6k 0.8 GiB
7k by 7k 1.1 GiB
8k by 8k 1.43 GiB
9k by 9k 1.81 GiB
10k by 10k 2.24 GiB
11k by 11k 2.7 GiB
12k by 12k 3.22 GiB
13k by 13k 3.78 GiB
14k by 14k 4.38 GiB
15k by 15k 5.03 GiB
16k by 16k 5.72 GiB
17k by 17k 6.46 GiB
18k by 18k 7.24 GiB
19k by 19k 8.07 GiB
20k by 20k 8.94 GiB
30k by 30k 20.12 GiB
40k by 40k 35.76 GiB
50k by 50k 55.88 GiB
60k by 60k 80.47 GiB
70k by 70k 109.52 GiB
80k by 80k 143.05 GiB
90k by 90k 181.05 GiB
100k by 100k 223.52 GiB

Int32

Matrix Size Memory
1k by 1k 0.01 GiB
2k by 2k 0.04 GiB
3k by 3k 0.1 GiB
4k by 4k 0.18 GiB
5k by 5k 0.28 GiB
6k by 6k 0.4 GiB
7k by 7k 0.55 GiB
8k by 8k 0.72 GiB
9k by 9k 0.91 GiB
10k by 10k 1.12 GiB
11k by 11k 1.35 GiB
12k by 12k 1.61 GiB
13k by 13k 1.89 GiB
14k by 14k 2.19 GiB
15k by 15k 2.51 GiB
16k by 16k 2.86 GiB
17k by 17k 3.23 GiB
18k by 18k 3.62 GiB
19k by 19k 4.03 GiB
20k by 20k 4.47 GiB
30k by 30k 10.06 GiB
40k by 40k 17.88 GiB
50k by 50k 27.94 GiB
60k by 60k 40.23 GiB
70k by 70k 54.76 GiB
80k by 80k 71.53 GiB
90k by 90k 90.52 GiB
100k by 100k 111.76 GiB

mem_req(s, T) = 3s^2*sizeof(T) / (1 << 30)

function f(::Type{T}, Ns = nothing) where {T}
    println("| Matrix Size | Memory |")
    println("| ----------- | ------ |")
    if Ns isa Nothing
        _Ns = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100]
    else 
        _Ns = Ns
    end
    for N in _Ns 
        mem = mem_req(N * 1_000, T)
        m = round(mem; digits = 2)
        println("| $(N)k by $(N)k | $(m) GiB |")
    end
    return nothing
end

@DilumAluthge
Copy link
Member Author

I'll make a PR to add those tables to a page in the docs.

@chriselrod
Copy link
Collaborator

chriselrod commented Jan 20, 2021

Sorry, I realized it creates two C matrices, so that it can check that all the results match.
So mem_req should be multiplying by 4, not 3, since we need 4 matrices total.

@DilumAluthge
Copy link
Member Author

Ah good point. I'll regenerate the tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants