Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.9 rc1 significant slowdown in garbage collection #49120

Closed
denglerchr opened this issue Mar 23, 2023 · 16 comments
Closed

v1.9 rc1 significant slowdown in garbage collection #49120

denglerchr opened this issue Mar 23, 2023 · 16 comments
Labels
GC Garbage collector kind:regression Regression in behavior compared to a previous version
Milestone

Comments

@denglerchr
Copy link

I have a significant slowdown for some code in Julia 1.9 rc1 which I could track down to garbage collection during filling of arrays. There seems to be something fishy with v1.9 rc1.
Below is some reduced code to reproduce the issue. The garbage collection is here significantly slower for Julia 1.9 compared to previous versions.
Total runtime for ´julia test.jl' on my machine is

  • around 5 s for Julia 1.8
  • around 94 s for Julia 1.9
using Random: seed!
seed!(1)

abstract type Cell end

struct CellA<:Cell
    a::Int
end

struct CellB<:Cell
    b::String
end

function fillcells!(mc::Array{Cell})
    for ind in eachindex(mc)
        mc[ind] = ifelse(rand() > 0.5, CellA(ind), CellB(string(ind)))
    end
    return mc
end

mcells = Array{Cell}(undef, 5000, 5000 )
t1 = @elapsed fillcells!(mcells)
t2 = @elapsed fillcells!(mcells)

println("filling: $t1 s\nfilling again: $t2 s")

I saw some GC issues in #48923 , not sure if this is related. If so, let me know and I will close this.

@oscardssmith
Copy link
Member

Is this a mac? If so this is a known issue that's fixed in RC2.

@KristofferC
Copy link
Sponsor Member

Might be #48923

@denglerchr
Copy link
Author

denglerchr commented Mar 23, 2023

No, it is on Linux unfortunately. Also issue remains on nightly 703b3f8.

@d-netto
Copy link
Member

d-netto commented Mar 23, 2023

The number of loaded packages is small, so I don't think #48923 could be affecting this benchmark.

FWIW, we are doing a lot more collections on 1.9:

  • 1.8.5:
GC: pause 4.01ms. collected 10.198471MB. incr 
GC: pause 34.94ms. collected 34.963833MB. incr 
GC: pause 84.85ms. collected 75.577256MB. incr 
GC: pause 109.72ms. collected 79.675280MB. full 
GC: pause 213.68ms. collected 85.997408MB. full 
GC: pause 321.27ms. collected 215.055408MB. full 
GC: pause 591.63ms. collected 533.485312MB. full 
GC: pause 1086.78ms. collected 1378.285368MB. full 
filling: 3.566998475 s
filling again: 3.228617506 s
  • 1.9.0-rc1:
GC: pause 67.15ms. collected 6.790489MB. incr 
GC: pause 36.52ms. collected 34.751601MB. incr 
GC: pause 55.13ms. collected 34.410968MB. incr 
GC: pause 75.47ms. collected 34.403344MB. full 
GC: pause 152.60ms. collected 34.417240MB. full 
GC: pause 169.60ms. collected 34.399952MB. full 
GC: pause 185.98ms. collected 34.403560MB. full 
GC: pause 202.43ms. collected 34.388320MB. full 
GC: pause 263.89ms. collected 34.406648MB. full 
GC: pause 236.16ms. collected 34.409280MB. full 
GC: pause 257.39ms. collected 34.406736MB. full 
GC: pause 269.43ms. collected 34.408752MB. full 
GC: pause 286.03ms. collected 34.406968MB. full 
GC: pause 327.78ms. collected 34.416368MB. full 
GC: pause 320.19ms. collected 34.405104MB. full 
GC: pause 369.43ms. collected 34.418976MB. full 
GC: pause 397.49ms. collected 34.403640MB. full 
GC: pause 370.10ms. collected 34.413672MB. full 
GC: pause 402.93ms. collected 34.407912MB. full 
GC: pause 403.85ms. collected 34.408168MB. full 
GC: pause 457.09ms. collected 34.407040MB. full 
GC: pause 442.94ms. collected 34.417576MB. full 
GC: pause 493.65ms. collected 34.397328MB. full 
GC: pause 483.23ms. collected 34.423648MB. full 
GC: pause 487.27ms. collected 34.411808MB. full 
GC: pause 541.25ms. collected 33.980184MB. full 
GC: pause 516.42ms. collected 33.632160MB. full 
GC: pause 560.09ms. collected 33.627168MB. full 
GC: pause 559.40ms. collected 33.642096MB. full 
GC: pause 589.34ms. collected 33.638320MB. full 
GC: pause 581.41ms. collected 33.664896MB. full 
GC: pause 641.15ms. collected 33.639904MB. full 
GC: pause 615.90ms. collected 33.627584MB. full 
GC: pause 630.25ms. collected 33.646992MB. full 
GC: pause 688.77ms. collected 33.634736MB. full 
GC: pause 663.60ms. collected 33.628400MB. full 
GC: pause 711.74ms. collected 33.650528MB. full 
GC: pause 696.64ms. collected 33.641456MB. full 
GC: pause 749.13ms. collected 33.637648MB. full 
GC: pause 730.56ms. collected 33.657232MB. full 
GC: pause 747.19ms. collected 33.634912MB. full 
GC: pause 803.27ms. collected 33.653600MB. full 
GC: pause 779.60ms. collected 33.642720MB. full 
GC: pause 839.74ms. collected 33.643504MB. full 
GC: pause 821.32ms. collected 33.637008MB. full 
GC: pause 842.50ms. collected 33.641488MB. full 
GC: pause 846.64ms. collected 33.651024MB. full 
GC: pause 861.75ms. collected 33.657360MB. full 
GC: pause 910.56ms. collected 33.649168MB. full 
GC: pause 895.63ms. collected 33.632240MB. full 
GC: pause 961.85ms. collected 33.636976MB. full 
GC: pause 928.86ms. collected 33.657312MB. full 
GC: pause 945.02ms. collected 33.620464MB. full 
GC: pause 999.88ms. collected 33.635504MB. full 
GC: pause 982.78ms. collected 33.633616MB. full 
GC: pause 1030.34ms. collected 33.629536MB. full 
GC: pause 1012.70ms. collected 33.659360MB. full 
GC: pause 1028.10ms. collected 33.623680MB. full 
GC: pause 1087.57ms. collected 33.648432MB. full 
GC: pause 1061.18ms. collected 33.641808MB. full 
GC: pause 1080.80ms. collected 33.647808MB. full 
GC: pause 1137.38ms. collected 33.644848MB. full 
GC: pause 1110.80ms. collected 33.642080MB. full 
GC: pause 1127.25ms. collected 33.655104MB. full 
GC: pause 1190.59ms. collected 37.473696MB. full 
GC: pause 1153.88ms. collected 45.852728MB. full 
GC: pause 1172.37ms. collected 45.884352MB. full 
GC: pause 1161.35ms. collected 45.877512MB. full 
GC: pause 1156.07ms. collected 45.856904MB. full 
GC: pause 1169.86ms. collected 45.887744MB. full 
GC: pause 1155.81ms. collected 45.890536MB. full 
GC: pause 1156.82ms. collected 45.879776MB. full 
GC: pause 1193.41ms. collected 45.858280MB. full 
GC: pause 1160.31ms. collected 45.862328MB. full 
GC: pause 1159.53ms. collected 45.868136MB. full 
GC: pause 1161.25ms. collected 45.871096MB. full 
GC: pause 1204.36ms. collected 45.862464MB. full 
GC: pause 1163.47ms. collected 45.869152MB. full 
GC: pause 1163.83ms. collected 45.880328MB. full 
GC: pause 1212.69ms. collected 45.865936MB. full 
GC: pause 1165.71ms. collected 45.861096MB. full 
GC: pause 1169.89ms. collected 45.869000MB. full 
GC: pause 1167.31ms. collected 45.865880MB. full 
GC: pause 1210.51ms. collected 45.891904MB. full 
GC: pause 1169.93ms. collected 45.860176MB. full 
GC: pause 1173.41ms. collected 45.882840MB. full 
GC: pause 1197.74ms. collected 45.878024MB. full 
GC: pause 1174.25ms. collected 45.849552MB. full 
GC: pause 1174.11ms. collected 45.869592MB. full 
GC: pause 1179.72ms. collected 45.882544MB. full 
GC: pause 1213.05ms. collected 45.882688MB. full 
GC: pause 1181.81ms. collected 45.872368MB. full 
GC: pause 1183.26ms. collected 45.899648MB. full 
GC: pause 1185.66ms. collected 45.871536MB. full 
GC: pause 1185.15ms. collected 45.861456MB. full 
GC: pause 1228.86ms. collected 45.908352MB. full 
GC: pause 1187.62ms. collected 45.881552MB. full 
GC: pause 1192.51ms. collected 45.880400MB. full 
GC: pause 1192.16ms. collected 45.875680MB. full 
GC: pause 1240.61ms. collected 45.879984MB. full 
GC: pause 1194.24ms. collected 45.868464MB. full 
GC: pause 1198.63ms. collected 45.874480MB. full 
GC: pause 1198.35ms. collected 45.856720MB. full 
GC: pause 1216.16ms. collected 45.851456MB. full 
GC: pause 1231.99ms. collected 45.872032MB. full 
GC: pause 1204.07ms. collected 45.865936MB. full 
GC: pause 1203.68ms. collected 45.872208MB. full 
GC: pause 1204.24ms. collected 45.879024MB. full 
GC: pause 1227.46ms. collected 45.879664MB. full 
GC: pause 1211.78ms. collected 45.864832MB. full 
GC: pause 1209.41ms. collected 45.858048MB. full 
GC: pause 1210.23ms. collected 45.813392MB. full 
GC: pause 1212.27ms. collected 45.885424MB. full 
GC: pause 1252.96ms. collected 45.896848MB. full 
GC: pause 1217.07ms. collected 45.866448MB. full 
GC: pause 1217.12ms. collected 45.879232MB. full 
GC: pause 1218.14ms. collected 45.896880MB. full 
GC: pause 1222.23ms. collected 45.877728MB. full 
GC: pause 1221.28ms. collected 45.868944MB. full 
GC: pause 1267.69ms. collected 45.851040MB. full 
GC: pause 1224.28ms. collected 45.870176MB. full 
GC: pause 1228.03ms. collected 45.890464MB. full 
GC: pause 1228.11ms. collected 45.864160MB. full 
GC: pause 1227.94ms. collected 45.883888MB. full 
GC: pause 1281.35ms. collected 45.873344MB. full 
GC: pause 1234.88ms. collected 45.880016MB. full 
GC: pause 1233.84ms. collected 45.869696MB. full 
GC: pause 1233.89ms. collected 45.854336MB. full 
filling: 41.371417009 s
filling again: 78.452537743 s

@d-netto
Copy link
Member

d-netto commented Mar 23, 2023

Also:

julia> versioninfo()
Julia Version 1.9.0-rc1.1
Commit b1b388a6d1 (2023-03-08 14:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7502 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 128 virtual cores

@d-netto d-netto added kind:regression Regression in behavior compared to a previous version GC Garbage collector status:bisect wanted labels Mar 23, 2023
@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 23, 2023

(as the label indicates) a bisection would be extremely helpful here.

Edit: I am doing one.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 23, 2023

It looks like we used to be willing to let the collections grow exponentially in size, while now we cap them at some fixed rate

@KristofferC KristofferC added this to the 1.9 milestone Mar 23, 2023
@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 23, 2023

Bisected to #44215. The title of that PR and the behaviour here seem to be at odds with each other...

@gbaraldi
Copy link
Member

A local profile shows a bunch of time spent doing the in_image check so I'm not so sure it's not just that. Did the bisection already show a big regression there?

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 28, 2023

There might have been two separate regressions. But IIRC #44215 was one of the big ones. And just checking in_image shouldn't change the behavior of the GC as in the OP, just the speed of it, no?

@KristofferC
Copy link
Sponsor Member

I didn't really notice any improvement here with the pkgimage bisection GC PR.

@gbaraldi
Copy link
Member

The GC PRs should only make this slightly better because they make the GC faster, but the issue here is why the hell are we doing so many small GCs

@KristofferC
Copy link
Sponsor Member

I'm a bit worried about how much code was deleted in #44215. Was all that really irrelevant?

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 30, 2023

Before #44215, the GC_TIME output looks something like

GC mark pause 581.52 ms | scanned 0 kB = 0 + 0 | remset 0 25000167
GC sweep arrays 0.00 ms (freed 0 / 33)
GC sweep big 0.00 ms (freed 0 / 38 with 0 rst)
GC sweep pools end 266.47 ms at 5.3 GB/s (skipped 26.41 % of 93194, swept 68579 pgs, 24615 freed with 0 lazily) full
GC sweep sysimg end 7.76 ms
GC sweep pause 274.30 ms live 939259 kB (freed 699977 kB EST 699977 kB [error 0] = 99% of allocd b 700000) (0.02 ms in post_mark) full | next in -700000 kB

GC mark pause 602.97 ms | scanned 0 kB = 0 + 0 | remset 0 25000035
GC sweep arrays 0.00 ms (freed 0 / 33)
GC sweep big 0.00 ms (freed 0 / 38 with 0 rst)
GC sweep pools end 310.55 ms at 4.9 GB/s (skipped 23.69 % of 98777, swept 75381 pgs, 23396 freed with 0 lazily) full
GC sweep sysimg end 7.18 ms
GC sweep pause 317.80 ms live 939281 kB (freed 699973 kB EST 699973 kB [error 0] = 99% of allocd b 700000) (0.02 ms in post_mark) full | next in -700000 kB

After:

GC mark pause 756.38 ms | scanned 0 kB = 0 + 0 | remset 0 25000035
GC sweep arrays 0.01 ms (freed 0 / 46)
GC sweep big 0.00 ms (freed 0 / 45 with 0 rst)
GC sweep pools end 97.87 ms at 7.7 GB/s (skipped 3.16 % of 49684, swept 48113 pgs, 1571 freed with 0 lazily) full
GC sweep sysimg end 8.76 ms
GC sweep pause 106.72 ms live 939382 kB (freed 44803 kB EST 44803 kB [error 0] = 100% of allocd b 44800) (0.02 ms in post_mark) full | next in -44800 kB

GC mark pause 768.63 ms | scanned 0 kB = 0 + 0 | remset 0 25000035
GC sweep arrays 0.01 ms (freed 0 / 46)
GC sweep big 0.00 ms (freed 0 / 45 with 0 rst)
GC sweep pools end 98.83 ms at 7.7 GB/s (skipped 3.16 % of 49684, swept 48113 pgs, 1571 freed with 0 lazily) full
GC sweep sysimg end 8.67 ms
GC sweep pause 107.58 ms live 939378 kB (freed 44791 kB EST 44791 kB [error 0] = 99% of allocd b 44800) (0.02 ms in post_mark) full | next in -44800 kB

Notice how before the next in was 700000 kB, and after it is 44800 kB.

@gbaraldi
Copy link
Member

gbaraldi commented Mar 30, 2023

Looking into this I see a bug where we weren't increasing the interval at all on these sweeps, and that I can fix, but since the interval was based on live bytes after sweeping we weren't increasing it enough. You can see that on 1.8 we almost double it per iteration, and even after fixing it we weren't getting that, so I'm looking a bit into what's going on.
But with a prototype I kinda get the same performance we had before.

@gbaraldi
Copy link
Member

gbaraldi commented Apr 3, 2023

Can we close this? It's basically fixed. If fixing #49205 is simple then maybe we add it in a patch release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector kind:regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

6 participants