Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LLVM Profiling robust for multithreaded programs #47778

Merged
merged 14 commits into from
Jan 11, 2023

Conversation

bachdavi
Copy link
Contributor

@bachdavi bachdavi commented Dec 2, 2022

This is a draft PR to make LLVM optimization logging correct for multithreaded programs.

It guards against the case where you enable or disable LLVM optimization logging (@snoopl) in one thread, while another thread is midway through an ongoing optimization, or stopping it before it's finished. Without this PR, this can result in a malformed .yaml file, since the logging might start with the "after" clause, or it might only contain the "begin" clause.

After this PR, such cases will simply not be logged, and only LLVM optimizations that occurred completely while logging was enabled would be logged.

@giordano giordano added multithreading Base.Threads and related functionality compiler:llvm For issues that relate to LLVM labels Dec 2, 2022
@pchintalapudi
Copy link
Member

It's not super clear to me what the purpose of this PR is. Is the intent to allow access to these timings within Julia code? If so, why is the timing struct embedding a vector<> and not a julia array? Also, the previous implementation of timing reporting should have been thread-safe, albeit not manipulable from Julia, since we locked before ever operating on the stream.

@bachdavi
Copy link
Contributor Author

bachdavi commented Dec 4, 2022

@pchintalapudi This is very much a work in progress :)
Sorry, could have mentioned this in the description, am adding it right now.
I pushed a PR such that @NHDaly and @vilterp can work on this together.
We'll ping you once this is in any state to review it!

@bachdavi
Copy link
Contributor Author

bachdavi commented Dec 4, 2022

Exactly, we want to access the timings from within Julia while potentially having another thread do code generation. Interesting, we thought that we might run into a case where we unset the streams, and codgen is trying to write to it, finding a null pointer. Where does the locking happen?

src/codegen.cpp Outdated Show resolved Hide resolved
@bachdavi
Copy link
Contributor Author

bachdavi commented Dec 8, 2022

@pchintalapudi Does the actual optimization in jitlayers.cpp: (***PMs).run(M); acquire any locks? If not, would it be feasible to hold the stream lock over the whole optimization time to ensure we don't write just the first part of the YAML block or just the second one? I've adapted this PR to do that. The alternative would be to write the first part into a string and write the whole YAML block atomically after optimization.

@NHDaly
Copy link
Member

NHDaly commented Dec 8, 2022

🤔 @bachdavi: now that i'm looking at the PR change, we maybe don't want to hold the lock over the whole LLVM optimization, since that would force serialization of two different threads doing LLVM optimization if we were profiling..

So i think that the string-buffer approach is probably the best one. 👍 Sorry i didn't think of this when we discussed it earlier! Thanks for the quick change today :)

@pchintalapudi
Copy link
Member

pchintalapudi commented Dec 8, 2022

We actually specifically want to avoid holding a lock during optimization, so that in the future multiple modules can be optimizing at the same time. Rather than write to a string first, I would say holding the data in variables and dumping the entire yaml block post-optimization using those variables would be a better solution.

Guess I got sniped by @NHDaly :)

@NHDaly
Copy link
Member

NHDaly commented Dec 8, 2022

💡 @pchintalapudi, yeah, just keeping the results in variables seems even better. 👍 👍

@pchintalapudi
Copy link
Member

As an aside, we currently don't measure machine code optimization and emission time at the moment (this is in jitlayers.cpp, CompilerT::operator()) although this probably constitutes a decent fraction of optimization time. That may also be something of interest if you're looking to collect optimization statistics.

@NHDaly
Copy link
Member

NHDaly commented Dec 8, 2022

I was wondering about that, @pchintalapudi: Does that get measured in @snoopc (codegen)?

@pchintalapudi
Copy link
Member

It does get measured by @snoopc since that relies on the dump_compiles_stream instead of the dump_llvm_opt_stream, but that includes time spent in codegen, middle-end optimization, and backend optimization/mc emission. We don't currently have anything measuring the contribution of just backend optimization/mc emission.

@bachdavi
Copy link
Contributor Author

bachdavi commented Dec 9, 2022

This uses a stringstream, instead we could use a map and store instructions and basicblocks as a tuple. The downside of the stringstream is the lack of proper formatting, afaik std::format is only available from c++20.

src/jitlayers.cpp Outdated Show resolved Hide resolved
@NHDaly NHDaly changed the title [wip] make llvm opt profiling thread-safe Make LLVM Profiling robust for multithreaded programs Dec 13, 2022
Copy link
Member

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, @bachdavi!
I've updated the description as well, to explain this situation better.

@pchintalapudi: Can you do a final review pass as well?

@NHDaly NHDaly marked this pull request as ready for review December 13, 2022 22:41
@NHDaly
Copy link
Member

NHDaly commented Dec 30, 2022

Friendly ping for @pchintalapudi. Do you want to review? We've been using this in production at RAI and so far it seems to be working correctly. Can we merge this?

CC: @IanButterworth as well, since this is profiling related.

@NHDaly NHDaly added the backport 1.9 Change should be backported to release-1.9 label Dec 30, 2022
@KristofferC KristofferC mentioned this pull request Jan 2, 2023
41 tasks
Copy link
Member

@pchintalapudi pchintalapudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me

@vchuravy
Copy link
Member

vchuravy commented Jan 5, 2023

Needs a rebase

@NHDaly
Copy link
Member

NHDaly commented Jan 5, 2023

Also, are the test failures here real? Or unrelated?

@vchuravy
Copy link
Member

vchuravy commented Jan 9, 2023

analyzegc failure seems relevant.

@bachdavi
Copy link
Contributor Author

bachdavi commented Jan 9, 2023

analyzegc failure seems relevant.

Mh, it seems so indeed!

@NHDaly
Copy link
Member

NHDaly commented Jan 11, 2023

Thanks Valentin and David. The remaining failures look unrelated:

Error During Test at C:\buildkite-agent\builds\win2k22-aws-vms-2\julialang\julia-master\julia-4d55f71f4d\share\julia\stdlib\v1.10\Downloads\test\runtests.jl:12
  Got exception outside of a @test
  RequestError: Could not resolve host: httpbingo.julialang.org while requesting https://httpbingo.julialang.org/base64/SnVsaWEgaXMgZ3JlYXQh
  Stacktrace:
    [1] (::Downloads.var"#9#18"{IOStream, Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Bool, String, Bool, Bool})(easy::Downloads.Curl.Easy)

and

intrinsics                                       (5) |        started at 2023-01-09T13:12:37.840
      From worker 4:
      From worker 4:	[180] signal (11.1): Segmentation fault
      From worker 4:	in expression starting at /cache/build/default-power9bot2-0/julialang/julia-master/julia-4d55f71f4d/share/julia/test/atomics.jl:281

@NHDaly NHDaly merged commit 8985403 into JuliaLang:master Jan 11, 2023
KristofferC pushed a commit that referenced this pull request Jan 16, 2023
* Use stringsteam to atomically write LLVM opt timings

* Add boolean to ensure we don't _only_ write the after block

* Use ios_printf

Co-authored-by: Nathan Daly <NHDaly@gmail.com>
(cherry picked from commit 8985403)
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:llvm For issues that relate to LLVM multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants