-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition SuperPMI replay task #66065
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsCreate 2 hard-coded partitions of work to be done to increase pipeline We could create a partition for each JitStressRegs option, but the concern
|
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
@dotnet/jit-contrib |
Not sure if you would like to do as part of this PR, but please consider creating |
Created #66067 to track this request |
|
||
jit_flags = [ | ||
jit_flags1 = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding, why not just split the jit_flags
in partition
and that way we can tweak the partitions in future?
From https://stackoverflow.com/a/2135920:
# splits array `a` in `n` partitions
def split(a, n):
k, m = divmod(len(a), n)
return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fine and general. I went for simplicity. We'd want to pass to pass both "partition#" and "total partition count" from the proj file, then any change would only need to be made in the proj file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, the better approach would be to partition based on collection type and that way it would create (existing partitions X collection count). With that, we will download one type of collection for each platform/architecture and run all jitstressregs on that partition. You might have to figure out how to magically write all of that in .proj
file.
<SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="benchmark"/>
<SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="libraries.pmi"/>
<SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="crossgen2"/>
...
; likewise for win-arm64, unix-x64, linux-arm64, osx-arm64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that because of the download efficiency. However, the different collections are very different sizes, so some partitions will be much slower (e.g., libraries_tests). Also, it's not great to hard-code the set of collections we have, since currently we'll download any and all -- especially true for benchmark & aspnet which currently are not all-platform.
Split per-platform/architecture work into multiple partitions to increase pipeline parallelism and reduce overall job time. The partitions are sets of different JitStressRegs options. We could create a partition for each JitStressRegs option, but the concern is that there is potentially a lot of overhead downloading the large set of MCH files, and we might want to share that overhead between work partitions.
91a4acc
to
eb1921a
Compare
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
@kunalspathak I made the partitioning dynamic and driven by arguments, so it's possible to have different sets of partitions for different arch/platform settings. I used this to make the x86 runs use 3 partitions and the x64 runs 2, since the x86 ones take more time. |
@@ -38,6 +40,19 @@ | |||
"JitStressRegs=0x1000", | |||
] | |||
|
|||
def split(a, n): | |||
""" Splits array `a` in `n` partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding the Credit for SO post here.
So currently, we will do 2X more downloads on x64 and 3X more downloads on x86, which is fine for now. I will think of a better way to do this partition. |
I actually don't think it's a problem. In my previous partitioned job (https://dev.azure.com/dnceng/public/_build/results?buildId=1639935&view=results), it takes about 4-5 minutes to do the download, compared to 45-60minutes (approx.) to do the replays. |
The replay with partitioning is here and looks good: 1 hour 40 minutes for the full run, ~1:10 for the replay alone, and x64/x86 are balanced. |
@kunalspathak Any more comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Create 2 hard-coded partitions of work to be done to increase pipeline
parallelism and reduce overall job time. The partitions are sets of
different JitStressRegs options.
We could create a partition for each JitStressRegs option, but the concern
is that there is potentially a lot of overhead downloading the large set
of MCH files, and we might want to share that overhead between work partitions.