Partition SuperPMI replay task #66065

BruceForstall · 2022-03-02T06:09:02Z

Create 2 hard-coded partitions of work to be done to increase pipeline
parallelism and reduce overall job time. The partitions are sets of
different JitStressRegs options.

We could create a partition for each JitStressRegs option, but the concern
is that there is potentially a lot of overhead downloading the large set
of MCH files, and we might want to share that overhead between work partitions.

ghost · 2022-03-02T06:09:09Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Create 2 hard-coded partitions of work to be done to increase pipeline
parallelism and reduce overall job time. The partitions are sets of
different JitStressRegs options.

We could create a partition for each JitStressRegs option, but the concern
is that there is potentially a lot of overhead downloading the large set
of MCH files, and we might want to share that overhead between work partitions.

Author:	BruceForstall
Assignees:	BruceForstall
Labels:	`area-CodeGen-coreclr`
Milestone:	-

BruceForstall · 2022-03-02T06:09:19Z

/azp run runtime-coreclr superpmi-replay

azure-pipelines · 2022-03-02T06:09:32Z

Azure Pipelines successfully started running 1 pipeline(s).

BruceForstall · 2022-03-02T06:09:32Z

@dotnet/jit-contrib

kunalspathak · 2022-03-02T06:17:54Z

Not sure if you would like to do as part of this PR, but please consider creating summary.md that can show the failures in extension tab. Without that, it is hard to spot the failures.

BruceForstall · 2022-03-02T06:59:25Z

Not sure if you would like to do as part of this PR, but please consider creating summary.md that can show the failures in extension tab. Without that, it is hard to spot the failures.

Created #66067 to track this request

kunalspathak · 2022-03-02T07:06:11Z

src/coreclr/scripts/superpmi_replay.py


-jit_flags = [
+jit_flags1 = [


Instead of hardcoding, why not just split the jit_flags in partition and that way we can tweak the partitions in future?

From https://stackoverflow.com/a/2135920:

# splits array `a` in `n` partitions def split(a, n): k, m = divmod(len(a), n) return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))

Sure, that's fine and general. I went for simplicity. We'd want to pass to pass both "partition#" and "total partition count" from the proj file, then any change would only need to be made in the proj file.

Perhaps, the better approach would be to partition based on collection type and that way it would create (existing partitions X collection count). With that, we will download one type of collection for each platform/architecture and run all jitstressregs on that partition. You might have to figure out how to magically write all of that in .proj file.

<SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="benchmark"/> <SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="libraries.pmi"/> <SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="crossgen2"/> ... ; likewise for win-arm64, unix-x64, linux-arm64, osx-arm64

I like that because of the download efficiency. However, the different collections are very different sizes, so some partitions will be much slower (e.g., libraries_tests). Also, it's not great to hard-code the set of collections we have, since currently we'll download any and all -- especially true for benchmark & aspnet which currently are not all-platform.

Split per-platform/architecture work into multiple partitions to increase pipeline parallelism and reduce overall job time. The partitions are sets of different JitStressRegs options. We could create a partition for each JitStressRegs option, but the concern is that there is potentially a lot of overhead downloading the large set of MCH files, and we might want to share that overhead between work partitions.

BruceForstall · 2022-03-02T20:38:08Z

/azp run runtime-coreclr superpmi-replay

azure-pipelines · 2022-03-02T20:38:21Z

Azure Pipelines successfully started running 1 pipeline(s).

BruceForstall · 2022-03-02T20:40:00Z

@kunalspathak I made the partitioning dynamic and driven by arguments, so it's possible to have different sets of partitions for different arch/platform settings. I used this to make the x86 runs use 3 partitions and the x64 runs 2, since the x86 ones take more time.

kunalspathak · 2022-03-02T20:41:09Z

src/coreclr/scripts/superpmi_replay.py

@@ -38,6 +40,19 @@
    "JitStressRegs=0x1000",
 ]

+def split(a, n):
+    """ Splits array `a` in `n` partitions.


Consider adding the Credit for SO post here.

kunalspathak · 2022-03-02T20:42:29Z

@kunalspathak I made the partitioning dynamic and driven by arguments, so it's possible to have different sets of partitions for different arch/platform settings. I used this to make the x86 runs use 3 partitions and the x64 runs 2, since the x86 ones take more time.

So currently, we will do 2X more downloads on x64 and 3X more downloads on x86, which is fine for now. I will think of a better way to do this partition.

BruceForstall · 2022-03-02T20:53:30Z

So currently, we will do 2X more downloads on x64 and 3X more downloads on x86, which is fine for now. I will think of a better way to do this partition.

I actually don't think it's a problem. In my previous partitioned job (https://dev.azure.com/dnceng/public/_build/results?buildId=1639935&view=results), it takes about 4-5 minutes to do the download, compared to 45-60minutes (approx.) to do the replays.

BruceForstall · 2022-03-02T23:00:09Z

The replay with partitioning is here and looks good: 1 hour 40 minutes for the full run, ~1:10 for the replay alone, and x64/x86 are balanced.

BruceForstall · 2022-03-02T23:02:07Z

@kunalspathak Any more comments?

kunalspathak

LGTM

ghost assigned BruceForstall Mar 2, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 2, 2022

BruceForstall requested a review from kunalspathak March 2, 2022 06:09

kunalspathak mentioned this pull request Mar 2, 2022

Run superpmi-replay pipeline on JIT PRs #66063

Merged

kunalspathak requested changes Mar 2, 2022

View reviewed changes

ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Mar 2, 2022

BruceForstall force-pushed the PartitionSpmiReplay branch from 91a4acc to eb1921a Compare March 2, 2022 20:37

kunalspathak reviewed Mar 2, 2022

View reviewed changes

Add comment

63418ed

kunalspathak approved these changes Mar 2, 2022

View reviewed changes

BruceForstall merged commit 83f204e into dotnet:main Mar 2, 2022

BruceForstall deleted the PartitionSpmiReplay branch March 2, 2022 23:05

ghost locked as resolved and limited conversation to collaborators Apr 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition SuperPMI replay task #66065

Partition SuperPMI replay task #66065

BruceForstall commented Mar 2, 2022

ghost commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

azure-pipelines bot commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak Mar 2, 2022

BruceForstall Mar 2, 2022

kunalspathak Mar 2, 2022

BruceForstall Mar 2, 2022

BruceForstall commented Mar 2, 2022

azure-pipelines bot commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak Mar 2, 2022

kunalspathak commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak left a comment


		jit_flags = [
		jit_flags1 = [

Partition SuperPMI replay task #66065

Partition SuperPMI replay task #66065

Conversation

BruceForstall commented Mar 2, 2022

ghost commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

azure-pipelines bot commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak Mar 2, 2022

Choose a reason for hiding this comment

BruceForstall Mar 2, 2022

Choose a reason for hiding this comment

kunalspathak Mar 2, 2022

Choose a reason for hiding this comment

BruceForstall Mar 2, 2022

Choose a reason for hiding this comment

BruceForstall commented Mar 2, 2022

azure-pipelines bot commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak Mar 2, 2022

Choose a reason for hiding this comment

kunalspathak commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

BruceForstall commented Mar 2, 2022

kunalspathak left a comment

Choose a reason for hiding this comment