Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition SuperPMI replay task #66065

Merged
merged 2 commits into from
Mar 2, 2022

Conversation

BruceForstall
Copy link
Member

Create 2 hard-coded partitions of work to be done to increase pipeline
parallelism and reduce overall job time. The partitions are sets of
different JitStressRegs options.

We could create a partition for each JitStressRegs option, but the concern
is that there is potentially a lot of overhead downloading the large set
of MCH files, and we might want to share that overhead between work partitions.

@ghost ghost assigned BruceForstall Mar 2, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 2, 2022
@ghost
Copy link

ghost commented Mar 2, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Create 2 hard-coded partitions of work to be done to increase pipeline
parallelism and reduce overall job time. The partitions are sets of
different JitStressRegs options.

We could create a partition for each JitStressRegs option, but the concern
is that there is potentially a lot of overhead downloading the large set
of MCH files, and we might want to share that overhead between work partitions.

Author: BruceForstall
Assignees: BruceForstall
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr superpmi-replay

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Member

Not sure if you would like to do as part of this PR, but please consider creating summary.md that can show the failures in extension tab. Without that, it is hard to spot the failures.

@BruceForstall
Copy link
Member Author

Not sure if you would like to do as part of this PR, but please consider creating summary.md that can show the failures in extension tab. Without that, it is hard to spot the failures.

Created #66067 to track this request


jit_flags = [
jit_flags1 = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding, why not just split the jit_flags in partition and that way we can tweak the partitions in future?

From https://stackoverflow.com/a/2135920:

# splits array `a` in `n` partitions

def split(a, n):
    k, m = divmod(len(a), n)
    return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine and general. I went for simplicity. We'd want to pass to pass both "partition#" and "total partition count" from the proj file, then any change would only need to be made in the proj file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, the better approach would be to partition based on collection type and that way it would create (existing partitions X collection count). With that, we will download one type of collection for each platform/architecture and run all jitstressregs on that partition. You might have to figure out how to magically write all of that in .proj file.

    <SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="benchmark"/>    
    <SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="libraries.pmi"/>
    <SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" CollectionFilter="crossgen2"/>
   ...

    ; likewise for win-arm64, unix-x64, linux-arm64, osx-arm64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that because of the download efficiency. However, the different collections are very different sizes, so some partitions will be much slower (e.g., libraries_tests). Also, it's not great to hard-code the set of collections we have, since currently we'll download any and all -- especially true for benchmark & aspnet which currently are not all-platform.

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Mar 2, 2022
Split per-platform/architecture work into multiple partitions to increase pipeline
parallelism and reduce overall job time. The partitions are sets of
different JitStressRegs options.

We could create a partition for each JitStressRegs option, but the concern
is that there is potentially a lot of overhead downloading the large set
of MCH files, and we might want to share that overhead between work partitions.
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr superpmi-replay

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member Author

@kunalspathak I made the partitioning dynamic and driven by arguments, so it's possible to have different sets of partitions for different arch/platform settings. I used this to make the x86 runs use 3 partitions and the x64 runs 2, since the x86 ones take more time.

@@ -38,6 +40,19 @@
"JitStressRegs=0x1000",
]

def split(a, n):
""" Splits array `a` in `n` partitions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding the Credit for SO post here.

@kunalspathak
Copy link
Member

@kunalspathak I made the partitioning dynamic and driven by arguments, so it's possible to have different sets of partitions for different arch/platform settings. I used this to make the x86 runs use 3 partitions and the x64 runs 2, since the x86 ones take more time.

So currently, we will do 2X more downloads on x64 and 3X more downloads on x86, which is fine for now. I will think of a better way to do this partition.

@BruceForstall
Copy link
Member Author

So currently, we will do 2X more downloads on x64 and 3X more downloads on x86, which is fine for now. I will think of a better way to do this partition.

I actually don't think it's a problem. In my previous partitioned job (https://dev.azure.com/dnceng/public/_build/results?buildId=1639935&view=results), it takes about 4-5 minutes to do the download, compared to 45-60minutes (approx.) to do the replays.

@BruceForstall
Copy link
Member Author

The replay with partitioning is here and looks good: 1 hour 40 minutes for the full run, ~1:10 for the replay alone, and x64/x86 are balanced.

@BruceForstall
Copy link
Member Author

@kunalspathak Any more comments?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BruceForstall BruceForstall merged commit 83f204e into dotnet:main Mar 2, 2022
@BruceForstall BruceForstall deleted the PartitionSpmiReplay branch March 2, 2022 23:05
@ghost ghost locked as resolved and limited conversation to collaborators Apr 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants