Proposal
Currently, we version SuperPMI collections on the JIT-EE version GUID. However, changes to calls across the interface lead to “missing data” in the collections if new or different calls are made due to some JIT change. This causes the collections to go stale. It is often useful to then recollect, with the same GUID, to fix the problem. However, this recollection currently overwrites the existing collections.
This proposal is to introduce the concept of a SuperPMI minor version, and never overwrite existing collections. The minor version identifier will be the git hash of the tree used to create the collections. It is assumed here that all collections are done with the same minor version. That is, we don't create some subset of collections with one git hash and others with a different git hash.
Collections will be stored with the minor version identifier. Superpmi.py will automatically look for the appropriate minor version, either in the local cache or remote store. You will be able to override this and specify a specific minor version.
Minor versions will be a path component after the JIT-EE GUID component, so, e.g.,
a1f5e9a1-ee44-42f9-9319-e2a2dbf8c5c9\22dc92037bb1ece7bafd3d2061da439dafedefc8\windows\x64\benchmarks.run.windows.arm64.checked.mch
a1f5e9a1-ee44-42f9-9319-e2a2dbf8c5c9\c42d26c2cc12e83d722d1129b29820e207b3d392\windows\x64\benchmarks.run.windows.arm64.checked.mch
Similarly, the minor version will be a component of the local cache directory path.
Implementation
The last JIT-EE GUID change hash can be found with:
git log --pretty=format:%H HEAD -1 -- src/coreclr/inc/jiteeversionguid.h
From that, all the changes between that and HEAD in the enlistment can be found with:
git log --pretty=format:%H <GUID-change-hash>..HEAD -- src/coreclr/jit
If any of those have a collection, use it.
There is a question about what is the "best" collection to use, if your tree is not sync'ed to the same git hash as any collection. The simplest choice is to always choose a collection that is older than your sync point. That might not be the best, but the user can override this (or sync and build to a newer point with a newer collection). For example,
| Collection |
JIT |
| 0 |
A |
|
B // modified JIT-EE traffic |
|
C // you’re based on this |
|
D |
| 1 |
E |
You want ‘1’, but we will walk backward to ‘0’. And if ‘D’ also modified traffic, well, there’s no good answer. And we don’t keep track of who modified traffic.
category:eng-sys
theme:super-pmi
skill-level:beginner
cost:small
Proposal
Currently, we version SuperPMI collections on the JIT-EE version GUID. However, changes to calls across the interface lead to “missing data” in the collections if new or different calls are made due to some JIT change. This causes the collections to go stale. It is often useful to then recollect, with the same GUID, to fix the problem. However, this recollection currently overwrites the existing collections.
This proposal is to introduce the concept of a SuperPMI minor version, and never overwrite existing collections. The minor version identifier will be the git hash of the tree used to create the collections. It is assumed here that all collections are done with the same minor version. That is, we don't create some subset of collections with one git hash and others with a different git hash.
Collections will be stored with the minor version identifier. Superpmi.py will automatically look for the appropriate minor version, either in the local cache or remote store. You will be able to override this and specify a specific minor version.
Minor versions will be a path component after the JIT-EE GUID component, so, e.g.,
Similarly, the minor version will be a component of the local cache directory path.
Implementation
The last JIT-EE GUID change hash can be found with:
From that, all the changes between that and HEAD in the enlistment can be found with:
If any of those have a collection, use it.
There is a question about what is the "best" collection to use, if your tree is not sync'ed to the same git hash as any collection. The simplest choice is to always choose a collection that is older than your sync point. That might not be the best, but the user can override this (or sync and build to a newer point with a newer collection). For example,
You want ‘1’, but we will walk backward to ‘0’. And if ‘D’ also modified traffic, well, there’s no good answer. And we don’t keep track of who modified traffic.
category:eng-sys
theme:super-pmi
skill-level:beginner
cost:small