Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime compressed refs work #8324

Merged
merged 1 commit into from
Feb 5, 2020
Merged

Conversation

gacholio
Copy link
Contributor

@gacholio gacholio commented Jan 16, 2020

Remove assumptions about the compile-time size of various types (f9object_t/fomrobject_t/object headers).

[ci skip]

Signed-off-by: Graham Chapman graham_chapman@ca.ibm.com

@gacholio
Copy link
Contributor Author

@dmitripivkine @amicic

This is not complete, but there's enough here to start having a look.

@dmitripivkine
Copy link
Contributor

I walked through the changes and they seems correct, however probably I will need another look focus on possible missed changed in pointers math

@gacholio gacholio force-pushed the compress branch 7 times, most recently from c42b2c5 to 69227f0 Compare January 20, 2020 22:11
@@ -195,7 +195,7 @@ MM_IndexableObjectAllocationModel::layoutContiguousArraylet(MM_EnvironmentBase *
GC_SlotObject slotObject(env->getOmrVM(), arrayoidPtr);
slotObject.writeReferenceToSlot((omrobjectptr_t)leafOffset);
leafOffset += arrayletLeafSize;
arrayoidPtr += 1;
arrayoidPtr = GC_SlotObject::addToSlotAddress(arrayoidPtr, 1, compressed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a non-static variant of this method and call for each iteration slotObject.addToSlotAddress(1, compressed); while constructing slotObject only once before entering the loop. It would be slightly more readable to me, but I'm not pushing for it.

return convertPointerFromToken(*srcAddress);
mm_j9object_t result = NULL;
if (compressObjectReferences()) {
result = convertPointerFromToken(*(uint32_t*)srcAddress);
Copy link
Contributor

@amicic amicic Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... convertPointerFromToken itself has already a runtime 'if CR' check.

arrayletLeafBase = (UDATA)convertPointerFromToken(*(U_32*)arrayletLeafSlot);
} else {
arrayletLeafBase = *(UDATA*)arrayletLeafSlot;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have up to 3 'if CR' runtime checks, while it's probably doable with only one :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?

		if (compressObjectReferences()) {
			arrayletLeafBase = (UDATA)convertCompressedPointerFromToken(*((U_32*)arrayoidPointer) + arrayletIndex);
		} else {
			arrayletLeafBase = *(((UDATA*)arrayoidPointer) + arrayletIndex);
		}

where convertCompressedPointerFromToken() does just shifting and no 'if CR' check. This helper could be used elsewhere (in readImpl() )

@@ -200,8 +200,9 @@ class MM_MetronomeDelegate : public MM_BaseNonVirtual
MMINLINE uintptr_t
scanPointerArraylet(MM_EnvironmentRealtime *env, fomrobject_t *arraylet)
{
UDATA const referenceSize = env->compressObjectReferences() ? sizeof(uint32_t) : sizeof(uintptr_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern repeats often. we should just have something like env->getObjectReferenceSize()

GC_SlotObject slotObject(_javaVM->omrVM, (fj9object_t*)scanPtr);
_markingScheme->markObject(env, slotObject.readReferenceFromSlot());
scanPtr++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass is somewhat perf critical and extra runtime checks in readReferenceFromSlot() could be tangible (sparsly populated arrays). Would it make sense to have specialized GC_SlotObject::readCRReferenceFromSlot/readNonCRReferenceFromSlot APIs?

#endif /* OMR_GC_COMPRESSED_POINTERS */
{
return (fj9object_t)(UDATA)pointer;
}
Copy link
Contributor

@amicic amicic Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This itself is good, but similarly to slow barrier code, this is invoked from a site where there is already a runtime CR check (see readObjectImpl).

As mentioned before, we could specialize this method further into two methods for CR and nonCR, and use proper one from the call site that already knows if it's CR or not.

We could keep this one too, for sites where we don't know if it's CR.

/**
* @return the next leaf reference slot in the arraylet
* @return NULL if there are no more reference slots in the object
*/
MMINLINE GC_SlotObject *nextLeafPointer()
{
if (_numLeafsCounted < _numLeafs) {
_slotObject.writeAddressToSlot(&_arrayoid[_numLeafsCounted]);
_slotObject.writeAddressToSlot(GC_SlotObject::addToSlotAddress(_arrayoid, _numLeafsCounted, compressObjectReferences()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as proposed before we could use a non-static addToSlotAddress

@gacholio
Copy link
Contributor Author

@amicic My current goal is to avoid regressing the single-mode builds. Optimizing multiple compressed checks is in plan for later on (likely by specializing entire code paths for compressed/full rather than microoptimizing the individual functions).

@gacholio gacholio force-pushed the compress branch 5 times, most recently from 90b19a6 to a039663 Compare February 3, 2020 04:24
@gacholio
Copy link
Contributor Author

gacholio commented Feb 3, 2020

@dmitripivkine @amicic With this PR, mixed mode now works on xa64 (metronome and balanced work is not complete).

I'd prefer to make the suggested optimizations/clarifications in future PRs (as they require some OMR work first).

@gacholio
Copy link
Contributor Author

gacholio commented Feb 3, 2020

@andrewcraik @fjeremic Can you please find someone to review the JIT changes?

@gacholio
Copy link
Contributor Author

gacholio commented Feb 3, 2020

I believe I've found all of the places doing fobj pointer math (I temporarily replaced fobj* with a new type defined as a pointer to an undefined struct, which makes all pointer math fail to compile), so there should be no need to look for missed changes, just correctness of the current ones.

Some math changes remain in metronome and balanced, but I know where they are now.

Copy link
Contributor

@fjeremic fjeremic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT changes look straightforward. I'd advise another x86 expert to take a look at the NASM changes.

@gacholio gacholio force-pushed the compress branch 3 times, most recently from 448e7b9 to 5f169d3 Compare February 3, 2020 19:10
@gacholio
Copy link
Contributor Author

gacholio commented Feb 3, 2020

Metronome is now working in mixed mode as well.

@@ -58,7 +58,7 @@ class GC_PointerArrayObjectScanner : public GC_IndexableObjectScanner
* @param[in] flags Scanning context flags
*/
MMINLINE GC_PointerArrayObjectScanner(MM_EnvironmentBase *env, omrobjectptr_t arrayPtr, fomrobject_t *basePtr, fomrobject_t *limitPtr, fomrobject_t *scanPtr, fomrobject_t *endPtr, uintptr_t flags)
: GC_IndexableObjectScanner(env, arrayPtr, basePtr, limitPtr, scanPtr, endPtr, ((endPtr - scanPtr) < _bitsPerScanMap) ? (((uintptr_t)1 << (endPtr - scanPtr)) - 1) : UDATA_MAX, sizeof(fomrobject_t), flags)
: GC_IndexableObjectScanner(env, arrayPtr, basePtr, limitPtr, scanPtr, endPtr, (GC_SlotObject::subtractSlotAddresses(endPtr, scanPtr, env->compressObjectReferences()) < _bitsPerScanMap) ? ((uintptr_t)1 << GC_SlotObject::subtractSlotAddresses(endPtr, scanPtr, env->compressObjectReferences())) - 1 : UDATA_MAX, env->compressObjectReferences() ? sizeof(uint32_t) : sizeof(uintptr_t), flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long now... Can it be split somehow to be more read friendly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if it's better now...

@gacholio gacholio changed the title WIP: Runtime compressed refs work Runtime compressed refs work Feb 5, 2020
@fjeremic
Copy link
Contributor

fjeremic commented Feb 5, 2020

Jenkins test sanity all jdk8

@gacholio
Copy link
Contributor Author

gacholio commented Feb 5, 2020

Jenkins test sanity all jdk8

Remove assumptions about the compile-time size of various types
(f9object_t/fomrobject_t/object headers).

[ci skip]

Signed-off-by: Graham Chapman <graham_chapman@ca.ibm.com>
@gacholio
Copy link
Contributor Author

gacholio commented Feb 5, 2020

Jenkins test sanity all jdk8

@gacholio
Copy link
Contributor Author

gacholio commented Feb 5, 2020

The failing testing looks like a build issue (JIT can't load due to protobuf stuff) - I suggest ignoring it and merging once the rest completes.

@pshipton
Copy link
Member

pshipton commented Feb 5, 2020

The failing testing looks like a build issue (JIT can't load due to protobuf stuff)

Ya, it's a problem on ub16-ppcle-2 #8257. I had disabled, but every time jenkins restarts the machine comes back online. I'll disable the machine again.

@amicic amicic merged commit b33493c into eclipse-openj9:master Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants