Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in PxsNphaseImplementationContext::unregisterContactManager method. #48

Closed
lbrysh opened this issue Jan 24, 2019 · 14 comments
Closed

Comments

@lbrysh
Copy link

lbrysh commented Jan 24, 2019

Visual studio debugger console output:

"Exception thrown: read access violation.
cm was nullptr."

Autos output:

  •   cm	0x0000000000000000 <NULL>	physx::PxsContactManager *
    
  •   this	0x0000022da9505fa0 {mRemovedContactManagers={mData=0x0000000000000000 {???} mSize=0 mCapacity=0 } mNarrowPhasePairs=...}	physx::PxsNphaseImplementationContext *
    
  •   physx::PxvNphaseImplementationContextUsableAsFallback	{...}	physx::PxvNphaseImplementationContextUsableAsFallback
    
  •   mRemovedContactManagers	{mData=0x0000000000000000 {???} mSize=0 mCapacity=0 }	physx::shdfnd::Array<unsigned int,physx::shdfnd::NamedAllocator>
    
  •   mNarrowPhasePairs	{mOutputContactManagers={mData=0x0000022db38209a0 {contactPatches=0x0000022e2d271120 "" contactPoints=...} ...} ...}	physx::PxsContactManagers
    
  •   mNewNarrowPhasePairs	{mOutputContactManagers={mData=0x0000022e2ca80000 {contactPatches=0x0000000000000000 <NULL> contactPoints=...} ...} ...}	physx::PxsContactManagers
    
  •   mModifyCallback	0x0000000000000000 <NULL>	physx::PxContactModifyCallback *
    
  •   mIslandSim	0x0000022dc6042b40 {mIslandHandles={mFreeHandles={mData=0x0000022ee403ac80 {1162} mSize=1257 mCapacity=...} ...} ...}	physx::IG::IslandSim *
    

Call Stack:

PhysX_64.dll!physx::PxsNphaseImplementationContext::unregisterContactManager(physx::PxsContactManager * cm) Line 691 C++
PhysX_64.dll!physx::Sc::Scene::finishBroadPhaseStage2(const unsigned int ccdPass) Line 6074 C++
PhysX_64.dll!physx::Sc::Scene::updateCCDSinglePassStage3(physx::PxBaseTask * continuation) Line 3125 C++
PhysX_64.dll!physx::Cm::Task::run() Line 67 C++
CarbonEditor.exe!physx::Ext::CpuWorkerThread::execute() Line 97 C++
PhysXFoundation_64.dll!physx::shdfnd::`anonymous namespace'::PxThreadStart(void * arg) Line 101 C++
kernel32.dll!BaseThreadInitThunk�() Unknown
ntdll.dll!RtlUserThreadStart�() Unknown

@kstorey-nvidia
Copy link

kstorey-nvidia commented Jan 24, 2019

Thanks for reporting. Are you using a debug configuration? We're there any errors logged before this? Are you able to reproduce the issue and would you be able to provide either a repro or repro steps?

Thanks

Kier

@lbrysh
Copy link
Author

lbrysh commented Jan 24, 2019

I'm using Debug and Checked 'win.x86_64.vc140.mt' configurations and the code crashes in both.
When I run the same simulation scene using PhysX 3.4 SDK compiled with my project the code doesn't crash. So, I assume this is new issue with PhysX 4.0 code.

I'll try to create demo project to repro this issue.

@kstorey-nvidia
Copy link

Thanks

@ikkah
Copy link

ikkah commented Jan 25, 2019

I also get this same exact crash, but it happens very rarely and I haven't been able to reproduce it. I'm running a checked config of "PhysX 3.4.1, APEX 1.4.1 Release @2284554". I recently switched to latest 3.4, and I haven't seen any of these crashes yet, but like I said it's a rare crash.

I log all the errors and I've often (but not always) seen this happen after many repeated errors with the text: "Number of required 16k memory blocks has exceeded the initial number of blocks. Allocator is being called. Consider increasing the number of pre-allocated 16k blocks.".

@ikkah
Copy link

ikkah commented Jan 25, 2019

...and I just now got a crash report from a build running the latest 3.4.

@kstorey-nvidia
Copy link

If you can provide a repro - even one that intermittently happens, we will investigate and fix it. Ideally, it would be an isolated repro, but we have previously taken entire games and grinded with them for days to get to the bottom of intermittent issues like this.

@lbrysh
Copy link
Author

lbrysh commented Jan 26, 2019

Hi,

I was trying to reproduce this crash using PhysX Sample framework and found another issue.

The PhysX will crash at:

PhysX_64.dll!physx::Bp::AABBManager::finalizeUpdate(unsigned int numCpuTasks, physx::PxcScratchAllocator * scratchAllocator, physx::PxBaseTask * continuation, physx::PxBaseTask * narrowPhaseUnlockTask) Line 1708 C++
PhysX_64.dll!physx::Bp::AABBManager::updateAABBsAndBP(unsigned int numCpuTasks, physx::Cm::FlushPool & flushPool, physx::PxcScratchAllocator * scratchAllocator, bool hasContactDistanceUpdated, physx::PxBaseTask * continuation, physx::PxBaseTask * narrowPhaseUnlockTask) Line 1672 C++
PhysX_64.dll!physx::Sc::Scene::ccdBroadPhase(physx::PxBaseTask * continuation) Line 3068 C++
PhysX_64.dll!physx::Cm::Task::run() Line 67 C++
Samples_64.exe!00007ff7ad841b00() Unknown
PhysXFoundation_64.dll!physx::shdfnd::`anonymous namespace'::PxThreadStart(void * arg) Line 101 C++
kernel32.dll!BaseThreadInitThunk�() Unknown
ntdll.dll!RtlUserThreadStart�() Unknown

Steps to reproduce:

  1. Build PhysX Samples project using attached file
    "\PhysX-4.0\physx\samples\samplebase\PhysXSample.cpp".
    In this file I modified PhysXSample::onInit() method enabling CCD collisions,
    in void SetupDefaultRigidDynamic() method setting CCD flag for rigid actor,
    and PhysXSample::spawnDebugObject() method will throw always Convex shapes.

  2. Run PhysX Samples executable.

  3. Select 'SampleHelloWorld'.

  4. Throw at least two objects by using [1] hot key.

  5. Let executable run a minute or so, observe crash in 'AABBManager::updateAABBsAndBP' method.

PhysXSample.zip
screencapture

@kstorey-nvidia
Copy link

Thanks for the repro. I've tracked this down and have a fix for the repro you provided.

It is due to a mismatch in the bounds computed in CCD and in discrete broad phase, which results in CCD broad phase being incorrectly entered when there are no shapes that updated. There was an optimization added to bounds computation (to make tighter bounds), which was controlled with a flag in 3.4 but turned on by default in 4.0. This optimization was not present in the CCD bounds. As the bounds are used to control whether CCD is required, this mismatch incorrectly triggered CCD for some resting convex shapes, which resulted in a nullptr being dereferenced.

The fix for that crash (and also small optimization) is to modify the Gu::computeBoundsWithCCDThreshold in GuBounds.cpp to match this:

`// PT: TODO: refactor this with regular function
PxF32 Gu::computeBoundsWithCCDThreshold(Vec3p& origin, Vec3p& extent, const PxGeometry& geometry, const PxTransform& pose, const CenterExtentsPadded* PX_RESTRICT localSpaceBounds)
{
// Box, Convex, Mesh and HeightField will compute local bounds and pose to world space.
// Sphere, Capsule & Plane will compute world space bounds directly.

const PxReal inSphereRatio = 0.75f;

//The CCD thresholds are as follows:
//(1) sphere = inSphereRatio * radius
//(2) plane = inf (we never need CCD against this shape)
//(3) capsule = inSphereRatio * radius
//(4) box = inSphereRatio * (box minimum extent axis)
//(5) convex = inSphereRatio * convex in-sphere * min scale
//(6) triangle mesh = 0.f (polygons have 0 thickness)
//(7) heightfields = 0.f (polygons have 0 thickness)

//The decision to enter CCD depends on the sum of the shapes' CCD thresholds. One of the 2 shapes must be a 
//sphere/capsule/box/convex so the sum of the CCD thresholds will be non-zero.

//Various changes have been made to computeBounds (e.g. support tight bounds) that have not been made
//to this method. Therefore, let's pipe the bounds query through a common method to ensure that stationary
//bodies do not still enter CCD
PxBounds3 bounds;
computeBounds(bounds, geometry, pose, 0.f, localSpaceBounds, 1.f);

origin = bounds.getCenter();
extent = bounds.getExtents();

switch (geometry.getType())
{
	case PxGeometryType::eSPHERE:
	{
		const PxSphereGeometry& shape = static_cast<const PxSphereGeometry&>(geometry);
		return shape.radius*inSphereRatio;
	}
	case PxGeometryType::ePLANE:
	{
		return PX_MAX_REAL;
	}
	case PxGeometryType::eCAPSULE:
	{
		const PxCapsuleGeometry& shape = static_cast<const PxCapsuleGeometry&>(geometry);
		return shape.radius * inSphereRatio;
	}

	case PxGeometryType::eBOX:
	{
		const PxBoxGeometry& shape = static_cast<const PxBoxGeometry&>(geometry);
		return PxMin(PxMin(shape.halfExtents.x, shape.halfExtents.y), shape.halfExtents.z)*inSphereRatio;
	}

	case PxGeometryType::eCONVEXMESH:
	{
		const PxConvexMeshGeometry& shape = static_cast<const PxConvexMeshGeometry&>(geometry);
		const Gu::ConvexHullData& hullData = static_cast<const Gu::ConvexMesh*>(shape.convexMesh)->getHull();
		return PxMin(shape.scale.scale.z, PxMin(shape.scale.scale.x, shape.scale.scale.y)) * hullData.mInternal.mRadius * inSphereRatio;
	}

	case PxGeometryType::eTRIANGLEMESH:
	{
		return 0.0f;
	}

	case PxGeometryType::eHEIGHTFIELD:
	{
		return 0.f;
	}

	case PxGeometryType::eGEOMETRY_COUNT:
	case PxGeometryType::eINVALID:
	{
		PX_ASSERT(0);		
		Ps::getFoundation().error(PxErrorCode::eINTERNAL_ERROR, __FILE__, __LINE__, "Gu::GeometryUnion::computeBounds: Unknown shape type.");
	}
}
return PX_MAX_REAL;

}`

This ensures that the same bounds are used for both CCD and discrete BP so, in the resting case, the BP is never invoked.

I will make sure that this appears in our next code drop.

I expect that this is unrelated to the issue that you originally reported, especially given that Ikkah said he can reproduce in 3.4 (which doesn't use this "tight bounds" optimization by default, although he may have enabled it). Please let us know if you manage to create a repro for the original issue, and thanks again for uncovering this most recent issue. We really appreciate it

Kind regards

Kier

@kstorey-nvidia
Copy link

Actually, maybe this is the same issue. The issue is a reported lost pair that doesn't exist. The crash you reported is where we attempt to update the BP when nothing had moved. However, maybe there is a similar edge case where we skip updating some objects in the bounds array because the actors weren't moving, but we adjusted their bounds, thereby missing a pair found event or a pair lost event. It seems like it would be a super rare, totally random edge case caused by the sorted list of bounds being partially unsorted due to this CCD bounds computation issue.

Please give the fix a try and report back if the original issue still reproduces? I at first thought this might be a separate issue but, on further thought, this might also be the origin of the issue you were having.

@ikkah
Copy link

ikkah commented Jan 28, 2019

We do in fact run 3.4 with tight bounds enabled.

@kstorey-nvidia
Copy link

OK. Then definitely, please try this fix. It may not solve the issue, but it is at least worth trying out

@kstorey-nvidia
Copy link

kstorey-nvidia commented Jan 28, 2019

Another detail: this issue also requires eENABLE_STABILIZATION to trigger the crash.

@ikkah
Copy link

ikkah commented Jan 28, 2019

Hmm, we don't have stabilization enabled.

I guess I should try to integrate the fix anyway? That overload of computeBounds() is missing on 3.4, but I suppose computeBounds(bounds, geometry, pose, 0.f, localSpaceBounds, 1.f, false) would work?

@kstorey-nvidia
Copy link

Yes, I think that should be equivalent. This might not actually be the solution for the bug you're reporting, but definitely worth trying out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants