Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling API: Different ClassID in ClassLoad and ClassUnload callbacks #13249

Closed
ghost opened this issue Aug 12, 2019 · 6 comments
Closed

Profiling API: Different ClassID in ClassLoad and ClassUnload callbacks #13249

ghost opened this issue Aug 12, 2019 · 6 comments
Labels
area-Diagnostics-coreclr tracking This issue is tracking the completion of other related issues.
Milestone

Comments

@ghost
Copy link

ghost commented Aug 12, 2019

Hey all,
this is a followup of a bug we encountered in all .Net frameworks version, which was un-reproducible until recently (.Net Core < 3 didn't support class unloading).
First, I'll link to the original posts (SO and msdn forums): https://stackoverflow.com/questions/54883986/icorprofilercallbackclassunloadstarted-not-called-for-a-generic-class-even-th https://social.msdn.microsoft.com/Forums/vstudio/en-US/06f3022d-0d6e-450c-8100-9e082fb21ee1/icorprofilercallbackclassunloadstarted-not-called-for-a-generic-class-instantiation-even-though?forum=clr

We further analyzed it back then and noticed we get a different ClassID on load then we get on Unload.

I managed to reproduce in on latest .net core preview (3.0.100-preview7-012821) on Windows 10. You can find it here: https://github.com/NitzanEgozy/dotnet/tree/master/testers/Unloading (cd out, run.bat)
The important part of the output:

ClassLoadFinished: Plugin.PluginClass (140731382877856)
...
ClassLoadFinished: Plugin.MyGenericTest (140731382879056)
...
ClassLoadFinished: Plugin.MyGenList`1 <System.__Canon> (140731382880136)
ClassLoadFinished: Plugin.MyGenList`1 <System.String> (140731382880344)
...
ERROR - can't retrieve moduleId
ClassUnloadStarted: Plugin.MyGenList`1 <Unknown> (140731382879688)
ClassUnloadStarted: Plugin.MyGenericTest (140731382879056)
ClassUnloadStarted: Plugin.PluginClass (140731382877856)
ClassUnloadStarted: Plugin.MyGenList`1 <System.__Canon> (140731382880136)

As we can see, Plugin.MyGenList1 is loaded twice and unloaded twice. The load/unload for System.__Canon is fine (same ClassID). We have another Load/Unload set. The Load is for System.String, but the the unload is unknown (Can't retrieve moduleID, probably too late). They are of different ClassIDs. Also, the loaded ClassID (Plugin.MyGenList1 <System.String> (140731382880344)) is never unloaded, and the unloaded ClassID (Plugin.MyGenList1 (140731382879688)) is never loaded, nor do we get this ClassID in any other way.

This is a huge issue for profilers who rely on tracking currently loaded classes.

@hoyosjs
Copy link
Member

hoyosjs commented Aug 12, 2019

@davmason @dotnet/dotnet-diag

@davmason
Copy link
Member

Hi @NitzanEgozy,

I did a first level of investigation, but I don't have a proposed fix yet. There are two separate issues here causing the strangeness you're seeing.

There are three separate types involved and each callback is only giving you two (but a different set of two).

  • Plugin.MyGenList`1: the unbound generic type
  • Plugin.MyGenList`1 <System.__Canon>: the generic type bound to thecanonical type (used for normal references)
  • Plugin.MyGenList`1 <System.String>: the generic type bound to System.String.

For ClassLoadStarted we have logic that that specifically excludes unbound generic types (i.e. Plugin.MyGenList`1) from being shown to the profiler in ClassLoader::Notify

This means you ClassLoadStarted only gives you callbacks for the canonical and string instances. This seems the right thing to do here, since as a profiler you would only care about bound generic types and there's nothing of interest for unbound ones.

The issue is that we do a different set of filtering for ClassUnloadStarted. That callback occurs inside EEClass::Destruct, and Destruct is only called on non-generic types, unbound generic types, and canonical generic types. Non-canonical generic types ( i.e. Plugin.MyGenList`1 <System.String> ) are skipped.

This second part is where I suspect the fix will go. I think we will have to make the ClassUnloadStarted callbacks match the ClassLoadStarted behavior by skipping unbound generic types and including non-canonical generic types.

@davmason
Copy link
Member

@janvorli do you know why there is special logic to skip calling Destruct on non-canonical closed generic types here? https://github.com/dotnet/coreclr/blob/5d27a3ce086d66673e2da3b768340b3cf19ea7b3/src/vm/ceeload.cpp#L3267-L3272

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@tommcdon tommcdon added the tracking This issue is tracking the completion of other related issues. label May 6, 2020
@tommcdon tommcdon modified the milestones: 5.0, Future May 6, 2020
@tommcdon tommcdon modified the milestones: Future, 6.0.0 Jul 8, 2020
@davmason
Copy link
Member

Closing as by design

@ghost
Copy link
Author

ghost commented Nov 15, 2020

What do you mean "by design"? In your previous comment, you detailed the bug and where you think the fix should take place.
This bug is a pretty big deal for any Profiler that tracks class loads/unloads in order to create a mapping.

@davmason
Copy link
Member

davmason commented Dec 1, 2020

Sorry for the delay, I was out for a couple weeks on vacation.

"By design" isn't the best phrasing. The issue is that this involves a bunch of really low level code that hasn't changed in forever, so fixing carries some real risk that we could break something unrelated.

There is a relatively simple workaround available - instead of relying on the class unload callback, you can rely on the module unload callback and unload all classes from that module. We do that in the profiler tests here:

HRESULT GetAppDomainStaticAddress::ModuleUnloadStarted(ModuleID moduleId)
{
printf("Forcing GC due to module unload\n");
gcWaitEvent.Signal();
{
lock_guard<mutex> guard(classADMapLock);
constexpr size_t nameLen = 1024;
WCHAR name[nameLen];
HRESULT hr = pCorProfilerInfo->GetModuleInfo2(moduleId,
NULL,
nameLen,
NULL,
name,
NULL,
NULL);
if (FAILED(hr))
{
printf("GetModuleInfo2 failed with hr=0x%x\n", hr);
++failures;
return E_FAIL;
}
wprintf(L"Module 0x%" PRIxPTR " (%s) unload started\n", moduleId, name);
for (auto it = classADMap.begin(); it != classADMap.end(); )
{
ClassID classId = it->first;
ModuleID modId;
hr = pCorProfilerInfo->GetClassIDInfo(classId, &modId, NULL);
if (FAILED(hr))
{
printf("Failed to get ClassIDInfo hr=0x%x\n", hr);
++failures;
return E_FAIL;
}
if (modId == moduleId)
{
if (DEBUG_OUT)
{
printf("ClassID 0x%" PRIxPTR " being removed due to parent module unloading\n", classId);
}
it = classADMap.erase(it);
continue;
}
// Now check the generic arguments
bool shouldEraseClassId = false;
vector<ClassID> genericTypes = GetGenericTypeArgs(classId);
for (auto genericIt = genericTypes.begin(); genericIt != genericTypes.end(); ++genericIt)
{
ClassID typeArg = *genericIt;
ModuleID typeArgModId;
if (DEBUG_OUT)
{
printf("Checking generic argument 0x%" PRIxPTR " of class 0x%" PRIxPTR "\n", typeArg, classId);
}
hr = pCorProfilerInfo->GetClassIDInfo(typeArg, &typeArgModId, NULL);
if (FAILED(hr))
{
printf("Failed to get ClassIDInfo hr=0x%x\n", hr);
++failures;
return E_FAIL;
}
if (typeArgModId == moduleId)
{
if (DEBUG_OUT)
{
wprintf(L"ClassID 0x%" PRIxPTR " (%s) being removed due to generic argument 0x%" PRIxPTR " (%s) belonging to the parent module 0x%" PRIxPTR " unloading\n",
classId, GetClassIDName(classId).ToCStr(), typeArg, GetClassIDName(typeArg).ToCStr(), typeArgModId);
}
shouldEraseClassId = true;
break;
}
}
if (shouldEraseClassId)
{
it = classADMap.erase(it);
}
else
{
++it;
}
}
}
return S_OK;
}

Since there is a workaround and the fix has risk, this bug is unlikely to get fixed.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics-coreclr tracking This issue is tracking the completion of other related issues.
Projects
None yet
Development

No branches or pull requests

4 participants