Updates KernelAttributes to avoid possible dangling handles.#957
Updates KernelAttributes to avoid possible dangling handles.#957Andy-Jost merged 4 commits intoNVIDIA:mainfrom
Conversation
| @@ -76,15 +77,18 @@ def _get_cached_attribute(self, device_id: int, attribute: driver.CUfunction_att | |||
| """Helper function to get a cached attribute or fetch and cache it if not present.""" | |||
| if device_id in self._cache and attribute in self._cache[device_id]: | |||
There was a problem hiding this comment.
I also see an opportunity to simplify the logic throughout to use pairs as keys:
if (device_id, attribute) in self._cache:
rwgk
left a comment
There was a problem hiding this comment.
Looks good to me.
My suggestion is meant to be a minor by easy optimization.
| """Helper function to get a cached attribute or fetch and cache it if not present.""" | ||
| if device_id in self._cache and attribute in self._cache[device_id]: | ||
| return self._cache[device_id][attribute] | ||
| if (device_id, attribute) in self._cache: |
There was a problem hiding this comment.
WDYT about
cache_key = (device_id, attribute) # so this tuple doesn't have to be rebuilt
result = self._cache.get(cache_key, cache_key) # the tuple doubles as sentinel; there is only one cache lookup
if result is not cache_key:
return result
then all the way below
self.cache[cache_key] = result
return result
There was a problem hiding this comment.
Good suggestion, thanks!
|
/ok to test 1878c2d |
This comment has been minimized.
This comment has been minimized.
rwgk
left a comment
There was a problem hiding this comment.
Looks great to me!
Sorry I missed the full context: How did you discover the bug fixed in this PR? Is there a reasonably easy way to add a test that would have created a dangling handle? — I wouldn't spend more than 10 minutes on that.
Leo asked me to bundle the attributes for IPC-enabled mempools and pointed to this as a reference implementation. I noticed this issue by inspection when reading the code. There is an example test called Maybe to be more specific, it was this code that set off the alarm bells:
When an object ( |
I looked into this and I think it would be difficult right now. The reference management is a bit of a mess. E.g., there is a reference cycle between Kernel, which refers to a Module, which refers to its Kernels. That makes it difficult to accurately delete Kernels as we'd need for this test. If we cleaned up the references overall it would be easier. |
|
|
Sorry I did not have a chance to read this PR. It is fine. But there was not bug, and this PR is not a bug fix. In case it is not clear, this was implemented by design. |
Previously a
KernelAttributesbundle held a reference to an internal handle from aKernelobject. This change introduces a weak reference to ensure that a dangling handle is never used. Testing shows no statistically significant change to property access times.