Description
I've recently been experimenting with XNNPACK's weight cache to reduce load time by caching packed weights and also reduce memory pressure for repeated weights across the same kernels.
I was experiementing with fully-connected operator and found that the weight cache was never being hit. I noticed that when using the apis to create the xnn_weights_cache_t
we set the look up function to be xnn_internal_weights_cache_look_up
:
Line 148 in 85071b8
looking at this function, it looks like a placeholder function which would always return XNN_CACHE_NOT_FOUND
:
Lines 491 to 496 in 85071b8
Now when I'm using the weights cache to create a runtime_t with only a fully connected operator, in the flow of creating the fully-connected operator, we look up the cache to see if the weights have been packed before, using xnn_weights_cache_look_up:
XNNPACK/src/operators/fully-connected-nc.c
Lines 154 to 157 in 85071b8
However this just uses the the placeholder function above, returning XNN_CACHE_NOT_FOUND:
Lines 530 to 534 in 85071b8
As a result, every look up would then fall to XNN_CACHE_NOT_FOUND, in which weights have to be repacked, and memory has to be allocated for the newly packed weights:
XNNPACK/src/operators/fully-connected-nc.c
Lines 159 to 179 in 85071b8
Am I looking at this incorrectly? Or is this a feature that is still a wip? Or is this a bug that is meant to be fixed in the future?