``` import dpctl import time dpctl.SyclDevice() for i in range(10): start = time.time() dpctl.SyclDevice() print(time.time() - start) ``` Output: ``` 0.0028018951416015625 0.0028688907623291016 0.0030078887939453125 0.0030264854431152344 0.002977132797241211 0.00286102294921875 0.0028085708618164062 0.00516057014465332 0.003843069076538086 0.0028009414672851562 ``` ``` import dpctl import time d = dpctl.SyclDevice() for i in range(10): start = time.time() d.filter_string print(time.time() - start) ``` Output: ``` 0.0021190643310546875 0.002058744430541992 0.0021059513092041016 0.0021822452545166016 0.0024938583374023438 0.0023038387298583984 0.0023317337036132812 0.0023195743560791016 0.002201080322265625 0.002750396728515625 ``` Since filter string is part of type signature in numba-dpex it is calling SyclDevice and *.filter_string at least once per usm/dpnp array parameter on each function call. This generates significant performance overhead. Related issue: https://github.com/IntelPython/numba-dpex/issues/945