Description
Describe the issue
System:
Manufacturer Dell
Processor Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) CPU 3.42 GHz
Installed RAM 32.0 GB (31.6 GB usable)
System type 64-bit operating system, ARM-based processor
Onnxruntime-qnn:
Version 1.22.0
Using the same notebook/file onnxruntime-qnn will throw a 5005 error. In order to resolve this I either need to shut down the terminal (if running from main.py), restart the kernel (jupyter notebook), or sometimes restart the computer and allow it to stay off for a few minutes. I believe the issue has to do with a cache not being reset but I'm not 100% sure. As I said this is a very intermittent issue
This is not due to having more than one QnnHTP.dll file installed or another QnnHTP.dll in system path. I'm only referencing the using the HTP driver that's installed with onnxruntime-qnn. Below is an example of the error

To reproduce
There is only way one I'm able to force this error to occur as it's very random.
If I have an active jupyter notebook running with an InferenceSession calling the QNNExecutionProvider, then try and run main.py from command line.
You can use this repo as an example:
- Follow instructions in README.md to download models.
- Run /qnn_sample_apps/notebooks/llm/Deepseek_r1_7b_Optimized_Temperature_TopK.ipynb
- Open powershell and run python /qnn_sample_apps/src/deepseek_r1/main.py --query "how to resolve this 5005 error"
- The 5005 error will show up in terminal.
Repo: github.com/DerrickJ1612/qnn_sample_apps
Urgency
I work for Qualcomm so this is urgent for us as we've been showcasing this workflow to run LLMs. I'm trying to narrow down if this is an onnxruntime-qnn issue or not.
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.22.0
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Other / Unknown
Execution Provider Library Version
QNN Execution Provider, not sure why the Execution Provider above has SNPE and not QNN
Model File
Download from referenced repo
https://drive.google.com/drive/folders/1hCopYw7rMdeOm3zV6NC2do9orzpKqAMf
Is this a quantized model?
Yes