Description
Describe the issue
There's a memory leak occurring when creating an inference session using the runtime in Node. Each time a new session is created and released, the memory usage grows a little. We're running inference on an edge device with limited memory available, so after a couple sessions, the device runs out of memory. We need to reload the session, without restarting the whole application, as it's a realtime system.
Despite disabling "CpuMemArena" and "memPattern", as suggested for similar issues, the memory leak is still occurring.
To reproduce
I've made a minimal javascript example that reproduces the issue:
import * as ort from "onnxruntime-node";
const MODEL_FILE_PATH = "./model.onnx";
const getModel = async (filename) => await ort.InferenceSession.create(filename, { enableCpuMemArena: false, enableMemPattern: false });
const loadSession = async () => {
const session = await getModel(MODEL_FILE_PATH);
await session.release();
}
const main = async () => {
for (let i = 0; i < 10; i++) {
await loadSession();
}
}
(async () => {
await main()
})();
When testing locally, the application memory usage sits at 325 MB when loading the session once. When loading and releasing the session 10 times, it bloats to 994 MB. For 100 load/release cycles it sits at 9.12 GB
Urgency
MEDIUM
Platform
Mac
OS Version
14.5
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.22.0-rev
ONNX Runtime API
JavaScript
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Unknown