failures during experimental feature parallel compile

testing new experimental feature from PR #5826 **Add functions for parallel compilation**  
which was recently merged into main branch  

im loading number of small models and attempting to run pre-compile and i'm getting errors on all attempts  

here ive documented 3 different failures:

- compile fails on some models with a totally random message such as:  
  (and works fine on some models)

  > Uncaught (in promise) Error: Pass at least one tensor to `tf.stack`

- compile completes without errors, but later actual code execution in js fails:  
  (same code works just fine if there is no pre-compile)

      Uncaught (in promise) TypeError: Cannot read properties of null (reading 'A')
        at tfjs.esm.js:47772:27
        at Array.forEach (<anonymous>)
        at runProgram (tfjs.esm.js:47770:10)
        at _MathBackendWebGL.runWebGLProgram (tfjs.esm.js:49796:7)
        at _MathBackendWebGL.uploadToGPU (tfjs.esm.js:49916:40)

  which happens in a trivial function that runs `tf.image.resizeBilinear` followed by `tf.div` to normalize input tensor
 
 - compile completes without errors, but later model inference fails with the same error as above  
   actual backtrace shows that it happens during `execute` call and kernel op in model that triggers error is a simple `sub`  
  (same model executes without issues if there is no pre-compile)


my function that runs precompile on all models is:

```js
type Models: Record<string, GraphModel>;

async function runCompile(allModels: Models) {
  const backendType = tf.getBackend();
  const webGLBackend = tf.backend();
  if ((backendType !== 'webgl') || (!webGLBackend || !webGLBackend.checkCompileCompletion)) {
    log('compile pass: skip');
    return;
  }
  const models = Object.values(allModels).filter((m) => m !== null) as GraphModel[];
  tf.env().set('ENGINE_COMPILE_ONLY', true);
  const numTensorsStart = tf.engine().state.numTensors;
  for (const model of models) {
    const shape = (model.inputs && model.inputs[0] && model.inputs[0].shape) ? [...model.inputs[0].shape] : [1, 64, 64, 3];
    const dtype = (model.inputs && model.inputs[0] && model.inputs[0].dtype) ? model.inputs[0].dtype : 'float32';
    for (let dim = 0; dim < shape.length; dim++) {
      if (shape[dim] === -1) shape[dim] = dim === 0 ? 1 : 64; // override batch number and any dynamic dimensions
    }
    const tensor = tf.zeros(shape, dtype);
    const res = await model.executeAsync(tensor);
    if (Array.isArray(res)) res.forEach((t) => tf.dispose(t));
    else tf.dispose(res);
    tf.dispose(tensor);
  }
  const kernels = await webGLBackend.checkCompileCompletionAsync(); // same errors if check is moved inside per-model loop
  webGLBackend.getUniformLocations();
  log('compile pass kernels:', kernels.length); // getting a reasonable value here
  tf.env().set('ENGINE_COMPILE_ONLY', false);
  const numTensorsEnd = tf.engine().state.numTensors;
  if ((numTensorsEnd - numTensorsStart) > 0) log('tensor leak:', numTensorsEnd - numTensorsStart); // no leaks
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

failures during experimental feature parallel compile #6250

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

failures during experimental feature parallel compile #6250

Description

Activity

lina128 commented on Apr 14, 2022

vladmandic commented on Apr 15, 2022

vladmandic commented on Sep 29, 2022

SangbumChoi commented on Jan 19, 2023

gaikwadrahul8 commented on May 30, 2023

vladmandic commented on May 30, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions