recover TF inference for deepJet and deepDoubleX and MXNET inf for deepBoosted (migrated to ONNX in #28112) #28959

slava77 · 2020-02-13T17:44:51Z

[This issue is contingent on making ONNX properly working on PPC. ]

As a part of #28112 the support for running DeepFlavour and DeepDoubleX inference using TensorFlow was removed; it was replaced with the ONNX-based inference.
Trained model is the same in both cases.
The migration was done to minimize code replication and to bypass implementation/support of an adaptor layer that would allow a consistent support of both inference engines.

Similarly, deepBosted tagger was migrated from MXNet.

However, it was later discovered that ONNX lacks support for PPC architecture.
Some significant effort was made on CMS side to add support, e.g. in cms-externals/onnxruntime#4 but this did not give numerically correct performance at run time.

There is some somewhat significant motivation to provide support for PPC with numerically correct performance.
So, the more straightforward solution now is to keep support for TF-based [and MXNet] inference on PPC.

@hqucms @makortel @mrodozov

slava77 · 2020-02-13T17:45:01Z

assign reconstruction

cmsbuild · 2020-02-13T17:45:11Z

New categories assigned: reconstruction

@slava77,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2020-02-13T17:45:13Z

A new Issue was created by @slava77 Slava Krutelyov.

@Dr15Jones, @smuzaffar, @silviodonato, @makortel, @davidlange6, @fabiocos can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2020-02-13T19:57:42Z

@slava77 Thanks for opening the issue. Are you essentially suggesting to continue use ONNX inference on x86 and ARM, and TF on POWER (i.e. not using TF for all architectures e.g. after #28711)?

Technically we could do that with SwitchProducer, but we would need to solve #28576 first (and that will likely take a while).

@mrodozov @smuzaffar We should probably anyway disable building the ONNX-dependent libraries and plugins on POWER.

slava77 · 2020-02-13T20:45:00Z

@makortel
I think that a switch producer would be a good option.
The alternative is some "ppcProduction" process modifier, which may be possible to pass through the submission infrastructure.

I'm not sure if #28576 fully applies here. IIUC, it is possible to build ONNX on Power and load libraries without anything exploding.

makortel · 2020-02-13T21:07:58Z

The alternative is some "ppcProduction" process modifier, which may be possible to pass through the submission infrastructure.

I just want to remind that in that case the configuration hash would be different between x86 and POWER (which motivated the SwitchProducer).

I'm not sure if #28576 fully applies here. IIUC, it is possible to build ONNX on Power and load libraries without anything exploding.

At the moment the POWER build fails
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc7_ppc64le_gcc820/CMSSW_11_1_X_2020-02-12-2300/PhysicsTools/ONNXRuntime

Do I interpret correctly this means that cms-externals/onnxruntime#4 (or something like it) is needed to build ONNX for POWER in the first place?

slava77 · 2020-02-13T22:11:28Z

Do I interpret correctly this means that cms-externals/onnxruntime#4 (or something like it) is needed to build ONNX for POWER in the first place?

right, that's what I implied by "it's possible to build ONNX on power"

makortel · 2020-02-13T22:21:56Z

Do I interpret correctly this means that cms-externals/onnxruntime#4 (or something like it) is needed to build ONNX for POWER in the first place?

right, that's what I implied by "it's possible to build ONNX on power"

Thanks for the clarification. I'm not really fond of "hacking an external to build" as a mid/long-term solution. I suppose the added complexity might be justified in the (very) short term if the ONNX continues to be "much faster" than TF for these models even after #28711.

I'd prefer to find a general solution (#28576) because I'm afraid the situation will only get worse in the future (but I'm happy to be proven wrong).

smuzaffar · 2020-02-13T22:30:38Z

cms-externals/onnxruntime#4 only allows us to build but it does not run. ONNX unit tests failed on PowerPC

slava77 · 2020-02-26T21:15:28Z

I just realized that this issue should extend to the MXNET as well (used in deepBoosted taggers)

hqucms · 2020-03-10T20:43:36Z

#29172 is open to address this issue. Feedback is welcome!

slava77 · 2020-05-01T12:48:22Z

I'm closing this for now, after cms-sw/cmsdist#5743 was integrated

cmsbuild added pending-signatures reconstruction-pending labels Feb 13, 2020

slava77 changed the title ~~recover TF inference for deepJet and DeepDoubleX (migrated to ONNX in #28112)~~ recover TF inference for deepJet and deepDoubleX and MXNET inf for deepBoosted (migrated to ONNX in #28112) Feb 26, 2020

slava77 mentioned this issue Feb 26, 2020

MXNet-based implementation of the ParticleNet tagger #28902

Merged

hqucms mentioned this issue Mar 10, 2020

Restore TF and MXNet-based inference for DeepJet, DeepDoubleX and DeepAK8 #29172

Closed

slava77 closed this as completed May 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recover TF inference for deepJet and deepDoubleX and MXNET inf for deepBoosted (migrated to ONNX in #28112) #28959

recover TF inference for deepJet and deepDoubleX and MXNET inf for deepBoosted (migrated to ONNX in #28112) #28959

slava77 commented Feb 13, 2020 •

edited

Loading

slava77 commented Feb 13, 2020

cmsbuild commented Feb 13, 2020

cmsbuild commented Feb 13, 2020

makortel commented Feb 13, 2020

slava77 commented Feb 13, 2020

makortel commented Feb 13, 2020

slava77 commented Feb 13, 2020

makortel commented Feb 13, 2020

smuzaffar commented Feb 13, 2020

slava77 commented Feb 26, 2020

hqucms commented Mar 10, 2020

slava77 commented May 1, 2020

recover TF inference for deepJet and deepDoubleX and MXNET inf for deepBoosted (migrated to ONNX in #28112) #28959

recover TF inference for deepJet and deepDoubleX and MXNET inf for deepBoosted (migrated to ONNX in #28112) #28959

Comments

slava77 commented Feb 13, 2020 • edited Loading

slava77 commented Feb 13, 2020

cmsbuild commented Feb 13, 2020

cmsbuild commented Feb 13, 2020

makortel commented Feb 13, 2020

slava77 commented Feb 13, 2020

makortel commented Feb 13, 2020

slava77 commented Feb 13, 2020

makortel commented Feb 13, 2020

smuzaffar commented Feb 13, 2020

slava77 commented Feb 26, 2020

hqucms commented Mar 10, 2020

slava77 commented May 1, 2020

slava77 commented Feb 13, 2020 •

edited

Loading