from the issue "https://developer.apple.com/forums/thread/740518 how do we use the computational power of A17 Pro Neural Engine?"
I learn that if i want to inference my mlmodel on my ipad pro with m4 soc int8 38T ane high performance, i have to use the coreml torch api to quantize both weight and activation during training time quantization with int8 datatype.
my question is:
I only have a fp32 mlmodel without torch code or model, what can i do?
by the way, if just only weight int8 quantization, M4 ane will use fp16 to compute or int8?
thanks for your help~
from the issue "https://developer.apple.com/forums/thread/740518 how do we use the computational power of A17 Pro Neural Engine?"
I learn that if i want to inference my mlmodel on my ipad pro with m4 soc int8 38T ane high performance, i have to use the coreml torch api to quantize both weight and activation during training time quantization with int8 datatype.
my question is:
I only have a fp32 mlmodel without torch code or model, what can i do?
by the way, if just only weight int8 quantization, M4 ane will use fp16 to compute or int8?
thanks for your help~