need help about both model weight and activation quantization with only a float32 mlmodel

from the issue "https://developer.apple.com/forums/thread/740518 how do we use the computational power of A17 Pro Neural Engine?"

I learn that if i want to inference my mlmodel on my ipad pro with m4 soc int8 38T ane high performance, i have to use the coreml torch api to quantize both weight and activation during training time quantization with int8 datatype.

my question is: 
I only have a fp32 mlmodel without torch code or model, what can i do?
by the way, if just only weight int8 quantization, M4 ane will use fp16 to compute or int8?
thanks for your help～


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need help about both model weight and activation quantization with only a float32 mlmodel #2227

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

need help about both model weight and activation quantization with only a float32 mlmodel #2227

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions