Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

per-layer feature mask #4273

Closed
nihui opened this issue Oct 14, 2022 · 0 comments · Fixed by #4278
Closed

per-layer feature mask #4273

nihui opened this issue Oct 14, 2022 · 0 comments · Fixed by #4278

Comments

@nihui
Copy link
Member

nihui commented Oct 14, 2022

create a new param entry with id 31=uint
use bit for per-layer feature masking

bool use_fp16_packed;
bool use_fp16_storage;
bool use_fp16_arithmetic;

Sample use case

7767517
6 6
Input            data         0 1 data 0=224 1=224 2=3
Convolution      conv1_1      1 1 data conv1_1 0=64 1=3 4=1 5=1 6=1728 9=1
Convolution      conv1_2      1 1 conv1_1 conv1_2 0=64 1=3 4=1 5=1 6=36864 9=1
Pooling          pool1        1 1 conv1_2 pool1 1=2 2=2
Convolution      conv2_1      1 1 pool1 conv2_1 0=128 1=3 4=1 5=1 6=73728 9=1
Convolution      conv2_2      1 1 conv2_1 output 0=128 1=3 4=1 5=1 6=147456 9=1

Typically, we use fp16 computation to improve inference speed
Because the weight value of conv2_1 is large, fp16 accumulation may cause numerical overflow, so fp16 needs to be disabled individually for conv2_1, while other layers continue to use fp16 mode

Add 31=1 i.e. (1<<0) as disabled bit to disable fp16

Convolution      conv2_1      1 1 pool1 conv2_1 0=128 1=3 4=1 5=1 6=73728 9=1 31=1

It is also possible to control num_threads for each layer individually, but it is not very useful, so no more precious bits are used

mask bit rationale
no fp16 arithmetic 1<<0 precision concern
no fp16 storage 1<<1 precision concern
no bf16 storage 1<<2 precision concern
no int8 1<<3 debug dynamic quantized model
no vulkan 1<<4 reduce overhead for cpu op - gpu split - cpu op
no sgemm 1<<5 reduce some memory
no winograd 1<<6 reduce some memory

These masks will be implemented, and more bits can be used to achieve other needs in the future

@nihui nihui linked a pull request Oct 17, 2022 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant