[PTAL]Gridsample op support #4288

LRY89757 · 2022-10-19T13:03:35Z

Many Bugs remain. But I will try to finish it as soon as possible. : )

codecov-commenter · 2022-10-19T13:34:58Z

Codecov Report

Merging #4288 (3990c33) into master (5b28c17) will decrease coverage by 0.03%.
The diff coverage is 96.47%.

@@            Coverage Diff             @@
##           master    #4288      +/-   ##
==========================================
- Coverage   91.70%   91.66%   -0.04%     
==========================================
  Files         783      784       +1     
  Lines      184366   185051     +685     
==========================================
+ Hits       169070   169632     +562     
- Misses      15296    15419     +123

Impacted Files	Coverage Δ
src/layer/gridsample.cpp	`96.47% <96.47%> (ø)`
src/cpu.cpp	`58.28% <0.00%> (-0.64%)`	⬇️
src/layer/riscv/packing_riscv.cpp	`89.84% <0.00%> (+<0.01%)`	⬆️
src/layer.cpp	`47.61% <0.00%> (+1.61%)`	⬆️
src/mat.h	`92.21% <0.00%> (+2.33%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

lgtm-com · 2022-10-19T13:39:45Z

This pull request introduces 1 alert when merging b28e6f9 into c33cbc9 - view on LGTM.com

new alerts:

1 for Empty branch of conditional

LRY89757 · 2022-10-19T17:01:50Z

通过了几乎所有组合正确性测试，除了这一个：bicubic+reflection，这个精度只有在1e-1的时候可以过，其他的都能在1e-4的情况下通过：

(0, 0, 0, 0):  tensor(0.0808) tensor(0.0809)
(0, 0, 0, 1):  tensor(0.4152) tensor(0.4463)
(0, 0, 0, 2):  tensor(0.3586) tensor(0.3593)
(0, 0, 0, 3):  tensor(0.5852) tensor(0.6012)
(0, 0, 0, 4):  tensor(0.3457) tensor(0.3605)
(0, 0, 0, 5):  tensor(0.0900) tensor(0.1026)
(0, 0, 0, 6):  tensor(0.3670) tensor(0.3727)
(0, 0, 0, 7):  tensor(-0.0226) tensor(-0.0166)

lgtm-com · 2022-10-19T17:39:32Z

This pull request introduces 1 alert when merging 9c77f2e into c33cbc9 - view on LGTM.com

new alerts:

1 for Empty branch of conditional

LRY89757 · 2022-10-20T15:46:45Z

通过了几乎所有组合正确性测试，除了这一个：bicubic+reflection，这个精度只有在1e-1的时候可以过，其他的都能在1e-4的情况下通过：
(0, 0, 0, 0):  tensor(0.0808) tensor(0.0809)
(0, 0, 0, 1):  tensor(0.4152) tensor(0.4463)
(0, 0, 0, 2):  tensor(0.3586) tensor(0.3593)
(0, 0, 0, 3):  tensor(0.5852) tensor(0.6012)
(0, 0, 0, 4):  tensor(0.3457) tensor(0.3605)
(0, 0, 0, 5):  tensor(0.0900) tensor(0.1026)
(0, 0, 0, 6):  tensor(0.3670) tensor(0.3727)
(0, 0, 0, 7):  tensor(-0.0226) tensor(-0.0166)

实在搞不明白为什么，这个gridsample有三个参数resize_type，padding_mode，align_corner，目前只有resize_type=bicubic, padding_mode=reflection的情况过不去，但是其他的reflection的情况都能过，其他的bicubic的情况都能过，说明调用的两个接口是没有问题的，但是这两个合到一块儿也不是计算逻辑的错误，就单单是精度有问题过不去，真是奇怪？

源码之前是我瞎改的，只能过去某几个情况，现在都改成完全参考pytorch的cpp实现，只剩这个test不过去了。

LRY89757 · 2022-10-21T12:19:19Z

目前成功支持4d,5d以及各种情况下的gridsample, 5d情况下在自动生成的test_ncnn.py基础上手动更改加squeeze可以通过。
只有当4d tensor(N,C,H,W), 且resize_type=bicubic, padding_mode=reflection的情况下卡了精度（1e-1）无法和torch对齐
另外关于pnnx 5d input2的pnnx转化需要解决，目前pnnx的test_ncnn.py生成逻辑有点问题，目前不是很明白 “把F grid sample第二个input也设batchindex” 怎么做
@nihui

LRY89757 · 2022-10-21T15:44:03Z

目前成功支持4d,5d以及各种情况下的gridsample, 5d情况下在自动生成的test_ncnn.py基础上手动更改加squeeze可以通过。

只有当4d tensor(N,C,H,W), 且resize_type=bicubic, padding_mode=reflection的情况下卡了精度（1e-1）无法和torch对齐

另外关于pnnx 5d input2的pnnx转化需要解决，目前pnnx的test_ncnn.py生成逻辑有点问题，目前不是很明白 “把F grid sample第二个input也设batchindex” 怎么做
@nihui

感谢指导，问题已解决，同时一个很惊喜的发现是原来爆精度的两个情况有一个也不爆精度了，目前唯一的缺点就是当griddims=4d resize_type=bicubic, padding_mode=reflection align_corners=False下精度有一些小问题，
测试精度为1e-4，现在精度为1e-3.5🤣

nihui · 2022-11-01T09:13:06Z

we could have linear_interp1d like cubic_interp1d to get rid of many inboud checking

tests/CMakeLists.txt

src/layer/gridsample.h

tools/pnnx/src/pass_ncnn/solve_batch_index.cpp

src/layer/gridsample.cpp

LRY89757 · 2022-11-03T05:29:55Z

事实证明，nearest还是需要进行in_bound检测的，否则的话过不去test_gridsample :(

nihui · 2022-11-11T12:56:06Z

Thanks for your contribution !

* remove duplicated newline (Tencent#4187) * remove duplicated newline (Tencent#4188) * optmize softmax arm neon (Tencent#4171) * [docs] Fix typo (Tencent#4201) * [Prelu x86] Finish intrinsic with elempack merged (Tencent#4177) * changed size of images for pretty formatting of page (Tencent#4193) * [Gelu x86] Finish intrinsic with elempack merged(fast version) (Tencent#4144) * Finish the gelu x86 intrinsics * Finish the fast tanh x86 simd impl * Ignore .xmake directory (Tencent#4212) * Bump pypa/cibuildwheel from 2.9.0 to 2.10.1 (Tencent#4207) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.9.0 to 2.10.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.9.0...v2.10.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * style: space alignment (Tencent#4217) * Ignore CMakeSettings.json, the Visual Studio CMake schema file (Tencent#4228) * RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part Tencent#4100) (Tencent#4118) * RVV: use size_t for vl * RVV: replace vsseg.v tuple type by using regex ----- search: vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1$([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)$, vl\); substitute by: vsseg$1e$2_v_$3$2m$4($5, $6, vl); * RVV: replace vssseg.v tuple types by using regex --- search: vssseg([1-9])e(8|16|32)_v_f\2m1x\1$([ -~]+), vcreate_f\2m1x\1\(([ -~]+)$, vl\); substitute by: vssseg$1e$2_v_f$2m1($3, $4, vl); * RVV: replace vlseg.v tuple types in load/store * RVV: replace vloxseg2ei32.v tuple types * RVV: add a wrapper for old compilers * RVV: add segment load/store wrapper in pakcing * RVV: fix cmake test * RVV: make clang happy by dropping VLAs in sgemm * RVV: add clang cmake toolchain configure * RVV: add clang ci, riscv64-unknown-linux-gnu Co-authored-by: thelastlin <thelastlin@users.noreply.github.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> * Bump pypa/cibuildwheel from 2.10.1 to 2.10.2 (Tencent#4220) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.1 to 2.10.2. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.10.1...v2.10.2) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * add c906 build ci (Tencent#4232) * Add benchmark result of T-Head TH1520 (Tencent#4240) `cpuinfo`: ``` isa : rv64imafdcvsu mmu : sv39 cpu-freq : 1.848Ghz cpu-icache : 64KB cpu-dcache : 64KB cpu-l2cache : 1MB cpu-tlb : 1024 4-ways cpu-cacheline : 64Bytes cpu-vector : 0.7.1 ``` Compiled with `-DCMAKE_TOOLCHAIN_FILE=../toolchains/c910-v240.toolchain.cmake -DCMAKE_BUILD_TYPE=release -DNCNN_OPENMP=OFF -DNCNN_THREADS=OFF -DNCNN_RUNTIME_CPU=OFF -DNCNN_RVV=ON -DNCNN_SIMPLEOCV=ON -DNCNN_BUILD_EXAMPLES=ON` Seems much worse than expected 🤔 * fix param parsing issue when layer/blob name exceeds 255 (Tencent#4236) * fix param parsing issue when layer/blob name exceeds 255 * apply code-format changes Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com> * Memory Pool Improvement For Variadic Sized Inputs (Tencent#4190) * Simple miss count for better space efficiency * Simple double ended greedy; * Add size drop threshold setter; * set workspace allocator cr to zero as we had some sort of recylcing capability :P Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> * docs: disable fp16 when wrong results encountered caused by overflow (Tencent#4248) * pnnx math operation (Tencent#4251) * more stricter armv7 fp16 and armv84 bf16 compiler check, fix Tencent#4147 fix Tencent#4222 (Tencent#4247) * modified the param axes of expanddims in modelwriter (Tencent#4259) * Add TH1520 (4*C910V) toolchain support. (Tencent#4267) * implement lstm proj_size (Tencent#4263) * Optimize x86 DeformableConv2D (Tencent#4128) * fix compile warning with gcc 9.1.0 including simplestl.h file (Tencent#4274) * fix compile warning with gcc 9.1.0 including simplestl.h file * apply code-format changes Co-authored-by: veahow <veahow@users.noreply.github.com> * add benchmark for rk3588 on rock5b (Tencent#4275) * linux-x64-cpu-gcc on tencent ci * implement layer feature disabled bit (Tencent#4278) * add elu vulkan operator (Tencent#4280) * fix tencent ci (Tencent#4277) * implement GLU and pnnx conversion (Tencent#4283) * Bump pypa/cibuildwheel from 2.10.2 to 2.11.1 (Tencent#4271) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.2 to 2.11.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.10.2...v2.11.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix pnnx softmax/normalize/slice negative axis conversion to ncnn (Tencent#4284) * pnnx glu batchindex aware conversion (Tencent#4285) * 1. Fix typo in readme (Tencent#4287) * x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family (Tencent#4286) * pnnx skip dynamic size evaluation (Tencent#4291) * Fix linux build error(Tencent#4265) (Tencent#4294) Co-authored-by: wangyu <786794414@qq.com> * general cpu feature detection on macos/ios, enable bf16 and i8mm on a15 a16 and m2 (Tencent#4300) * x86 unified fc fp32/fp16s (Tencent#4303) * more fma * more transpose utility function * Bump pypa/cibuildwheel from 2.11.1 to 2.11.2 (Tencent#4308) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.11.1 to 2.11.2. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.11.1...v2.11.2) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * pnnx pytorch 1.13 (Tencent#4314) * fix Tencent#4315 (Tencent#4316) * get_physical_cpu_count api family (Tencent#4302) * get_physical_cpu_count api family * set default to physical big cpu * always treat smt core as big core * is_smt_cpu * get max freq mhz on windows * windows thread affinity * groupnorm 1d/2d/4d (Tencent#4312) * fix slice end index, fix fp16 model weight alignment (Tencent#4317) * tencent ci test-coverage pnnx (Tencent#4305) * RVV: BatchNorm with fp16s(a) support (Tencent#4075) * RVV: InstanceNorm with fp16s(a) support (Tencent#4078) * fix ci pnnx build * fold new_full and full_like (Tencent#4323) * pnnx convert nn.Softmax2d (Tencent#4324) * pnnx convert fold unfold (Tencent#4325) * support yolov5 6.2 (Tencent#4328) * implement ncnn fold and unfold (Tencent#4326) * pnnx load gpu torchscript and reset device (Tencent#4330) * fix:pnnx-softmax (Tencent#4333) * pnnx save onnx zero (Tencent#4077) * save foldable constants in file for reducing memory usage (Tencent#4337) * match inplace slice copy pattern, rewrite copy uses (Tencent#4338) * add vector optimization for loongarch64 (Tencent#4242) * ci loongarch64 lsx (Tencent#4344) * gridsample op support (Tencent#4288) Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> * squeeze and expanddims 4d (Tencent#4346) * implement MultiheadAttention kdim vdim (Tencent#4347) * pnnx convert torch bitwise left_shift right_shift (Tencent#4349) * pnnx fp16 option for ncnn and onnx weight type (Tencent#4350) * pnnx fuse more function to module (Tencent#4351) * pnnx fuse more function to module * rename some pass name * fuse adjacent reshape, fuse pad conv2d * fuse pad conv1d * split tests (Tencent#4354) * Support mat.numpy() in Python (Tencent#4356) * Fix typo in stb_image.h (Tencent#4358) exitting -> exiting * Fix windows-arm64 build for non-neon case (Tencent#4227) * update release ci (Tencent#4359) * update release ci * find modern glslang * parallel jobs on windows * Fix c api allocator (Tencent#4360) * add some c_api interfaces related to allocator setup. * fix errors in allocator parameters in c_api. * test c api allocator Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com> * update glslang (Tencent#4361) * disable out-of-line atomics since ndk23+ for resolving linking issue with old ndk (Tencent#4362) * I added one more project to the list of examples. (Tencent#4205) * Dedicated to coloring black and white photographs. * add example project link (Tencent#4365) * fix(pybind11): build error (Tencent#4368) * fix openmp affinity abort when cpu goes offline (Tencent#4370) * Update release-python.yml * small fixes * unpack list input * Remove LSTM2 * fix LSTM Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Menci <huanghaorui301@gmail.com> Co-authored-by: luqiang guo <702572275@qq.com> Co-authored-by: Lry89757 <77330637+LRY89757@users.noreply.github.com> Co-authored-by: magicse <magicse@users.noreply.github.com> Co-authored-by: Zhuo Zhang <imzhuo@foxmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 汤圆奶昔 <47135403+tonori@users.noreply.github.com> Co-authored-by: Xavier Hsinyuan <me@lstlx.com> Co-authored-by: thelastlin <thelastlin@users.noreply.github.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> Co-authored-by: 柚木鉉 <740291272@qq.com> Co-authored-by: Zhang Ge <sjtu.zg123@gmail.com> Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com> Co-authored-by: LinHe <LinHe.Lurking@gmail.com> Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> Co-authored-by: MisakaBit <MisakaBit@gmail.com> Co-authored-by: LiuYi-Up <73060646+LiuYi-Up@users.noreply.github.com> Co-authored-by: 陸言 <robinluaa@outlook.com> Co-authored-by: miemie2013 <53960695+miemie2013@users.noreply.github.com> Co-authored-by: Eahow Chen <15228088+veahow@users.noreply.github.com> Co-authored-by: veahow <veahow@users.noreply.github.com> Co-authored-by: li mengyang <hwdefcom@outlook.com> Co-authored-by: Yoh <wpz_yoh@163.com> Co-authored-by: Caize Wu <zepanwucai@gmail.com> Co-authored-by: bestpower <wangyu117136@gmail.com> Co-authored-by: wangyu <786794414@qq.com> Co-authored-by: shaoshengsong <30892500+shaoshengsong@users.noreply.github.com> Co-authored-by: WuJinxuan <2456510228@qq.com> Co-authored-by: junchao-loongson <68935141+junchao-loongson@users.noreply.github.com> Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: Ikko Ashimine <eltociear@gmail.com> Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com> Co-authored-by: tpoisonooo <khj.application@aliyun.com>

LRY89757 and others added 3 commits October 19, 2022 20:57

Add the naive impl but many bugs

00e9943

apply code-format changes

2d96bb5

Merge branch 'Tencent:master' into gridsample

b28e6f9

Pass the correctness check

19e4511

Add the cmath header

9c77f2e

LRY89757 added 2 commits October 20, 2022 23:31

grid_sample to gridsample

0a10106

temporarily disable the test of 5d input

793be94

LRY89757 and others added 4 commits October 21, 2022 00:41

fix small bugs

acf0109

remove omp collapse

48387c6

Add the support for 5d in torch

ec7c4e1

apply code-format changes

4f5b319

LRY89757 and others added 3 commits October 21, 2022 23:33

Merge branch 'Tencent:master' into gridsample

56b5f3e

Update 5d support add gridsample batch_index

0a0b2df

apply code-format changes

b5bcd6d

LRY89757 and others added 2 commits October 22, 2022 00:07

Update the test func

25cba0c

apply code-format changes

fc3b74a

nihui closed this Oct 22, 2022

nihui reopened this Oct 22, 2022

LRY89757 added 2 commits October 22, 2022 16:12

remove cmath header for ci

e1a0be6

clang-format

917f19e

LRY89757 changed the title ~~[WIP]Gridsample op support~~ [PTAL]Gridsample op support Oct 22, 2022

LRY89757 added 2 commits October 26, 2022 00:02

Update the Operators doc

5b65777

Update the Operators doc

84f3969

fix bugs of int and remove hacks

04342c5

nihui closed this Nov 1, 2022

nihui reopened this Nov 1, 2022

nihui and others added 3 commits November 1, 2022 14:34

Merge remote-tracking branch 'upstream/master' into gridsample

e66f034

NCNN_FORCEINLINE thx to Yoh

9022828

simplify reflect coord

041c759

LRY89757 and others added 2 commits November 1, 2022 18:58

linear_interp1d,3d, rm nearest redundant

2b4bcb6

apply code-format changes

b7ceb6a

LRY89757 closed this Nov 1, 2022

LRY89757 reopened this Nov 1, 2022

LRY89757 and others added 2 commits November 1, 2022 22:17

change to c,d,h,w loop for better reading

0eb87e4

apply code-format changes

7aaeb6a

nihui reviewed Nov 2, 2022

View reviewed changes

LRY89757 and others added 2 commits November 2, 2022 16:36

drw2zxy, rm _, resize2sample

6ddff1c

apply code-format changes

025a0b2

LRY89757 closed this Nov 3, 2022

LRY89757 reopened this Nov 3, 2022

nearest fixed

db12e8b

LRY89757 and others added 2 commits November 3, 2022 13:34

nearest fixed

bd29ece

Merge branch 'master' into gridsample

c96ed53

LRY89757 closed this Nov 8, 2022

LRY89757 reopened this Nov 8, 2022

nihui added 2 commits November 11, 2022 14:40

Merge remote-tracking branch 'upstream/master' into gridsample

b13e514

update gridsample

3990c33

nihui merged commit 6a47f8d into Tencent:master Nov 11, 2022

LRY89757 mentioned this pull request Nov 15, 2022

PNNX is an open standard for PyTorch model interoperability #3262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PTAL]Gridsample op support #4288

[PTAL]Gridsample op support #4288

LRY89757 commented Oct 19, 2022

codecov-commenter commented Oct 19, 2022 •

edited

Loading

lgtm-com bot commented Oct 19, 2022

LRY89757 commented Oct 19, 2022

lgtm-com bot commented Oct 19, 2022

LRY89757 commented Oct 20, 2022 •

edited

Loading

LRY89757 commented Oct 21, 2022 •

edited

Loading

LRY89757 commented Oct 21, 2022 •

edited

Loading

nihui commented Nov 1, 2022

LRY89757 commented Nov 3, 2022

nihui commented Nov 11, 2022

[PTAL]Gridsample op support #4288

[PTAL]Gridsample op support #4288

Conversation

LRY89757 commented Oct 19, 2022

codecov-commenter commented Oct 19, 2022 • edited Loading

Codecov Report

lgtm-com bot commented Oct 19, 2022

LRY89757 commented Oct 19, 2022

lgtm-com bot commented Oct 19, 2022

LRY89757 commented Oct 20, 2022 • edited Loading

LRY89757 commented Oct 21, 2022 • edited Loading

LRY89757 commented Oct 21, 2022 • edited Loading

nihui commented Nov 1, 2022

LRY89757 commented Nov 3, 2022

nihui commented Nov 11, 2022

codecov-commenter commented Oct 19, 2022 •

edited

Loading

LRY89757 commented Oct 20, 2022 •

edited

Loading

LRY89757 commented Oct 21, 2022 •

edited

Loading

LRY89757 commented Oct 21, 2022 •

edited

Loading