Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PTAL]Gridsample op support #4288

Merged
merged 36 commits into from
Nov 11, 2022
Merged

[PTAL]Gridsample op support #4288

merged 36 commits into from
Nov 11, 2022

Conversation

LRY89757
Copy link
Contributor

Many Bugs remain. But I will try to finish it as soon as possible. : )

@codecov-commenter
Copy link

codecov-commenter commented Oct 19, 2022

Codecov Report

Merging #4288 (3990c33) into master (5b28c17) will decrease coverage by 0.03%.
The diff coverage is 96.47%.

@@            Coverage Diff             @@
##           master    #4288      +/-   ##
==========================================
- Coverage   91.70%   91.66%   -0.04%     
==========================================
  Files         783      784       +1     
  Lines      184366   185051     +685     
==========================================
+ Hits       169070   169632     +562     
- Misses      15296    15419     +123     
Impacted Files Coverage Δ
src/layer/gridsample.cpp 96.47% <96.47%> (ø)
src/cpu.cpp 58.28% <0.00%> (-0.64%) ⬇️
src/layer/riscv/packing_riscv.cpp 89.84% <0.00%> (+<0.01%) ⬆️
src/layer.cpp 47.61% <0.00%> (+1.61%) ⬆️
src/mat.h 92.21% <0.00%> (+2.33%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2022

This pull request introduces 1 alert when merging b28e6f9 into c33cbc9 - view on LGTM.com

new alerts:

  • 1 for Empty branch of conditional

@LRY89757
Copy link
Contributor Author

通过了几乎所有组合正确性测试,除了这一个:bicubic+reflection,这个精度只有在1e-1的时候可以过,其他的都能在1e-4的情况下通过:

(0, 0, 0, 0):  tensor(0.0808) tensor(0.0809)
(0, 0, 0, 1):  tensor(0.4152) tensor(0.4463)
(0, 0, 0, 2):  tensor(0.3586) tensor(0.3593)
(0, 0, 0, 3):  tensor(0.5852) tensor(0.6012)
(0, 0, 0, 4):  tensor(0.3457) tensor(0.3605)
(0, 0, 0, 5):  tensor(0.0900) tensor(0.1026)
(0, 0, 0, 6):  tensor(0.3670) tensor(0.3727)
(0, 0, 0, 7):  tensor(-0.0226) tensor(-0.0166)

@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2022

This pull request introduces 1 alert when merging 9c77f2e into c33cbc9 - view on LGTM.com

new alerts:

  • 1 for Empty branch of conditional

@LRY89757
Copy link
Contributor Author

LRY89757 commented Oct 20, 2022

通过了几乎所有组合正确性测试,除了这一个:bicubic+reflection,这个精度只有在1e-1的时候可以过,其他的都能在1e-4的情况下通过:

(0, 0, 0, 0):  tensor(0.0808) tensor(0.0809)
(0, 0, 0, 1):  tensor(0.4152) tensor(0.4463)
(0, 0, 0, 2):  tensor(0.3586) tensor(0.3593)
(0, 0, 0, 3):  tensor(0.5852) tensor(0.6012)
(0, 0, 0, 4):  tensor(0.3457) tensor(0.3605)
(0, 0, 0, 5):  tensor(0.0900) tensor(0.1026)
(0, 0, 0, 6):  tensor(0.3670) tensor(0.3727)
(0, 0, 0, 7):  tensor(-0.0226) tensor(-0.0166)

实在搞不明白为什么,这个gridsample有三个参数resize_type,padding_mode,align_corner,目前只有resize_type=bicubic, padding_mode=reflection的情况过不去,但是其他的reflection的情况都能过,其他的bicubic的情况都能过,说明调用的两个接口是没有问题的,但是这两个合到一块儿也不是计算逻辑的错误,就单单是精度有问题过不去,真是奇怪?

源码之前是我瞎改的,只能过去某几个情况,现在都改成完全参考pytorch的cpp实现,只剩这个test不过去了。

@LRY89757
Copy link
Contributor Author

LRY89757 commented Oct 21, 2022

  • 目前成功支持4d,5d以及各种情况下的gridsample, 5d情况下在自动生成的test_ncnn.py基础上手动更改squeeze可以通过。
  • 只有当4d tensor(N,C,H,W), 且resize_type=bicubic, padding_mode=reflection的情况下卡了精度(1e-1)无法和torch对齐
  • 另外关于pnnx 5d input2的pnnx转化需要解决,目前pnnx的test_ncnn.py生成逻辑有点问题,目前不是很明白 “把F grid sample第二个input也设batchindex” 怎么做
    @nihui

@LRY89757
Copy link
Contributor Author

LRY89757 commented Oct 21, 2022

  • 目前成功支持4d,5d以及各种情况下的gridsample, 5d情况下在自动生成的test_ncnn.py基础上手动更改squeeze可以通过。
  • 只有当4d tensor(N,C,H,W), 且resize_type=bicubic, padding_mode=reflection的情况下卡了精度(1e-1)无法和torch对齐
  • 另外关于pnnx 5d input2的pnnx转化需要解决,目前pnnx的test_ncnn.py生成逻辑有点问题,目前不是很明白 “把F grid sample第二个input也设batchindex” 怎么做
    @nihui

感谢指导,问题已解决,同时一个很惊喜的发现是原来爆精度的两个情况有一个也不爆精度了,目前唯一的缺点就是当griddims=4d resize_type=bicubic, padding_mode=reflection align_corners=False下精度有一些小问题,
测试精度为1e-4,现在精度为1e-3.5🤣

image

@nihui nihui closed this Oct 22, 2022
@nihui nihui reopened this Oct 22, 2022
@LRY89757 LRY89757 changed the title [WIP]Gridsample op support [PTAL]Gridsample op support Oct 22, 2022
@nihui nihui closed this Nov 1, 2022
@nihui nihui reopened this Nov 1, 2022
@nihui
Copy link
Member

nihui commented Nov 1, 2022

we could have linear_interp1d like cubic_interp1d to get rid of many inboud checking

@LRY89757 LRY89757 closed this Nov 1, 2022
@LRY89757 LRY89757 reopened this Nov 1, 2022
tests/CMakeLists.txt Outdated Show resolved Hide resolved
src/layer/gridsample.h Outdated Show resolved Hide resolved
src/layer/gridsample.h Outdated Show resolved Hide resolved
tools/pnnx/src/pass_ncnn/solve_batch_index.cpp Outdated Show resolved Hide resolved
src/layer/gridsample.cpp Outdated Show resolved Hide resolved
src/layer/gridsample.cpp Outdated Show resolved Hide resolved
@LRY89757 LRY89757 closed this Nov 3, 2022
@LRY89757 LRY89757 reopened this Nov 3, 2022
@LRY89757
Copy link
Contributor Author

LRY89757 commented Nov 3, 2022

事实证明,nearest还是需要进行in_bound检测的,否则的话过不去test_gridsample :(

@LRY89757 LRY89757 closed this Nov 8, 2022
@LRY89757 LRY89757 reopened this Nov 8, 2022
@nihui nihui merged commit 6a47f8d into Tencent:master Nov 11, 2022
@nihui
Copy link
Member

nihui commented Nov 11, 2022

Thanks for your contribution !

csukuangfj added a commit to csukuangfj/ncnn that referenced this pull request Dec 1, 2022
* remove duplicated newline (Tencent#4187)

* remove duplicated newline (Tencent#4188)

* optmize softmax arm neon (Tencent#4171)

* [docs] Fix typo (Tencent#4201)

* [Prelu x86] Finish intrinsic with elempack merged (Tencent#4177)

* changed size of images for pretty formatting of page (Tencent#4193)

* [Gelu x86] Finish intrinsic with elempack merged(fast version) (Tencent#4144)

* Finish the gelu x86 intrinsics
* Finish the fast tanh x86 simd impl

* Ignore .xmake directory (Tencent#4212)

* Bump pypa/cibuildwheel from 2.9.0 to 2.10.1 (Tencent#4207)

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.9.0 to 2.10.1.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.9.0...v2.10.1)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* style: space alignment (Tencent#4217)

* Ignore CMakeSettings.json, the Visual Studio CMake schema file (Tencent#4228)

* RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part Tencent#4100) (Tencent#4118)

* RVV: use size_t for vl

* RVV: replace vsseg.v tuple type by using regex

-----

search:
vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1\(([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)\), vl\);

substitute by:
vsseg$1e$2_v_$3$2m$4($5, $6, vl);

* RVV: replace vssseg.v tuple types by using regex

---

search:
vssseg([1-9])e(8|16|32)_v_f\2m1x\1\(([ -~]+), vcreate_f\2m1x\1\(([ -~]+)\), vl\);

substitute by:
vssseg$1e$2_v_f$2m1($3, $4, vl);

* RVV: replace vlseg.v tuple types in load/store

* RVV: replace vloxseg2ei32.v tuple types

* RVV: add a wrapper for old compilers

* RVV: add segment load/store wrapper in pakcing

* RVV: fix cmake test

* RVV: make clang happy by dropping VLAs in sgemm

* RVV: add clang cmake toolchain configure

* RVV: add clang ci, riscv64-unknown-linux-gnu

Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>

* Bump pypa/cibuildwheel from 2.10.1 to 2.10.2 (Tencent#4220)

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.1 to 2.10.2.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.10.1...v2.10.2)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add c906 build ci (Tencent#4232)

* Add benchmark result of T-Head TH1520 (Tencent#4240)

`cpuinfo`: 

```
isa             : rv64imafdcvsu
mmu             : sv39
cpu-freq                : 1.848Ghz
cpu-icache              : 64KB
cpu-dcache              : 64KB
cpu-l2cache             : 1MB
cpu-tlb         : 1024 4-ways
cpu-cacheline           : 64Bytes
cpu-vector              : 0.7.1
```

Compiled with `-DCMAKE_TOOLCHAIN_FILE=../toolchains/c910-v240.toolchain.cmake -DCMAKE_BUILD_TYPE=release -DNCNN_OPENMP=OFF -DNCNN_THREADS=OFF -DNCNN_RUNTIME_CPU=OFF -DNCNN_RVV=ON -DNCNN_SIMPLEOCV=ON -DNCNN_BUILD_EXAMPLES=ON` 

Seems much worse than expected 🤔

* fix param parsing issue when layer/blob name exceeds 255 (Tencent#4236)

* fix param parsing issue when layer/blob name exceeds 255

* apply code-format changes

Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com>

* Memory Pool Improvement For Variadic Sized Inputs (Tencent#4190)

* Simple miss count for better space efficiency

* Simple double ended greedy;

* Add size drop threshold setter;

* set workspace allocator cr to zero as we had some sort of recylcing capability :P

Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>

* docs: disable fp16 when wrong results encountered caused by overflow (Tencent#4248)

* pnnx math operation (Tencent#4251)

* more stricter armv7 fp16 and armv84 bf16 compiler check, fix Tencent#4147 fix Tencent#4222 (Tencent#4247)

* modified the param axes of expanddims in modelwriter (Tencent#4259)

* Add TH1520 (4*C910V) toolchain support.  (Tencent#4267)

* implement lstm proj_size (Tencent#4263)

* Optimize x86 DeformableConv2D (Tencent#4128)

* fix compile warning with gcc 9.1.0 including simplestl.h file (Tencent#4274)

* fix compile warning with gcc 9.1.0 including simplestl.h file

* apply code-format changes

Co-authored-by: veahow <veahow@users.noreply.github.com>

* add benchmark for rk3588 on rock5b (Tencent#4275)

* linux-x64-cpu-gcc on tencent ci

* implement layer feature disabled bit (Tencent#4278)

* add elu vulkan operator (Tencent#4280)

* fix tencent ci (Tencent#4277)

* implement GLU and pnnx conversion (Tencent#4283)

* Bump pypa/cibuildwheel from 2.10.2 to 2.11.1 (Tencent#4271)

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.2 to 2.11.1.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.10.2...v2.11.1)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix pnnx softmax/normalize/slice negative axis conversion to ncnn (Tencent#4284)

* pnnx glu batchindex aware conversion (Tencent#4285)

* 1. Fix typo in readme (Tencent#4287)

* x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family (Tencent#4286)

* pnnx skip dynamic size evaluation (Tencent#4291)

* Fix linux build error(Tencent#4265) (Tencent#4294)

Co-authored-by: wangyu <786794414@qq.com>

* general cpu feature detection on macos/ios, enable bf16 and i8mm on a15 a16 and m2 (Tencent#4300)

* x86 unified fc fp32/fp16s (Tencent#4303)

* more fma
* more transpose utility function

* Bump pypa/cibuildwheel from 2.11.1 to 2.11.2 (Tencent#4308)

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.11.1 to 2.11.2.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.11.1...v2.11.2)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* pnnx pytorch 1.13 (Tencent#4314)

* fix Tencent#4315 (Tencent#4316)

* get_physical_cpu_count api family (Tencent#4302)

* get_physical_cpu_count api family

* set default to physical big cpu

* always treat smt core as big core

* is_smt_cpu

* get max freq mhz on windows

* windows thread affinity

* groupnorm 1d/2d/4d (Tencent#4312)

* fix slice end index, fix fp16 model weight alignment (Tencent#4317)

* tencent ci test-coverage pnnx (Tencent#4305)

* RVV: BatchNorm with fp16s(a) support (Tencent#4075)

* RVV: InstanceNorm with fp16s(a) support (Tencent#4078)

* fix ci pnnx build

* fold new_full and full_like (Tencent#4323)

* pnnx convert nn.Softmax2d (Tencent#4324)

* pnnx convert fold unfold (Tencent#4325)

* support yolov5 6.2 (Tencent#4328)

* implement ncnn fold and unfold (Tencent#4326)

* pnnx load gpu torchscript and reset device (Tencent#4330)

* fix:pnnx-softmax (Tencent#4333)

* pnnx save onnx zero (Tencent#4077)

* save foldable constants in file for reducing memory usage (Tencent#4337)

* match inplace slice copy pattern, rewrite copy uses (Tencent#4338)

* add vector optimization for loongarch64 (Tencent#4242)

* ci loongarch64 lsx (Tencent#4344)

* gridsample op support (Tencent#4288)



Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>

* squeeze and expanddims 4d (Tencent#4346)

* implement MultiheadAttention kdim vdim (Tencent#4347)

* pnnx convert torch bitwise left_shift right_shift (Tencent#4349)

* pnnx fp16 option for ncnn and onnx weight type (Tencent#4350)

* pnnx fuse more function to module (Tencent#4351)

* pnnx fuse more function to module

* rename some pass name

* fuse adjacent reshape, fuse pad conv2d

* fuse pad conv1d

* split tests (Tencent#4354)

* Support mat.numpy() in Python (Tencent#4356)

* Fix typo in stb_image.h (Tencent#4358)

exitting -> exiting

* Fix windows-arm64 build for non-neon case (Tencent#4227)

* update release ci (Tencent#4359)

* update release ci

* find modern glslang

* parallel jobs on windows

* Fix c api allocator (Tencent#4360)

* add some c_api interfaces related to allocator setup.

* fix errors in allocator parameters in c_api.

* test c api allocator

Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>

* update glslang (Tencent#4361)

* disable out-of-line atomics since ndk23+ for resolving linking issue with old ndk (Tencent#4362)

* I added one more project to the list of examples. (Tencent#4205)

* Dedicated to coloring black and white photographs.

* add example project link (Tencent#4365)

* fix(pybind11): build error (Tencent#4368)

* fix openmp affinity abort when cpu goes offline (Tencent#4370)

* Update release-python.yml

* small fixes

* unpack list input

* Remove LSTM2

* fix LSTM

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Menci <huanghaorui301@gmail.com>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: Lry89757 <77330637+LRY89757@users.noreply.github.com>
Co-authored-by: magicse <magicse@users.noreply.github.com>
Co-authored-by: Zhuo Zhang <imzhuo@foxmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: 汤圆奶昔 <47135403+tonori@users.noreply.github.com>
Co-authored-by: Xavier Hsinyuan <me@lstlx.com>
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
Co-authored-by: 柚木鉉 <740291272@qq.com>
Co-authored-by: Zhang Ge <sjtu.zg123@gmail.com>
Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com>
Co-authored-by: LinHe <LinHe.Lurking@gmail.com>
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: MisakaBit <MisakaBit@gmail.com>
Co-authored-by: LiuYi-Up <73060646+LiuYi-Up@users.noreply.github.com>
Co-authored-by: 陸 言 <robinluaa@outlook.com>
Co-authored-by: miemie2013 <53960695+miemie2013@users.noreply.github.com>
Co-authored-by: Eahow Chen <15228088+veahow@users.noreply.github.com>
Co-authored-by: veahow <veahow@users.noreply.github.com>
Co-authored-by: li mengyang <hwdefcom@outlook.com>
Co-authored-by: Yoh <wpz_yoh@163.com>
Co-authored-by: Caize Wu <zepanwucai@gmail.com>
Co-authored-by: bestpower <wangyu117136@gmail.com>
Co-authored-by: wangyu <786794414@qq.com>
Co-authored-by: shaoshengsong <30892500+shaoshengsong@users.noreply.github.com>
Co-authored-by: WuJinxuan <2456510228@qq.com>
Co-authored-by: junchao-loongson <68935141+junchao-loongson@users.noreply.github.com>
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: Ikko Ashimine <eltociear@gmail.com>
Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>
Co-authored-by: tpoisonooo <khj.application@aliyun.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants