Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVS+ Support #7

Open
WolframRhodium opened this issue Jul 25, 2021 · 64 comments
Open

AVS+ Support #7

WolframRhodium opened this issue Jul 25, 2021 · 64 comments

Comments

@WolframRhodium
Copy link
Owner

This issue tracks information related to AviSynth+ support. Please discuss in the doom9's thread.

@WolframRhodium
Copy link
Owner Author

WolframRhodium commented Jul 25, 2021

experimental releases

@kedaitinh12
Copy link

Thanks, can you update avisynth cpu ver??

@kedaitinh12
Copy link

Deleted, sr for disturb, don't care about sse3, sse4

@kedaitinh12
Copy link

Can you make auto build for avs+ x86?? Thanks
https://github.com/WolframRhodium/VapourSynth-BM3DCUDA/actions/runs/1168740493

@WolframRhodium
Copy link
Owner Author

Yes, I will do it in the near future.

@kedaitinh12
Copy link

Thanks

@WolframRhodium
Copy link
Owner Author

Done 26a5015.

@kedaitinh12
Copy link

kedaitinh12 commented Sep 3, 2021

Thanks, i don't think "near future" only 5 hours 😂😂😂

@kedaitinh12
Copy link

kedaitinh12 commented Sep 3, 2021

But x86 ver don't have bm3d_vaggregate_avs.dll

@WolframRhodium
Copy link
Owner Author

But x86 ver don't have bm3d_vaggregate_avs.dll

Thanks.

Thanks, i don't think "near future" only 5 hours 😂😂😂

I thought there must be some compilation errors but everything goes smoothly.

@kedaitinh12
Copy link

x86 ver have bm3d_vaggregate_avs now. Thanks

@mysteryx93
Copy link

In Avisynth, bm_range is limited to 1-8 while it can easily be 16 in VapourSynth

@WolframRhodium
Copy link
Owner Author

In Avisynth, bm_range is limited to 1-8 while it can easily be 16 in VapourSynth

Fixed. Thanks for the information.

@Reel-Deal
Copy link

@WolframRhodium

Thank you for putting up the new releases on here.

@tormento
Copy link

Port 2.8 with internal VAggregate to AVS+ :)

@tormento
Copy link

@WolframRhodium please? :)

@madey83
Copy link

madey83 commented Oct 18, 2022

hi,

i use BM3DCUDA_AVS-test9 on my RTX 2060 with below call as prefilter:
BM3D(sigma=10,preset="normal",radius=3,UV=1,gpuid=0,tv_range=true)

and i saw that clock of RTX is set to 855MHz only....

@mysteryx93
Copy link

mysteryx93 commented Oct 19, 2022

GPU may be limited by transfer bandwidth. It's designed for outputting graphics, not to transfer massive data back and forth.

@WolframRhodium
Copy link
Owner Author

WolframRhodium commented Oct 19, 2022

hi,

i use BM3DCUDA_AVS-test9 on my RTX 2060 with below call as prefilter: BM3D(sigma=10,preset="normal",radius=3,UV=1,gpuid=0,tv_range=true)

and i saw that clock of RTX is set to 855MHz only....

@madey83

Hi. You should always use Prefetch() to enable multi-threading. The difference in speed is huge. Check the example here.

Port 2.8 with internal VAggregate to AVS+ :)

Various wrappers for AVS+ exist and I don't think there is a need to introduce it.

@madey83
Copy link

madey83 commented Oct 19, 2022

@WolframRhodium
at the end of my script i use this: Prefetch(6,12) and it can't brake 855MHz
image

@WolframRhodium
Copy link
Owner Author

WolframRhodium commented Oct 19, 2022

The Prefetch call should follow the BM3D_VAggregate call immediately.

@madey83
Copy link

madey83 commented Oct 19, 2022

hi,
this is my script call:
image

@WolframRhodium
Copy link
Owner Author

WolframRhodium commented Oct 19, 2022

...
ex_BM3D(...)
Prefetch(...)
...

@madey83
Copy link

madey83 commented Oct 19, 2022

sorry, but i do not catch your answer....

@WolframRhodium
Copy link
Owner Author

Sorry about that.

The script should be

...

pre=ex_BM3D(sigma=10,preset="normal",radius=3,UV=1,gpuid=0,tv_range=true)

pre=Prefetch(pre,6,12) # <= ** new line here **

SMDgrain(prefilter=pre, LFR=false, limits=false, DFTFlicker=false, tr=2, thSAD=)

...

@madey83
Copy link

madey83 commented Oct 19, 2022

thank you for answer. this not help but maybe this is the problem with WinOS,
i will test it again when i will have clean Win installation done.

@WolframRhodium
Copy link
Owner Author

What if you remove all the following filters and output pre directly?

@tormento
Copy link

The real bottleneck is the aggregate part (i.e. the temporal part of BM3D), that is still done in CPU.

@tormento
Copy link

@WolframRhodium sorry to bother you again but I'd like to see the porting with internal aggregation :)

@WolframRhodium
Copy link
Owner Author

It is simply a kind of wrapper in terms of avisynth, in which scripts and plugins are treated equally.

@tormento
Copy link

@WolframRhodium so there is no speed advantage in the so called BM3Dv2?

@WolframRhodium
Copy link
Owner Author

Yep.

@tormento
Copy link

And has this part

Improve performance of VAggregate() and BM3Dv2() for temporal denoising.
This VAggregate() implementation is measured to be ~40% faster than the original implementation, resulting in 0 ~ 5% speedup overall.

been ported? :)

@WolframRhodium
Copy link
Owner Author

Previously bm3dcpu/cuda on vs are using VapourSynth-BM3D for VAggregate computation, which is never available for avs.

@newcapricasean
Copy link

Any chance of this avisynth+ version continuing to be under development? I particularly would love to see, for example, like the vapoursynth one, specific cpu type optimized versions made. But, also, with any other improvements included in the vapoursynth compiles. If you could show me / us how to compile the avisynth+ version, from the vapoursynth source code (if that's how you did it - you didn't provide source code for these avisynth+ versions), then I / we could do it ourselves... Why hold on to the obsolete avisynth+? Well, for the moment, I've found that, the TemporalDegrain2 avisynth+ script, with the BM3D_CPU, does better than the BM3D_CUDA/CPU, alone. That script also produces the best output, with the BM3D_CUDA/CPU. So, if you could possibly help in some way, that would be great! Thanks, in advance!

@WolframRhodium
Copy link
Owner Author

The source code for the avisynth+ version is in the avs+ branch, and the corresponding automatic compilation script is here.

BM3D_CPU should not produce noticeable result compared to BM3D_CUDA. That is a design objective.

@newcapricasean
Copy link

newcapricasean commented Jan 28, 2024 via email

@WolframRhodium
Copy link
Owner Author

The cuda one can be made deterministic by setting the extractor_exp parameter to 3 or higher.

@newcapricasean
Copy link

newcapricasean commented Jan 29, 2024 via email

@WolframRhodium
Copy link
Owner Author

There is no max number because the least value required for reproducible sum depends on the number of summation operands, which may increase as the value of parameters radius, bm_range, block_step, ps_num, ps_range increase.

This parameter does reduce accuracy, because this is the price of deterministic result. However, this error is marginal compared to the error of conventional fp32 -> uint16/uint10/uint8 conversion.

@newcapricasean
Copy link

newcapricasean commented Jan 29, 2024 via email

@WolframRhodium
Copy link
Owner Author

The yml file uses GitHub actions to compile the plugins on GitHub-hosted runners. You may check individual cmake commands in that file to compile on your host.

You can change the compilation flags.

@newcapricasean
Copy link

newcapricasean commented Jan 29, 2024 via email

@WolframRhodium
Copy link
Owner Author

git clone -b avs+ https://github.com/WolframRhodium/VapourSynth-BM3DCUDA

cd VapourSynth-BM3DCUDA

cmake -S . -B build -G Ninja -D CMAKE_BUILD_TYPE=Release -D USE_NVRTC_STATIC=ON -D ENABLE_AVISYNTHPLUS=ON -D AVISYNTHPLUS_INCLUDE_DIRECTORY="%cd%\avisynth+\avs_core\include" -D ENABLE_VAPOURSYNTH=OFF -D CMAKE_CXX_FLAGS="/fp:fast" -D CMAKE_CUDA_FLAGS="--threads 0 --use_fast_math --resource-usage -Wno-deprecated-gpu-targets" -D CMAKE_CUDA_ARCHITECTURES="50;61-real;75-real;86-real;89-real"

cmake --build build

@newcapricasean
Copy link

newcapricasean commented Jan 29, 2024 via email

@newcapricasean
Copy link

newcapricasean commented Jan 29, 2024 via email

@WolframRhodium
Copy link
Owner Author

Only cxx flags will be used. The flags depend on the compiler you use.

avs+ code should be re-implemented in vapoursynth for maximal performance in general.

@newcapricasean
Copy link

newcapricasean commented Jan 30, 2024 via email

@WolframRhodium
Copy link
Owner Author

This is not related to this repository.

@tormento
Copy link

Any wish to get AVS+ version on par with VS release?

@WolframRhodium
Copy link
Owner Author

The cuda backend is equivalent.

@tormento
Copy link

VAggregate is still external, plus you didn't add all the subsequent fixes and new cards.

@WolframRhodium
Copy link
Owner Author

VAggregate is external but it can be easily handled by various wrappers in the field.
Which fixes and new cards are not included?

@tormento
Copy link

I was reading:

bm3d.VAggregate should be called after temporal filtering, as in VapourSynth-BM3D. Alternatively, you may use the BM3Dv2() interface for both spatial and temporal denoising in one step.

Isn't that faster that a wrapper?

@WolframRhodium
Copy link
Owner Author

No.

@tormento
Copy link

About GPU support, forgive me if I am wrong but the last AVS+ build was on Jan 31, 2023 (R12.3.test).

Later you released VS builds up to R12.4, introducing AMD and Intel support.

@WolframRhodium
Copy link
Owner Author

I am not intended to port these implementations to AVS+.

@kedaitinh12
Copy link

About GPU support, forgive me if I am wrong but the last AVS+ build was on Jan 31, 2023 (R12.3.test).

Later you released VS builds up to R12.4, introducing AMD and Intel support.

You can ask Asd-g if he is interested in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants