Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifacts using TemporalMedian #1

Closed
Selur opened this issue Apr 25, 2024 · 51 comments
Closed

Artifacts using TemporalMedian #1

Selur opened this issue Apr 25, 2024 · 51 comments
Assignees
Labels
bug Something isn't working

Comments

@Selur
Copy link

Selur commented Apr 25, 2024

Am I missing something?
Using:

clip = core.lsmas.LWLibavSource(source="C:/Users/Selur/Desktop/test edge bars.avi", format="YUV422P8", stream_index=0, cache=0, prefer_hw=0)
clip = core.zsmooth.TemporalMedian(clip, radius=1`)

grafik
grafik
grafik
with both the avx2 and the avx512 version on Windows11, Ryzen 9 7950X, Vapoursynth R65.
Artifacts change depending on the input color space.
This only happens with some files.

This does not happen with all files, but a bunch, I attached a small sample file
example.zip
grafik
no clue what the cause happens with different files with different resolution and color spaces.

@adworacz
Copy link
Owner

adworacz commented Apr 25, 2024

Fascinating - thanks for the bug report. I'll take a look at what could be causing this in the next few days.

Do the artifacts appear on any other filters, or just TemporalMedian? Nevermind, I'll test it on all filters regardless and see what's happening.

@adworacz
Copy link
Owner

I have a sneaking suspicion that I already know what the issue is (and that's my handling of stride). I'll try and get some tests in soon.

@adworacz adworacz self-assigned this Apr 25, 2024
@adworacz adworacz added the bug Something isn't working label Apr 25, 2024
@dnjulek
Copy link

dnjulek commented Apr 25, 2024

Yes, I just tested it and this is a stride problem, VS won't always leave stride == width, it depends on the type of memory alignment, in this case the video is 720 and the stride 736.

@adworacz
Copy link
Owner

Solid, thank you for confirming @dnjulek .

I'll update all of my functions to respect stride properly and then I'll post a new version.

@Selur
Copy link
Author

Selur commented Apr 26, 2024

Nice, looking forward to a fixed version.
(happy that it isn't an issue only I have and it is a problem with my hardware ;))

@adworacz
Copy link
Owner

Selur, please try the 0.3 release, which should have everything fixed now: https://github.com/adworacz/zsmooth/releases/tag/0.3

@Selur
Copy link
Author

Selur commented Apr 27, 2024

At least here it is not fixed, now I get a black bar on the right side,... (resolution: 720x480)
grafik

@adworacz
Copy link
Owner

Bugger okay, thank you for confirming.

I also found that it crashes with non 8-bit content as well, so I'll work on a new round of fixes.

@dnjulek
Copy link

dnjulek commented Apr 27, 2024

I also found that it crashes with non 8-bit content as well

For non 8-bit stride you need to divide it by sizeOf(T)

@dnjulek
Copy link

dnjulek commented Apr 27, 2024

If you don't want to divide, sekrit-twc usually casts the ptr back to u8 and always makes the ptr shift as u8.
example:
https://github.com/vapoursynth/vapoursynth/blob/d27b54e121112cb85cdb8f4200c706396265bbb7/src/core/kernel/x86/convolution_avx2.cpp#L18-L21

@adworacz
Copy link
Owner

I also found that it crashes with non 8-bit content as well

For non 8-bit stride you need to divide it by sizeOf(T)

Yup, that's exactly the solution I came to myself. Since all divisions are powers of 2, I've got no problems using division since the compiler will optimize it all to right shifts anyways.

@Selur - I've got version 0.4 up now: https://github.com/adworacz/zsmooth/releases/tag/0.4

I tested it locally with TemporalMedian (radius = 1) on RGB, YUV420, YUV422, YUV444, in 1920x1080 and 720x480 and I get no black bars anywhere. I even added unit tests to assert the stride handling behavior.

So fingers cross all of the artifacts/black bars issues should be fixed now.

@Selur
Copy link
Author

Selur commented Apr 28, 2024

Doesn't work either.
with the sample I posted above:
grafik
with others:
grafik
(for both avx2 and avx512)
grafik
Here's another sample to try:
Fever.zip
I use QTGMC to deinterlace and then:

  core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/DenoiseFilter/ZSmooth/zsmooth.avx512.dll")
  clip = core.zsmooth.TemporalMedian(clip, radius=1)

@Selur
Copy link
Author

Selur commented Apr 28, 2024

Just to show that it's not just mpeg2 content:
grafik
small sample (190x240, MPEG-4 ASP): sample_mpeg4.zip

@dnjulek
Copy link

dnjulek commented Apr 28, 2024

I've just build from source and the error doesn't happen, maybe the dll is wrong?

@dnjulek
Copy link

dnjulek commented Apr 28, 2024

ah, you're using avx512, I can't test it here

@Selur
Copy link
Author

Selur commented Apr 28, 2024

I tested both and used:
https://github.com/adworacz/zsmooth/releases/download/0.4/zsmooth.avx2.dll
https://github.com/adworacz/zsmooth/releases/download/0.4/zsmooth.avx512.dll
just tested it again.
broken

Here's the script I use for the comparision:

# Imports
import vapoursynth as vs
# getting Vapoursynth core
core = vs.core
# loading plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/LSmashSource/LSMASHSource.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/DenoiseFilter/ZSmooth/zsmooth.avx2.dll")

# Source: 'C:\Users\Selur\Desktop\sample_mpeg4.mp4'
# Current color space: YUV420P8, bit depth: 8, resolution: 190x240, frame rate: 30fps, scanorder: progressive, yuv luminance scale: full, matrix: 470bg, transfer: bt.601
# Loading C:\Users\Selur\Desktop\sample_mpeg4.mp4 using LibavSMASHSource
clip = core.lsmas.LibavSMASHSource(source="C:/Users/Selur/Desktop/sample_mpeg4.mp4")
frame = clip.get_frame(0)
# Setting detected color matrix (470bg).
clip = core.std.SetFrameProps(clip, _Matrix=5)
# setting color transfer (601), if it is not set.
if ('_Transfer' not in frame.props) or (frame.props['_Transfer'] == '') or (frame.props['_Transfer'] == 0):
  clip = core.std.SetFrameProps(clip, _Transfer=5)
# setting color primaries info (to 470), if it is not set.
if ('_Primaries' not in frame.props) or (frame.props['_Primaries'] == '') or (frame.props['_Primaries'] == 0):
  clip = core.std.SetFrameProps(clip, _Primaries=5)
# setting color range to PC (full) range.
clip = core.std.SetFrameProps(clip=clip, _ColorRange=0)
# making sure frame rate is set to 30fps
clip = core.std.AssumeFPS(clip=clip, fpsnum=30, fpsden=1)
# making sure the detected scan type is set (detected: progressive)
clip = core.std.SetFrameProps(clip=clip, _FieldBased=0) # progressive

original = clip
clip = core.zsmooth.TemporalMedian(clip, radius=1)
# adjusting output color from: YUV420P8 to YUV420P10 for NVEncModel
original = core.resize.Bicubic(clip=original, format=vs.YUV420P10, range_s="full")
# adjusting output color from: YUV420P8 to YUV420P10 for NVEncModel
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P10, range_s="full")
original = core.text.Text(clip=original,text="Original",scale=1,alignment=7)
clip = core.text.Text(clip=clip,text="Filtered",scale=1,alignment=7)
stacked = core.std.StackHorizontal([original,clip])
# set output frame rate to 30fps (progressive)
stacked = core.std.AssumeFPS(clip=stacked, fpsnum=30, fpsden=1)
# output
stacked.set_output()

@Selur
Copy link
Author

Selur commented Apr 28, 2024

To be sure it's nothing else I did, I simplified the script:

# Imports
import vapoursynth as vs
# getting Vapoursynth core
core = vs.core
# loading plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/LSmashSource/LSMASHSource.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/DenoiseFilter/ZSmooth/zsmooth.avx2.dll")

# Source: 'C:\Users\Selur\Desktop\sample_mpeg4.mp4'
# Current color space: YUV420P8, bit depth: 8, resolution: 190x240, frame rate: 30fps, scanorder: progressive, yuv luminance scale: full, matrix: 470bg, transfer: bt.601
# Loading C:\Users\Selur\Desktop\sample_mpeg4.mp4 using LibavSMASHSource
clip = core.lsmas.LibavSMASHSource(source="C:/Users/Selur/Desktop/sample_mpeg4.mp4")

original = clip
clip = core.zsmooth.TemporalMedian(clip, radius=1)


original = core.text.Text(clip=original,text="Original",scale=1,alignment=7)
clip = core.text.Text(clip=clip,text="Filtered",scale=1,alignment=7)
stacked = core.std.StackHorizontal([original,clip])


# output
stacked.set_output()

broken

@dnjulek
Copy link

dnjulek commented Apr 28, 2024

python_O54GocB8t4.webm

I've just done another test, the previous one was with Linux from source, now I've done it in Windows with 0.4/zsmooth.avx2.dll, no errors.

@Selur
Copy link
Author

Selur commented Apr 28, 2024

Okay, so it must be something related to my setup.
You did use https://github.com/adworacz/zsmooth/releases/download/0.4/zsmooth.avx2.dll right?

@dnjulek
Copy link

dnjulek commented Apr 28, 2024

Yes, CRC64: E3D6E45A3C709BF6

@Selur
Copy link
Author

Selur commented Apr 28, 2024

Do you know a tool which calculates CRC64 on Windows?

SHA512-Hash of zsmooth.avx2.dll:
2c19f8cdce4b53be14d273b6bc20fb74263108d3fe5545c0c09a7f91c6fa2568b5a91b6178dd9a450c7013ad1e813ea57a0ac80697cf7e8537340408d1d9d6b0

@Selur
Copy link
Author

Selur commented Apr 28, 2024

found one:
crc64 zsmooth.avx2.dll
CRC: [0x50F2D3E114F1F9A4] File: [zsmooth.avx2.dll]

@Selur
Copy link
Author

Selur commented Apr 28, 2024

Redownloaded https://github.com/adworacz/zsmooth/releases/download/0.4/zsmooth.avx2.dll
crc64 is still 50F2D3E114F1F9A4
Seems like we are using different files
I used https://sourceforge.net/projects/crc64/ to calculate the crc64.
(also downloaded the file with different browsers, same result)

@adworacz
Copy link
Owner

This is what I have on my end, which seems to match your original sha512, Selur

sha256
e6d9360b2fd8a1bc3a94ed81efd272bd533f6f553de9ef56a4d45a913ab8f7c3 zsmooth.avx2.dll

sha512
2c19f8cdce4b53be14d273b6bc20fb74263108d3fe5545c0c09a7f91c6fa2568b5a91b6178dd9a450c7013ad1e813ea57a0ac80697cf7e8537340408d1d9d6b0 zsmooth.avx2.dll

@adworacz
Copy link
Owner

I'll try testing with your sample

@adworacz
Copy link
Owner

Hmmm, I'm also seeing no issues at all using your test sample.

I'm using version 66 of Vapoursynth - what version are you using?

no-artifacts

Used script:

import vapoursynth as vs
core = vs.core

clip = core.lsmas.LibavSMASHSource("/home/adub/Downloads/sample_mpeg4.mp4")
zsmooth = clip.zsmooth.TemporalMedian(radius=1)

clip = clip.text.Text("Original")
zsmooth = zsmooth.text.Text("Filtered")
stacked = core.std.StackHorizontal([clip,zsmooth])
stacked.set_output(0)

@dnjulek
Copy link

dnjulek commented Apr 28, 2024

Do you know a tool which calculates CRC64 on Windows?

7-zip, right-click -> 7-zip has all the options.

@Selur
Copy link
Author

Selur commented Apr 28, 2024

I'm using R65 since some of the torch stuff I use isn't available for Python 3.12.
I'll test with R66.

@Selur
Copy link
Author

Selur commented Apr 28, 2024

Do you know a tool which calculates CRC64 on Windows?

7-zip, right-click -> 7-zip has all the options.

Name: zsmooth.avx2.dll
Größe: 946176 Bytes (924 KiB)
CRC64: E3D6E45A3C709BF6
so we are using the same file :)

@Selur
Copy link
Author

Selur commented Apr 28, 2024

Using R66 I still have the same error. :(
grafik

@adworacz
Copy link
Owner

This is very strange. It's almost like the it's not processing all planes, or getting the wrong width for different planes.

I think I'm going to have to put a debug build together that spits out some of this information so we can see what the plugin is working with. I have no idea why it would be any different on your machine, since it should be using the same function calls in Vapoursynth for all of them.

It is interesting that the output looks a little different in VS 66 - there's some of the original image that's come through, but it's like one of the planes wasn't operated on properly.

@Selur
Copy link
Author

Selur commented Apr 28, 2024

It is interesting that the output looks a little different in VS 66 - there's some of the original image that's come through, but it's like one of the planes wasn't operated on properly.
I agree, using the same script:
R65:
grafik
R66:
grafik

@NSQY
Copy link

NSQY commented Apr 29, 2024

I can reproduce this with the 0.4 release DLLs (both avx2 and avx512), however I cannot repo when compiling af9c956 with 0.13.0-dev.46+3648d7df1

Here's my env

Windows 11
7950X3D

        Core R66
        API R4.1
        API R3.6
        Options: -
        Number of Threads: 32
        Max Cache Size: 4096

Minimal test case:

import vapoursynth as vs
core = vs.core

clip = core.std.BlankClip(None, width=720, height=480, format=vs.RGB24, color=[127]*3)

zsmooth = core.zsmooth.TemporalMedian(clip, radius=1)

zsmooth.set_output()

@adworacz
Copy link
Owner

Oh realllyyy???

That’s fascinating. I wonder if it’s a zig version difference (I’m using 0.12), or something else.

Or are you saying that the issue was created in DLL 0.4?

@NSQY - would you mind trying a rebuild using 0.12 with the latest commit?

I want to track down if it’s a zig version thing or a bug introduced in 0.4.

@adworacz
Copy link
Owner

It might also be a cross-compilation issue, since I’m compiling the binaries on a Linux machine. Zig should be doing everything right for cross-compilation, but we’ll have to see.

I’ll look into setting up a Windows VM so I can attempt to replicate the problem myself.

@NSQY
Copy link

NSQY commented Apr 29, 2024

Or are you saying that the issue was created in DLL 0.4?

I'm saying that I can only reproduce with whatever was used to compile the provided .dll files, so either they're based off a different commit or something went wrong in the compile.

I can reproduce the original problem with git checkout 0.2, but not the broken column of pixels. I can only reproduce this with the provided DLL. See the provided .webm.

test512.webm

@NSQY
Copy link

NSQY commented Apr 29, 2024

992421d

0.13.0-dev.46+3648d7df1 OK
0.12.0 OK

@NSQY
Copy link

NSQY commented Apr 29, 2024

I wrote a basic script that just checks the PlaneStatsAverage of all BlankClip frames and ensures they're all the same. Looks like 5d40eff and newer are ok?

..\zig-windows-x86_64-0.12.0\zig.exe build -Doptimize=ReleaseFast

Test passed at commit 992421d08eca6a5692beab08bd56d9818598d255
Test passed at commit 2c36c478cfd0011c5739ee8a212a036fc93c69b6
Test passed at commit 690726fcb3a2df6142e6d24658a758250b833b40
Test passed at commit fb036ca4a4da4c403c055ca7eb06dad98a9d3073
Test passed at commit df86a2f6b83862a6720a05e65445b887dfe8c054
Test passed at commit cd1b7bae0ea7ebf4d50cce85a237e96fa1f19de6
Test passed at commit af9c95659b34d033dd8ae2172120282c0d217d83
Test passed at commit 8f5472849952ab1f59adaf01a7f83dc86dfec23f
Test passed at commit 156ec629543dc6575a474028d2178eba89dd71de
Test passed at commit 9e4457d577894d78af58fd765c96096aa677c46c
Test passed at commit 640ff1c4e20bc2c75e6dfd558047301e4b8293b1
Test passed at commit 5d40effeb758236e98b6dcf62a1c0931a7566951
Test failed at commit bb062280882a67603821fced5aa07a3cb2b84ec8: ERROR
Test failed at commit 3568626264eb04729df5f38eb11650b2cc0aa0e8: ERROR
Test failed at commit c4148f94d67e572d0c1fe068afafb8cb038aff3e: ERROR
Test failed at commit f303a18bc8825576288ec9d6f60c52da3c509f2c: ERROR
Test failed at commit 6137ada151550da02f24798cf45afb776b771d14: ERROR
Test failed at commit 7ce83e6089089e8104e4cd24bede54cf26f43edd: ERROR
Test failed at commit 6eed4ff6d87ca96518b180baecf26e100d7da4ca: ERROR
Test failed at commit fdb20191afc9e920a60b26a19e92ee1183ea1391: ERROR
Test failed at commit bdbe528db77d85a18a26ac5e728c221348417bf5: ERROR
Test failed at commit 21776f36345d5e0c9cf3ae75f564338f9033bd51: ERROR
Test failed at commit d1d703c74ef534f6a8eb04177633dba7f21a991d: ERROR
Test failed at commit 9ac320921529b7513715b3a133e4cb65ff0b06f7: ERROR
Test failed at commit 8f01d05ffd9503895ce43346788c8363fa43f28a: ERROR
Test failed at commit d97d55e5c55478977684123187f8085d1edf19a5: ERROR
Test failed at commit 9764ef150b3532d1039bb33d780d18850501045f: ERROR
Test failed at commit 587a1aa91b4279ce918b0df1c0bb6919b0c2f617: ERROR
Test failed at commit 214637fe4ed1a2693c5d048635f8c75e4c5a5ce8: ERROR
Test failed at commit 54ef2089cbc6ac4847fed786861a3b79fefe1562: ERROR
Test failed at commit fc2159cce5429bce2edb9998b756bca8109715ff: ERROR
Test failed at commit 1d87982690ef01415ab663edde38267e749e10b4: ERROR
Test failed at commit e7f3e3e2d282fa2b192f55dbfdb63d39d5870413: ERROR
Test failed at commit b4cc332b6d49e789841d0261d56347fcb16e5c78: ERROR
Test failed at commit f251e346253bcd221298c41e319cc6451703edcd: ERROR
Test failed at commit fd03c56968807574e2fb64f3795f54492724efaa: ERROR
Test failed at commit 998e9249c290a37522e417c8f74e3d3f8b91894f: ERROR
Test failed at commit 0dd1432b767c94785ea9acd11c8cb8274568f3a9: ERROR
Test failed at commit c224127d3a7099a347025b87cf556a2bf12be5a1: ERROR
Test failed at commit 3e6d03c0c2fc362e0dde08ea2663e38e53d883a4: ERROR
Test failed at commit 28a0a09a93a8a9328b86b610289e3bed2f6281f5: ERROR
(Memory error)

@adworacz
Copy link
Owner

Okay, this is super helpful testing.

The DLLs were compiled with this script, and cross compilation happens at this line specifically:

zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v3

This was done on my Intel laptop, running Linux.

I’m going to set up a VM to see if I can get this issue reproduced.

Meanwhile, it sounds like the workaround is simply for Windows users to compile their own DLLs from source on the machines they’re going to run the DLL on.

@NSQY
Copy link

NSQY commented Apr 29, 2024

I will try cross-compiling from my WSL Arch image. zig-windows-x86_64-0.13.0-dev.46+3648d7df1 errors at the same spot btw

Test passed at commit 992421d08eca6a5692beab08bd56d9818598d255
Test passed at commit 2c36c478cfd0011c5739ee8a212a036fc93c69b6
Test passed at commit 690726fcb3a2df6142e6d24658a758250b833b40
Test passed at commit fb036ca4a4da4c403c055ca7eb06dad98a9d3073
Test passed at commit df86a2f6b83862a6720a05e65445b887dfe8c054
Test passed at commit cd1b7bae0ea7ebf4d50cce85a237e96fa1f19de6
Test passed at commit af9c95659b34d033dd8ae2172120282c0d217d83
Test passed at commit 8f5472849952ab1f59adaf01a7f83dc86dfec23f
Test passed at commit 156ec629543dc6575a474028d2178eba89dd71de
Test passed at commit 9e4457d577894d78af58fd765c96096aa677c46c
Test passed at commit 640ff1c4e20bc2c75e6dfd558047301e4b8293b1
Test passed at commit 5d40effeb758236e98b6dcf62a1c0931a7566951
Test failed at commit bb062280882a67603821fced5aa07a3cb2b84ec8: ERROR
Test failed at commit 3568626264eb04729df5f38eb11650b2cc0aa0e8: ERROR
Test failed at commit c4148f94d67e572d0c1fe068afafb8cb038aff3e: ERROR
Test failed at commit f303a18bc8825576288ec9d6f60c52da3c509f2c: ERROR
Test failed at commit 6137ada151550da02f24798cf45afb776b771d14: ERROR
Test failed at commit 7ce83e6089089e8104e4cd24bede54cf26f43edd: ERROR
Test failed at commit 6eed4ff6d87ca96518b180baecf26e100d7da4ca: ERROR

Meanwhile, it sounds like the workaround is simply for Windows users to compile their own DLLs from source on the machines they’re going to run the DLL on.

Thankfully zig makes this trivial 😊

@NSQY
Copy link

NSQY commented Apr 29, 2024

992421d .dll built with 0.12.0 on Arch using zig/zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v3 is bad.

zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseSafe -Dtarget=x86_64-windows OK
zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseSafe -Dtarget=x86_64-windows -Dcpu=x86_64_v3 ERROR
zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseSafe -Dtarget=x86_64-windows -Dcpu=x86_64_v4 ERROR
zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseSafe -Dtarget=x86_64-windows -Dcpu=znver4 OK

zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows OK
zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v3 ERROR
zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v4 ERROR
zig/zig-linux-x86_64-0.12.0/zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=znver4 OK

zig/zig-linux-x86_64-0.13.0-dev.46+3648d7df1/zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v3 ERROR
zig/zig-linux-x86_64-0.13.0-dev.46+3648d7df1/zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v4 ERROR

zig/zig-linux-x86_64-0.13.0-dev.46+3648d7df1/zig build -Doptimize=ReleaseSafe -Dtarget=x86_64-windows OK

@adworacz
Copy link
Owner

This is great @NSQY ! I was starting to wonder the same thing, if it's related to the -Dcpu param.

If you run zig build -Doptimize=ReleaseFast -Dcpu=x86_64_v3 on you Windows install (not cross-compiling from an OS level, but from a CPU level), does the issue reproduce?

I'm starting to suspect that there's something wrong with the chosen SIMD vector lengths, and I'm wondering if it's specific to the CPU and not the operating system. Both you and Selur are running 7950X's, which have AVX512 and would use a 64 element wide SIMD vector for 8-bit content (for 512 bits). BUT the x86_64_v3 target uses a 32-element wide vector (256 bits), and shockingly so does the x86_64_v4 target

In theory, this shouldn't matter, but maybe I have a bug in my math somewhere or for some reason running 256bit vectors on a machine that supports 512bit vectors may be causing some kind of issue.

@NSQY
Copy link

NSQY commented Apr 29, 2024

build -Doptimize=ReleaseFast -Dcpu=x86_64_v3 does indeed reproduce the ERROR with both zig 0.12.0 and 0.13.0-dev.46+3648d7df1 when compiled on Windows.

@adworacz
Copy link
Owner

Excellent, we’re really narrowing things down now. I have a suspicion on the source of the issue.

I’ll try and get a potential fix together tonight.

@adworacz
Copy link
Owner

I have a quick potential fix up in a branch here: 389e11e

The pertinent one-line change is here:
389e11e#diff-52223101217accb3adcc290fd525773c059bbc26636d96b6597dbf0f7024d597L76-L78

🤞 that's the final piece.

I'll work on getting all other filters updated tonight if that proves to be the solution.

@dnjulek
Copy link

dnjulek commented Apr 29, 2024

I was checking this part of the code just now, I thought it would be here too.
About the width_simd formula, wouldn't width - (width % vec_size) be better?

@adworacz
Copy link
Owner

I was checking this part of the code just now, I thought it would be here too. About the width_simd formula, wouldn't width - (width % vec_size) be better?

They produce the exact same code after compilation. See this Compiler Explorer (Godbolt) link with both implementations. One jumps to other because they are identical.

https://zig.godbolt.org/z/fvW73PGrG

@adworacz
Copy link
Owner

Okay, I've committed potential fixes to Temporal Soften and FluxSmooth as well.

sha256

30fc26acad1c5e1708598ca1503e41f447280bd1703b7c70fb44c7ee4df07a5b  zsmooth.avx2.dll

Please try this version of the DLL:
zsmooth.avx2.dll.zip

@Selur
Copy link
Author

Selur commented Apr 30, 2024

That one seems to work for me. 👍

@NSQY
Copy link

NSQY commented Apr 30, 2024

I think with this we're good to go.

992421d ..\zig-windows-x86_64-0.12.0\zig.exe build -Doptimize=ReleaseFast -Dcpu=x86_64_v3

Summary
TemporalSoften1: ERROR
TemporalSoften2: ERROR
TemporalSoften3: ERROR
TemporalSoften4: ERROR
TemporalSoften5: ERROR
TemporalSoften6: ERROR
TemporalSoften7: ERROR
TemporalMedian1: ERROR
TemporalMedian2: ERROR
TemporalMedian3: ERROR
TemporalMedian4: ERROR
TemporalMedian5: ERROR
TemporalMedian6: ERROR
TemporalMedian7: ERROR
FluxSmoothT: ERROR
FluxSmoothST: OK
RemoveGrain1: OK
RemoveGrain2: OK
RemoveGrain3: OK
RemoveGrain4: OK
RemoveGrain5: OK
RemoveGrain6: OK
RemoveGrain7: OK
RemoveGrain8: OK
RemoveGrain9: OK
RemoveGrain10: OK
RemoveGrain11: OK
RemoveGrain12: OK
RemoveGrain13: OK
RemoveGrain14: OK
RemoveGrain15: OK
RemoveGrain16: OK
RemoveGrain17: OK
RemoveGrain18: OK
RemoveGrain19: OK
RemoveGrain20: OK
RemoveGrain21: OK
RemoveGrain22: OK
RemoveGrain23: OK
RemoveGrain24: OK

30fc26acad1c5e1708598ca1503e41f447280bd1703b7c70fb44c7ee4df07a5b zsmooth.avx2.dll

Summary
TemporalSoften1: OK
TemporalSoften2: OK
TemporalSoften3: OK
TemporalSoften4: OK
TemporalSoften5: OK
TemporalSoften6: OK
TemporalSoften7: OK
TemporalMedian1: OK
TemporalMedian2: OK
TemporalMedian3: OK
TemporalMedian4: OK
TemporalMedian5: OK
TemporalMedian6: OK
TemporalMedian7: OK
FluxSmoothT: OK
FluxSmoothST: OK
RemoveGrain1: OK
RemoveGrain2: OK
RemoveGrain3: OK
RemoveGrain4: OK
RemoveGrain5: OK
RemoveGrain6: OK
RemoveGrain7: OK
RemoveGrain8: OK
RemoveGrain9: OK
RemoveGrain10: OK
RemoveGrain11: OK
RemoveGrain12: OK
RemoveGrain13: OK
RemoveGrain14: OK
RemoveGrain15: OK
RemoveGrain16: OK
RemoveGrain17: OK
RemoveGrain18: OK
RemoveGrain19: OK
RemoveGrain20: OK
RemoveGrain21: OK
RemoveGrain22: OK
RemoveGrain23: OK
RemoveGrain24: OK

626ade0 ..\zig-windows-x86_64-0.12.0\zig.exe build -Doptimize=ReleaseFast -Dcpu=x86_64_v3

Summary
TemporalSoften1: OK
TemporalSoften2: OK
TemporalSoften3: OK
TemporalSoften4: OK
TemporalSoften5: OK
TemporalSoften6: OK
TemporalSoften7: OK
TemporalMedian1: OK
TemporalMedian2: OK
TemporalMedian3: OK
TemporalMedian4: OK
TemporalMedian5: OK
TemporalMedian6: OK
TemporalMedian7: OK
FluxSmoothT: OK
FluxSmoothST: OK
RemoveGrain1: OK
RemoveGrain2: OK
RemoveGrain3: OK
RemoveGrain4: OK
RemoveGrain5: OK
RemoveGrain6: OK
RemoveGrain7: OK
RemoveGrain8: OK
RemoveGrain9: OK
RemoveGrain10: OK
RemoveGrain11: OK
RemoveGrain12: OK
RemoveGrain13: OK
RemoveGrain14: OK
RemoveGrain15: OK
RemoveGrain16: OK
RemoveGrain17: OK
RemoveGrain18: OK
RemoveGrain19: OK
RemoveGrain20: OK
RemoveGrain21: OK
RemoveGrain22: OK
RemoveGrain23: OK
RemoveGrain24: OK

@Selur
Copy link
Author

Selur commented Apr 30, 2024

Nice, looking forward for the next release. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants