Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Player hangs after seeking backwards with multiple Prefetchers. #21

Closed
Nuihc88 opened this issue Nov 2, 2020 · 15 comments
Closed
Labels
bug Something isn't working

Comments

@Nuihc88
Copy link

Nuihc88 commented Nov 2, 2020

Environment

  • Filter version: All recent versions.
  • AviSynth+ version: Tested v3.5.2 (r3218) & 3.6.2 (r3325)
  • OS: Windows7 x64
  • Video player: Tested: PotPlayer 200908(1.7.21295), MPC-HC (1.9.8) & MPC-BE (1.5.5.5433) 32bit & 64bit
  • Splitter filter and video decoder: LAV Filters, MPC-HC & MPC-BE internal defaults
  • Video renderer: Tested: MadVR & EVR(C/P)
  • Video format: Tested: YV12
  • AviSynth script:

AvsFilterSource()
Prefetch(1,8)
OnCPU(16)

or

AvsFilterSource()
Prefetch(1,1)
Prefetch(1,2)

  • Do you use other DirectShow filters? Specify name and version: Happens with or without changing player defaults.

Describe the bug

Player window hangs as a result of seeking with multiple prefetchers enabled. Simplifying the script seems to have increased the reproduction sensitivity over how it was with filters in between. With the full script i originally detected it with, difference between AviSynth Filter builds seemed more obvious, however it very well could have been present all along.

To Reproduce

  1. Load video player.
  2. Load a video.
  3. Seek to middle.
  4. Repeatedly seek backwards in quick succession.

Checklist

  • Did you try a different video player and check if the issue persist? For example, if your main video player is MPC-HC, try MPC-BE.
    Answer: Yes.

  • Did you try a different video renderer and check if the issue persist? For example, if issue exists with madVR, try EVR.
    Answer: Yes

  • Did you try a different video file and check if the issue persist? Try another file with different format, dimension and frame rate.
    Answer: Yes

I'll describe the function and purpose of OnCPU & Prefetch (as i understand it) later on in another comment.

@Nuihc88 Nuihc88 added the bug Something isn't working label Nov 2, 2020
@Nuihc88
Copy link
Author

Nuihc88 commented Nov 2, 2020

Continued from here...

I tried many previous versions, and so far all of them hang.

I don't know why, but for me, it started consistently hanging after updating the filter and then working normally when switching back, but the behavior i observed initially was that after updating, it consistently took only one seek backwards to trigger; with the simplified script it takes more tries. Could be that i was unknowingly triggering some other hang condition and while troubleshooting ended up locating this one.

I've never used the OnCPU function before. Isn't it introduced in the Neo version? How is it supposed to be used? Should it be used in conjunction with Prefetch or mutually exclusive? If I only use one of them it works.

Neo builds (and derivatives) allow for the creation of OnCPU & OnCUDA pipelines, which usually speed up processing even when not strictly necessary and can be used in conjunction with it's enhanced Prefetch-function. There is very little documentation and most of what i know is from Google-translated Japanese from here and my own experimentation. Main differences between OnCPU & Prefetch as i understand them are:

  • OnCPU is a cache management pipeline applied to every function preceding it, Prefetch() included; it would appear to cycle through functions from upstream to downstream; it also always uses only one thread. OnCPU is usually faster than Prefetch(), but there seem to be times when a combination works better.
  • Neo version of the Prefetch() function allows for multiple uses throughout the script for extra buffering between functions; the user can also specify the number of frames to buffer.

The hang comes from the script clip waiting for its thread pool to terminate, which never happens. I have not changed any logic around that since long time.

I guess it could even be an AviSynth+ bug. I remember noticing some latency degradation before when first testing the performance of multiple prefetchers. The latency degradation wasn't there with OnCPU, which is why i started using it with Prefetch in the first place. In other words:

  • OnCPU() only: No problems.
  • OnCPU() & Prefetch(): This bug.
  • 1x Prefetch(): Increased latency.
  • 2x Prefetch(): This bug & 2x Increased latency

I remember avs document mentioned Prefetch() should be placed at the end. Wouldn't that OnCPU invalidate, or worse conflict, with that?

I'm fairly sure the documentation predates OnCPU and likely Prefetch(threads , frames) as well.
In the unsimplified script i also had another Prefetch() at the end.

In any case, I think this is a completely separate issue. You should create separate ticket with detailed information (use the template please).

Done.

@CrendKing
Copy link
Owner

Try AviSynthFilter.zip, should no longer hang.

Here is the process of reaching the fix for documentation purpose.

First, I tested OnCPU a bit. If I only have OnCPU, all computation happens on just one thread, regardless of what number I gave it. Prefetch is the one that activates multi-thread.

I tried the combination on a 4K video + SVP, which pressures CPU to 50%. I can't tell any difference with and without OnCPU.

Then I went to try ffdshow. It does not hang. And the reason is ffdshow does not reload script when seeking, which is why you see those ghost frames after seek. I can comment out two lines in our code to achieve the same, but obviously that's wrong decision.

Then I went to investigate the call stack at hang. It looks like because the new OnCPU is placed BELOW Prefetch(), when reloading the script, AVS tries to release the previous prefetcher variable, create new prefetcher and assign to it. Unfortunately, since prefetcher is constantly trying to get new frame, but there is no frame available during flush, we block it forever, which consequently block the flush itself. Basically a dead lock.

I first tried to prove the theory by hacking a version that destroy and recreate avs environment during seek, and as expected it no longer hangs.

Finally I changed the flush logic to fix this: 1) break flush into two steps; 2) cache drain frame during reload instead of during flush (the latter would cause another dead lock). The change is in 16ee618

@CrendKing
Copy link
Owner

Ideally there should be no need for having "drain frame". AviSynth+ should provide client API to individually flush their internal cache and stop prefetecher. Then we could simply stop the prefetcher during seek. Unfortunately there is no response for that request AviSynth/AviSynthPlus#180

@Nuihc88
Copy link
Author

Nuihc88 commented Nov 2, 2020

Try AviSynthFilter.zip, should no longer hang.

Seeking is now much faster and smoother than it has ever been. However, after adding another prefetcher at the end i can still eventually cause a hang by holding the seek button down for several seconds.

AvsFilterSource()
Prefetch(1,4)
#OnCPU(8)
Prefetch(1,16)

I can even comment out OnCPU() as above and seek backwards and forwards in quick succession for the same result.

First, I tested OnCPU a bit. If I only have OnCPU, all computation happens on just one thread, regardless of what number I gave it. Prefetch is the one that activates multi-thread.

Yes, that is the expected behavior since as the (Google-translated) documentation states:

OnCPU / OnCUDA (collectively called OnDevice) specifications
If all are valid, the chain will be as follows.

Upstream → Upstream cache → Thread → Transfer → Downstream cache → Downstream

→ is the flow of frame data (reverse of GetFrame call direction)

Number of prefetch frames

0: Synchronous call without all cache
1: Synchronous call, but only transfer is read ahead and executed asynchronously. Downstream cache is enabled.
2 or more: Pre-read upstream processing using threads. Both upstream and downstream caches are valid.
The number of upstream threads is fixed at 1 thread when prefetch = 2 or more, and the number of prefetches is fixed at 2. The downstream look-ahead number is set to the specified prefetch sheet.

In other words, if i'm interpreting this right, the intended usage is basically: SourceFilter -> Prefetch() -> Filtering -> OnCPU() -> Prefetch() -> return last.
My own tests would suggest that this is indeed generally optimal.

I tried the combination on a 4K video + SVP, which pressures CPU to 50%. I can't tell any difference with and without OnCPU.

That's unsurprising, given i generally only start noticing a difference at over 85% CPU usage with fluctuating loads.

PS. Now also seem to be getting a hanging issue with high CPU usage when switching subtitle-tracks with MPC-HC or PotPlayer.
Also, i'm not sure if i misconfigured something, but the AviSynth Filter isn't loading on MPC-BE now, i'll look into it more tomorrow.

@CrendKing
Copy link
Owner

I can even comment out OnCPU() as above and seek backwards and forwards in quick succession for the same result.

This works fine for me in AviSynth+ 3.6. In 3.5.1 it doesn't work due to single prefetcher limitation. What's the benefit of having multiple prefetchers instead of just increasing the thread count?

@Nuihc88
Copy link
Author

Nuihc88 commented Nov 3, 2020

This works fine for me in AviSynth+ 3.6.

This i only tested with AviSynth+ v3.6.2 (r3325) & MPC-HC (1.9.8) 32bit & 64bit. It generally took about 3 seconds of non-stop back and forth seeking to trigger. I'm not sure how often that hang would be encountered in normal usage, but i'm guessing it requires other prerequisite(s) such as a video containing at least one subtitle-track; i tried one without a subtitle and that worked fine.

What's the benefit of having multiple prefetchers instead of just increasing the thread count?

I find that with Neo's Prefetch, one thread and more frames usually performs better. Having one at upstream and another at downstream can mitigate delays caused by bottle-necking from multiple filters.

Optimized real world example could even end up looking something like this:

#OnCPU(4) pipeline is applied to every function up to here.
AvsFilterSource()
Prefetch(1,4)
RemoveGrain(mode=27)
Prefetch(1,4)
Dup1(maxcopies=3,threshold=1.618,chroma=false,blksize=32)
Prefetch(1,4)
FDecimate2(rate=19.98,threshold=1.618,chroma=false)
OnCPU(4)
#OnCPU(8) pipeline is applied to every function up to here
Prefetch(1,8)
super=SVSuper(last, super_params)
vectors=SVAnalyse(super, analyse_params, src=last)
SVSmoothFps(last, super, vectors, smoothfps_params, mt=16)
OnCPU(8)
Prefetch(1,16) #This one is operating separately from OnCPU to always be ready to supply frames downstream.

And yes, that's still usage as intended, according to what little documentation exists.

PS. This build appears to be working correctly, but i'll test it for a few days and let you know if i encounter any new problems.

@CrendKing
Copy link
Owner

I don't think it is good idea to ask users to spam Prefetch() in script if I'm the author of AviSynth+. I would design it a way so that the single Prefetch() at the end determines the maximum amount of resources the engine can use, and use all available in between each instructions in script automatically. It shouldn't require user to give it so much heuristics to do good job.

@Nuihc88
Copy link
Author

Nuihc88 commented Nov 4, 2020

I don't think it is good idea to ask users to spam Prefetch() in script if I'm the author of AviSynth+. I would design it a way so that the single Prefetch() at the end determines the maximum amount of resources the engine can use, and use all available in between each instructions in script automatically. It shouldn't require user to give it so much heuristics to do good job.

Given that each filter can have it's own optimal setting for both Threads & Frames, it may not always be as simple as setting one Prefetch at the end, applying it for all filters above it and then forgetting about it. I have however personally observed a pattern that filters running in mtmodes 1 & 3 generally work well with just one thread and one frame per core; with filters requiring mtmode 2 or running on GPU things seem to become more complicated. For the final downstream cache at the end, a value equal to Renderer's frame-buffer/queue seems to be the optimal value.

(Google-Translated) Documentation:

MT improvement
In the original Plus, you could only use one Prefetch, but in Neo you can use as many as you like. Also, an argument has been added to specify the number of frames to prefetch.

Prefetch (clip, int 'threads', int 'frames')
Clip
Clips to parallelize
Int threads = (number of logical cores in the system) +1
Number of threads. If it is 0, it will pass through without doing anything.
Int frames = threads * 2
The number of frames to prefetch. If it is 0, it will pass through without doing anything.
Example: Pipeline parallelization

Filtering A
Prefetch (1,4)
Filtering B
Prefetch (1,4)
Filtering C
Prefetch (1,4)

Prefetch (1,4) causes 1 thread to stand and read 4 frames ahead. In the above example, the filtering processes A, B, and C are executed in parallel in a pipeline. Since the number of threads of each Prefetch is arbitrary, for example, filter processing B is heavy, so if you want to increase the number of parallels by that amount, you can increase the number of threads as follows.

Filtering A
Prefetch (1,4)
Filtering B
Prefetch (4)
Filtering C
Prefetch (1,4)

All that said, i think most of the time users will just end up copy-pasting Prefetch(1,4) or Prefetch(4) anywhere they detect a bottleneck, regardless of whether that's anywhere close to optimal for their script.

@CrendKing
Copy link
Owner

CrendKing commented Nov 4, 2020

i think most of the time users will just end up copy-pasting

I think most of the time users will just end up copy-pasting other people's code without understanding why, or just use tools like SVP. You are probably the 0.1% people who would ever put more than one Prefetch() in the script, consciously.

Seriously, for most users, if they experience performance problem, they will just upgrade their 10 year old CPU and get immediate result, rather than scratching head to figure out how to optimally place prefetchers to squeeze out maybe 5% performance.

@Nuihc88
Copy link
Author

Nuihc88 commented Nov 4, 2020

I think most of the time users will just end up copy-pasting other people's code without understanding why, or just use tools like SVP. You are probably the 0.1% people who would ever put more than one Prefetch() in the script, consciously.

I think that using AviSynth is more of a power-user thing to do in general. Searching for other people's code-snippets and figuring out where to paste them is in itself a fairly involved process. People with previous knowledge of AviSynth, use AviSynth; People with previous knowledge of Prefetch, use Prefetch, etc.

Seriously, for most users, if they experience performance problem, they will just upgrade their 10 year old CPU and get immediate result, rather than scratching head to figure out how to optimally place prefetchers to squeeze out maybe 5% performance.

I think proper cache optimization can produce up to ~25% performance improvement at times. Very often the issue isn't raw computing power, but some old or poorly optimized filter effectively bottlenecking the CPU to ~60%. Upgrading hardware is an easy solution, but it doesn't always work as well as people hope it would, since old software is still going to run like old software and newer alternatives don't always exist or they require more computing power rather than less.

@Nuihc88
Copy link
Author

Nuihc88 commented Nov 4, 2020

I've noticed a few minor problems with switching tracks, but can't reliably reproduce most of them, however one exception is a semi-hanged state with >90% CPU usage after a subtitle or audio -track switch with PotPlayer; after seeking once the video starts playing back normally, but do anything else and it hangs completely. I copied the log at hang and then after seeking:
AviSynthFilter_PP32_CopyAtHang.txt & AviSynthFilter_Log_PP_HangThenSeek.txt
AviSynthFilter_Log_PP64_CopyAtHang.txt & AviSynthFilter_Log_PP64_CopyThenSeek.txt

PS. Using the splitter from LAV Filters seems to be a prerequisite for triggering a semi-hanged state by switching subtitles, but semi-hanging by switching audio tracks happens with other splitters as well.
PPS. Here's a snapshot of a difficult to reproduce output glitch on video-stream switch with MPC-HC 64bit, the process also silently hangs on the background when the windows is closed...
https://user-images.githubusercontent.com/58824736/98207619-9e5c0500-1f44-11eb-9d7b-a79148a3368b.png
AviSynthFilter_Log_MPC-HC64_VideoStreamSwitch.txt
PPPS. The new build over here seems much better. I'll test it for a few days...

@CrendKing
Copy link
Owner

CrendKing commented Nov 7, 2020

Yes, that fix should affect all use cases where a stop-and-play happens. If it does fix for you and chainikdn, I'll release a version for you guys.

BTW, how did you notice an update on a non-participated closed issue? Were you subscribing to all tickets, or ... wait, are you ... actually the second account of chainikdn himself 😉 I'm kidding.

@Nuihc88
Copy link
Author

Nuihc88 commented Nov 7, 2020

Yes, that fix should affect all use cases where a stop-and-play happens. If it does fix for you and chainikdn, I'll release a version for you guys.

The video-stream switch output glitch seems to be still happening with MPC-HC. I'm also getting a frame freeze on PotPlayer under the same conditions, Seems to require pretty specific circumstances to trigger; as multiple video streams isn't common, for me it's only happening with a test file i hastily put together with mkvtoolnix for debugging purposes. This specific issue is unlikely to be encountered in real use, but since there could be similar issues arising from the same source, i've uploaded a test clip here.

I'm guessing the steps to reproduce go something like this:

  1. Load Video.
  2. Switch Video-track (for still-frame).
  3. Seek Once (for glitched output).
  4. ...Go back to 2. & Repeat...

After that the video might stick to glitched, still-frame or normal state and you might need to rename the file to (re-)trigger.

BTW, how did you notice an update on a non-participated closed issue? Were you subscribing to all tickets, or ... wait, are you ... actually the second account of chainikdn himself wink I'm kidding.

I've clicked the 'Watch'-button at the top of this project and configured GitHub to make email notifications for pretty much everything.

@CrendKing
Copy link
Owner

CrendKing commented Nov 7, 2020

Thanks for providing test clip. I have never seen a file with multiple video tracks.

I repeated the steps using MPC-HC (which uses LAV) but I can't get green screen while getting SVP FRC effect. The only difference between the two video tracks are the resolution and the watermark at the black border.

I've clicked the 'Watch'-button at the top of this project and configured GitHub to make email notifications for pretty much everything.

But why? I never expect anyone to watch this repo for non-release stuff (except myself ofc) 😄

BTW, if you use AviSynth 3.5.*, never use more than 1 output thread. Otherwise it is not compatible with 3.5's one-prefetcher limitation.

@Nuihc88
Copy link
Author

Nuihc88 commented Nov 8, 2020

But why? I never expect anyone to watch this repo for non-release stuff (except myself ofc)

I'm planning on using AviSynth Filter as a component in a few future projects (if i decide to do them), as well as recommending it for running my realtime optimized SVPflow Templates. Thus i want to know about any new features, issues, bugs, fixes, technical details, etc. as soon as possible. Also, there are still a few minor problems i would like to figure out, including the one concerning some input-colorspace selections causing crashes and glitches on some systems including mine.

BTW, if you use AviSynth 3.5.*, never use more than 1 output thread. Otherwise it is not compatible with 3.5's one-prefetcher limitation.

I generally don't use AviSynth+ builds prior to v3.5.2 (r3218) as most of my scripts aren't compatible with them; while troubleshooting i usually use the latest test build.

I repeated the steps using MPC-HC (which uses LAV) but I can't get green screen while getting SVP FRC effect. The only difference between the two video tracks are the resolution and the watermark at the black border.

I tested with MPC-HC (v1.9.8) as as well as MPC-HC Portable (v1.9.7) with default settings, using MadVR & EVR (C/P), with this one line script: 'AvsFilterSource()', even 1 Output Thread and then... I think i figured it out:
MPC-HC and even it's Portable version by default unnecessarily load all filters registered to the host system, thus adding an unnecessary 'ffdshow Video Decoder' to the Filter Graph (on systems that have it installed), after video has already been processed using the internal LAV Video Decoder. That second decoder isn't picking up the stream switch, which causes the persistent video output glitch. After disabling ffdshow for raw video, the issue goes away except for a few grayish frames immediately after stream switch, which i guess is Key-frame related. That just left the PotPlayer frame freeze issue, which after closer inspection turned out to be an issue with the internal source splitter.

Looks like all bugs connected to the original issue have now been solved, so i'm closing the issue.

@Nuihc88 Nuihc88 closed this as completed Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants