New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README with comparison vs. other scene detect tools #7

Open
tonycpsu opened this Issue Jan 16, 2016 · 17 comments

Comments

Projects
None yet
4 participants
@tonycpsu

tonycpsu commented Jan 16, 2016

Hi, this looks like a neat project, and a few quick tests with some videos showed promising results. Unfortunately, it's much slower than other scene detection tools I've experimented with, so unless it gets faster or the detection capability is much better, I'm not sure it will meet my needs.

The other tools I've used for content scene detection are ffprobe:

ffprobe -show_frames -of compact=p=0 -f lavfi "movie=${FILE},select=gt(scene\,${THRESHOLD})

and x264 (transcoding the video with ffmpeg first to make it faster:

ffmpeg -i ${FILE} -vf scale=320:-1 -sws_flags neighbor -an -pix_fmt yuv420p -f yuv4mpegpipe - 2>/dev/null | x264 - --demuxer y4m --bframes 0 --min-keyint 10 --scenecut ${THRESHOLD} --preset superfast --crf 30 --threads 1 -v --output /dev/null 2>&1 | grep scene | cut -d ' ' -f 6

(ffmpeg also has a blackframe filter, but I haven't used it much and am more interested in detection when there aren't black frames to make things easy.)

I'm sure you're familiar with these approaches, and I'm wondering if you've spent any time comparing them to your tool for quality / speed. I see you have an open issue to reduce the file size to make things faster, but even with a ~2m long 320x180 file, I'm seeing 17 seconds for your tool vs. 8 seconds for x264 vs. less than a second for ffprobe. I've subjectively found x264 to do a better job than ffprobe, so I think parity with it would be compelling enough reason to switch.

So, my actual questions:

  1. Do you think that's an achievable performance target for your tool?
  2. If not, do you think the slowness is coming from OpenCV or Python?
  3. Would you consider adding some text to the README documenting what's better about your tool over these others?

I know you have plans to add some other detection methods,, which sounds great, but right now, I don't see a compelling reason to switch given the speed difference.

Nonetheless, this is a really cool project, and I'll be interested to see how it develops in the future.

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Jan 17, 2016

Thanks for the feedback, @tonycpsu. I just wanted to provide you with a quick update until I can provide a more comprehensive reply. If you haven't tried content mode yet, please feel free to try it out (the one to detect changes between scenes rather than just using thresholds).

Right now I'm working on completing the documentation on Readthedocs - very much incomplete, but well underway (that might answer some of the questions you brought up), as well as creating a GUI with GTK+. Eventually I'll strip most of the Readme file out and direct people directly to the Readthedocs site (I've also planned a special page for comparison with other programs already, although I'm still working on completing most of the documentation content still - I'll be sure to add the tools you've outlined above to that page once I get there).

Performance improvements are definitely on the radar, but right now PySceneDetect is right now focused on completeness over efficiency. The slowness is an equal mix of OpenCV and Python/Numpy, so I'm looking into either using a JIT/acceleration library like Numba, and/or allowing scene detection modules to be written in C/C++ and called via the ctypes module natively. I'm also looking into memoizing certain scene detection heuristics, which would allow subsequent analysis of the video to proceed significantly faster (although this depends on the detection method and parameters used, and wouldn't save any time if you only need to process videos one time).

With the GUI version, I'm also planning on integrating the ability to cut videos with either ffmpeg, libav, or mkvmerge right from PySceneDetect. I'm trying to ensure I can keep both a CLI-only and a GUI version, although the command-line arguments list is growing a lot larger than I'd currently like (ideally, the GUI would display the equivalent command line arguments so you could use it to find the best detection mode/parameters). Your feedback on this is most welcome.


What are you using PySceneDetect for, out of curiosity? (Are you only focused on the actual detection aspect, or interested in cutting the video itself as well?) Also, what detection mode are you using for the times you cited (threshold, the default, or content)? (in general, content mode is much slower, and threshold mode still has room for optimization - if you can, feel free to attach/share the sample video you were using for testing)

If I get a better idea of how you're using PySceneDetect, perhaps I can better focus my development efforts (and profile the related detection methods/function calls and hopefully achieve a more reasonable runtime). Would you use a GUI version, or do you prefer the CLI version? Thanks again for your comments, questions, and feedback, looking forward to your reply.

@tonycpsu

This comment has been minimized.

tonycpsu commented Jan 17, 2016

Thanks for the thorough reply.

What are you using PySceneDetect for, out of curiosity?

I've been putting together what sounds suspiciously like the GUI video editing thing you say you're planning, except it's a TUI using urwid. It started out as just a bunch of shell scripts with sort of a REPL-style text-based menu that would shell out to mplayer2 to preview the clips then use mkvmerge to join them, then a few weeks ago I migrated it to Python, put a (somewhat) proper UI on it and switched the video player to a long-running mpv process via python-mpv, and switched to ffmpeg to do the stitching.

It's nowhere near releasable right now, but it can currently load files, split them, and let you select which scenes to include in the output, then save them. It also has a janky but functional method of splitting scenes manually for cases where the scene detection doesn't do the right thing, which is where I thought something like PySceneDetect might come into play. Even if it's not the fastest method, I could use the others for a coarse first pass and then maybe try to use your tool to find transitions the others missed. I haven't had a chance to test that approach yet, though.

I'm mainly interested in the content method, since the blackframe filter in ffmpeg can handle the job pretty well when there are obvious cuts. The dissolve method you mention in the source code sounds really interesting, and I guess the holy grail would be some hybrid method that tries to find all of those different kinds of transitions at the same time, but that's probably a heavy lift when you're doing the work in Python, even with numpy speeding things up.

Anyway, that's what I'm doing. Once I get the code to some kind of stable state, I'll let you know and you can maybe use it to guide your development of a client API and maybe your GUI if you decide to do one.

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Jan 17, 2016

@tonycpsu thank you for the detailed response! Indeed, I'm planning on developing all the scene detection methods to have a standardized interface, so it should be easy to integrate them with other Python programs, or even as plugins for other applications. The GUI I'm planning is more just for newer users to visually see how PySceneDetect works, and provide an easy interface for selecting the appropriate detection method & parameters.

There will always be a Python API for integration with other applications, so I may look into multiple methods for improving performance (Cython, multi-threading, or GPGPU optimized perhaps), although want to keep compatibility as high as possible. For now I'll continue to strive and improve PySceneDetect's performance in pure Python, but will also be investigating what kind of performance gains can be had compiling the detection methods in C and calling them via ctypes.

In theory, I can combine the content and dissolve detection methods. What I was going to do for the dissolve method was similar to content detection, but instead of detecting every adjacent frame, only check frames before/after a defined crossfade length.

@tonycpsu

This comment has been minimized.

tonycpsu commented Jan 17, 2016

OK, re: the client API, the things that would help me as a potential user would be:

  1. detect_scenes currently expects a cv2 object as a parameter -- ideally this would be a filename instead, with the API managing the cv2 handle, so that the client doesn't have to import cv2.
  2. The ability to do scene detection on a segment of the input file (not the whole thing) by passing in a start position and a duration. The detector would simply skip over frames until it gets to the start time, then stop detection when the duration has elapsed.
  3. I don't know if cv2 supports reading from streams, but if it does, the ability to read from stdin rather than a file would be nice.
@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Jan 17, 2016

@tonycpsu Heh, you've highlighted right there the exact issue with using OpenCV for this: you can only open a video file, or a capture device (by integer index, starting from 0). Seeking doesn't work very well (if at all) with the underlying VideoCapture implementation. I've been looking into MoviePy or the ffmpeg API itself, but want to avoid having go to that route if possible.

It's funny you brought up this issue, as this is the last "big" issue in regards to the big picture that I'm working on. One possibility is implementing seeking manually, by just reading through each frame and discarding it until the target frame is reached (just like you describe), although currently, it's probably easier to do a rough cut with mkvmerge or ffmpeg before feeding a file to PySceneDetect.

Another option I'm considering is supporting more than just OpenCV for video I/O, and supporting other video processing libraries (in addition to OpenCV, not instead of), although this greatly complicates things. Hopefully I can hack something together similar to how MoviePy uses ffmpeg I/O streams (I think this is similar to what you're talking about) to implement seeking efficiently.

  1. The ability to do scene detection on a segment of the input file (not the whole thing) by passing in a start position and a duration. The detector would simply skip over frames until it gets to the start time, then stop detection when the duration has elapsed.

For now I'll implement this exactly as you described (as I've had issues seeking with OpenCV), and make it a priority issue as I can see this being something a lot of people might want. You can expect some command-line flags to be included for this in the next release of PySceneDetect (v0.3.1-beta). I'll also try to have some performance improvements completed by then (frame skipping and resolution reduction would both affect content mode, although these options might not be ideal in all cases so I'll still be looking into making a more efficient detector; as an example, if you halved the resolution of the video, and only processed every other frame, in theory that could be 8x faster, bringing the runtime for your test-file to just over 2 seconds, from 17s).

I can also add a few more API functions to cover the first item you brought up, that shouldn't be a problem. The API is very much in a "rough" state right now (you can probably tell, everything still being in a single file), so you can be sure to see a lot of additions on that front, and feel free to share anything else you might find useful. Finally, as for reading from streams, unfortunately OpenCV can't do this from either Python or C++ (it's just a basic wrapper around ffmpeg). I can see this being a very useful feature though, and I will definitely be looking into how this can be implemented in the future (either hacking OpenCV, making a similar wrapper around ffmpeg, or taking an approach like MoviePy does). I'll have to make an entry in the issue tracker for it, or feel free if you'd like.

Thanks again for your feedback, very insightful!

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Jan 23, 2016

Hey @tonycpsu, just a quick update. The latest release of PySceneDetect (v0.3.1) includes downscaling (-df / --downscale_factor), which will significantly improve performance. If you don't mind, can you try your previous benchmark video, but add -df 2 or -df 3 and see what the runtime is? (I'm getting a 3-4x performance gain with -df 2).

You should notice a significant performance boost, although I understand this isn't acceptable in all situations (raising -df too high will cause problems as the image shrinks too small). For the next version, I'm also working on a multi-core implementation, which should also achieve a nice performance boost. The implementation will be similar to pipelining, although this will require making the current scene detector classes thread-safe.

I've also implemented a frame skipping option (-fs), but I wouldn't use this unless the source video has a high frame rate (e.g. > 60 FPS).

@tonycpsu

This comment has been minimized.

tonycpsu commented Jan 23, 2016

-df 3 brings the detection time for a ~2 minute 720p 24 fps file from 2:39 to 1:30. Better, but still nowhere near the 8 second time for x264.

For what it's worth, If I downscale the input file first with ffmpeg/x264 and then run scenedetect without -df, it only takes 25 seconds.

This probably isn't going to change performance much if at all, but I notice the tool is writing JPG files of the scene cuts. Could this be made optional?

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Jan 23, 2016

@tonycpsu my apologies, the thumbnail generation code was supposed to be released in a future version with the proper flags (I'll update the release now and remove that for the time being). Thanks for the benchmark numbers, hopefully when I update PySceneDetect to make use of multicore processors the runtime will get more acceptable.

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Jan 24, 2016

@tonycpsu quick question, what OS are you running?

Interestingly enough, I'm getting much better performance under Linux versus Windows (even if I disable thumbnail generation entirely), so I'm still trying to figure out what's causing the discrepancy.

@tonycpsu

This comment has been minimized.

tonycpsu commented Jan 24, 2016

I'm on OSX. Haven't tried it on any of my Linux machines yet.

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Jan 26, 2016

Hey @tonycpsu just to give you a quick update, as of v0.3.2 you can now set the start time (-st) and end time (-et) or duration (-dt) so you no longer have to process the entire video. Timecodes can be specified in absolute number of frames (e.g. 123), seconds (number followed by an s, e.g. 123s, or 123.4s), or standard format HH:MM:SS[.nnn]. You can see some examples here.

@BaddMann

This comment has been minimized.

BaddMann commented Mar 4, 2016

I'm currently compiling opencv in a docker container... long process and I'm still learning it, opencv and python.
Anyway once I'm done I'm hoping to eventually use your code to get Timecodes for slide transitions on a video of a powerpoint presentation. I'm guessing this is outside the scope of why you made this project.
Is this doable, being that the Slides have very little difference between each other and only the text really changes?
Any steps to optimize for powerpoint?

Thanks in Advance

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Mar 5, 2016

Hello @BaddMann;

This application is certainly within the scope of PySceneDetect. For this, I would recommend using content-aware detection mode (-d content). With regards to slideshows in particular, you'll probably need to fiddle with the --threshold value (sensitivity, default 30) as well as the minimum scene length (-m / --min-scene-length, default 15). You can figure these values out manually by generating a statsfile (-s video_stats.csv) when running PySceneDetect.

Do you happen to have an example video slideshow I can look at? Would help me visualize what other parameters to focus on. Lastly, thank you for the feedback (feel free to suggest any other ideas/solutions if you'd like).

@albanie

This comment has been minimized.

albanie commented Mar 29, 2016

Do you think pyscenedetect should work well for shot detection (rather than scene detection) under content aware mode?

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Mar 29, 2016

@albanie each shot already is a "scene" in content-aware mode, as in the following example in the documentation:
http://pyscenedetect.readthedocs.org/en/latest/examples/usage-example/

As you can see from the output images on that page, the Youtube clip gets split on each shot cut (this is what PySceneDetect calls a "scene"), so in this case, a shot and a scene are equivalent terms.

Is this what you meant? If so, do you think it would be worth explicitly stating that content-aware mode detects individual shots as opposed to scenes?


In the future I may add additional detection modes to group individual shots into bundles/"scenes" based on some similarity metric, but right now, PySceneDetect will generate a new scene every time the current shot changes.

@albanie

This comment has been minimized.

albanie commented Mar 30, 2016

@Breakthrough Thanks - I figured that was the meaning but I've noticed that sometimes the two are considered distinct (e.g. in this explanation http://production.4filmmaking.com/cinematography1.html).

In case it's of interest to others, I've benchmarked some shot detection tools (including PySceneDetect) on a few videos: https://github.com/albanie/shot-detection-benchmarks

@Breakthrough

This comment has been minimized.

Owner

Breakthrough commented Apr 1, 2016

@albanie yes certainly, a standardized benchmark or set of metrics for shot detection would be a really helpful test case in itself (if it's alright with you that I follow those benchmarks to help optimize the performance of PySceneDetect over a wider variety of material).

Feel free to share any input on the matter in #13, I plan on updating the discussion in issue #13 periodically with the best set of detectors/parameters I can find for the current (and any possible future) videos you include in your benchmark, since they serve as great real-world test cases themselves. As a suggestion, including a show interweaved with some commercials/advertisements might be another good addition to the test set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment