Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't parse CSV file based on detect-threshold when splitting video. #211

Closed
jeremymeyers opened this issue Mar 6, 2021 · 7 comments
Closed
Labels
Milestone

Comments

@jeremymeyers
Copy link

Trying to figure out why i can't use the csv file generated to splti video without having to re-scan. Are scene csv files different from stats csv files? The documentation is unclear.

Procedure:

scenedetect --input "file.mp4" detect-threshold list-scenes
(this works fine)
scenedetect --input "file.mp4" -s file-stats.csv" detect-threshold split-scenes

error:
[PySceneDetect] PySceneDetect v0.5.5
[PySceneDetect] Loaded 1 video, framerate: 29.97 FPS, resolution: 1920 x 1080
[PySceneDetect] Downscale factor set to 6, effective resolution: 320 x 180
[PySceneDetect] Loading frame metrics from stats file: scenes.csv
[PySceneDetect] Could not load stats file.
Failed to parse stats file:
Could not load frame metrics from stats file - file is corrupt or not a valid PySceneDetect stats file. If the file exists, ensure that it is a valid stats file CSV, otherwise delete it and run PySceneDetect again to re-generate the stats file.
[PySceneDetect] Could not parse CLI options.:
Could not load given stats file, see above output for details.

@Breakthrough
Copy link
Owner

Hey @jeremymeyers;

Indeed, the scene CSVs generated by list-scenes are different than the stats CSV files (-s). The latter contains frame-by-frame metrics for each detector you are using in order to run statistical analysis (e.g. determining optimal threshold), as well as to speed up subsequent runs. Sorry that's unclear, is there a particular section of the documentation that you think is difficult to comprehend? If you're running into this issue, it's likely others are as well, so I'd be happy to have any feedback you can share about the best way to relay that information.

The -s argument is used to both generate and load a statsfile (if the file does not exist, it is created; if it does exist, the existing stats are loaded into memory, updated if required, and written back to disk when PySceneDetect is finished). In the example above, how did you generate file-stats.csv? Would you be able to share the first few lines from the file here? Also, if you could also add -v debug before specifying any other arguments, and provide the output of running that, that would also be very helpful.

The statsfile format was updated in the latest release, but should include backwards compatibility, so that error you're seeing may be an issue. Lastly, note that detect-threshold currently doesn't benefit from the use of a statsfile outside of determining the optimal threshold, due to some implementation details I still need to work out as part of #178 (as it skips a lot of stats calculations if a particular frame doesn't require it). Even though detect-threshold it won't currently run faster with a statsfile, you should definitely be able to load/save an existing statsfile.

Thanks!

@jeremymeyers
Copy link
Author

So my current workflow is based around large files (1-2gb usually) with collections of around 5-6 short-ish mini-movies that end in cuts to black (usually). I am using scenedect to split the one large file into 5-6 individual files. I'm not using it to split out at a more granular level.

How i have been doing it is to do a detect-threshold list-scenes to determine whether scenedetect has accurately identified the number of scenes and therefore will split correctly (ive found that detect-content lands me with a much longer list of shorter clips, as by design). If it has the correct number (plus or minus a few extra blips in between where it found title sequences or whatever) then i will run it again with split-video.

As you can imagine, I would really love a way to not have to scan through the whole movie twice every time, so I though that i might be able to use the scenes.csv that it generates to split it accordingly. I suppose i could feed it into ffmpeg/mkvmerge directly, but it just seemed like there should be a workflow that accounted for this.

In terms of the documentation i think it would be useful to just note that using the -s flag and using stats files generally only applies when detect-content is used.

I will add the file info shortly.

@Breakthrough
Copy link
Owner

Breakthrough commented Mar 7, 2021

Indeed the first row of the file outputted from list-scenes does contain all the information required to split the video without re-running PySceneDetect, so you shouldn't have to run it twice here - however it does require some work as you implied by feeding it right into ffmpeg.

That being said, you are correct, this use case is definitely supposed to work with PySceneDetect. This will be supported properly when #178 is closed. I'll see if I can squeeze that in for the v0.5.6 release, but in the meantime, hopefully the workaround by using the .csv from list-scenes is a sufficient workaround for you. Sorry about that, and thanks for the report - I'll keep this issue open and link it to the other one so they get tracked together.

@jeremymeyers
Copy link
Author

Awesome, thank you Brandon!

@Breakthrough
Copy link
Owner

This should be resolved now in the v0.5.6 branch, will be available in the next official release. I also removed the -p/--min-percent argument, as it wasn't really providing any kind of performance or analysis benefit and has no effect on accuracy, since the threshold can be adjusted accordingly to achieve the same result.

Thanks for the report!

@jeremymeyers
Copy link
Author

What is the syntax for this?

@Breakthrough
Copy link
Owner

As in your original post, this should now just work:

scenedetect --input file.mp4 -s file-stats.csv detect-threshold split-scenes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants