-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list-scenes "CSV" file isn't really a CSV file #136
Comments
Hi @tonycpsu; Are you referencing RFC 4180 here? I agree it's not really a good idea to have different numbers of columns per row in a CSV file. Thoughts on just outputting two files by default? Otherwise a format argument to specify scenes/cuts works as well. Any comments/suggestions would be most welcome. Thanks! |
It's not just that the number of colums in each row that's wrong that seems off, it's that the first line isn't really the CSV header -- the second line is. The first line is just carrying extra information that should ideally be in a separate file. Switching to a more flexible format like YAML or JSON doesn't really make sense, since they're harder to inspect visually, and the keys would be repeated, bloating the size of the file. I also noticed tonight that the stats file has the same thing going on with the FPS at the top. I'm not going to be a purist here -- it's easy enough for the tools I'm writing to skip over these lines, and in the case of FPS, it's kind of handy that it's there in the stats file. But I figured I'd ask about this while I was in there to see if some other way to go here. My thinking is there should be 3 files here. The scenes CSV, the stats CSV, and a third file in YAML or JSON (your choice -- I prefer YAML but JSON comes without an extra dependency.) The third file would contain the cuts and the framerate, and could in the future contain anything else that's needed. This isn't super high on my list right now, but it's something I could throw together if there's interest. |
Thanks for your comments, I think being a purist about this kind of stuff is somewhat important to address ongoing technical debt - quite frankly I didn't give much thought to automated analysis of the output CSV files when I first implemented those features (I was still doing a lot of manual analysis using spreadsheet software). Having the first row not being actual headers is indeed confusing, and definitely wasn't the best long-term choice. My only reasoning for including a list of cuts in comma-separated format was that it could be copypasted directly into mkvmerge. This was before I had implemented the split-video command which I guess kind of depreciated the use for the cutting list as an exported file (most users requiring a cutting list get it directly from the Python API now I think). The approach you suggested is sound and I agree with it (personally would go for JSON just for the all batteries included approach). That being said I'm wondering if it even makes sense to output a list of the cuts detected anymore, as most use cases for them would be automated and likely bypass using a file as an intermediate. Maybe it's better to just keep the scenes and stats CSV files pure (e.g. same # of columns per row, first row is a real header), and only output the cutting list to the terminal. Thoughts? (I know this isn't a high priority, but I still consider this technical debt which the project would be better off addressing sooner - say for the v0.6 release - rather than leaving it and possibly forgetting about it). |
Yeah, that'd work for me. If you don't think outputting the cuts or frame rate to a file are important, there's no need to do extra work for them. Just clean up both CSVs and be done with it. Will send a PR. |
Updated PR with a new one, but still need to add backwards compatibility for existing statsfiles. |
Closed as per #196, will include in v0.5.5 release. |
Resolved in v0.5.5 branch, will be included in next release of PySceneDetect. Thanks @tonycpsu! |
I realize I only did this for statsfiles. For list-scenes, I should probably add a command line flag that allows to trigger the first row being not written (rather than just removing it entirely). |
Hey @tonycpsu; I've added a new option to the Hopefully this will work out for most people, but feel free to share any additional thoughts/feedback. Currently committed to v0.5.5 branch, will be included in the next release. |
Sounds great. Looking forward to getting back to playing with this project soon. Thanks for the fix! |
The file produced by
list-scenes
starts with a line containing the cut list, which means it isn't really a valid CSV file. While it's handy to have a concise list of cuts in addition to the full scene data, I think it makes more sense for the default output to just be the CSV, and add an option to instead (or in addition) write a separate file with only the scene cut timestamps.Maybe something like a
-f
option forlist-scenes
that takes a value ofscenes
,cuts
, orboth
?The text was updated successfully, but these errors were encountered: