Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list-scenes "CSV" file isn't really a CSV file #136

Closed
tonycpsu opened this issue Nov 24, 2019 · 10 comments
Closed

list-scenes "CSV" file isn't really a CSV file #136

tonycpsu opened this issue Nov 24, 2019 · 10 comments
Milestone

Comments

@tonycpsu
Copy link
Contributor

tonycpsu commented Nov 24, 2019

The file produced by list-scenes starts with a line containing the cut list, which means it isn't really a valid CSV file. While it's handy to have a concise list of cuts in addition to the full scene data, I think it makes more sense for the default output to just be the CSV, and add an option to instead (or in addition) write a separate file with only the scene cut timestamps.

Maybe something like a -f option for list-scenes that takes a value of scenes, cuts, or both?

@Breakthrough
Copy link
Owner

Hi @tonycpsu;

Are you referencing RFC 4180 here? I agree it's not really a good idea to have different numbers of columns per row in a CSV file.

Thoughts on just outputting two files by default? Otherwise a format argument to specify scenes/cuts works as well. Any comments/suggestions would be most welcome.

Thanks!

@tonycpsu
Copy link
Contributor Author

It's not just that the number of colums in each row that's wrong that seems off, it's that the first line isn't really the CSV header -- the second line is. The first line is just carrying extra information that should ideally be in a separate file. Switching to a more flexible format like YAML or JSON doesn't really make sense, since they're harder to inspect visually, and the keys would be repeated, bloating the size of the file.

I also noticed tonight that the stats file has the same thing going on with the FPS at the top. I'm not going to be a purist here -- it's easy enough for the tools I'm writing to skip over these lines, and in the case of FPS, it's kind of handy that it's there in the stats file. But I figured I'd ask about this while I was in there to see if some other way to go here.

My thinking is there should be 3 files here. The scenes CSV, the stats CSV, and a third file in YAML or JSON (your choice -- I prefer YAML but JSON comes without an extra dependency.) The third file would contain the cuts and the framerate, and could in the future contain anything else that's needed.

This isn't super high on my list right now, but it's something I could throw together if there's interest.

@Breakthrough
Copy link
Owner

Thanks for your comments, I think being a purist about this kind of stuff is somewhat important to address ongoing technical debt - quite frankly I didn't give much thought to automated analysis of the output CSV files when I first implemented those features (I was still doing a lot of manual analysis using spreadsheet software). Having the first row not being actual headers is indeed confusing, and definitely wasn't the best long-term choice.

My only reasoning for including a list of cuts in comma-separated format was that it could be copypasted directly into mkvmerge. This was before I had implemented the split-video command which I guess kind of depreciated the use for the cutting list as an exported file (most users requiring a cutting list get it directly from the Python API now I think).

The approach you suggested is sound and I agree with it (personally would go for JSON just for the all batteries included approach). That being said I'm wondering if it even makes sense to output a list of the cuts detected anymore, as most use cases for them would be automated and likely bypass using a file as an intermediate. Maybe it's better to just keep the scenes and stats CSV files pure (e.g. same # of columns per row, first row is a real header), and only output the cutting list to the terminal.

Thoughts?

(I know this isn't a high priority, but I still consider this technical debt which the project would be better off addressing sooner - say for the v0.6 release - rather than leaving it and possibly forgetting about it).

@Breakthrough Breakthrough removed the bug label Nov 27, 2019
@tonycpsu
Copy link
Contributor Author

Yeah, that'd work for me. If you don't think outputting the cuts or frame rate to a file are important, there's no need to do extra work for them. Just clean up both CSVs and be done with it.

Will send a PR.

@Breakthrough Breakthrough added this to the v0.6 milestone Aug 6, 2020
@Breakthrough Breakthrough modified the milestones: v0.6, v0.5.5 Jan 10, 2021
@Breakthrough Breakthrough linked a pull request Jan 10, 2021 that will close this issue
Breakthrough added a commit that referenced this issue Jan 10, 2021
@Breakthrough Breakthrough linked a pull request Jan 10, 2021 that will close this issue
@Breakthrough
Copy link
Owner

Updated PR with a new one, but still need to add backwards compatibility for existing statsfiles.

@Breakthrough
Copy link
Owner

Closed as per #196, will include in v0.5.5 release.

@Breakthrough
Copy link
Owner

Resolved in v0.5.5 branch, will be included in next release of PySceneDetect. Thanks @tonycpsu!

@Breakthrough Breakthrough modified the milestones: v0.5.5, v0.6 Jan 10, 2021
@Breakthrough Breakthrough reopened this Jan 10, 2021
@Breakthrough
Copy link
Owner

I realize I only did this for statsfiles. For list-scenes, I should probably add a command line flag that allows to trigger the first row being not written (rather than just removing it entirely).

@Breakthrough
Copy link
Owner

Hey @tonycpsu;

I've added a new option to the list-scenes command called -s / --skip-cuts which will omit the first row when writing the CSV file to disk. The default behaviour is still to include the extra cutting list as the first row, but if you specify this flag, it will be omitted. The cutting list can still be saved from the last line of the terminal output, so I figured it wasn't worth dealing with writing it to another file.

Hopefully this will work out for most people, but feel free to share any additional thoughts/feedback. Currently committed to v0.5.5 branch, will be included in the next release.

@tonycpsu
Copy link
Contributor Author

Sounds great. Looking forward to getting back to playing with this project soon. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants