You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note end_time in the above output from a PIttsburgh, PA captions processing. Turns out that event does have a caption file in the Legistar data structure for the event but it is in fact empty.
We want to filter these out and leave Session.caption_uri = None.
Use Case
Better to avoid wasting resources to process these invalid caption files.
Solution
Compare the lengths of the video and the caption file. If they differ by more than some threshold, e.g. 20%, throw away the caption file.
This means the scraper can no longer just hand off the caption file URL as-is.
Alternatives
Throw away caption files with file size less than some threshold, e.g. 100 bytes.
Throw away caption files less than ~1 minute.
The text was updated successfully, but these errors were encountered:
The more I think about this, I prefer not comparing against the associated video. Mostly, I don't want to have to handle all the different video formats. So I prefer just to do some very simple stupid validation on the caption file, alone.
Feature Description
Note
end_time
in the above output from a PIttsburgh, PA captions processing. Turns out that event does have a caption file in the Legistar data structure for the event but it is in fact empty.We want to filter these out and leave
Session.caption_uri = None
.Use Case
Better to avoid wasting resources to process these invalid caption files.
Solution
Compare the lengths of the video and the caption file. If they differ by more than some threshold, e.g. 20%, throw away the caption file.
This means the scraper can no longer just hand off the caption file URL as-is.
Alternatives
The text was updated successfully, but these errors were encountered: