You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note end_time in the above output from a PIttsburgh, PA captions processing. Turns out that event does have a caption file in the Legistar data structure for the event but it is in fact empty.
We want to filter these out just attempt speech-to-text.
Use Case
Allow for proper transcript generation in generate_transcript() by appropriately filtering out empty caption files.
Solution
Compare the lengths of the video and the caption file. If they differ by more than some threshold, e.g. 20%, throw away the caption file.
This means the scraper can no longer just hand off the caption file URL as-is.
Alternatives
Throw away caption files with file size less than some threshold, e.g. 100 bytes.
Throw away caption files less than ~1 minute.
The text was updated successfully, but these errors were encountered:
Feature Description
Note
end_time
in the above output from a PIttsburgh, PA captions processing. Turns out that event does have a caption file in the Legistar data structure for the event but it is in fact empty.We want to filter these out just attempt speech-to-text.
Use Case
Allow for proper transcript generation in
generate_transcript()
by appropriately filtering out empty caption files.Solution
Compare the lengths of the video and the caption file. If they differ by more than some threshold, e.g. 20%, throw away the caption file.
This means the scraper can no longer just hand off the caption file URL as-is.
Alternatives
The text was updated successfully, but these errors were encountered: