Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Update Transcript Parsing to support Valid WebVTT formats #440

Closed
3 tasks
Dananji opened this issue Feb 29, 2024 · 5 comments
Closed
3 tasks

[BUG] Update Transcript Parsing to support Valid WebVTT formats #440

Dananji opened this issue Feb 29, 2024 · 5 comments
Assignees
Labels
bug 🐛 Something isn't working transcripts Transcript component related

Comments

@Dananji
Copy link
Collaborator

Dananji commented Feb 29, 2024

Description

Transcript component cannot parse WebVTT files with additional information in the header. These files are properly interpreted as captions while the transcript component freezes when trying to display them.

Example of a valid vtt file from W3C that doesn't work in Transcript component:

WEBVTT

REGION
id:fred
width:40%
lines:3
regionanchor:0%,100%
viewportanchor:10%,90%
scroll:up

REGION
id:bill
width:40%
lines:3
regionanchor:100%,100%
viewportanchor:90%,90%
scroll:up

00:00:00.000 --> 00:00:20.000 region:fred align:left
<v Fred>Hi, my name is Fred

00:00:02.500 --> 00:00:22.500 region:bill align:right
<v Bill>Hi, I’m Bill

00:00:05.000 --> 00:00:25.000 region:fred align:left
<v Fred>Would you like to get a coffee?

NOTE: When parsing WebVTT does the parser allow arbitrary text before timed text? Is there a way to identify this?

Done Looks Like

  • Transcript component parses valid WebVTT files
  • Only timed text blocks are displayed to end user
  • Region and styling blocks are ignored; not used to style text

Related resources

@Dananji Dananji added bug 🐛 Something isn't working transcripts Transcript component related labels Feb 29, 2024
@Dananji Dananji self-assigned this Apr 5, 2024
@elynema
Copy link

elynema commented Apr 8, 2024

Dananji not trying to validate or parse what is within the block, just identifying them and skipping them.

Styling should not be displayed by end user; supposed to be read by parser and used to display text. Region is used for caption display only, so will be ignored for transcript text.

@elynema
Copy link

elynema commented Apr 8, 2024

@joncameron Dananji's suggestion for this first pass was that styling and region info be ignored in the transcript context. Region info at least is intended for caption display, and does not pertain to transcript display. That sound ok?

@elynema
Copy link

elynema commented Apr 9, 2024

Notes at top of transcript or with the transcript will be shown as plain text, but they do not have any timing component so span the time and text columns.

@Dananji Dananji mentioned this issue Apr 9, 2024
@Dananji
Copy link
Collaborator Author

Dananji commented Apr 26, 2024

This can be tested on Ramp demo site.

@joncameron
Copy link
Contributor

Works great; I created an issue for a small rendering bug I saw while testing: #500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working transcripts Transcript component related
Projects
None yet
Development

No branches or pull requests

3 participants