Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebvttParser creates duplicate CuesWithTiming when handling cues sharing same start/end timestamps #1177

Closed
1 task
JunkFood02 opened this issue Mar 12, 2024 · 4 comments
Assignees
Labels

Comments

@JunkFood02
Copy link

Version

Media3 1.3.0

More version details

media3-extractor

Devices that reproduce the issue

Android Studio

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

Not tested

Reproduction steps

Use WebvttParser.parse() to parse the vtt captions below:

private const val vttSample = """WEBVTT
Kind: captions
Language: en

00:00:00.000 --> 00:00:01.712
[MUSIC PLAYING]

00:00:02.533 --> 00:00:03.950
SPEAKER: As an
Android engineer, I

00:00:03.950 --> 00:00:06.350
have seen firsthand how
the ecosystem is growing--
"""
val cuesWithTimingList = buildList {
            WebvttParser().parse(
                vttSample.toByteArray(),
                SubtitleParser.OutputOptions.allCues()
            ) {
                add(it)
            }
        }

Expected result

The cuesWithTimingList contains 3 items or lines

Actual result

The list contains 4 items, see screenshot

image

Media

WEBVTT
Kind: captions
Language: en

00:00:00.000 --> 00:00:01.712
[MUSIC PLAYING]

00:00:02.533 --> 00:00:03.950
SPEAKER: As an
Android engineer, I

00:00:03.950 --> 00:00:06.350
have seen firsthand how
the ecosystem is growing--

Bug Report

@icbaker
Copy link
Collaborator

icbaker commented Mar 13, 2024

The duplication is expected for cues that actually overlap, that's checked in a test case here:

public void parseWithOverlappingTimestamps() throws Exception {

This duplication is currently required due to the way we implement WebVTT's simultaneous cues handling (by creating synthetic CuesWithTiming objects representing the overlapping state) - which is mostly done inside WebvttSubtitle:

// Steps 4 - 10 of https://www.w3.org/TR/webvtt1/#cue-computed-line

If we didn't create these synthetic cues here, we would have lost too much information to correctly implement the layout rules later in the playback pipeline. It's still not perfect (see e.g. google/ExoPlayer#10980), and it would be better to implement this layout logic later, but it's not likely something we're going to change soon.

However, this duplication is not expected for consecutive cues where one cue ends at the same time the next cue starts. I can reproduce that the duplication does occur in this case, I'll look into fixing it.

@JunkFood02
Copy link
Author

Thanks for the quick and detailed response! Edited the title since I didn't intend to work with subtitles with overlapping timestamp

@JunkFood02 JunkFood02 changed the title WebvttParser generates duplicated CuesWithTiming when handling overlapped timestamps WebvttParser creates duplicate CuesWithTiming when handling cues sharing same start/end timestamps Mar 13, 2024
copybara-service bot pushed a commit that referenced this issue Mar 15, 2024
It's a bit arguable whether the `Subtitle` implementation supports
zero-duration events, since `getEventTimeCount` is documented as
effectively "the number of times the cues returns by `getCues(long)`
changes", and zero-duration events violate that. However, the current
`WebvttSubtitle` impl **does** produce zero-duration events, so it
seems safer to handle them gracefully here and then, as a possible
follow-up, fix the `WebvttSubtitle` impl (or remove it completely).

Issue: #1177

#minor-release

PiperOrigin-RevId: 616095798
copybara-service bot pushed a commit to google/ExoPlayer that referenced this issue Mar 15, 2024
It's a bit arguable whether the `Subtitle` implementation supports
zero-duration events, since `getEventTimeCount` is documented as
effectively "the number of times the cues returns by `getCues(long)`
changes", and zero-duration events violate that. However, the current
`WebvttSubtitle` impl **does** produce zero-duration events, so it
seems safer to handle them gracefully here and then, as a possible
follow-up, fix the `WebvttSubtitle` impl (or remove it completely).

Issue: androidx/media#1177

#minor-release

PiperOrigin-RevId: 616095798
@icbaker icbaker closed this as completed Mar 15, 2024
@szaboa
Copy link
Contributor

szaboa commented Apr 2, 2024

@icbaker as I understood this was just a preparation to handle overlapping vtt cues?

If we didn't create these synthetic cues here, we would have lost too much information to correctly implement the layout rules later in the playback pipeline. It's still not perfect (see e.g. google/ExoPlayer#10980), and it would be better to implement this layout logic later, but it's not likely something we're going to change soon.

Do you have suggestions on the "layout logic"?

@icbaker
Copy link
Collaborator

icbaker commented Apr 4, 2024

@szaboa I'm not sure I completely understand the question - do you mind filing a new question issue with a bit more detail? (I'm likely to lose track of this closed one)

SheenaChhabra pushed a commit that referenced this issue Apr 8, 2024
It's a bit arguable whether the `Subtitle` implementation supports
zero-duration events, since `getEventTimeCount` is documented as
effectively "the number of times the cues returns by `getCues(long)`
changes", and zero-duration events violate that. However, the current
`WebvttSubtitle` impl **does** produce zero-duration events, so it
seems safer to handle them gracefully here and then, as a possible
follow-up, fix the `WebvttSubtitle` impl (or remove it completely).

Issue: #1177

PiperOrigin-RevId: 616095798
(cherry picked from commit e9ed874)
l1068 pushed a commit to l1068org/media that referenced this issue Apr 15, 2024
It's a bit arguable whether the `Subtitle` implementation supports
zero-duration events, since `getEventTimeCount` is documented as
effectively "the number of times the cues returns by `getCues(long)`
changes", and zero-duration events violate that. However, the current
`WebvttSubtitle` impl **does** produce zero-duration events, so it
seems safer to handle them gracefully here and then, as a possible
follow-up, fix the `WebvttSubtitle` impl (or remove it completely).

Issue: androidx#1177

PiperOrigin-RevId: 616095798
(cherry picked from commit e9ed874)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants