Permalink
Show file tree
Hide file tree
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
AX: Consider VTT-based audio descriptions with text-to-speech.
https://bugs.webkit.org/show_bug.cgi?id=243600 rdar://98206665 Reviewed by Jer Noble. * LayoutTests/media/track/captions-webvtt/captions-descriptions.vtt: Added. * LayoutTests/media/track/track-description-cue-expected.txt: Added. * LayoutTests/media/track/track-description-cue.html: Added. * LayoutTests/TestExpectations: Feature is Cocoa-specific so far, skip test globally. * LayoutTests/platform/ios/TestExpectations: Enable test on iOS. * LayoutTests/platform/mac/TestExpectations: Enable test on macOS. * Source/WTF/Scripts/Preferences/WebPreferencesExperimental.yaml: Add 'AudioDescriptionsEnabled' setting. * Source/WebCore/Modules/speech/SpeechSynthesis.cpp: (WebCore::SpeechSynthesis::handleSpeakingCompleted): Call utterance to fire event. (WebCore::SpeechSynthesis::boundaryEventOccurred): Ditto. (WebCore::SpeechSynthesis::didStartSpeaking): Ditto. (WebCore::SpeechSynthesis::didPauseSpeaking): Ditto. (WebCore::SpeechSynthesis::didResumeSpeaking): Ditto. (WebCore::SpeechSynthesis::fireEvent const): Deleted. (WebCore::SpeechSynthesis::fireErrorEvent const): Deleted. * Source/WebCore/Modules/speech/SpeechSynthesis.h: (WebCore::SpeechSynthesis::userGestureRequiredForSpeechStart const): (WebCore::SpeechSynthesis::removeBehaviorRestriction): Make public. * Source/WebCore/Modules/speech/SpeechSynthesisUtterance.cpp: (WebCore::SpeechSynthesisUtterance::create): Add version that takes a completion handler. (WebCore::SpeechSynthesisUtterance::SpeechSynthesisUtterance): Add completion handler parameter. (WebCore::SpeechSynthesisUtterance::eventOccurred): New. Call completion handler or dispatch event. (WebCore::SpeechSynthesisUtterance::errorEventOccurred): Ditto. * Source/WebCore/Modules/speech/SpeechSynthesisUtterance.h: * Source/WebCore/Modules/speech/SpeechSynthesisUtterance.idl: JSGenerateToJSObject. * Source/WebCore/html/HTMLMediaElement.cpp: (WebCore::HTMLMediaElement::updateActiveTextTrackCues): Return early if a seek is pending. Call new executeCueEnterOrLeaveAction method instead of dispatching events directly. (WebCore::HTMLMediaElement::setSpeechSynthesisState): Maintain synthesis state. (WebCore::HTMLMediaElement::speakCueText): Speak a cue. (WebCore::HTMLMediaElement::pauseSpeakingCueText): (WebCore::HTMLMediaElement::resumeSpeakingCueText): (WebCore::HTMLMediaElement::cancelSpeakingCueText): (WebCore::HTMLMediaElement::shouldSpeakCueTextForTime): (WebCore::HTMLMediaElement::executeCueEnterOrLeaveAction): Trigger cue speech if the track contains descriptions and we are entering a cue range, schedule an enter or exit event. (WebCore::HTMLMediaElement::seekWithTolerance): INFO_LOG -> ALWAYS_LOG (WebCore::HTMLMediaElement::seekTask): Cancel speaking if necessary. (WebCore::HTMLMediaElement::finishSeek): Update logging. If there isn't a pending seek, queue a task to update text track cues. (WebCore::HTMLMediaElement::configureTextTrackGroup): When processing descriptions and the user wants text descriptions, set `fallbackTrack` to the first track seen in case none of the tracks matches the audio track language. (WebCore::HTMLMediaElement::playPlayer): Call resumeSpeakingCueText. (WebCore::HTMLMediaElement::pausePlayer): Call pauseSpeakingCueText. (WebCore::HTMLMediaElement::effectiveVolume const): Use the speech volume multiplier when calculating the effective volume. (WebCore::m_categoryAtMostRecentPlayback): Deleted. * Source/WebCore/html/HTMLMediaElement.h: (WebCore::HTMLMediaElement::cueBeingSpoken const): * Source/WebCore/html/shadow/MediaControlTextTrackContainerElement.cpp: (WebCore::MediaControlTextTrackContainerElement::updateDisplay): Skip spoken tracks. * Source/WebCore/html/track/InbandGenericTextTrack.cpp: Remove unneeded include. * Source/WebCore/html/track/TextTrack.cpp: (WebCore::TextTrack::trackIndex): Use textTrackList() instead of m_textTrackList. (WebCore::TextTrack::isRendered): Consider descriptions. (WebCore::TextTrack::isSpoken): (WebCore::TextTrack::trackIndexRelativeToRenderedTracks): Use textTrackList() instead of m_textTrackList. (WebCore::TextTrack::speechSynthesis): * Source/WebCore/html/track/TextTrack.h: * Source/WebCore/html/track/TextTrackCue.cpp: (WebCore::operator<<): All cues have a `text()` method, just use it. * Source/WebCore/html/track/TextTrackCue.h: (WebCore::TextTrackCue::text const): (WebCore::TextTrackCue::speak): * Source/WebCore/html/track/VTTCue.cpp: (WebCore::VTTCue::updateDisplayTree): `track()` can return null, check it. (WebCore::VTTCue::getDisplayTree): Ditto. (WebCore::VTTCue::toJSON const): Drive-by: address a Darin FIXME. (WebCore::VTTCue::speak): * Source/WebCore/html/track/VTTCue.h: (WebCore::VTTCue::speechUtterance const): (WebCore::VTTCue::text const): Deleted. * Source/WebCore/html/track/VTTCue.idl: * Source/WebCore/page/CaptionUserPreferences.cpp: (WebCore::CaptionUserPreferences::userPrefersTextDescriptions const): Check audioDescriptionsEnabled setting. (WebCore::CaptionUserPreferences::textTrackSelectionScore const): Consider description tracks if the user preference is set. Clean up logic. * Source/WebCore/page/CaptionUserPreferencesMediaAF.cpp: (WebCore::CaptionUserPreferencesMediaAF::userPrefersCaptions const): (WebCore::CaptionUserPreferencesMediaAF::userPrefersTextDescriptions const): check MediaAccessibility framework preference. * Source/WebCore/page/CaptionUserPreferencesMediaAF.h: * Source/WebCore/platform/cf/MediaAccessibilitySoftLink.cpp: * Source/WebCore/platform/cf/MediaAccessibilitySoftLink.h: * Source/WebCore/platform/cocoa/PlatformSpeechSynthesizerCocoa.mm: (-[WebSpeechSynthesisWrapper speakUtterance:]): If `utteranceVoice` is non-NULL and the URI is empty, it is invalid so check the language. * Source/WebCore/platform/graphics/InbandGenericCue.cpp: (WebCore::InbandGenericCue::toJSONString const): Log cue text like VTTCue now does. * Source/WebCore/platform/graphics/iso/ISOVTTCue.cpp: (WebCore::ISOWebVTTCue::toJSONString const): Ditto. * Source/WebCore/testing/Internals.cpp: (WebCore::Internals::speechSynthesisUtteranceForCue): (WebCore::Internals::mediaElementCurrentlySpokenCue): * Source/WebCore/testing/Internals.h: * Source/WebCore/testing/Internals.idl: * Source/WebKit/UIProcess/WebPageProxy.cpp: (WebKit::WebPageProxy::speechSynthesisSpeak): Remove the `startTime` parameter name, it isn't used. Canonical link: https://commits.webkit.org/253931@main
- Loading branch information
1 parent
7a01e74
commit 969ac99ba7af596fe35f7adc7e5e60161ad137a3
Showing
35 changed files
with
533 additions
and
116 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
WEBVTT | ||
1 | ||
00:00:01.000 --> 00:00:02.000 | ||
1 - The first cue | ||
|
||
2 | ||
00:00:03.000 --> 00:00:15.000 | ||
2 - The second cue, from time 3 to 15 | ||
|
||
3 | ||
00:00:30.000 --> 00:00:40.000 | ||
2 - The second cue, from time 30 to 40 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
|
||
|
||
PASS WebVTT audio descriptions | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> | ||
<script src="../../resources/testharness.js"></script> | ||
<script src="../../resources/testharnessreport.js"></script> | ||
<script src=../media-file.js></script> | ||
</head> | ||
<body> | ||
<video controls muted id=video> | ||
<track id='testTrack' src='captions-webvtt/captions-descriptions.vtt' kind='descriptions' > | ||
</video> | ||
|
||
<script> | ||
|
||
promise_test(async (t) => { | ||
|
||
let descriptionsTrack = document.querySelector("track"); | ||
|
||
if (window.internals) | ||
internals.settings.setShouldDisplayTrackKind('TextDescriptions', true); | ||
|
||
video.src = findMediaFile('video', '../content/test'); | ||
await new Promise(resolve => video.oncanplaythrough = resolve); | ||
|
||
let cues = descriptionsTrack.track.cues; | ||
assert_equals(cues.length, 3); | ||
|
||
let checkCue = (cue, expectedText) => { | ||
assert_equals(cue.text, expectedText); | ||
if (!window.internals) | ||
return; | ||
|
||
let spokenCue = window.internals.mediaElementCurrentlySpokenCue(video); | ||
assert_not_equals(spokenCue, null, 'descriptive cue is being spoken'); | ||
|
||
let props = ['vertical', 'snapToLines', 'line', 'lineAlign', 'position', 'positionAlign', 'size', 'align', 'text', 'region', 'id', 'startTime', 'endTime', 'pauseOnExit']; | ||
props.forEach(prop => { | ||
assert_equals(cue[prop], spokenCue[prop], `spoken cue has correct "${prop}" value`); | ||
}); | ||
|
||
let utterance = window.internals.speechSynthesisUtteranceForCue(spokenCue); | ||
assert_not_equals(utterance, null, 'cue utterance is not null'); | ||
assert_equals(utterance.text, expectedText, 'correct text is being spoken'); | ||
} | ||
|
||
// Seek into the range for the first cue. | ||
video.currentTime = 1.1; | ||
await new Promise(resolve => cues[0].onenter = resolve); | ||
checkCue(cues[0], '1 - The first cue'); | ||
|
||
video.currentTime = 2.9; | ||
await new Promise(resolve => video.onseeked = resolve); | ||
|
||
// Play into the range of the second cue. | ||
video.play(); | ||
await new Promise(resolve => cues[1].onenter = (e) => { video.pause(); resolve() }); | ||
checkCue(cues[1], '2 - The second cue, from time 3 to 15'); | ||
|
||
}, "WebVTT audio descriptions"); | ||
|
||
</script> | ||
|
||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.