Skip to content
Permalink
Browse files
AX: Support VTT-based extended audio descriptions
https://bugs.webkit.org/show_bug.cgi?id=244931
<rdar://99697422>

Reviewed by Jer Noble.

* LayoutTests/media/track/captions-webvtt/captions-extended-descriptions.vtt: Added.
* LayoutTests/media/track/track-description-cue.html:
* LayoutTests/media/track/track-extended-descriptions-expected.txt: Added.
* LayoutTests/media/track/track-extended-descriptions.html: Added.
* LayoutTests/TestExpectations: Feature is Cocoa-specific so far, skip test globally.
* LayoutTests/platform/ios/TestExpectations: Enable test on iOS.
* LayoutTests/platform/mac/TestExpectations: Enable test on macOS.

* Source/WTF/Scripts/Preferences/WebPreferencesExperimental.yaml: Add 'ExtendedAudioDescriptionsEnabled'
setting.

* Source/WebCore/Modules/speech/SpeechSynthesis.cpp:
(WebCore::SpeechSynthesis::setPlatformSynthesizer): Take a Ref<> synthesizer instead of a
unique_ptr.
(WebCore::SpeechSynthesis::ensurePlatformSpeechSynthesizer): Use PlatformSpeechSynthesizer::create.
* Source/WebCore/Modules/speech/SpeechSynthesis.h:

* Source/WebCore/html/HTMLMediaElement.cpp:
(WebCore::convertEnumerationToString): Added.
(WebCore::HTMLMediaElement::unregisterWithDocument): Cancel speech, clear synthesize.
(WebCore::HTMLMediaElement::updateActiveTextTrackCues): Cleanup for readability.
(WebCore::HTMLMediaElement::setSpeechSynthesisState): Update for extended descriptions.
(WebCore::HTMLMediaElement::speakCueText): Call cue.prepareToSpeak as we don't always want
a cue to begin speaking immediately.
(WebCore::HTMLMediaElement::pauseSpeakingCueText): Support completing an extended description.
(WebCore::HTMLMediaElement::resumeSpeakingCueText): Ditto.
(WebCore::HTMLMediaElement::pausePlaybackForExtendedTextDescription): New, pause playback
to allow an extended description to complete.
(WebCore::HTMLMediaElement::speechSynthesis): Have the media element own the synthesizer
since the logic for managing it is here.
(WebCore::HTMLMediaElement::executeCueEnterOrExitActionForTime): Support extended descriptions.
(WebCore::HTMLMediaElement::addTextTrack): Set m_userPrefersExtendedDescriptions.
(WebCore::HTMLMediaElement::configureTextTrackGroup): Use m_userPrefersTextDescriptions.
(WebCore::HTMLMediaElement::updatePlayState): Pause and resume speaking here, not in `playPlayer`
and `pausePlayer`.
(WebCore::HTMLMediaElement::playPlayer): Ditto.
(WebCore::HTMLMediaElement::pausePlayer): Ditto.
(WebCore::HTMLMediaElement::captionPreferencesChanged): set m_userPrefersExtendedDescriptions.
(WebCore::HTMLMediaElement::executeCueEnterOrLeaveAction): Deleted.
* Source/WebCore/html/HTMLMediaElement.h:
(WTF::LogArgument<WebCore::HTMLMediaElement::SpeechSynthesisState>::toString):

* Source/WebCore/html/track/TextTrack.cpp:
(WebCore::TextTrack::speechSynthesis): Deleted.
* Source/WebCore/html/track/TextTrack.h:

* Source/WebCore/html/track/TextTrackCue.h:
(WebCore::TextTrackCue::prepareToSpeak): Renamed from `speak`.
(WebCore::TextTrackCue::beginSpeaking): Added.
(WebCore::TextTrackCue::pauseSpeaking): Added.
(WebCore::TextTrackCue::cancelSpeaking): Added.
(WebCore::TextTrackCue::speak): Deleted.

* Source/WebCore/html/track/VTTCue.cpp:
(WebCore::mapVideoRateToSpeechRate): Convert playback rate to web speech rate.
(WebCore::VTTCue::prepareToSpeak): Call completion handler when there is nothing to speak
or when the track is null. Stash `speechSynthesis` for future use.
(WebCore::VTTCue::beginSpeaking): Begin or resume speaking.
(WebCore::VTTCue::pauseSpeaking):
(WebCore::VTTCue::cancelSpeaking):
(WebCore::VTTCue::speak): Deleted.
* Source/WebCore/html/track/VTTCue.h:

* Source/WebCore/page/CaptionUserPreferences.cpp:
(WebCore::CaptionUserPreferences::userPrefersTextDescriptions const): Consider setting for
extended descriptions.
* Source/WebCore/platform/PlatformSpeechSynthesizer.h: Make refcounted.

* Source/WebCore/platform/cocoa/PlatformSpeechSynthesizerCocoa.mm:
(-[WebSpeechSynthesisWrapper speakUtterance:]): `client()` is a ref, not a pointer.
(-[WebSpeechSynthesisWrapper speechSynthesizer:didStartSpeechUtterance:]): Ditto.
(-[WebSpeechSynthesisWrapper speechSynthesizer:didFinishSpeechUtterance:]): Ditto.
(-[WebSpeechSynthesisWrapper speechSynthesizer:didPauseSpeechUtterance:]): Ditto.
(-[WebSpeechSynthesisWrapper speechSynthesizer:didContinueSpeechUtterance:]): Ditto.
(-[WebSpeechSynthesisWrapper speechSynthesizer:didCancelSpeechUtterance:]): Ditto.
(-[WebSpeechSynthesisWrapper speechSynthesizer:willSpeakRangeOfSpeechString:utterance:]): Ditto.
(WebCore::PlatformSpeechSynthesizer::create): Client can never be null so take ref, not a pointer.
(WebCore::PlatformSpeechSynthesizer::PlatformSpeechSynthesizer):

* Source/WebCore/platform/mock/PlatformSpeechSynthesizerMock.cpp:
(WebCore::PlatformSpeechSynthesizerMock::create): Client can never be null so take ref, not a pointer.
(WebCore::PlatformSpeechSynthesizerMock::PlatformSpeechSynthesizerMock):
(WebCore::PlatformSpeechSynthesizerMock::speakingFinished): `client()` is a ref, not a pointer.
(WebCore::PlatformSpeechSynthesizerMock::speak): Ditto. Use configurable utterance duration.
(WebCore::PlatformSpeechSynthesizerMock::cancel): Ditto.
(WebCore::PlatformSpeechSynthesizerMock::pause): Ditto. Null-check `m_utterance`.
(WebCore::PlatformSpeechSynthesizerMock::resume): Ditto.
* Source/WebCore/platform/mock/PlatformSpeechSynthesizerMock.h:
(WebCore::PlatformSpeechSynthesizerMock::setUtteranceDuration):

* Source/WebCore/testing/Internals.cpp:
(WebCore::Internals::enableMockSpeechSynthesizer):
(WebCore::Internals::enableMockSpeechSynthesizerForMediaElement):
(WebCore::Internals::setSpeechUtteranceDuration):
* Source/WebCore/testing/Internals.h:
* Source/WebCore/testing/Internals.idl:

* Source/WebKit/UIProcess/WebPageProxy.cpp:
(WebKit::WebPageProxy::resetSpeechSynthesizer): PlatformSpeechSynthesizer is a Ref, not
a unique_ptr.
(WebKit::WebPageProxy::speechSynthesisData): Ditto.
* Source/WebKit/UIProcess/WebPageProxy.h:

Canonical link: https://commits.webkit.org/254502@main
  • Loading branch information
eric-carlson committed Sep 15, 2022
1 parent d122af2 commit c2f7594742ab1169aa5e66097a2a4de8f928a746
Show file tree
Hide file tree
Showing 27 changed files with 402 additions and 109 deletions.
@@ -4795,6 +4795,7 @@ http/tests/media/hls/hls-hdr-switch.html [ Skip ]
http/tests/media/video-canplaythrough-webm.html [ Skip ]
media/media-session/mock-coordinator.html [ Skip ]
media/track/track-description-cue.html [ Skip ]
media/track/track-extended-descriptions.html [ Skip ]

# These tests rely on webkit-test-runner flags that aren't implemented for DumpRenderTree, so they will fail under legacy WebKit.
editing/selection/expando.html [ Failure ]
@@ -0,0 +1,10 @@
WEBVTT
1
00:00:01.100 --> 00:00:02.000
1 - The first cue, it is much too long to complete

2
00:00:03.000 --> 00:00:15.000
2 - The second cue, from time 3 to 15

@@ -7,22 +7,30 @@
<script src=../media-file.js></script>
</head>
<body>
<video controls muted id=video>
<track id='testTrack' src='captions-webvtt/captions-descriptions.vtt' kind='descriptions' >
<video controls muted>
</video>

<script>

promise_test(async (t) => {

let descriptionsTrack = document.querySelector("track");
let video = document.getElementsByTagName('video')[0];

if (window.internals)
if (window.internals) {
internals.settings.setShouldDisplayTrackKind('TextDescriptions', true);
internals.settings.setAudioDescriptionsEnabled(true);
internals.enableMockSpeechSynthesizerForMediaElement(video);
}

video.src = findMediaFile('video', '../content/test');
await new Promise(resolve => video.oncanplaythrough = resolve);

let descriptionsTrack = document.createElement('track');
descriptionsTrack.setAttribute('kind', 'descriptions');
descriptionsTrack.setAttribute('src', 'captions-webvtt/captions-descriptions.vtt');
video.appendChild(descriptionsTrack);
await new Promise(resolve => descriptionsTrack.onload = resolve);

let cues = descriptionsTrack.track.cues;
assert_equals(cues.length, 3);

@@ -0,0 +1,4 @@


PASS WebVTT extended audio descriptions

@@ -0,0 +1,80 @@
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<script src="../../resources/testharness.js"></script>
<script src="../../resources/testharnessreport.js"></script>
<script src=../media-file.js></script>
</head>
<body>
<video controls>
</video>

<script>

promise_test(async (t) => {

let video = document.getElementsByTagName('video')[0];

if (window.internals) {
internals.settings.setShouldDisplayTrackKind('TextDescriptions', true);
internals.settings.setExtendedAudioDescriptionsEnabled(true);
internals.enableMockSpeechSynthesizerForMediaElement(video);
internals.setSpeechUtteranceDuration(2);
}

video.src = findMediaFile('video', '../content/test');
await new Promise(resolve => video.oncanplaythrough = resolve);

let descriptionsTrack = document.createElement('track');
descriptionsTrack.setAttribute('kind', 'descriptions');
descriptionsTrack.setAttribute('src', 'captions-webvtt/captions-extended-descriptions.vtt');
video.appendChild(descriptionsTrack);
await new Promise(resolve => descriptionsTrack.onload = resolve);

let cues = descriptionsTrack.track.cues;
assert_equals(cues.length, 2);

let checkCue = (cue, expectedId) => {
assert_equals(cue.id, expectedId);
if (!window.internals)
return;

let spokenCue = window.internals.mediaElementCurrentlySpokenCue(video);
assert_not_equals(spokenCue, null, 'descriptive cue is being spoken');

let props = ['vertical', 'snapToLines', 'line', 'lineAlign', 'position', 'positionAlign', 'size', 'align', 'text', 'region', 'id', 'startTime', 'endTime', 'pauseOnExit'];
props.forEach(prop => {
assert_equals(cue[prop], spokenCue[prop], `spoken cue has correct "${prop}" value`);
});

let utterance = window.internals.speechSynthesisUtteranceForCue(spokenCue);
assert_not_equals(utterance, null, 'cue utterance is not null');
assert_equals(utterance.text, cue.text, 'correct text is being spoken');
}

// Play into the range of the first cue...
video.currentTime = 1;
video.play();
await new Promise(resolve => cues[0].onenter = resolve);
checkCue(cues[0], '1');

// playback should pause...
await new Promise(resolve => video.onpause = resolve);

// and resume.
await new Promise(resolve => video.onplay = resolve);

// Play into the range of the second cue.
video.currentTime = 2.9;
await new Promise(resolve => video.onseeked = resolve);
video.play();
await new Promise(resolve => cues[1].onenter = (e) => { video.pause(); resolve() });
checkCue(cues[1], '2');

}, "WebVTT extended audio descriptions");

</script>

</body>
</html>
@@ -59,6 +59,7 @@ imported/w3c/web-platform-tests/speech-api/ [ Pass ]

http/tests/media/fairplay [ Pass ]
media/track/track-description-cue.html [ Pass ]
media/track/track-extended-descriptions.html [ Pass ]

#//////////////////////////////////////////////////////////////////////////////////////////
# End platform-specific directories.
@@ -88,6 +88,7 @@ imported/w3c/web-platform-tests/speech-api [ Pass ]

http/tests/media/fairplay [ Pass ]
media/track/track-description-cue.html [ Pass ]
media/track/track-extended-descriptions.html [ Pass ]

#//////////////////////////////////////////////////////////////////////////////////////////
# End platform-specific directories.
@@ -132,8 +132,8 @@ AsyncClipboardAPIEnabled:
AudioDescriptionsEnabled:
type: bool
condition: ENABLE(VIDEO)
humanReadableName: "Audio descriptions for video"
humanReadableDescription: "Enable audio descriptions for video"
humanReadableName: "Audio descriptions for video - Standard"
humanReadableDescription: "Enable standard audio descriptions for video"
defaultValue:
WebKitLegacy:
default: false
@@ -645,6 +645,19 @@ ExposeSpeakersEnabled:
WebCore:
default: false

ExtendedAudioDescriptionsEnabled:
type: bool
condition: ENABLE(VIDEO)
humanReadableName: "Audio descriptions for video - Extended"
humanReadableDescription: "Enable extended audio descriptions for video"
defaultValue:
WebKitLegacy:
default: false
WebKit:
default: false
WebCore:
default: false

FTPEnabled:
type: bool
humanReadableName: "FTP support enabled"
@@ -74,9 +74,9 @@ SpeechSynthesis::SpeechSynthesis(ScriptExecutionContext& context)

SpeechSynthesis::~SpeechSynthesis() = default;

void SpeechSynthesis::setPlatformSynthesizer(std::unique_ptr<PlatformSpeechSynthesizer> synthesizer)
void SpeechSynthesis::setPlatformSynthesizer(Ref<PlatformSpeechSynthesizer>&& synthesizer)
{
m_platformSpeechSynthesizer = WTFMove(synthesizer);
m_platformSpeechSynthesizer = synthesizer.ptr();
m_voiceList.clear();
m_currentSpeechUtterance = nullptr;
m_utteranceQueue.clear();
@@ -93,7 +93,7 @@ void SpeechSynthesis::voicesDidChange()
PlatformSpeechSynthesizer& SpeechSynthesis::ensurePlatformSpeechSynthesizer()
{
if (!m_platformSpeechSynthesizer)
m_platformSpeechSynthesizer = makeUnique<PlatformSpeechSynthesizer>(this);
m_platformSpeechSynthesizer = PlatformSpeechSynthesizer::create(*this);
return *m_platformSpeechSynthesizer;
}

@@ -63,7 +63,7 @@ class SpeechSynthesis : public PlatformSpeechSynthesizerClient, public SpeechSyn
const Vector<Ref<SpeechSynthesisVoice>>& getVoices();

// Used in testing to use a mock platform synthesizer
WEBCORE_EXPORT void setPlatformSynthesizer(std::unique_ptr<PlatformSpeechSynthesizer>);
WEBCORE_EXPORT void setPlatformSynthesizer(Ref<PlatformSpeechSynthesizer>&&);

// Restrictions to change default behaviors.
enum BehaviorRestrictionFlags {
@@ -106,7 +106,7 @@ class SpeechSynthesis : public PlatformSpeechSynthesizerClient, public SpeechSyn

PlatformSpeechSynthesizer& ensurePlatformSpeechSynthesizer();

std::unique_ptr<PlatformSpeechSynthesizer> m_platformSpeechSynthesizer;
RefPtr<PlatformSpeechSynthesizer> m_platformSpeechSynthesizer;
Vector<Ref<SpeechSynthesisVoice>> m_voiceList;
RefPtr<SpeechSynthesisUtterance> m_currentSpeechUtterance;
Deque<Ref<SpeechSynthesisUtterance>> m_utteranceQueue;

0 comments on commit c2f7594

Please sign in to comment.