Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AX: Consider VTT-based audio descriptions with text-to-speech. #3486

Merged

Conversation

eric-carlson
Copy link
Contributor

@eric-carlson eric-carlson commented Aug 19, 2022

969ac99

AX: Consider VTT-based audio descriptions with text-to-speech.
https://bugs.webkit.org/show_bug.cgi?id=243600
rdar://98206665

Reviewed by Jer Noble.

* LayoutTests/media/track/captions-webvtt/captions-descriptions.vtt: Added.
* LayoutTests/media/track/track-description-cue-expected.txt: Added.
* LayoutTests/media/track/track-description-cue.html: Added.
* LayoutTests/TestExpectations: Feature is Cocoa-specific so far, skip test globally.
* LayoutTests/platform/ios/TestExpectations: Enable test on iOS.
* LayoutTests/platform/mac/TestExpectations: Enable test on macOS.

* Source/WTF/Scripts/Preferences/WebPreferencesExperimental.yaml: Add 'AudioDescriptionsEnabled'
setting.

* Source/WebCore/Modules/speech/SpeechSynthesis.cpp:
(WebCore::SpeechSynthesis::handleSpeakingCompleted): Call utterance to fire event.
(WebCore::SpeechSynthesis::boundaryEventOccurred): Ditto.
(WebCore::SpeechSynthesis::didStartSpeaking): Ditto.
(WebCore::SpeechSynthesis::didPauseSpeaking): Ditto.
(WebCore::SpeechSynthesis::didResumeSpeaking): Ditto.
(WebCore::SpeechSynthesis::fireEvent const): Deleted.
(WebCore::SpeechSynthesis::fireErrorEvent const): Deleted.
* Source/WebCore/Modules/speech/SpeechSynthesis.h:
(WebCore::SpeechSynthesis::userGestureRequiredForSpeechStart const):
(WebCore::SpeechSynthesis::removeBehaviorRestriction): Make public.

* Source/WebCore/Modules/speech/SpeechSynthesisUtterance.cpp:
(WebCore::SpeechSynthesisUtterance::create): Add version that takes a completion handler.
(WebCore::SpeechSynthesisUtterance::SpeechSynthesisUtterance): Add completion handler
parameter.
(WebCore::SpeechSynthesisUtterance::eventOccurred): New. Call completion handler or
dispatch event.
(WebCore::SpeechSynthesisUtterance::errorEventOccurred): Ditto.
* Source/WebCore/Modules/speech/SpeechSynthesisUtterance.h:
* Source/WebCore/Modules/speech/SpeechSynthesisUtterance.idl: JSGenerateToJSObject.

* Source/WebCore/html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::updateActiveTextTrackCues): Return early if a seek
is pending. Call new executeCueEnterOrLeaveAction method instead of dispatching
events directly.
(WebCore::HTMLMediaElement::setSpeechSynthesisState): Maintain synthesis state.
(WebCore::HTMLMediaElement::speakCueText): Speak a cue.
(WebCore::HTMLMediaElement::pauseSpeakingCueText):
(WebCore::HTMLMediaElement::resumeSpeakingCueText):
(WebCore::HTMLMediaElement::cancelSpeakingCueText):
(WebCore::HTMLMediaElement::shouldSpeakCueTextForTime):
(WebCore::HTMLMediaElement::executeCueEnterOrLeaveAction): Trigger cue speech if
the track contains descriptions and we are entering a cue range, schedule an enter
or exit event.
(WebCore::HTMLMediaElement::seekWithTolerance): INFO_LOG -> ALWAYS_LOG
(WebCore::HTMLMediaElement::seekTask): Cancel speaking if necessary.
(WebCore::HTMLMediaElement::finishSeek): Update logging. If there isn't a pending
seek, queue a task to update text track cues.
(WebCore::HTMLMediaElement::configureTextTrackGroup): When processing descriptions
and the user wants text descriptions, set `fallbackTrack` to the first track seen
in case none of the tracks matches the audio track language.
(WebCore::HTMLMediaElement::playPlayer): Call resumeSpeakingCueText.
(WebCore::HTMLMediaElement::pausePlayer): Call pauseSpeakingCueText.
(WebCore::HTMLMediaElement::effectiveVolume const): Use the speech volume multiplier
when calculating the effective volume.
(WebCore::m_categoryAtMostRecentPlayback): Deleted.
* Source/WebCore/html/HTMLMediaElement.h:
(WebCore::HTMLMediaElement::cueBeingSpoken const):

* Source/WebCore/html/shadow/MediaControlTextTrackContainerElement.cpp:
(WebCore::MediaControlTextTrackContainerElement::updateDisplay): Skip spoken tracks.

* Source/WebCore/html/track/InbandGenericTextTrack.cpp: Remove unneeded include.

* Source/WebCore/html/track/TextTrack.cpp:
(WebCore::TextTrack::trackIndex): Use textTrackList() instead of m_textTrackList.
(WebCore::TextTrack::isRendered): Consider descriptions.
(WebCore::TextTrack::isSpoken):
(WebCore::TextTrack::trackIndexRelativeToRenderedTracks):  Use textTrackList()
instead of m_textTrackList.
(WebCore::TextTrack::speechSynthesis):
* Source/WebCore/html/track/TextTrack.h:

* Source/WebCore/html/track/TextTrackCue.cpp:
(WebCore::operator<<): All cues have a `text()` method, just use it.
* Source/WebCore/html/track/TextTrackCue.h:
(WebCore::TextTrackCue::text const):
(WebCore::TextTrackCue::speak):

* Source/WebCore/html/track/VTTCue.cpp:
(WebCore::VTTCue::updateDisplayTree): `track()` can return null, check it.
(WebCore::VTTCue::getDisplayTree): Ditto.
(WebCore::VTTCue::toJSON const): Drive-by: address a Darin FIXME.
(WebCore::VTTCue::speak):
* Source/WebCore/html/track/VTTCue.h:
(WebCore::VTTCue::speechUtterance const):
(WebCore::VTTCue::text const): Deleted.
* Source/WebCore/html/track/VTTCue.idl:

* Source/WebCore/page/CaptionUserPreferences.cpp:
(WebCore::CaptionUserPreferences::userPrefersTextDescriptions const): Check audioDescriptionsEnabled
setting.
(WebCore::CaptionUserPreferences::textTrackSelectionScore const): Consider description
tracks if the user preference is set. Clean up logic.

* Source/WebCore/page/CaptionUserPreferencesMediaAF.cpp:
(WebCore::CaptionUserPreferencesMediaAF::userPrefersCaptions const):
(WebCore::CaptionUserPreferencesMediaAF::userPrefersTextDescriptions const): check
MediaAccessibility framework preference.
* Source/WebCore/page/CaptionUserPreferencesMediaAF.h:
* Source/WebCore/platform/cf/MediaAccessibilitySoftLink.cpp:
* Source/WebCore/platform/cf/MediaAccessibilitySoftLink.h:

* Source/WebCore/platform/cocoa/PlatformSpeechSynthesizerCocoa.mm:
(-[WebSpeechSynthesisWrapper speakUtterance:]): If `utteranceVoice` is non-NULL
and the URI is empty, it is invalid so check the language.

* Source/WebCore/platform/graphics/InbandGenericCue.cpp:
(WebCore::InbandGenericCue::toJSONString const): Log cue text like VTTCue now does.

* Source/WebCore/platform/graphics/iso/ISOVTTCue.cpp:
(WebCore::ISOWebVTTCue::toJSONString const): Ditto.

* Source/WebCore/testing/Internals.cpp:
(WebCore::Internals::speechSynthesisUtteranceForCue):
(WebCore::Internals::mediaElementCurrentlySpokenCue):
* Source/WebCore/testing/Internals.h:
* Source/WebCore/testing/Internals.idl:

* Source/WebKit/UIProcess/WebPageProxy.cpp:
(WebKit::WebPageProxy::speechSynthesisSpeak): Remove the `startTime` parameter name,
it isn't used.

Canonical link: https://commits.webkit.org/253931@main

2afe52d

Misc iOS, tvOS & watchOS macOS Linux Windows
βœ… πŸ§ͺ style βœ… πŸ›  ios   πŸ›  mac βœ… πŸ›  wpe βœ… πŸ›  πŸ§ͺ win
βœ… πŸ§ͺ bindings βœ… πŸ›  ios-sim   πŸ›  mac-debug βœ… πŸ›  gtk   πŸ›  wincairo
βœ… πŸ§ͺ webkitperl   πŸ§ͺ ios-wk2 βœ… πŸ›  mac-AS-debug βœ… πŸ§ͺ gtk-wk2
  πŸ§ͺ api-ios   πŸ§ͺ api-mac βœ… πŸ§ͺ api-gtk
βœ… πŸ›  πŸ§ͺ jsc βœ… πŸ›  tv   πŸ§ͺ mac-wk1 βœ… πŸ›  jsc-armv7
βœ… πŸ›  tv-sim   πŸ§ͺ mac-wk2 βœ… πŸ§ͺ jsc-armv7-tests
βœ… πŸ›  πŸ§ͺ merge βœ… πŸ›  watch   πŸ§ͺ mac-AS-debug-wk2 βœ… πŸ›  jsc-mips
βœ… πŸ›  watch-sim   πŸ§ͺ mac-wk2-stress βœ… πŸ§ͺ jsc-mips-tests

@eric-carlson eric-carlson self-assigned this Aug 19, 2022
@eric-carlson eric-carlson added Accessibility For bugs related to accessibility. WebKit Nightly Build labels Aug 19, 2022
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Aug 19, 2022

auto exitEvent = Event::create(eventNames().exitEvent, Event::CanBubble::No, Event::IsCancelable::No);
scheduleEventOn(*eventTask.second, WTFMove(exitEvent));
executeCueEnterOrLeaveAction(*eventTask.second, CueAction::Enter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be cleaner to make eventTask a structured binding auto& [eventTime, eventCue]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we should do that as a followup.

if (!m_cueBeingSpoken)
return false;

ALWAYS_LOG(LOGIDENTIFIER, "time = ", time, ", returning ", (time.toDouble() >= m_cueBeingSpoken->startTime() && time.toDouble() < m_cueBeingSpoken->endTime()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(time.toDouble() >= m_cueBeingSpoken->startTime() && time.toDouble() < m_cueBeingSpoken->endTime()) can be its own variable like timeIsInRange

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change that.

return;

auto& track = *this->track();
if (!m_speechUtterance) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for m_speechUtterance to exist at this point? If so, the completion handler won't be called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be possible, I'll add an assert and will remove the test.

@eric-carlson eric-carlson added merge-queue Applied to send a pull request to merge-queue and removed merging-blocked Applied to prevent a change from being merged labels Aug 30, 2022
https://bugs.webkit.org/show_bug.cgi?id=243600
rdar://98206665

Reviewed by Jer Noble.

* LayoutTests/media/track/captions-webvtt/captions-descriptions.vtt: Added.
* LayoutTests/media/track/track-description-cue-expected.txt: Added.
* LayoutTests/media/track/track-description-cue.html: Added.
* LayoutTests/TestExpectations: Feature is Cocoa-specific so far, skip test globally.
* LayoutTests/platform/ios/TestExpectations: Enable test on iOS.
* LayoutTests/platform/mac/TestExpectations: Enable test on macOS.

* Source/WTF/Scripts/Preferences/WebPreferencesExperimental.yaml: Add 'AudioDescriptionsEnabled'
setting.

* Source/WebCore/Modules/speech/SpeechSynthesis.cpp:
(WebCore::SpeechSynthesis::handleSpeakingCompleted): Call utterance to fire event.
(WebCore::SpeechSynthesis::boundaryEventOccurred): Ditto.
(WebCore::SpeechSynthesis::didStartSpeaking): Ditto.
(WebCore::SpeechSynthesis::didPauseSpeaking): Ditto.
(WebCore::SpeechSynthesis::didResumeSpeaking): Ditto.
(WebCore::SpeechSynthesis::fireEvent const): Deleted.
(WebCore::SpeechSynthesis::fireErrorEvent const): Deleted.
* Source/WebCore/Modules/speech/SpeechSynthesis.h:
(WebCore::SpeechSynthesis::userGestureRequiredForSpeechStart const):
(WebCore::SpeechSynthesis::removeBehaviorRestriction): Make public.

* Source/WebCore/Modules/speech/SpeechSynthesisUtterance.cpp:
(WebCore::SpeechSynthesisUtterance::create): Add version that takes a completion handler.
(WebCore::SpeechSynthesisUtterance::SpeechSynthesisUtterance): Add completion handler
parameter.
(WebCore::SpeechSynthesisUtterance::eventOccurred): New. Call completion handler or
dispatch event.
(WebCore::SpeechSynthesisUtterance::errorEventOccurred): Ditto.
* Source/WebCore/Modules/speech/SpeechSynthesisUtterance.h:
* Source/WebCore/Modules/speech/SpeechSynthesisUtterance.idl: JSGenerateToJSObject.

* Source/WebCore/html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::updateActiveTextTrackCues): Return early if a seek
is pending. Call new executeCueEnterOrLeaveAction method instead of dispatching
events directly.
(WebCore::HTMLMediaElement::setSpeechSynthesisState): Maintain synthesis state.
(WebCore::HTMLMediaElement::speakCueText): Speak a cue.
(WebCore::HTMLMediaElement::pauseSpeakingCueText):
(WebCore::HTMLMediaElement::resumeSpeakingCueText):
(WebCore::HTMLMediaElement::cancelSpeakingCueText):
(WebCore::HTMLMediaElement::shouldSpeakCueTextForTime):
(WebCore::HTMLMediaElement::executeCueEnterOrLeaveAction): Trigger cue speech if
the track contains descriptions and we are entering a cue range, schedule an enter
or exit event.
(WebCore::HTMLMediaElement::seekWithTolerance): INFO_LOG -> ALWAYS_LOG
(WebCore::HTMLMediaElement::seekTask): Cancel speaking if necessary.
(WebCore::HTMLMediaElement::finishSeek): Update logging. If there isn't a pending
seek, queue a task to update text track cues.
(WebCore::HTMLMediaElement::configureTextTrackGroup): When processing descriptions
and the user wants text descriptions, set `fallbackTrack` to the first track seen
in case none of the tracks matches the audio track language.
(WebCore::HTMLMediaElement::playPlayer): Call resumeSpeakingCueText.
(WebCore::HTMLMediaElement::pausePlayer): Call pauseSpeakingCueText.
(WebCore::HTMLMediaElement::effectiveVolume const): Use the speech volume multiplier
when calculating the effective volume.
(WebCore::m_categoryAtMostRecentPlayback): Deleted.
* Source/WebCore/html/HTMLMediaElement.h:
(WebCore::HTMLMediaElement::cueBeingSpoken const):

* Source/WebCore/html/shadow/MediaControlTextTrackContainerElement.cpp:
(WebCore::MediaControlTextTrackContainerElement::updateDisplay): Skip spoken tracks.

* Source/WebCore/html/track/InbandGenericTextTrack.cpp: Remove unneeded include.

* Source/WebCore/html/track/TextTrack.cpp:
(WebCore::TextTrack::trackIndex): Use textTrackList() instead of m_textTrackList.
(WebCore::TextTrack::isRendered): Consider descriptions.
(WebCore::TextTrack::isSpoken):
(WebCore::TextTrack::trackIndexRelativeToRenderedTracks):  Use textTrackList()
instead of m_textTrackList.
(WebCore::TextTrack::speechSynthesis):
* Source/WebCore/html/track/TextTrack.h:

* Source/WebCore/html/track/TextTrackCue.cpp:
(WebCore::operator<<): All cues have a `text()` method, just use it.
* Source/WebCore/html/track/TextTrackCue.h:
(WebCore::TextTrackCue::text const):
(WebCore::TextTrackCue::speak):

* Source/WebCore/html/track/VTTCue.cpp:
(WebCore::VTTCue::updateDisplayTree): `track()` can return null, check it.
(WebCore::VTTCue::getDisplayTree): Ditto.
(WebCore::VTTCue::toJSON const): Drive-by: address a Darin FIXME.
(WebCore::VTTCue::speak):
* Source/WebCore/html/track/VTTCue.h:
(WebCore::VTTCue::speechUtterance const):
(WebCore::VTTCue::text const): Deleted.
* Source/WebCore/html/track/VTTCue.idl:

* Source/WebCore/page/CaptionUserPreferences.cpp:
(WebCore::CaptionUserPreferences::userPrefersTextDescriptions const): Check audioDescriptionsEnabled
setting.
(WebCore::CaptionUserPreferences::textTrackSelectionScore const): Consider description
tracks if the user preference is set. Clean up logic.

* Source/WebCore/page/CaptionUserPreferencesMediaAF.cpp:
(WebCore::CaptionUserPreferencesMediaAF::userPrefersCaptions const):
(WebCore::CaptionUserPreferencesMediaAF::userPrefersTextDescriptions const): check
MediaAccessibility framework preference.
* Source/WebCore/page/CaptionUserPreferencesMediaAF.h:
* Source/WebCore/platform/cf/MediaAccessibilitySoftLink.cpp:
* Source/WebCore/platform/cf/MediaAccessibilitySoftLink.h:

* Source/WebCore/platform/cocoa/PlatformSpeechSynthesizerCocoa.mm:
(-[WebSpeechSynthesisWrapper speakUtterance:]): If `utteranceVoice` is non-NULL
and the URI is empty, it is invalid so check the language.

* Source/WebCore/platform/graphics/InbandGenericCue.cpp:
(WebCore::InbandGenericCue::toJSONString const): Log cue text like VTTCue now does.

* Source/WebCore/platform/graphics/iso/ISOVTTCue.cpp:
(WebCore::ISOWebVTTCue::toJSONString const): Ditto.

* Source/WebCore/testing/Internals.cpp:
(WebCore::Internals::speechSynthesisUtteranceForCue):
(WebCore::Internals::mediaElementCurrentlySpokenCue):
* Source/WebCore/testing/Internals.h:
* Source/WebCore/testing/Internals.idl:

* Source/WebKit/UIProcess/WebPageProxy.cpp:
(WebKit::WebPageProxy::speechSynthesisSpeak): Remove the `startTime` parameter name,
it isn't used.

Canonical link: https://commits.webkit.org/253931@main
@webkit-early-warning-system webkit-early-warning-system merged commit 969ac99 into WebKit:main Aug 30, 2022
@webkit-commit-queue
Copy link
Collaborator

Committed 253931@main (969ac99): https://commits.webkit.org/253931@main

Reviewed commits have been landed. Closing PR #3486 and removing active labels.

@webkit-commit-queue webkit-commit-queue removed the merge-queue Applied to send a pull request to merge-queue label Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accessibility For bugs related to accessibility.
Projects
None yet
6 participants