Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPXSpeechSynthesizer stopSpeaking() method cannot return immediately on iOS 17 #2081

Closed
solossam opened this issue Sep 19, 2023 · 15 comments
Closed
Assignees
Labels
accepted Issue moved to product team backlog. Will be closed when addressed.

Comments

@solossam
Copy link

solossam commented Sep 19, 2023

After calling SPXSpeechSynthesizer startSpeakingSsml() to synthesize the speech, while the speech is playing back, and then call stopSpeaking() to stop the speech, stopSpeaking() cannot return immediately on iOS 17. The speech audio can be stopped immediately, but this method will take 10-20 secs to return.

iOS16 and iOS 15 doesn't have this issue.

Below is our sample swift code to illustrate this issue on iOS 17

// below is the code snippet to start text to speech

let speechConfig = try SPXSpeechConfiguration(subscription: key, region: region)
synthesizer = try SPXSpeechSynthesizer(speechConfig)

let ssml = """
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"  xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
    <voice name="en-US-JennyNeural">
        <mstts:express-as style="friendly">
            <prosody rate="1.0">
                Hello how are you ?
            </prosody>
        </mstts:express-as>
    </voice>
</speak>
"""

try synthesizer?.startSpeakingSsml(ssml)

Below is the code snippet to stop text to speech

try synthesizer?.stopSpeaking() // this method takes 10-20s to return

We're using MicrosoftCognitiveServicesSpeech-iOS ver 1.32.1

@trrwilson
Copy link
Member

@yulin-li -- do you have any insight on this iOS17-specific behavioral difference relative to older versions? @jhakulin rightly recommended asking you early while we track down a repro device.

@trrwilson trrwilson added the in-review In review label Sep 26, 2023
@ssadel
Copy link

ssadel commented Oct 2, 2023

bump- this is causing a crash in my app as my class deinits

temp fix: have replaced the sdk with REST where i let users preview voices in my app

@jhakulin jhakulin self-assigned this Oct 2, 2023
@jhakulin
Copy link

jhakulin commented Oct 2, 2023

Internal workitem: 5706031

@jhakulin jhakulin added accepted Issue moved to product team backlog. Will be closed when addressed. and removed in-review In review labels Oct 2, 2023
@jhakulin
Copy link

jhakulin commented Oct 3, 2023

@ssadel Does your app crash due to stop takes too long? Or crash happens inside SDK? Could you please share backtrace?

@jhakulin jhakulin added the update needed For items that are in progress but have not been updated label Oct 3, 2023
@solossam
Copy link
Author

solossam commented Oct 4, 2023

May I know any ETA for this fix? Our production iOS app suffers from this issue, is there any suggested workaround in the meantime before the fix is available. Many thanks !

@ssadel
Copy link

ssadel commented Oct 4, 2023

@ssadel Does your app crash due to stop takes too long? Or crash happens inside SDK? Could you please share backtrace?

hopefully this can help, it's the trace from an iOS 17.0.2 device. context is the stopPreview func is called right before deallocation of a view model. comes from crashlytics btw might be a bit wonky, i'm unable to test myself as i haven't updated yet

Crashed: com.apple.main-thread
0  libsystem_platform.dylib            0x6290 _os_unfair_lock_corruption_abort + 88
1  libsystem_platform.dylib            0x3358 _os_unfair_lock_lock_slow + 300
2  libsystem_pthread.dylib             0x8030 pthread_mutex_destroy + 64
3  MicrosoftCognitiveServicesSpeech    0x8f3fb8 pal_get_value + 89132
4  MicrosoftCognitiveServicesSpeech    0x34ae28 GetModuleObject + 1701908
5  MicrosoftCognitiveServicesSpeech    0x34ad94 GetModuleObject + 1701760
6  MicrosoftCognitiveServicesSpeech    0x3481d0 GetModuleObject + 1690556
7  MicrosoftCognitiveServicesSpeech    0x347f70 GetModuleObject + 1689948
8  MicrosoftCognitiveServicesSpeech    0x34864c GetModuleObject + 1691704
9  MicrosoftCognitiveServicesSpeech    0x3930c4 GetModuleObject + 1997488
10 MicrosoftCognitiveServicesSpeech    0x393c90 GetModuleObject + 2000508
11 MicrosoftCognitiveServicesSpeech    0xc69b4 property_bag_copy + 5460
12 MicrosoftCognitiveServicesSpeech    0xc66b0 property_bag_copy + 4688
13 MicrosoftCognitiveServicesSpeech    0xc657c property_bag_copy + 4380
14 MicrosoftCognitiveServicesSpeech    0xc4d40 property_bag_release + 20
15 MicrosoftCognitiveServicesSpeech    0x18bdf8 pal_string_to_wstring + 807452
16 MicrosoftCognitiveServicesSpeech    0x18b9f0 pal_string_to_wstring + 806420
17 MicrosoftCognitiveServicesSpeech    0x165f78 pal_string_to_wstring + 652188
18 libobjc.A.dylib                     0x5354 object_cxxDestructFromClass(objc_object*, objc_class*) + 116
19 libobjc.A.dylib                     0x5090 objc_destructInstance + 80
20 libobjc.A.dylib                     0x503c _objc_rootDealloc + 80
21 MicrosoftCognitiveServicesSpeech    0x185f3c pal_string_to_wstring + 783200
22 libobjc.A.dylib                     0x5354 object_cxxDestructFromClass(objc_object*, objc_class*) + 116
23 libobjc.A.dylib                     0x5090 objc_destructInstance + 80
24 libobjc.A.dylib                     0x503c _objc_rootDealloc + 80
25 MicrosoftCognitiveServicesSpeech    0x185c58 pal_string_to_wstring + 782460
26 Span                                0x44748 AddVoiceViewModel.stopPreview() + 145 (AddVoiceViewModel.swift:145)
27 Span                                0x43058 AddVoiceViewModel.previewVoice(_:) + 59 (AddVoiceViewModel.swift:59)
28 Span                                0x67840 specialized AddVoiceView.previewVoice(_:didTapCell:) + 147 (AddVoiceView.swift:147)
29 Span                                0x67f74 partial apply for closure #1 in closure #2 in closure #1 in closure #1 in closure #1 in AddVoiceView.body.getter + 4304928628 (<compiler-generated>:4304928628)
30 Span                                0xcf370 partial apply for closure #1 in VoiceCell.body.getter + 20 (VoiceCell.swift:20)
31 SwiftUI                             0x198c318 OUTLINED_FUNCTION_2 + 900
32 SwiftUI                             0xf75c6c OUTLINED_FUNCTION_4 + 12156
33 SwiftUI                             0xf7be9c objectdestroy.73Tm + 300
34 SwiftUI                             0xf79b5c OUTLINED_FUNCTION_4 + 28268
35 SwiftUI                             0xf79aa8 OUTLINED_FUNCTION_4 + 28088
36 SwiftUI                             0x15f2b48 OUTLINED_FUNCTION_0 + 5048
37 SwiftUI                             0x15f1e3c OUTLINED_FUNCTION_0 + 1708
38 SwiftUI                             0x15f180c OUTLINED_FUNCTION_0 + 124
39 SwiftUI                             0x11474c0 OUTLINED_FUNCTION_11 + 40
40 SwiftUI                             0x11474e8 OUTLINED_FUNCTION_11 + 80
41 SwiftUI                             0x11474c0 OUTLINED_FUNCTION_11 + 40
42 SwiftUI                             0x1977230 OUTLINED_FUNCTION_4 + 2828
43 SwiftUI                             0x1977948 OUTLINED_FUNCTION_4 + 4644
44 SwiftUI                             0x1b3b42c OUTLINED_FUNCTION_3 + 152
45 SwiftUI                             0x18c3414 OUTLINED_FUNCTION_9 + 9908
46 SwiftUI                             0x18c193c OUTLINED_FUNCTION_9 + 3036
47 SwiftUI                             0x18c1acc OUTLINED_FUNCTION_9 + 3436
48 UIKitCore                           0x166494 -[UIGestureRecognizer _componentsEnded:withEvent:] + 172
49 UIKitCore                           0x1f408 -[UITouchesEvent _sendEventToGestureRecognizer:] + 464
50 UIKitCore                           0x1ba84 -[UIGestureEnvironment _deliverEvent:toGestureRecognizers:usingBlock:] + 172
51 UIKitCore                           0x1b9b0 -[UIGestureEnvironment _updateForEvent:window:] + 188
52 UIKitCore                           0x2054b8 -[UIWindow sendEvent:] + 3188
53 UIKitCore                           0x204748 -[UIApplication sendEvent:] + 560
54 UIKitCore                           0x1c6d88 __dispatchPreprocessedEventFromEventQueue + 6492
55 UIKitCore                           0x1c508c __processEventQueue + 5540
56 UIKitCore                           0x1c3a9c updateCycleEntry + 160
57 UIKitCore                           0xaad94 _UIUpdateSequenceRun + 84
58 UIKitCore                           0xaa484 schedulerStepScheduledMainSection + 144
59 UIKitCore                           0xaa540 runloopSourceCallback + 92
60 CoreFoundation                      0x37acc __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 28
61 CoreFoundation                      0x36d48 __CFRunLoopDoSource0 + 176
62 CoreFoundation                      0x354fc __CFRunLoopDoSources0 + 244
63 CoreFoundation                      0x34238 __CFRunLoopRun + 828
64 CoreFoundation                      0x33e18 CFRunLoopRunSpecific + 608
65 GraphicsServices                    0x35ec GSEventRunModal + 164
66 UIKitCore                           0x22f350 -[UIApplication _run] + 888
67 UIKitCore                           0x22e98c UIApplicationMain + 340
68 SwiftUI                             0x114d354 OUTLINED_FUNCTION_31 + 604
69 SwiftUI                             0x114d198 OUTLINED_FUNCTION_31 + 160
70 SwiftUI                             0xdca434 OUTLINED_FUNCTION_26 + 2196
71 Span                                0x77b8 main + 10 (SpanApp.swift:10)
72 ???                                 0x1be597d44 (Missing)

@github-actions github-actions bot removed the update needed For items that are in progress but have not been updated label Oct 4, 2023
@jhakulin
Copy link

jhakulin commented Oct 4, 2023

@ssadel thanks for the backtrace, could you please share the sample code how this could be reproduced?

@solossam we have a fix currently for the stop delay, the ETA would be 1.33.0 release (approx. end of the Oct 2023). The problem seems to be due to behavioral change in https://developer.apple.com/documentation/audiotoolbox/1501970-audioqueuestop?language=objc in iOS17 and which causes the delay with Speech SDK.

If your use case is to do synthesis playback via speaker output, one workaround for the issue before the SDK fix is released is to create SpeechSynthesizer using PullAudioOutputStream or PushAudioOutputStream and get the audio bytes from synthesis to your application and then stream the audio bytes to the device speaker using AVFoundation APIs.

If you want to try that route, for Push/PullAudioOutputStream examples, please take a look following Objective-C sample below (For Swift , it is similar, unfortunately we do not have Swift sample for the stream usage).
https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/objective-c/ios/synthesis-samples/synthesis-samples/ViewController.m

@github-actions
Copy link

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

@github-actions github-actions bot added the update needed For items that are in progress but have not been updated label Oct 24, 2023
@pasha-o
Copy link

pasha-o commented Oct 24, 2023

hi @jhakulin is the update going out as planned?

@github-actions github-actions bot removed the update needed For items that are in progress but have not been updated label Oct 25, 2023
@streamlify
Copy link

Desperately need this fix too!
Not just the stopSpeaking, but also the stopContinuousRecognition() is taking much longer to return.

@liuduoios
Copy link

Desperately need this fix too!

@jhakulin
Copy link

jhakulin commented Nov 7, 2023

Fix has been released in 1.33.0 Speech SDK release. Let us know if there are any problems.

@jhakulin jhakulin closed this as completed Nov 7, 2023
@PatrickkZhao
Copy link

PatrickkZhao commented Nov 28, 2023

Fix has been released in 1.33.0 Speech SDK release. Let us know if there are any problems.

stopContinuousRecognition still takes more than 5 seconds, even after I upgraded to 1.33.

@wyk111wyk
Copy link

Not really fixed(1.36), synthesizer.addSynthesisCompletedEventHandler still needs 10-20s to receive callback.

@wtto00
Copy link

wtto00 commented May 8, 2024

@jhakulin I am using version 1.37.0, and I have encountered a similar issue.

stopSpeaking does not immediately terminate the playback process; it only stops the speaker from playing.

For example, if I generate a 14-second audio and execute stopSpeaking at 10 seconds, then let speakResult = synthesizer?.speakSsml(ssml) will immediately return with speakResult?.reason=9(SPXResultReason_SynthesizingAudioCompleted) instead of 1(SPXResultReason_Canceled). Moreover, the callback registered with synthesizer?.addSynthesisCompletedEventHandler is triggered after waiting for 4 seconds, rather than the callback registered with synthesizer?.addSynthesisCanceledEventHandler.

let ssml =
          "<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts'><voice name='\(identifier)'>\(mstts)</voice></speak>"
let speakResult = try self.synthesizer?.speakSsml(ssml)
print(speakResult?.reason ?? "")
try synthesizer?.stopSpeaking()

Here is a demo repositorie: https://github.com/wtto00/flutter_azure_speech/tree/main/example

The swift code is in https://github.com/wtto00/flutter_azure_speech/blob/eb419b89fcc16903cabaa8f9820559d93ed80861/ios/Classes/AzureSpeechPlugin.swift#L294

Related to #2350 and #2367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Issue moved to product team backlog. Will be closed when addressed.
Projects
None yet
Development

No branches or pull requests

10 participants