SpeechToText has horrendous performance issues and/or not working properly #1645

sej69 · 2024-01-15T15:10:50Z

Is there an existing issue for this?

I have searched the existing issues

Did you read the "Reporting a bug" section on Contributing file?

I have read the "Reporting a bug" section on Contributing file: https://github.com/CommunityToolkit/Maui/blob/main/CONTRIBUTING.md#reporting-a-bug

Current Behavior

First wrote this on SO: [https://stackoverflow.com/questions/77808446/communitytoolkit-speechtotext-performance-is-extremely-slow?noredirect=1#comment137188380_77808446]

I also found this in your bug reporting which is closed as unverified. https://github.com/CommunityToolkit/Maui/issues/1586 I have verified this now as not operating as your documents state it should using the below code.

The SpeechToText is not detecting silence and only reports back after it appears to timeout. I created a .Net Maui app using the following async method:

     async Task StartListener(CancellationToken cancellationToken)
    {
        var isGranted = await speechToText.RequestPermissions(cancellationToken);
        if (!isGranted)
        {
            await Toast.Make("Permission not granted").Show(CancellationToken.None);
            return;
        }

        do
        {
            Stopwatch sw = Stopwatch.StartNew();

            var recognitionResult = await speechToText.ListenAsync(
                                                CultureInfo.GetCultureInfo(Language),
                                                new Progress<string>(partialText =>
                                                {
                                                    RecognitionText += partialText + " ";
                                                }), cancellationToken);

            sw.Stop();

            if (recognitionResult.IsSuccessful)
            {
                RecognitionText = recognitionResult.Text;
                Debug.WriteLine("Success " + RecognitionText + " Time " + sw.Elapsed);
                RecognitionText = string.Empty;
            }
            else
            {
                Debug.WriteLine("failed - Time " + sw.Elapsed);
            }
        } while (!cancellationToken.IsCancellationRequested);
    }

When I run this on my iPad, it loads and waits. I speak, "This is a test" and then wait for 1 minute when it responds below in the debug window with the success message and time it took. It doesn't seem to matter if I say this phrase quickly after start or waiting for 30 seconds, it still seems to take a minute.

[0:] Success This is a test Time 00:01:00.9459623

This method is unusable in this current state. People won't wait a minute each time they speak for a response. And according to your docs, this method should detect silence.

Expected Behavior

This should detect silence and return text when it does. You probably also want to have a way to adjust the timeout value which doesn't appear to be there now.

Steps To Reproduce

create a new .net maui application
add the communitytoolkit
use the above code to test

Link to public reproduction project repository

https://github.com/sej69/TestSpeechRecognizer

Environment

- .NET MAUI CommunityToolkit:
- OS: ios running on an iPad through my Mac (setup through VS)  iPad is on current version.
- .NET MAUI:

Anything else?

using current versions of all libraries and Maui. (.net 8) Very simple basic project.

The text was updated successfully, but these errors were encountered:

bijington · 2024-01-16T06:57:06Z

What iPad are you running this on?

sej69 · 2024-01-16T09:11:12Z

My development ipad is an iPad mini. Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Shaun Lawrence ***@***.***> Sent: Tuesday, January 16, 2024 12:57:17 AM To: CommunityToolkit/Maui ***@***.***> Cc: Scott Johnson ***@***.***>; Author ***@***.***> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645) What iPad are you running this on? — Reply to this email directly, view it on GitHub<#1645 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2HLVLTHV3DVQ4F3H4KPWTYOYQE3AVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGE3DMNJTGY>. You are receiving this because you authored the thread.Message ID: ***@***.***>

bijington · 2024-01-19T06:34:51Z

Sorry what model and age is the iPad?

sej69 · 2024-01-19T14:45:55Z

It’s a Mini Model 4 From: Shaun Lawrence ***@***.***> Sent: Friday, January 19, 2024 12:35 AM To: CommunityToolkit/Maui ***@***.***> Cc: Scott Johnson ***@***.***>; Author ***@***.***> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645) Sorry what model and age is the iPad? — Reply to this email directly, view it on GitHub<#1645 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2HLVOTTLQK7OGWLMG2SH3YPIHZNAVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJZHA2DKNJUGU>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

sej69 · 2024-02-07T15:12:25Z

I purchased a new 10” series 10 ipad yesterday and this is still an issue on this version as well. From: Shaun Lawrence ***@***.***> Sent: Tuesday, January 16, 2024 12:57 AM To: CommunityToolkit/Maui ***@***.***> Cc: Scott Johnson ***@***.***>; Author ***@***.***> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645) What iPad are you running this on? — Reply to this email directly, view it on GitHub<#1645 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2HLVLTHV3DVQ4F3H4KPWTYOYQE3AVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGE3DMNJTGY>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

sej69 · 2024-02-22T17:30:43Z

In playing around with this a bit more, I discovered that the OnRecognitionTextComplete event does fire on the iPAD 10 if I hit a break point on the event "OnRecognitionTextUpdated" method and continue operation. It does not fire on its own though if I take out that breakpoint.

sej69 · 2024-02-28T20:52:09Z

This is running on an iPad 10, current / most recent hardware version.

The sample on this page: https://learn.microsoft.com/en-us/dotnet/communitytoolkit/maui/essentials/speech-to-text?tabs=android

seems to show a recognitionResult return, but this never seems to happen. I created a test project using this code and set a breakpoint on "if (recognitionResult.IsSuccessful)" and it never hits.

I thought it may have something to do with the interface I had set up per the documentation:

`public interface ISpeechToText
{
Task RequestPermissions();

Task<string> Listen(CultureInfo culture,
    IProgress<string> recognitionResult,
    CancellationToken cancellationToken);

}

// speech recognition code using the Interface:

private ISpeechToText speechToText;
public SpeechListener(IAudioManager am, Iris.ISpeechToText stt)
{
speechToText = stt;
}
.....
var recognitionResult = await _speechToText.Listen(CultureInfo.GetCultureInfo("en-us"),
new Progress(partialText =>
{

            Debug.WriteLine("--- " + partialText + "---");

            //RecognitionText += partialText + " "; // demo code shows to use +/, but that duplicates incoming text.
            RecognitionText = partialText + " ";  // this doesn't duplicate the incoming text, but it does continue to report an additional time after the timer end.

        }), cancellationToken);`

But even when I switch it up to using SpeechtoText.Default.ListenAsync()... eg:

` _cancellationToken = cancellationToken;

    var recognitionResult = await SpeechToText.Default.ListenAsync(CultureInfo.GetCultureInfo("en-us"),
        new Progress<string>(partialText =>
        {
            Debug.WriteLine("--- " + partialText + "---");
            RecognitionText += PartialText; // need to do += here or it won't add the additional words
            }), cancellationToken);
    if (recognitionResult.IsSuccessful) // breakpoint here
    {
        RecognitionText = recognitionResult.Text;
    }
    else
    {
        await Toast.Make(recognitionResult.Exception?.Message ?? "Unable to recognize speech").Show(CancellationToken.None);
    }`

This doesn't end the recognitionResult either. This method stays open and continues to recognize text.

To make matters worse, running the SpeechToText.Default... vs the iSpeechToText() variant seems to work "better". I've had to create timing routines as the ListenAsync never returns anything; it just keeps listening and spitting out info.

When running the interface version (iSpeechToText), it keeps coming back and reporting the same text one more time so it awakens my timer routing again. I can write some code to get around this behavior, but I shouldn't have to...

I pulled the interface out, but then I ran into an issue with the SpeechToText dropping the bluetooth headset AppDelegate config:

`public override bool FinishedLaunching(UIApplication application, NSDictionary launchOptions)
{

SetAudioSession();

return base.FinishedLaunching(application, launchOptions);

}

public bool SetAudioSession()
{

var audioSession = AVAudioSession.SharedInstance();
var err = audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, AVAudioSessionCategoryOptions.AllowBluetooth |
                                    AVAudioSessionCategoryOptions.AllowAirPlay | AVAudioSessionCategoryOptions.DefaultToSpeaker);

if (err != null)
    return false;

err = audioSession.SetActive(true);

if (err != null)
    return false;

return true;

}`

The bluetooth works when the ISpeechRecognizer interface is used, but not the SpeechRecognizer.Default.ListenAsync. The headphone icon on the iPad goes away until I exit the app and then it shows up again and all audio destined to the audioManager interface goes through the iPad only. However, it seems the mic on the headphones still works through bluetooth. I've tracked it down to the "ListenAsync" method, if I comment this out, it works. If I add it back in, it doesn't. If I change it back to the interface iSpeechToText, it does work with my timing routines to get spoken text, but the method never returns.

I'd love to have the ListenAsync to detect silence and report back if possible. Or I can continue to use my timing routines, but I think something is broken with the API here.

sej69 · 2024-02-28T23:05:56Z

Another item you can look at. I downloaded the samples from here: https://github.com/CommunityToolkit/Maui/tree/main and pointed it to my iPad. It's exhibiting the same issue with not detecting end of speech or returning to advance the recognitionResult.

sej69 · 2024-02-28T23:11:14Z

Sorry, I should have tested the AppDelegate as well.

This:

` public override bool FinishedLaunching(UIApplication application, NSDictionary launchOptions)
{

    SetAudioSession();

    return base.FinishedLaunching(application, launchOptions);
}

public bool SetAudioSession()
{

    var audioSession = AVAudioSession.SharedInstance();
    var err = audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, AVAudioSessionCategoryOptions.AllowBluetooth |
                                        AVAudioSessionCategoryOptions.AllowAirPlay | AVAudioSessionCategoryOptions.DefaultToSpeaker);

    if (err != null)
        return false;

    err = audioSession.SetActive(true);

    if (err != null)
        return false;

    return true;


}`

Has the same effect with the sample code where the headset icon disappears and all audio still goes through the iPad and not the headphones.

VladislavAntonyuk · 2024-02-29T03:02:17Z

There is no difference between using interface and SpeechToText.Default
Feel free to open PR and include Bluetooth support for the speaker.
Listen is designed for continuous listening. You can use StartListening/StopListening methods to get control over the recognition process

sej69 · 2024-02-29T03:06:55Z

I was wondering about that, but then why is there a difference (even in your sample application)? What is PR? Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Vladislav Antonyuk ***@***.***> Sent: Wednesday, February 28, 2024 9:02:28 PM To: CommunityToolkit/Maui ***@***.***> Cc: Scott Johnson ***@***.***>; Author ***@***.***> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645) 1. There is no difference between using interface and SpeechToText.Default 2. Feel free to open PR and include Bluetooth support for the speaker. — Reply to this email directly, view it on GitHub<#1645 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2HLVOR4GQAIBZDO34OL2DYV2M4JAVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZQGMYTCMZSHE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

VladislavAntonyuk · 2024-03-07T08:42:15Z

It depends how you registered ISpeechToText, but if you register it correctly there is no difference.
Pull Request.

vchelaru · 2024-03-13T21:03:47Z

I have noticed that on Android, the silence is automatically detected and the listening ends successfully. On iOS, it does not. I have added info here, but this issue was closed due to me not having a sample:

#1723

sej69 · 2024-03-15T01:02:03Z

Thank you! I'm looking forward in trying this out! Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Vladislav Antonyuk ***@***.***> Sent: Thursday, March 14, 2024 6:00:21 PM To: CommunityToolkit/Maui ***@***.***> Cc: Scott Johnson ***@***.***>; Author ***@***.***> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645) Closed #1645<#1645> as completed via #1741<#1741>. — Reply to this email directly, view it on GitHub<#1645 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD2HLVKZNY3UKQBTFGIJYRTYYITYLAVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSGEZDIOBWGY2TMMY>. You are receiving this because you authored the thread.Message ID: ***@***.***>

sej69 added bug Something isn't working unverified labels Jan 15, 2024

vhugogarcia added the area/essentials Issue/Discussion/PR that has to do with Essentials label Jan 19, 2024

bijington mentioned this issue Feb 28, 2024

recognitionResult doesn't detect end of phrase and return MicrosoftDocs/CommunityToolkit#380

Closed

VladislavAntonyuk self-assigned this Mar 7, 2024

VladislavAntonyuk added a commit that referenced this issue Mar 10, 2024

SpeechRecognition Add more categories for AudioSession #1645

01e3e7e

VladislavAntonyuk mentioned this issue Mar 10, 2024

SpeechRecognition Add more categories for AudioSession #1645 #1741

Merged

6 tasks

VladislavAntonyuk closed this as completed in #1741 Mar 14, 2024

VladislavAntonyuk added a commit that referenced this issue Mar 14, 2024

SpeechRecognition Add more categories for AudioSession #1645 (#1741)

00712e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpeechToText has horrendous performance issues and/or not working properly #1645

SpeechToText has horrendous performance issues and/or not working properly #1645

sej69 commented Jan 15, 2024 •

edited by pictos

bijington commented Jan 16, 2024

sej69 commented Jan 16, 2024 via email

bijington commented Jan 19, 2024

sej69 commented Jan 19, 2024 via email

sej69 commented Feb 7, 2024 via email

sej69 commented Feb 22, 2024

sej69 commented Feb 28, 2024 •

edited

sej69 commented Feb 28, 2024

sej69 commented Feb 28, 2024

VladislavAntonyuk commented Feb 29, 2024 •

edited

sej69 commented Feb 29, 2024 via email

VladislavAntonyuk commented Mar 7, 2024

vchelaru commented Mar 13, 2024

sej69 commented Mar 15, 2024 via email

SpeechToText has horrendous performance issues and/or not working properly #1645

SpeechToText has horrendous performance issues and/or not working properly #1645

Comments

sej69 commented Jan 15, 2024 • edited by pictos

Is there an existing issue for this?

Did you read the "Reporting a bug" section on Contributing file?

Current Behavior

Expected Behavior

Steps To Reproduce

Link to public reproduction project repository

Environment

Anything else?

bijington commented Jan 16, 2024

sej69 commented Jan 16, 2024 via email

bijington commented Jan 19, 2024

sej69 commented Jan 19, 2024 via email

sej69 commented Feb 7, 2024 via email

sej69 commented Feb 22, 2024

sej69 commented Feb 28, 2024 • edited

sej69 commented Feb 28, 2024

sej69 commented Feb 28, 2024

VladislavAntonyuk commented Feb 29, 2024 • edited

sej69 commented Feb 29, 2024 via email

VladislavAntonyuk commented Mar 7, 2024

vchelaru commented Mar 13, 2024

sej69 commented Mar 15, 2024 via email

sej69 commented Jan 15, 2024 •

edited by pictos

sej69 commented Feb 28, 2024 •

edited

VladislavAntonyuk commented Feb 29, 2024 •

edited