Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpeechToText has horrendous performance issues and/or not working properly #1645

Closed
2 tasks done
sej69 opened this issue Jan 15, 2024 · 14 comments · Fixed by #1741
Closed
2 tasks done

SpeechToText has horrendous performance issues and/or not working properly #1645

sej69 opened this issue Jan 15, 2024 · 14 comments · Fixed by #1741
Assignees
Labels
area/essentials Issue/Discussion/PR that has to do with Essentials bug Something isn't working unverified

Comments

@sej69
Copy link

sej69 commented Jan 15, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Did you read the "Reporting a bug" section on Contributing file?

Current Behavior

First wrote this on SO: [https://stackoverflow.com/questions/77808446/communitytoolkit-speechtotext-performance-is-extremely-slow?noredirect=1#comment137188380_77808446]

I also found this in your bug reporting which is closed as unverified. https://github.com/CommunityToolkit/Maui/issues/1586 I have verified this now as not operating as your documents state it should using the below code.

The SpeechToText is not detecting silence and only reports back after it appears to timeout. I created a .Net Maui app using the following async method:

     async Task StartListener(CancellationToken cancellationToken)
    {
        var isGranted = await speechToText.RequestPermissions(cancellationToken);
        if (!isGranted)
        {
            await Toast.Make("Permission not granted").Show(CancellationToken.None);
            return;
        }

        do
        {
            Stopwatch sw = Stopwatch.StartNew();

            var recognitionResult = await speechToText.ListenAsync(
                                                CultureInfo.GetCultureInfo(Language),
                                                new Progress<string>(partialText =>
                                                {
                                                    RecognitionText += partialText + " ";
                                                }), cancellationToken);

            sw.Stop();

            if (recognitionResult.IsSuccessful)
            {
                RecognitionText = recognitionResult.Text;
                Debug.WriteLine("Success " + RecognitionText + " Time " + sw.Elapsed);
                RecognitionText = string.Empty;
            }
            else
            {
                Debug.WriteLine("failed - Time " + sw.Elapsed);
            }
        } while (!cancellationToken.IsCancellationRequested);
    }

When I run this on my iPad, it loads and waits. I speak, "This is a test" and then wait for 1 minute when it responds below in the debug window with the success message and time it took. It doesn't seem to matter if I say this phrase quickly after start or waiting for 30 seconds, it still seems to take a minute.

[0:] Success This is a test Time 00:01:00.9459623

This method is unusable in this current state. People won't wait a minute each time they speak for a response. And according to your docs, this method should detect silence.

Expected Behavior

This should detect silence and return text when it does. You probably also want to have a way to adjust the timeout value which doesn't appear to be there now.

Steps To Reproduce

  1. create a new .net maui application
  2. add the communitytoolkit
  3. use the above code to test

Link to public reproduction project repository

https://github.com/sej69/TestSpeechRecognizer

Environment

- .NET MAUI CommunityToolkit:
- OS: ios running on an iPad through my Mac (setup through VS)  iPad is on current version.
- .NET MAUI:

Anything else?

using current versions of all libraries and Maui. (.net 8) Very simple basic project.

@sej69 sej69 added bug Something isn't working unverified labels Jan 15, 2024
@bijington
Copy link
Contributor

What iPad are you running this on?

@sej69
Copy link
Author

sej69 commented Jan 16, 2024 via email

@bijington
Copy link
Contributor

Sorry what model and age is the iPad?

@vhugogarcia vhugogarcia added the area/essentials Issue/Discussion/PR that has to do with Essentials label Jan 19, 2024
@sej69
Copy link
Author

sej69 commented Jan 19, 2024 via email

@sej69
Copy link
Author

sej69 commented Feb 7, 2024 via email

@sej69
Copy link
Author

sej69 commented Feb 22, 2024

In playing around with this a bit more, I discovered that the OnRecognitionTextComplete event does fire on the iPAD 10 if I hit a break point on the event "OnRecognitionTextUpdated" method and continue operation. It does not fire on its own though if I take out that breakpoint.

@sej69
Copy link
Author

sej69 commented Feb 28, 2024

This is running on an iPad 10, current / most recent hardware version.

The sample on this page: https://learn.microsoft.com/en-us/dotnet/communitytoolkit/maui/essentials/speech-to-text?tabs=android

seems to show a recognitionResult return, but this never seems to happen. I created a test project using this code and set a breakpoint on "if (recognitionResult.IsSuccessful)" and it never hits.

I thought it may have something to do with the interface I had set up per the documentation:

`public interface ISpeechToText
{
Task RequestPermissions();

Task<string> Listen(CultureInfo culture,
    IProgress<string> recognitionResult,
    CancellationToken cancellationToken);

}

// speech recognition code using the Interface:

private ISpeechToText speechToText;
public SpeechListener(IAudioManager am, Iris.ISpeechToText stt)
{
speechToText = stt;
}
.....
var recognitionResult = await _speechToText.Listen(CultureInfo.GetCultureInfo("en-us"),
new Progress(partialText =>
{

            Debug.WriteLine("--- " + partialText + "---");

            //RecognitionText += partialText + " "; // demo code shows to use +/, but that duplicates incoming text.
            RecognitionText = partialText + " ";  // this doesn't duplicate the incoming text, but it does continue to report an additional time after the timer end.

        }), cancellationToken);`

But even when I switch it up to using SpeechtoText.Default.ListenAsync()... eg:

` _cancellationToken = cancellationToken;

    var recognitionResult = await SpeechToText.Default.ListenAsync(CultureInfo.GetCultureInfo("en-us"),
        new Progress<string>(partialText =>
        {
            Debug.WriteLine("--- " + partialText + "---");
            RecognitionText += PartialText; // need to do += here or it won't add the additional words
            }), cancellationToken);
    if (recognitionResult.IsSuccessful) // breakpoint here
    {
        RecognitionText = recognitionResult.Text;
    }
    else
    {
        await Toast.Make(recognitionResult.Exception?.Message ?? "Unable to recognize speech").Show(CancellationToken.None);
    }`

This doesn't end the recognitionResult either. This method stays open and continues to recognize text.

To make matters worse, running the SpeechToText.Default... vs the iSpeechToText() variant seems to work "better". I've had to create timing routines as the ListenAsync never returns anything; it just keeps listening and spitting out info.

When running the interface version (iSpeechToText), it keeps coming back and reporting the same text one more time so it awakens my timer routing again. I can write some code to get around this behavior, but I shouldn't have to...

I pulled the interface out, but then I ran into an issue with the SpeechToText dropping the bluetooth headset AppDelegate config:

`public override bool FinishedLaunching(UIApplication application, NSDictionary launchOptions)
{

SetAudioSession();

return base.FinishedLaunching(application, launchOptions);

}

public bool SetAudioSession()
{

var audioSession = AVAudioSession.SharedInstance();
var err = audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, AVAudioSessionCategoryOptions.AllowBluetooth |
                                    AVAudioSessionCategoryOptions.AllowAirPlay | AVAudioSessionCategoryOptions.DefaultToSpeaker);

if (err != null)
    return false;

err = audioSession.SetActive(true);

if (err != null)
    return false;

return true;

}`

The bluetooth works when the ISpeechRecognizer interface is used, but not the SpeechRecognizer.Default.ListenAsync. The headphone icon on the iPad goes away until I exit the app and then it shows up again and all audio destined to the audioManager interface goes through the iPad only. However, it seems the mic on the headphones still works through bluetooth. I've tracked it down to the "ListenAsync" method, if I comment this out, it works. If I add it back in, it doesn't. If I change it back to the interface iSpeechToText, it does work with my timing routines to get spoken text, but the method never returns.

I'd love to have the ListenAsync to detect silence and report back if possible. Or I can continue to use my timing routines, but I think something is broken with the API here.

@sej69
Copy link
Author

sej69 commented Feb 28, 2024

Another item you can look at. I downloaded the samples from here: https://github.com/CommunityToolkit/Maui/tree/main and pointed it to my iPad. It's exhibiting the same issue with not detecting end of speech or returning to advance the recognitionResult.

@sej69
Copy link
Author

sej69 commented Feb 28, 2024

Sorry, I should have tested the AppDelegate as well.

This:

` public override bool FinishedLaunching(UIApplication application, NSDictionary launchOptions)
{

    SetAudioSession();

    return base.FinishedLaunching(application, launchOptions);
}

public bool SetAudioSession()
{

    var audioSession = AVAudioSession.SharedInstance();
    var err = audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, AVAudioSessionCategoryOptions.AllowBluetooth |
                                        AVAudioSessionCategoryOptions.AllowAirPlay | AVAudioSessionCategoryOptions.DefaultToSpeaker);

    if (err != null)
        return false;

    err = audioSession.SetActive(true);

    if (err != null)
        return false;

    return true;


}`

Has the same effect with the sample code where the headset icon disappears and all audio still goes through the iPad and not the headphones.

@VladislavAntonyuk
Copy link
Collaborator

VladislavAntonyuk commented Feb 29, 2024

  1. There is no difference between using interface and SpeechToText.Default
  2. Feel free to open PR and include Bluetooth support for the speaker.
  3. Listen is designed for continuous listening. You can use StartListening/StopListening methods to get control over the recognition process

@sej69
Copy link
Author

sej69 commented Feb 29, 2024 via email

@VladislavAntonyuk
Copy link
Collaborator

It depends how you registered ISpeechToText, but if you register it correctly there is no difference.
Pull Request.

@vchelaru
Copy link

I have noticed that on Android, the silence is automatically detected and the listening ends successfully. On iOS, it does not. I have added info here, but this issue was closed due to me not having a sample:

#1723

@sej69
Copy link
Author

sej69 commented Mar 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/essentials Issue/Discussion/PR that has to do with Essentials bug Something isn't working unverified
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants