Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is SpeechRecognition not working correctly in Safari? #96

Open
nicktoot opened this issue Jul 18, 2021 · 0 comments
Open

Why is SpeechRecognition not working correctly in Safari? #96

nicktoot opened this issue Jul 18, 2021 · 0 comments

Comments

@nicktoot
Copy link

nicktoot commented Jul 18, 2021

I am using browser technology SpeechRecognition/webkitSpeechRecognition for voice recognition in browser. I need the microphone to always listen to commands after opening the page, and after the user says the desired phrase, I change the state of the video (for example, play/pause), depending on the command.

On the desktop in the Chrome browser, it all works fine, the microphone starts listening after opening the page (rec.start(), the browser shows a notification about allowing the use of the microphone on the page and the user agrees) if the user does not say anything, then the browser stops recording itself after 7-10 seconds of silence, but after stopping the mic I restart it and it doesn't cause any problems.

A simplified example that works well on a desktop in Chrome:

let rec = null;
let isRecording = false;

function startRecording() {
  if (rec && !isRecording) rec.start();
}

function stopRecording() {
  if (rec && isRecording) rec.stop();
}


// Сhecking the technology support in the browser
try {
  var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;
  rec = new SpeechRecognition();
} catch(e) {
  console.log(e);
}

// If the technology is supported, then we set up the recording
if (rec) {

  rec.continuous = true;
  rec.lang = 'en-US';
  rec.interimResults = false;

  // Event after receiving voice transcription as text
  rec.onresult = function(e) {
    // Сode that compares what the user said with existing commands,
    // after which some action takes place with the video on the page
  }

  // The event is triggered after recording has started
  // rec.start();
  rec.onstart = function(e) {
    isRecording = true;
  }

  // Event after recording stopped
  // rec.stop(); or after an error occurs
  rec.onend = function(e) {
    isRecording = false;

    startRecording(); // Restart recording
  }

  // Event that occurs when audio recording fails
  rec.onerror = function(e) {
    console.log('Speech recognition error detected: ' + e.error);
  }

  // Start recording after page load
  startRecording();
  
}

In the safari browser (on iOS), after the user says the phrase for the first time and the script handles the onresult event, this event will no longer be triggered, but the recording does not return an error (onerror event) and does not stop (onend event), in the address bar continues to display a red microphone icon. Why is this happening?

I tried to restart the recording after processing the first phrase and it started working, but unfortunately it is not very stable, because sometimes the microphone stops processing commands for other reasons, which I also cannot track using the built-in events in the recording. I just have to come up with some kind of regular restart, for example, by timer, but I think that this is not the best option.

Changes that helped to listen to more than one phrase in Safari:

// Event after receiving voice transcription as text
rec.onresult = function(e) {
  // Сode that compares what the user said with existing commands,
  // after which some action takes place with the video on the page

  // Restart recording;
  restartRecording();
}

I also noticed that if I do not use headphones with a microphone, then the browser or device reduces the sound of the video in the speaker to the minimum while recording is on, is this how it should be?


UPD
After many tests, I realized that this is not a matter of the only available recognition, but of the conflict between the video and SpeechRecognition, if SpeechRecognition is launched and at this time play / pause occurs on the video, then the recognition breaks down and stops giving feedback, although the error event or stop does not work, if after that you restart the recording, then everything will work again. The most difficult thing is to find the right moment to restart, now I set a timer for 3.5-5 seconds for a delayed restart, because an early restart does not help to restore speech recognition, but this is a very long time.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant