Sample: Prime speech using a tap-to-dismiss splash screen on Safari #995

rodmcleay · 2018-05-30T01:09:34Z

I'm have a webchat conrol for a bot that that is up and running and working well in Chrome. The link How to enable speech in Web Chat shows how to set this up and we have done it exactly like this.

It mentioned multiple browsers, but does not specify Safari in any way.

We need this working on an iPhone, however it just doesn't seem to work, there is not a lot of feedback from the browser, the icon changes and it appears to have turned on the microphone after access is approved.

Nothing spoken is recorded/recognized and the text area of the bot stays empty, no 'listining....' or any other indication its working other than the red microphone on icon in the browser header. clicking the icon mutes and un-mutes as you'd expect, it just doesn't seem to be connected to the webchat control in the browser.

All of my investigation appears to go around in circles.

Has anyone achieved this with WebChat, or any other direct line component?
Or Can anyone confirm definitely doesn't work with safari so I can stop banging my head against it.
Are there any alternatives to webchat that do work in iPhone/Safari?

Thanks for taking the time to read, any assistance would be much appreciated, I'm at the end on this investigation and pulling my hair out.

compulim · 2018-06-05T02:48:02Z

@rodmcleay, we have just tested it on an iPhone with iOS 11.4, running Safari, and use Cognitive Services Speech. It works.

Can you check for a few things?

Your iPhone is running iOS 11+
You are using Safari, not Chrome or Edge app
Settings app > Safari > Camera & Microphone Access is enabled
Your web site is on HTTPS. Safari block microphone on insecure HTTP
You are running on iPhone
- We tested it don't run on iPod with iOS 11.4, we haven't test iPad yet
Your page is using Cognitive Services, not browser speech (a.k.a. WebSpeech API)

I agree we need to make the speech detection more robust and informative. But also need to make sure detection doesn't pop up the "Access to Microphone" dialog too early. But unfortunately, in some cases, you can't have both.

shubhamchawla · 2018-06-05T06:36:17Z

How to get it working on Chrome for iOS, any help would be greatly appreciated.
Thanks in advance.

compulim · 2018-06-05T07:59:36Z

@shubhamchawla It doesn't work in Chrome for iOS because Chrome does not support WebRTC on iOS. The only browser on iOS which support WebRTC is Safari right now.

rodmcleay · 2018-06-06T02:44:04Z

Hi Compulim,
I don't mind the popup asking access, that is understandable and expected.
I'm using cognitive services, as per the code below, and it is on HTTPS, working fine in Chrome on windows and android phones. iPhone is iOS11.3.1

const speechOptions = {
    speechRecognizer: new CognitiveServices.SpeechRecognizer({
        fetchCallback: (authFetchEventId) => getToken(),
        fetchOnExpiryCallback: (authFetchEventId) => getToken()
    }),
    speechSynthesizer: new CognitiveServices.SpeechSynthesizer({
        gender: CognitiveServices.SynthesisGender.Female,
        subscriptionKey: '@System.Configuration.ConfigurationManager.AppSettings["CognitiveKey"]',
        voiceName: 'Microsoft Server Speech Text to Speech Voice (en-US, JessaRUS)'
    })
};

Is that the config you would expect?

Get token is on the client at the moment.

function getToken() {
      // Normally this token fetch is done from your secured backend to avoid exposing the API key and this call
      // would be to your backend, or to retrieve a token that was served as part of the original page.
      return fetch(
        'https://api.cognitive.microsoft.com/sts/v1.0/issueToken',
        {
          headers: {
            'Ocp-Apim-Subscription-Key': '@System.Configuration.ConfigurationManager.AppSettings["CognitiveKey"]'
          },
          method: 'POST'
        }
      ).then(res => res.text());
    }

rosskyl · 2018-06-08T15:13:55Z

I got it working with Safari and Firefox with the following javascript. I just include this in a javascript file while still using the linked CognitiveServices.js file from the cdn. I use the bing speech recognizer and the browser speech synthesizer.

This works because their current version uses window.navigator.getUserMedia which is being deprecated so change that to use window.navigator.mediaDevices.getUserMedia. Then Safari has problems with playing audio using the speech synthesizer programatically, so I register an event to the microphone click to play a sound from the speech synthesizer and remove that event. Finally, Safari also has problems recording audio programatically again so I create the audio context before actually needing it and connect the processor. Safari doesn't allow recording audio or playing audio with the speech synthesizer unless it is a direct result from a touch or tap. This includes the then part of the promise returned from window.navigator.mediaDevises.getUserMedia.

I've tested this with the latest version of Chrome, Firefox, and Edge on windows 10, Chrome on android, and Safari on an iPad pro. The only browser I haven't had it work on is internet explorer.

// Necessary for safari
// Safari will only speak after speaking from a button click
var isSafari = /^((?!chrome|android).)*safari/i.test(navigator.userAgent);

function SpeakText() {
    var msg = new SpeechSynthesisUtterance();
    window.speechSynthesis.speak(msg);

    document.getElementsByClassName("wc-mic")[0].removeEventListener("click", SpeakText);
}

if (isSafari) {

    window.addEventListener("load", function () {
        document.getElementsByClassName("wc-mic")[0].addEventListener("click", SpeakText);
    });
}

// Needed to change between the two audio contexts
var AudioContext = window.AudioContext || window.webkitAudioContext;

var context;
var processor;

// Overrides the base constructor to use a singleton like structure
// Needed for Safari
var BasePrototype = AudioContext.prototype;
AudioContext = function () {
    return context;
};
AudioContext.prototype = BasePrototype;

// Sets the old style getUserMedia to use the new style that is supported in more browsers
window.navigator.getUserMedia = function (constraints, successCallback, errorCallback) {
    context = new BasePrototype.constructor;
    processor = context.createScriptProcessor(1024, 1, 1);
    processor.connect(context.destination);

    window.navigator.mediaDevices.getUserMedia(constraints)
        .then(function (e) {
            successCallback(e);
        })
        .catch(function (e) {
            errorCallback(e);
        });
};

compulim · 2018-06-13T01:01:35Z

@rosskyl this is good hack, without the need to touch the Web Chat code.

Can you explain a little bit more on synthesis part? Do you mean Safari requires touch/tap for both synthesis and recognition part?

rosskyl · 2018-06-13T13:42:02Z

The first time you use either the speech synthesis or recognizer, it needs to be triggered by a user touch or tap. After the speech synthesis was triggered once, then I was able to get it to work without needing a touch or tap. Apple requires this to prevent the web page from automatically playing audio or recording audio even though all of the other browsers allow it.

The speech synthesis or recognizer will not work if they are triggered from a setTimeOut or from the .then portion of a promise (which is what the newer version of getUserMedia uses. For getUserMedia, the AudioContext object must be created from the tap and the processor created and connected from the tap. The recording can be done later.

compulim · 2018-06-13T16:10:46Z

@rosskyl Thanks for the explanation. I totally understand the recognizer requirement for tap/touch, but it just feel weird to me for the synthesis part. I bet one don't need to tap/touch for WebAudio.

Anyway, it's Apple's requirement then we need to work with it. 😉

rosskyl · 2018-06-13T17:19:29Z

You could try it without adding the event listener, but I couldn't get it to work without it. You could also write your own custom speech synthesizer and try it with WebAudio. I originally wrote my own that used the speech synthesizer, but ended up with the same problem the BrowserSpeechSynthesizer had. I fixed it with the event listener and figured out it worked with the BrowserSpeechSynthesizer also.

compulim · 2018-06-16T05:10:24Z

Thanks @rosskyl. I will make this a bug.

BTW, we are planning to polyfill HTML WebSpeech API using Cognitive Services. So we don't need to maintain two different APIs, and we can bring Cognitive Services to platforms that does not support WebSpeech (e.g. Edge, desktop Firefox).

As always, we welcome contributions, and we will take quality projects as dependencies.

Anyway, note to bug fixer:

Safari requires touch/tap to enable both speech recognition and speech synthesis
We need to workaround this, one possible move:
1. On any touch/tap on Web Chat, synthesis an empty string to prime the browser

serpino · 2018-08-06T07:55:08Z

Hi @rosskyl
I am using the chat and without using your javascript code, the voice conversation works correctly, except for IOS.

If I add your code to the project, it gives me an error when I press the microphone. Can you please help me?

The error I get in Chrome is this:

export function __awaiter(thisArg, _arguments, P, generator) {
    return new (P || (P = Promise))(function (resolve, reject) {
        function fulfilled(value) { try { step(generator.next(value)); } catch (e) { reject(e); } }
        -->(IN THIS LINE)-->function rejected(value) { try { step(generator.throw(value)); } catch (e) { reject(e); } }<--(In this line)
        function step(result) { result.done ? resolve(result.value) : new P(function (resolve) { resolve(result.value); }).then(fulfilled, rejected); }
        step((generator = generator.apply(thisArg, _arguments || [])).next());
    });
}

Uncaught (in promise) TypeError: Illegal invocation
     at MicAudioSource.TurnOn (MicAudioSource.ts: 110)
     at MicAudioSource.Listen (MicAudioSource.ts: 182)
     at MicAudioSource.Attach (MicAudioSource.ts: 131)
     at Recognizer.Recognize (Recognizer.ts: 97)
     at SpeechRecognizer. <anonymous> (SpeechRecognition.ts: 153)
     at step (tslib.es6.js: 91)
     at Object.next (tslib.es6.js: 72)
     at tslib.es6.js: 65
     at new Promise (<anonymous>)
     at Object .__ awaiter (tslib.es6.js: 61)

And in Firefox is:
TypeError: 'get state' called on an object that does not implement interface BaseAudioContext

I am using cognitiveServices. What can be failing?

Thanks

rosskyl · 2018-08-06T13:54:45Z

I believe that is because some of the internals for the cognitiveServices changes. The following is what I currently use:

var isSafari = /^((?!chrome|android).)*safari/i.test(navigator.userAgent);

function SpeakText() {
    var msg = new SpeechSynthesisUtterance();
    window.speechSynthesis.speak(msg);

    document.getElementsByClassName("wc-mic")[0].removeEventListener("click", SpeakText);
}

if (isSafari) {

    window.addEventListener("load", function () {
        document.getElementsByClassName("wc-mic")[0].addEventListener("click", SpeakText);
    });
}

// Needed to change between the two audio contexts
var AudioContext = window.AudioContext || window.webkitAudioContext;

// Sets the old style getUserMedia to use the new style that is supported in more browsers even though the framework uses the new style
if (window.navigator.mediaDevices.getUserMedia && !window.navigator.getUserMedia) {
    window.navigator.getUserMedia = function (constraints, successCallback, errorCallback) {
        window.navigator.mediaDevices.getUserMedia(constraints)
            .then(function (e) {
                successCallback(e);
            })
            .catch(function (e) {
                errorCallback(e);
            });
    };
}

I have this working for all of the major browsers on Windows, android, macOS, and iOS.

serpino · 2018-08-07T07:16:38Z

@rosskyl This works much better, at least for the rest of browsers.

I already made it work for any mobile device.

In the end I did it in the following way:
When it detects that a response arrives to the user's message, I call this function by passing the text of the response and the language in which it should speak.

function playMessage(msgText, locale ){
  var msg = new SpeechSynthesisUtterance();
            msg.text = msgText;
            msg.volume = 1; // 0 to 1

            msg.rate = 1; // 0.1 to 9

            msg.pitch = 1; // 0 to 2, 1=normal

            msg.lang = locale ;//"en-US";
            speechSynthesis.speak(msg);
}

A part I do some other checks as if the user is on a mobile device or if the message comes from the micro or not
Thank you very much!

rosskyl · 2018-09-25T20:32:46Z

Just a note that this will only work for the browser speech synthesizer. It does not work for the cognitive services speech synthesizer.

I tried to prime it like above by creating an audio context and playing a tone but that does not work. I can get the tone to play on the mic tap, but can't get it to work programmatically.

serpino · 2018-09-26T07:24:09Z

@rosskyl
Right. I only use this process when cognitive services do not work. That depends on the browser. So I use both methods depending on the browser.

cwhitten · 2019-07-24T01:49:36Z

Closing due to lack of activity - see linked samples issues above.

compulim self-assigned this May 30, 2018

compulim added the investigating label May 30, 2018

compulim mentioned this issue May 30, 2018

Bot Framework Webchat mic not working on iOS #996

Closed

compulim added bug Indicates an unexpected problem or an unintended behavior. and removed investigating labels Jun 16, 2018

compulim changed the title ~~Bot Framework Webchat microphone speech not working in Safari~~ Speech need to be primed using touch/tap in Safari Jun 18, 2018

compulim mentioned this issue Jul 24, 2018

Speech playback not functional on iOS Safari #1056

Closed

compulim mentioned this issue Aug 1, 2018

iOS speech recognition stops after device is locked #1064

Closed

rosskyl mentioned this issue Aug 27, 2018

Speech api is not working perfectly on IOS devices. #1098

Closed

tom-s mentioned this issue Oct 10, 2018

initialization problem on mobile tom-s/speak-tts#6

Closed

corinagum added front-burner Sample Implement PoC or sample code labels Nov 29, 2018

compulim changed the title ~~Speech need to be primed using touch/tap in Safari~~ Sample: Prime speech using a tap-to-dismiss splash screen on Safari Nov 30, 2018

corinagum added the 4.4 label Nov 30, 2018

corinagum added p1 Painful if we don't fix, won't block releasing Approved labels Nov 30, 2018

compulim removed their assignment Jan 31, 2019

compulim added the community-help-wanted This is a good issue for a contributor to take on and submit a solution label Jan 31, 2019

corinagum mentioned this issue Mar 20, 2019

Text to Speech integrated with LUIS BOT does not voice out/playback in Safari browser on iOS/iPhone devices #1829

Closed

cwhitten assigned compulim Apr 2, 2019

cwhitten added p2 Nice to have and removed 4.4 p1 Painful if we don't fix, won't block releasing labels Apr 4, 2019

cwhitten added the v3 Deprecated v3 version label May 13, 2019

compulim mentioned this issue Jun 28, 2019

Sample Customizing Speech UI browser support #1968

Closed

compulim mentioned this issue Jul 10, 2019

Text to speech does not work in 06.c.cognitive-services-speech-services-js #2162

Closed

corinagum added the backlog Out of scope for the current iteration but it will be evaluated in a future release. label Jul 10, 2019

Kaiqb added the Customer label Jul 23, 2019

cwhitten closed this as completed Jul 24, 2019

compulim mentioned this issue Jul 31, 2019

Speech: Fix Safari by priming AudioContext #2245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample: Prime speech using a tap-to-dismiss splash screen on Safari #995

Sample: Prime speech using a tap-to-dismiss splash screen on Safari #995

rodmcleay commented May 30, 2018 •

edited

Loading

compulim commented Jun 5, 2018 •

edited

Loading

shubhamchawla commented Jun 5, 2018

compulim commented Jun 5, 2018

rodmcleay commented Jun 6, 2018 •

edited

Loading

rosskyl commented Jun 8, 2018

compulim commented Jun 13, 2018

rosskyl commented Jun 13, 2018

compulim commented Jun 13, 2018

rosskyl commented Jun 13, 2018

compulim commented Jun 16, 2018

serpino commented Aug 6, 2018 •

edited

Loading

rosskyl commented Aug 6, 2018

serpino commented Aug 7, 2018 •

edited

Loading

rosskyl commented Sep 25, 2018

serpino commented Sep 26, 2018

cwhitten commented Jul 24, 2019

Sample: Prime speech using a tap-to-dismiss splash screen on Safari #995

Sample: Prime speech using a tap-to-dismiss splash screen on Safari #995

Comments

rodmcleay commented May 30, 2018 • edited Loading

compulim commented Jun 5, 2018 • edited Loading

shubhamchawla commented Jun 5, 2018

compulim commented Jun 5, 2018

rodmcleay commented Jun 6, 2018 • edited Loading

rosskyl commented Jun 8, 2018

compulim commented Jun 13, 2018

rosskyl commented Jun 13, 2018

compulim commented Jun 13, 2018

rosskyl commented Jun 13, 2018

compulim commented Jun 16, 2018

serpino commented Aug 6, 2018 • edited Loading

rosskyl commented Aug 6, 2018

serpino commented Aug 7, 2018 • edited Loading

rosskyl commented Sep 25, 2018

serpino commented Sep 26, 2018

cwhitten commented Jul 24, 2019

rodmcleay commented May 30, 2018 •

edited

Loading

compulim commented Jun 5, 2018 •

edited

Loading

rodmcleay commented Jun 6, 2018 •

edited

Loading

serpino commented Aug 6, 2018 •

edited

Loading

serpino commented Aug 7, 2018 •

edited

Loading