You talk to the computer. The computer talks to you. Brought to you by sweet sweet parens.
Clojure(Script) is great. Speech synthesis and recognition are fun and probably useful. The aim of vocloj is to make working with speech synthesis and recognition across different platforms simple. For now only native browser APIs are supported.
(require '[vocloj.core :as vocloj.core]
'[vocloj.web :as vocloj.web])
;;; Recognition
(def recognizer (vocloj.web/create-recognizer {:continuous? true}))
(vocloj.core/listen recognizer #(println %))
;; Omitting a handler will return a core.async channel
(let [ch (vocloj.core/listen recognizer)]
(go-loop []
(println (<! ch))))
;;; Synthesis
(def synthesizer (vocloj.web/create-synthesizer))
(def voice-id "Alex")
(vocloj.core/speak synthesizer voice-id {:text "Hello World!"})
Currently the only supported implementation. Synthesis is backed by the SpeechSynthesis API, and recognition is backed by the SpeechRecognition API.
The caveat to using vocloj's web interface is that most browsers require user interaction to initialize synthesis and recognition.
See the demo app for an example.
At the time of this writing, Chrome has the best support for speech recognition by far.
It does work in Safari, but it has gotten so bad as to be nearly unusable (though technically supported).
See browser compatability table
Speech synthesizers and speech recognizers implement a simple state machine protocol. It is useful
to track state in order to know when speech is occurring or when a recognizer is listening. It is also
useful for updating and tracking voices that are available for utterances. The vocloj.core
interface
supports several methods for accessing and manipulating that internal state. These are largely used
for implementation, but may be useful for adding effects for things such as logging.
add-effect
is used internally to affect change in response to state transitions. It can be useful
to monitor internal changes or log transitions.
(require '[vocloj.web :as web]
'[vocloj.core :as core])
(def synthesizer (-> (web/create-synthesizer)
(core/add-effect ::logger (fn [synth old-state new-state]
(println new-state)))
(core/add-effect ::resumed :paused :speaking #(println "speaking again"))))
Returns the current state of a state machine as a hash map. For instance, the web based implementation of speech synthesis stores available voices in state:
(require '[vocloj.web :as web]
'[vocloj.core :as core])
(def synthesizer (web/create-synthesizer))
;; Get available voices
(def voices (-> synthesizer core/current-state :voices))
Recognizers are started and stopped. The easiest way to leverage a recognizer is to create one and use
vocloj.core/listen
to obtain a channel or use a callback.
(require '[vocloj.core :as vocloj.core]
'[vocloj.web :as vocloj.web])
(def recognizer (vocloj.web/create-recognizer {:continuous? true}))
(vocloj.core/listen recognizer #(println %))
;; Omitting a handler will return a core.async channel
(let [ch (vocloj.core/listen recognizer)]
(go-loop []
(println (<! ch))))
(vocloj.core/stop recognizer) ;; stop listening
See API docs for all functions and options.
(require '[vocloj.core :as vocloj.core]
'[vocloj.web :as vocloj.web])
(def synthesizer (vocloj.web/create-synthesizer))
(def voice-id "Alex")
(vocloj.core/speak synthesizer voice-id {:text "Hello World!"})
See API docs for all functions and options.
Microphone streams are started and stopped. The easiest way to leverage a stream is to create one and use
vocloj.core/listen
to obtain a channel or use a callback.
(require '[vocloj.core :as vocloj.core]
'[vocloj.web :as vocloj.web])
(def stream (vocloj.web/create-microphone-stream))
(vocloj.core/listen stream #(println %))
;; Omitting a handler will return a core.async channel
(let [ch (vocloj.core/listen stream)]
(go-loop []
(println (<! ch))))
(vocloj.core/stop stream) ;; stop listening
The microphone stream is named for the MediaStream API. However, the current implementation provides chunks of js Blobs on channel via the MediaRecorder API.
Future iterations will be targeting true streaming via AudioWorklets.