diff --git a/.gitignore b/.gitignore index 843268d11..807ffb5a4 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ .idea/ binding/python/__pycache__/ +binding/python/*.pyc resources/porcupine/binding/python/__pycache__/ diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6a88f9c4a..769376832 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,13 +1,14 @@ ## Ideas for Contributing -* Adding new language/platform bindings. JavaScript, maybe? When adding a new binding please do make sure it -is tested. Adding accompanying unit test is a great way to assure that. Take a look at -[binding/python/](/binding/python) to find out how to unit test new bindings. +* Adding new language/platform bindings. When adding a new binding please do make sure it is tested. Adding accompanying +unit test is a great way to assure that. Take a look at [binding/python/](/binding/python) to find out how to unit test +new bindings. -* Adding new demos. Feel free to add new demos showcasing Rhino's capabilities on new platforms. Even better, -if you have a cool application idea using Rhino feel free to add it under [demo/](/demo). If you end up making a new -repository for your application idea let us know and we'll be more than happy to provide a link to your project in -Rhino's documentation. +* Adding new demos. Feel free to add new demos showcasing Rhino's capabilities on new platforms. Even better, if you +have a cool application idea using Rhino feel free to add it under [demo/](/demo). If you end up making a new repository +for your application idea let us know and we'll be more than happy to provide a link to your project in Rhino's +documentation. * Adding tutorials. Step-by-step tutorials are a great way of sharing knowledge with the community. These are extremely -helpful especially when some hardware setup is involved (e.g. Raspberry Pi project). These can go under [docs]() directory. +helpful especially when some hardware setup is involved (e.g. Raspberry Pi project). These can go under [docs]() +directory. diff --git a/README.md b/README.md index 93a5fec74..f8e4b35b1 100644 --- a/README.md +++ b/README.md @@ -2,17 +2,17 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai) -Rhino is Picovoice's Speech-to-Intent engine. It translates speech commands into structured data representing user's -intention. For example, given a speech command *Can I have a small double-shot espresso with two sugars and no milk* it -will infer the intent and outputs the following structured data that can be used to take an action. +Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from speech commands within a given context of +interest in real-time. For example, given a speech command "*Can I have a small double-shot espresso with a lot of sugar + and some milk*" it infers that the user wants to *order a drink* with the following specific requirements. ```json { - "product": "espresso", + "type": "espresso", "size": "small", - "# shots": "double shot", - "sugar": "two sugars", - "milk": "no milk" + "numberOfShots": "2", + "sugar": "a lot", + "milk": "some" } ``` @@ -21,39 +21,86 @@ Rhino is * intuitive. It allows users to utter their intention in a natural and conversational fashion. * using deep neural networks trained in **real-world situations**. * compact and computationally-efficient making it suitable for **IoT** applications. It can run with as low as 100 KB of RAM. -* cross-platform. Currently **Android**, **iOS**, **Raspberry Pi**, **ARM Cortex-A**, **ARM Cortex-M**, and -a growing number of embedded platforms are supported. -* customizable. It can be customized for any given domain (set of commands). +* cross-platform. It is implemented in fixed-point ANSI C. Currently **ARM Cortex-M**, **ARM Cortex-A**, +**Raspberry Pi**, **Android**, **iOS**, **watchOS**, **Linux**, **Mac**, **Windows**, and **WebAssembly** are supported. +* customizable. It can be customized for any given domain. + +NOTE: Currently only Linux and Raspberry Pi builds are available to the open-source community. But we do have plans to +make other platforms available as well in upcoming releases. ## Table of Contents * [Try It Out](#try-it-out) * [Motivation](#motivation) +* [Terminology](#terminology) + * [Context](#context) + * [Expression](#expression) + * [Intent](#intent) + * [Slot](#slot) * [Structure of Repository](#structure-of-repository) * [Running Demo Applications](#running-demo-applications) * [Running Python Demo Application](#running-python-demo-application) * [Integration](#integration) - * [Python](#python) * [C](#c) + * [Python](#python) * [Releases](#releases) * [License](#license) ## Try It Out -Try out Rhino using its [interactive web demo](https://picovoice.ai/products/#speech-to-intent-demo). You need a working microphone. +Try out Rhino using its [interactive web demo](https://picovoice.ai/products/#speech-to-intent-demo). You need a working +microphone. ## Motivation -A good number of use-cases when building voice-enabled products revolves around understanding speech commands within a -specific (limited) domain. For example, smart home alliances, mobile applications, etc. Rhino is a tight combination of -speech-to-text and natural-language-understanding engines that are optimized to work for a specific domain. Rhino is quite -lean and can run on small embedded processors with very limited RAM (as low as 100 KB) making it ideal for IoT applications. -Furthermore, it can understand potentially unlimited number of commands within a specific domain. For example for coffee maker -example above it can correctly recognize the following commands +A significant number of use-cases when building voice-enabled products revolves around understanding spoken commands within a +specific domain. Smart home, appliances, infotainment systems, command and control for mobile applications, etc are a +few examples. The current solutions use a domain-specific natural language understanding (NLU) engine on top of a +generic speech recognition system. This approach is computationally expensive and if not delegated to cloud services +requires significant CPU and memory for an on-device implementation. + +Rhino solves this problem by providing a tightly-coupled speech recognition and NLU engine that are jointly optimized +for a specific domain (use case). Rhino is quite lean and can even run on small embedded processors +(think ARM Cortex-M or fixed-point DSPs) with very limited RAM (as low as 100 KB) making it ideal for +resource-constrained IoT applications. + +## Terminology + +Below we define a set of terms that form the main ideas around how Rhino functions. + +### Context + +A context defines the set of spoken commands that users of the application might say. Additionally, it maps each spoken +command to users' intent. For example, when building a smart lighting system the following are a few examples +of spoken commands: + +* Turn off the lights. +* Make the bedroom light darker +* Set the lights in the living room to purple. +* ... + +### Expression + +A context is made of a collection of spoken commands mapped to the user's intent. An expression is an entity that defines a mapping between +a (or a set of) spoken commands and its (their) corresponding intent. For example + +* {turnCommand} the lights. -> {turnIntent} +* Make the {location} light {intensityChange}. -> {changeIntensityIntent} +* Set the lights in the {location} to {color} -> {setColorIntent} + +The tokens within curly braces represent variables in spoken commands. They are either the user's intent (e.g. turnIntent) +or can be intent's details (e.g. location). More on this is below. -* can I have a latte? -* make me a single-shot espresso. -* I want a triple-shot americano with milk. -* may I have a large cappuccino with cream? +### Intent + +An intent represents what a user wants to accomplish with a spoken command. For example the intent of the phrase +"*Set the lights in the living room to purple*" is to set the color of lights. In order to take action based on this, +we might need to have more information such as which light or what is the desired color. More on this below. + +### Slot + +A slot represents the details of the user's intent. For example the intent of the phrase +"*Set the lights in the living room to purple*" is to set the color of lights. and the slots are the location (living room) +and color (purple). ## Structure of Repository @@ -67,142 +114,99 @@ applications within the repository. ### Running Python Demo Application -This demo application allows testing Rhino using computer's microphone. It opens an input audio stream, monitors it -using [Porcupine's](https://github.com/Picovoice/Porcupine) library, and when the wake phrase is detected it will extract -the intention within the follow-up command. +This [demo application](/demo/python) allows testing Rhino using computer's microphone. It opens an input audio stream, +monitors it using [Porcupine](https://github.com/Picovoice/Porcupine) wake word detection engine, and when the wake +phrase is detected it will extract the intent within the follow-up spoken command using Rhino. -The following runs the Rhino engine to translate speech commands in the context of a *coffee maker machine*. -Also, it initializes the Porcupine engine to detect the wake phrase *Alfred*. When the wake phrase is detected the Rhino -starts processing the following speech command and prints out the inferred attributes and their values on the console. +The following runs the demo application on a *Linux* machine to infer intent from spoken commands in the context of a +*coffee maker*. It also initializes the Porcupine engine to detect the wake phrase *Hey Rachel*. When the wake +phrase is detected the Rhino starts processing the followup spoken command and prints out the inferred intent and slot +values on the console. ```bash -python demo/python/rhino_demo.py --rhino_context_file_path=resources/contexts/coffee_maker.pv \ ---porcupine_keyword_file_path=resources/porcupine/resources/keyword_files/alfred_linux.ppn +python demo/python/rhino_demo.py \ +--rhino_library_path ./lib/linux/x86_64/libpv_rhino.so \ +--rhino_model_file_path ./lib/common/rhino_params.pv \ +--rhino_context_file_path ./resources/contexts/linux/coffee_maker_linux.rhn \ +--porcupine_library_path ./resources/porcupine/lib/linux/x86_64/libpv_porcupine.so \ +--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \ +--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn ``` -The following command runs the speech to intent engine within a *smart light* domain with wake phrase set to *Rachel*. - +The following runs the engine on a *Raspberry Pi 3* to infer intent within the context of smart lighting system ```bash -python demo/python/rhino_demo.py --rhino_context_file_path=resources/contexts/smart_light.pv \ ---porcupine_keyword_file_path=resources/porcupine/resources/keyword_files/rachel_linux.ppn +python demo/python/rhino_demo.py \ +--rhino_library_path ./lib/raspberry-pi/cortex-a53/libpv_rhino.so \ +--rhino_model_file_path ./lib/common/rhino_params.pv \ +--rhino_context_file_path ./resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn \ +--porcupine_library_path ./resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so \ +--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \ +--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn ``` ## Integration Below are code snippets showcasing how Rhino can be integrated into different applications. -### Python - -[rhino.py](/binding/python/rhino.py) provides a Python binding for Rhino library. Below is a quick demonstration of how -to construct an instance of it. - -```python -library_path = ... # absolute path to Rhino's dynamic library -model_file_path = ... # available at lib/common/rhino_params.pv -context_file_path = ... # absolute path to Rhino's context file for the domain of interest - -rhino = Rhino( - library_path=library_path, - model_file_path=model_file_path, - context_file_path=context_file_path) -``` - -When initialized, valid sample rate can be obtained using `rhino.sample_rate`. Expected frame length -(number of audio samples in an input array) is `rhino.frame_length`. The object can be used to monitor incoming audio as -below. - -```python -def get_next_audio_frame(): - # implement the logic to get the next frame of audio - -is_finalized = False - -while not is_finalized: - is_finalized = rhino.process(get_next_audio_frame()) - - if is_finalized: - if rhino.is_understood(): - for attribute in rhino.get_attributes(): - attribute_value = rhino.get_attribute_value(attribute) - - # logic to take action based on attributes and their values - else: - # logic to handle unsupported command -``` - -Finally, when done be sure to explicitly release the resources as the binding class does not rely on the garbage -collector. - -```python -rhino.delete() -``` - ### C -Rhinos is implemented in ANSI C and therefore can be directly linked to C applications. -[pv_rhino.h](/include/pv_rhino.h) header file contains relevant information. An instance of Rhino object can be -constructed as follows. +Rhino is implemented in ANSI C and therefore can be directly linked to C applications. [pv_rhino.h](/include/pv_rhino.h) +header file contains relevant information. An instance of Rhino object can be constructed as follows. ```c const char *model_file_path = ... // available at lib/common/rhino_params.pv -const char *context_file_path = ... // absolute path to Rhino's context file for the domain of interest - -pv_rhino_object_t *handle; -const pv_status_t status = pv_rhino_init(model_file_path, context_file_path, &handle); +const char *context_file_path = ... // absolute path to context file for the domain of interest + +pv_rhino_object_t *rhino; +const pv_status_t status = pv_rhino_init(model_file_path, context_file_path, &rhino); if (status != PV_STATUS_SUCCESS) { - // error handling logic goes here + // add error handling code } ``` -Now the `handle` can be used to monitor incoming audio stream. Rhino accepts single channel, 16-bit PCM audio. The -sample rate can be retrieved using `pv_sample_rate()`. Finally, Rhino accepts input audio in consecutive chunks -(aka frames) the length of each frame can be retrieved using `pv_rhino_frame_length()`. +Now the handle `rhino` can be used to infer intent from incoming audio stream. Rhino accepts single channel, 16-bit PCM +audio. The sample rate can be retrieved using `pv_sample_rate()`. Finally, Rhino accepts input audio in consecutive chunks +(frames) the length of each frame can be retrieved using `pv_rhino_frame_length()`. ```c extern const int16_t *get_next_audio_frame(void); - + while (true) { const int16_t *pcm = get_next_audio_frame(); - + bool is_finalized; - pv_status_t status = pv_rhino_process(handle, pcm, &is_finalized); + pv_status_t status = pv_rhino_process(rhino, pcm, &is_finalized); if (status != PV_STATUS_SUCCESS) { - // error handling logic goes here + // add error handling code } - + if (is_finalized) { bool is_understood; - status = pv_rhino_is_understood(handle, &is_understood); + status = pv_rhino_is_understood(rhino, &is_understood); if (status != PV_STATUS_SUCCESS) { - // error handling logic goes here + // add error handling code } - + if (is_understood) { - int num_attribtes; - char **attributes; - status = pv_rhino_get_attributes(handle, num_attributes, attributes); + const char *intent; + int num_slots; + const char **slots; + const char **values; + status = pv_rhino_get_intent(rhino, &intent, &num_slots, &slots, &values); if (status != PV_STATUS_SUCCESS) { - // error handling logic goes here - } - - for (int i = 0; i < num_attributes; i++) { - char *attribute_value; - status = pv_rhino_get_attribute_value(handle, attributes[i], &attribute_value) - if (status != PV_STATUS_SUCCESS) { - // error handling logic goes here - } - - // logic to take an action based on attribute value + // add error handling code } - - free(attributes); - } - else { - // logic to handle out of context commands + + // add code to take action based on inferred intent and slot values + + free(slots); + free(values); + } else { + // add code to handle unsupported commands } - - pv_rhino_reset(handle); + + pv_rhino_reset(rhino); } } ``` @@ -210,12 +214,62 @@ while (true) { When done be sure to release the resources acquired. ```c -pv_rhino_delete(handle); +pv_rhino_delete(rhino); +``` + +### Python + +[rhino.py](/binding/python/rhino.py) provides a Python binding for Rhino library. Below is a quick demonstration of how +to construct an instance of it. + +```python +library_path = ... # absolute path to Rhino's dynamic library +model_file_path = ... # available at lib/common/rhino_params.pv +context_file_path = ... # absolute path to context file for the domain of interest + +rhino = Rhino( + library_path=library_path, + model_file_path=model_file_path, + context_file_path=context_file_path) ``` +When initialized, valid sample rate can be obtained using `rhino.sample_rate`. Expected frame length +(number of audio samples in an input array) is `rhino.frame_length`. The object can be used to infer intent from spoken +commands as below. +```python +def get_next_audio_frame(): + # add code to get the next audio frame + pass -##Releases + +while True: + is_finalized = rhino.process(get_next_audio_frame()) + + if is_finalized: + if rhino.is_understood(): + intent, slot_values = rhino.get_intent() + # add code to take action based on inferred intent and slot values + else: + # add code to handle unsupported commands + pass + + rhino.reset() +``` + +Finally, when done be sure to explicitly release the resources as the binding class does not rely on the garbage +collector. + +```python +rhino.delete() +``` + +## Releases + +### v1.1.0 December 23rd, 2018 + +* Accuracy improvements. +* Open-sourced Raspberry Pi build. ### v1.0.0 November 2nd, 2018 @@ -224,6 +278,7 @@ pv_rhino_delete(handle); ## License Everything in this repository is licensed under Apache 2.0 including the contexts available under -[resources/contexts](/resources/contexts). Custom contexts are only provided with the purchase of the commercial license. -In order to inquire about the commercial license send an email to contact@picovoice.ai with a brief description of your -use case. +[resources/contexts](/resources/contexts). + +Custom contexts are only provided with the purchase of the commercial license. In order to inquire about the commercial +license [contact us](https://picovoice.ai/company/#contact-us). diff --git a/binding/README.md b/binding/README.md new file mode 100644 index 000000000..378e15ed3 --- /dev/null +++ b/binding/README.md @@ -0,0 +1 @@ +If you'd like to add a binding please submit a pull request. diff --git a/binding/python/README.md b/binding/python/README.md new file mode 100644 index 000000000..acc3c2de1 --- /dev/null +++ b/binding/python/README.md @@ -0,0 +1,23 @@ +# Prerequisites + +Python 3.5 or higher is required to use the binding and run its accompanying unit tests. + +The unit test uses [PySoundFile](https://pypi.python.org/pypi/PySoundFile) for reading audio test files. It can be +installed using + +```bash +pip install pysoundfile +``` + +# Running Unit Tests + +Using command line (from the root of the repository) + +```bash +python binding/python/test_rhino.py +``` + +# Binding Class + +Rhino's Python binding uses [ctypes](https://docs.python.org/3.5/library/ctypes.html) to access Rhino's C +library. For an example usage refer to [Rhino demo application](/demo/python/rhino_demo.py). diff --git a/binding/python/rhino.py b/binding/python/rhino.py index 3db5bcfd0..e881e7371 100644 --- a/binding/python/rhino.py +++ b/binding/python/rhino.py @@ -20,7 +20,7 @@ class Rhino(object): - """Python binding for Picovoice's Speech to Intent (a.k.a Rhino) library.""" + """Python binding for Picovoice's Speech to Intent (a.k.a Rhino) engine.""" class PicovoiceStatuses(Enum): """Status codes corresponding to 'pv_status_t' defined in 'include/picovoice.h'""" @@ -31,13 +31,15 @@ class PicovoiceStatuses(Enum): INVALID_ARGUMENT = 3 STOP_ITERATION = 4 KEY_ERROR = 5 + INVALID_STATE = 6 _PICOVOICE_STATUS_TO_EXCEPTION = { PicovoiceStatuses.OUT_OF_MEMORY: MemoryError, PicovoiceStatuses.IO_ERROR: IOError, PicovoiceStatuses.INVALID_ARGUMENT: ValueError, PicovoiceStatuses.STOP_ITERATION: StopIteration, - PicovoiceStatuses.KEY_ERROR: KeyError + PicovoiceStatuses.KEY_ERROR: KeyError, + PicovoiceStatuses.INVALID_STATE: RuntimeError } class CRhino(Structure): @@ -48,12 +50,13 @@ def __init__(self, library_path, model_file_path, context_file_path): Constructor. :param library_path: Absolute path to Rhino's dynamic library. - :param model_file_path: Absolute path to Rhino's model parameter file. - :param context_file_path: Absolute path to Rhino's context file. + :param model_file_path: Absolute path to file containing model parameters. + :param context_file_path: Absolute path to file containing context parameters. A context represents the set of + expressions (commands), intents, and intent arguments (slots) within a domain of interest. """ if not os.path.exists(library_path): - raise ValueError("couldn't find library path at '%s'" % library_path) + raise ValueError("couldn't find library at '%s'" % library_path) library = cdll.LoadLibrary(library_path) @@ -85,121 +88,144 @@ def __init__(self, library_path, model_file_path, context_file_path): self._is_understood_func.argtypes = [POINTER(self.CRhino), POINTER(c_bool)] self._is_understood_func.restype = self.PicovoiceStatuses - self._get_num_attributes_func = library.pv_rhino_get_num_attributes - self._get_num_attributes_func.argtypes = [POINTER(self.CRhino), POINTER(c_int)] - self._get_num_attributes_func.restype = self.PicovoiceStatuses - - self._get_attribute_func = library.pv_rhino_get_attribute - self._get_attribute_func.argtypes = [POINTER(self.CRhino), c_int, POINTER(c_char_p)] - self._get_attribute_func.restype = self.PicovoiceStatuses - - self._get_attribute_value_func = library.pv_rhino_get_attribute_value - self._get_attribute_value_func.argtypes = [POINTER(self.CRhino), c_char_p, POINTER(c_char_p)] - self._get_attribute_value_func.restype = self.PicovoiceStatuses + self._get_intent_func = library.pv_rhino_get_intent + self._get_intent_func.argtypes = [ + POINTER(self.CRhino), + POINTER(c_char_p), + POINTER(c_int), + POINTER(POINTER(c_char_p)), + POINTER(POINTER(c_char_p))] + self._get_intent_func.restype = self.PicovoiceStatuses self._reset_func = library.pv_rhino_reset self._reset_func.argtypes = [POINTER(self.CRhino)] self._reset_func.restype = self.PicovoiceStatuses + context_expressions_func = library.pv_rhino_context_expressions + context_expressions_func.argtypes = [POINTER(self.CRhino), POINTER(c_char_p)] + context_expressions_func.restype = self.PicovoiceStatuses + + expressions = c_char_p() + status = context_expressions_func(self._handle, byref(expressions)) + if status is not self.PicovoiceStatuses.SUCCESS: + raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('getting expressions failed') + + self._context_expressions = expressions.value.decode('utf-8') + + version_func = library.pv_rhino_version + version_func.argtypes = [] + version_func.restype = c_char_p + self._version = version_func().decode('utf-8') + self._frame_length = library.pv_rhino_frame_length() self._sample_rate = library.pv_sample_rate() + def delete(self): + """Releases resources acquired by Rhino's library.""" + + self._delete_func(self._handle) + def process(self, pcm): """ - Processes a frame of audio. + Processes a frame of audio and emits a flag indicating if the engine has finalized intent extraction. When + finalized, 'self.is_understood()' should be called to check if the command was valid + (is within context of interest). - :param pcm: An array (or array-like) of consecutive audio samples. For more information regarding required audio - properties (i.e. sample rate, number of channels encoding, and number of samples per frame) please refer to - 'include/pv_rhino.h'. + :param pcm: A frame of audio samples. The number of samples per frame can be attained by calling + 'self.frame_length'. The incoming audio needs to have a sample rate equal to 'self.sample_rate' and be 16-bit + linearly-encoded. Furthermore, Rhino operates on single channel audio. - :return: A flag if the engine has finalized intent extraction. + :return: Flag indicating whether the engine has finalized intent extraction. """ - assert len(pcm) == self.frame_length + if len(pcm) != self.frame_length: + raise ValueError("invalid frame length") + is_finalized = c_bool() status = self._process_func(self._handle, (c_short * len(pcm))(*pcm), byref(is_finalized)) if status is not self.PicovoiceStatuses.SUCCESS: - raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Processing failed') + raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('processing failed') return is_finalized.value def is_understood(self): """ - Indicates weather the engine understood the intent within speech command. + Indicates if the spoken command is valid, is within the domain of interest (context), and the engine understood + it. - :return: Flag indicating if the engine understood the intent. + :return: Flag indicating if the spoken command is valid, is within the domain of interest (context), and the + engine understood it. """ is_understood = c_bool() status = self._is_understood_func(self._handle, byref(is_understood)) if status is not self.PicovoiceStatuses.SUCCESS: - raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Processing failed') + raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('failed to verify if the spoken command is understood') return is_understood.value - def get_attributes(self): + def get_intent(self): """ - Retrieves the attributes within the speech command. + Getter for the intent inferred from spoken command. The intent is presented as an intent string and a + dictionary mapping slots to their values. It should be called only after intent extraction is finalized and it + is verified that the spoken command is valid and understood via calling 'self.is_understood()'. - :return: Inferred attributes. + :return: Tuple of intent string and slot dictionary. """ - num_attributes = c_int() - status = self._get_num_attributes_func(self._handle, byref(num_attributes)) + intent = c_char_p() + num_slots = c_int() + slots = POINTER(c_char_p)() + values = POINTER(c_char_p)() + status = self._get_intent_func( + self._handle, + byref(intent), + byref(num_slots), + byref(slots), + byref(values)) if status is not self.PicovoiceStatuses.SUCCESS: - raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Getting number of attributes failed') - - attributes = list() - - for i in range(num_attributes.value): - attribute = c_char_p() - status = self._get_attribute_func(self._handle, i, byref(attribute)) - if status is not self.PicovoiceStatuses.SUCCESS: - raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Getting attribute failed') + raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('getting intent failed') - attributes.append(attribute.value.decode('utf-8')) + slot_values = dict() + for i in range(num_slots.value): + slot_values[slots[i].decode('utf-8')] = values[i].decode('utf-8') - return set(attributes) + return intent.value.decode('utf-8'), slot_values - def get_attribute_value(self, attribute): + def reset(self): """ - Retrieves the value of a given attribute. - - :param attribute: Attribute. - :return: Attribute's value. + Resets the internal state of the engine. It should be called before the engine can be used to infer intent from + a new stream of audio. """ - attribute_value = c_char_p() - status = self._get_attribute_value_func( - self._handle, - create_string_buffer(attribute.encode('utf-8')), - byref(attribute_value)) + status = self._reset_func(self._handle) if status is not self.PicovoiceStatuses.SUCCESS: - raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Getting attribute value failed') + raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('reset failed') - return attribute_value.value.decode('utf-8') - - def reset(self): - """Reset's the internal state of Speech to Intent engine.""" + @property + def context_expressions(self): + """ + Getter for expressions. Each expression maps a set of spoken phrases to an intent and possibly a number of slots + (intent arguments). + """ - status = self._reset_func(self._handle) - if status is not self.PicovoiceStatuses.SUCCESS: - raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Reset failed') + return self._context_expressions - def delete(self): - """Releases resources acquired by Rhino's library.""" + @property + def version(self): + """Getter for version string.""" - self._delete_func(self._handle) + return self._version @property def frame_length(self): - """Number of audio samples per frame expected by C library.""" + """Getter for length (number of audio samples) per frame.""" return self._frame_length @property def sample_rate(self): - """Audio sample rate accepted by Rhino library.""" + """Audio sample rate accepted by Picovoice.""" return self._sample_rate diff --git a/binding/python/test_rhino.py b/binding/python/test_rhino.py index d4ebc11f6..f160e0f4c 100644 --- a/binding/python/test_rhino.py +++ b/binding/python/test_rhino.py @@ -1,93 +1,113 @@ import os +import platform import unittest import soundfile -from .rhino import Rhino +from rhino import Rhino class RhinoTestCase(unittest.TestCase): - def test_within_context(self): - rhino = Rhino( - library_path=self._library_path, - model_file_path=self._abs_path('lib/common/rhino_params.pv'), - context_file_path=self._abs_path('resources/contexts/coffee_maker.pv')) + rhino = None + + @classmethod + def setUpClass(cls): + cls.rhino = Rhino( + library_path=cls._library_path(), + model_file_path=cls._abs_path('lib/common/rhino_params.pv'), + context_file_path=cls._context_file_path()) + + @classmethod + def tearDownClass(cls): + if cls.rhino is not None: + cls.rhino.delete() + + def tearDown(self): + self.rhino.reset() - audio, sample_rate = soundfile.read( - self._abs_path('resources/audio_samples/test_within_context.wav'), - dtype='int16') - assert sample_rate == rhino.sample_rate + def test_within_context(self): + audio, sample_rate =\ + soundfile.read(self._abs_path('resources/audio_samples/test_within_context.wav'), dtype='int16') + assert sample_rate == self.rhino.sample_rate - num_frames = len(audio) // rhino.frame_length + num_frames = len(audio) // self.rhino.frame_length is_finalized = False for i in range(num_frames): - frame = audio[i * rhino.frame_length:(i + 1) * rhino.frame_length] - is_finalized = rhino.process(frame) + frame = audio[i * self.rhino.frame_length:(i + 1) * self.rhino.frame_length] + is_finalized = self.rhino.process(frame) if is_finalized: break self.assertTrue(is_finalized, "couldn't finalize") - is_understood = rhino.is_understood() - - self.assertTrue(is_understood, "couldn't understand") - - expected_attribute_values = dict( - milk='no milk', - sugar='two sugars', - twist='cherry twist', - product='espresso', - taste='salted caramel', - shots='single shot', - roast='dark roast', - size='small') - - attributes = rhino.get_attributes() + self.assertTrue(self.rhino.is_understood(), "couldn't understand") - self.assertEqual(expected_attribute_values.keys(), attributes, "incorrect attributes") + intent, slot_values = self.rhino.get_intent() - for attribute in attributes: - self.assertEqual( - rhino.get_attribute_value(attribute), - expected_attribute_values[attribute], - "incorrect attribute value") + self.assertEqual('orderDrink', intent, "incorrect intent") - rhino.delete() + expected_slot_values = dict( + sugarAmount='some sugar', + milkAmount='lots of milk', + coffeeDrink='americano', + numberOfShots='double shot', + size='medium') + self.assertEqual(slot_values, expected_slot_values, "incorrect slot values") def test_out_of_context(self): - rhino = Rhino( - library_path=self._library_path, - model_file_path=self._abs_path('lib/common/rhino_params.pv'), - context_file_path=self._abs_path('resources/contexts/coffee_maker.pv')) + audio, sample_rate =\ + soundfile.read( self._abs_path('resources/audio_samples/test_out_of_context.wav'), dtype='int16') + assert sample_rate == self.rhino.sample_rate - audio, sample_rate = soundfile.read( - self._abs_path('resources/audio_samples/test_out_of_context.wav'), - dtype='int16') - assert sample_rate == rhino.sample_rate - - num_frames = len(audio) // rhino.frame_length + num_frames = len(audio) // self.rhino.frame_length is_finalized = False for i in range(num_frames): - frame = audio[i * rhino.frame_length:(i + 1) * rhino.frame_length] - is_finalized = rhino.process(frame) + frame = audio[i * self.rhino.frame_length:(i + 1) * self.rhino.frame_length] + is_finalized = self.rhino.process(frame) if is_finalized: break self.assertTrue(is_finalized, "couldn't finalize") - self.assertTrue(not rhino.is_understood(), "shouldn't be able to understand") - rhino.delete() + self.assertFalse(self.rhino.is_understood(), "shouldn't be able to understand") + + def test_context_expressions(self): + self.assertIsInstance(self.rhino.context_expressions, str) - @property - def _library_path(self): - return self._abs_path('lib/linux/x86_64/libpv_rhino.so') + def test_version(self): + self.assertIsInstance(self.rhino.version, str) @staticmethod def _abs_path(rel_path): return os.path.join(os.path.dirname(__file__), '../..', rel_path) + @classmethod + def _library_path(cls): + system = platform.system() + machine = platform.machine() + + if system == 'Linux': + if machine == 'x86_64': + return cls._abs_path('lib/linux/x86_64/libpv_rhino.so') + elif machine.startswith('arm'): + return cls._abs_path('lib/raspberry-pi/arm11/libpv_rhino.so') + + raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine)) + + @classmethod + def _context_file_path(cls): + system = platform.system() + machine = platform.machine() + + if system == 'Linux' and machine == 'x86_64': + return cls._abs_path('resources/contexts/linux/coffee_maker_linux.rhn') + elif system == 'Linux' and machine.startswith('arm'): + return cls._abs_path('resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn') + + raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine)) + -if __name__ == '__main__ ': - pass +if __name__ == '__main__': + unittest.main() diff --git a/demo/README.md b/demo/README.md new file mode 100644 index 000000000..ef8fbfcdc --- /dev/null +++ b/demo/README.md @@ -0,0 +1 @@ +If you'd like to add a new demo please submit a pull request. diff --git a/demo/python/README.md b/demo/python/README.md new file mode 100644 index 000000000..d4f7fca8d --- /dev/null +++ b/demo/python/README.md @@ -0,0 +1,185 @@ +# Prerequisites + +First, consult the prerequisites section of [Python binding](/binding/python). Additionally, demo application +uses [PyAudio](https://people.csail.mit.edu/hubert/pyaudio/) for recording input audio (i.e. microphone). +Consult the installation guide at [PyAudio](https://people.csail.mit.edu/hubert/pyaudio/). + +# Demo Application + +Usage information can be found via + +```bash +python demo/python/rhino_demo.py --help +``` + +On a Raspberry Pi 3 the demo can be run via + +```bash +python demo/python/rhino_demo.py \ +--rhino_library_path ./lib/raspberry-pi/cortex-a53/libpv_rhino.so \ +--rhino_model_file_path ./lib/common/rhino_params.pv \ +--rhino_context_file_path ./resources/contexts/raspberrypi/smart_lighting_raspberrypi.rhn \ +--porcupine_library_path ./resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so \ +--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \ +--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn +``` + +It starts recording audio from the **default** input audio device, initializes instances of Porcupine and Rhino +engines, and monitors the incoming audio for the wake phrase **Hey Alfred**. Upon detection of the wake word the followup + command is processed by Rhino to infer user's intent and the inferred result is writen into the console. + +For running on a different platform you to use the corresponding platform-specific library paths, Porcupine keyword file, +and Rhino context file. For example to run the same demo on a Linux box + +```bash +python demo/python/rhino_demo.py \ +--rhino_library_path ./lib/linux/x86_64/libpv_rhino.so \ +--rhino_model_file_path ./lib/common/rhino_params.pv \ +--rhino_context_file_path ./resources/contexts/linux/smart_lighting_linux.rhn \ +--porcupine_library_path ./resources/porcupine/lib/linux/x86_64/libpv_porcupine.so \ +--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \ +--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn +``` + +Below is an example console output + +``` +LIGHTING SYSTEM CONTEXT: + +EXPRESSIONS: + +{turnCommand} the light(s). -> {turnLight} +{turnCommand} the {location} light(s). -> {turnLight} +{turnCommand} the light(s) in the {location}. -> {turnLight} +make the light(s) {intensityChange}. -> {changeIntensity} +make the {location} light(s) {intensityChange}. -> {changeIntensity} +make the light(s) in the {location} {intensityChange}. -> {changeIntensity} +set the light(s) to {color}. -> changeColor +set the {location} light(s) to {color}. -> changeColor +set the light(s) in the {location} to {color}. -> changeColor + +SLOT VALUES: + +turnCommand: [turn off, turn on] +location: [attic, balcony, basement, bathroom, bedroom, corridor, den, entrance, kitchen, living room] +color: [blue, green, lavender, olive, pink, purple, red, silver, violet, white, yellow] +intensityChange: [brighter, darker] + +EXAMPLES: + +turn off the lights. -> (intent: turnLight, slots: {turnCommand: turn off}) +turn on the kitchen light. -> (intent: turnLight, slots: {location: kitchen}) +set the living room light to blue. -> (intent: changeColor, slots: {location: living room, color: blue}) +make the light in the attic brighter. -> (intent: changeIntensity, slots: {location: attic, intensityChange: brighter}) +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline +ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'defaults.bluealsa.device' +ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory +ALSA lib conf.c:4996:(snd_config_expand) Args evaluate error: No such file or directory +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM bluealsa +ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'defaults.bluealsa.device' +ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory +ALSA lib conf.c:4996:(snd_config_expand) Args evaluate error: No such file or directory +ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM bluealsa +ALSA lib pcm_hw.c:1713:(_snd_pcm_hw_open) Invalid value for card +ALSA lib pcm_hw.c:1713:(_snd_pcm_hw_open) Invalid value for card +Cannot connect to server socket err = No such file or directory +Cannot connect to server request channel +jack server is not running or cannot be started +JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock +JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock +```` + +First, the demo outputs the information regarding the context of Rhino including supported expressions, slots, and their +possible values. The information is retrieved by calling `rhino.context_expressions` + +There are few lines of output generated by [ALSA](https://en.wikipedia.org/wiki/Advanced_Linux_Sound_Architecture). Don't +be alarmed this is normal! In order to detect a different wake word change the keyword file. In order to infer commands +within a different context change the context file. + +## FAQ + +#### The demo application does not detect/infer anything. Why? + +The most probable cause of this is that the default audio input device recognized by PyAudio is not the one being used. +There are a couple of debugging facilities baked into the demo application to solve this. First, type the following into +the console + +```bash +python ./demo/python/rhino_demo.py --show_audio_devices_info +``` + +It provides information about various audio input devices on the box. On a Linux box, this is the console output + +``` +'index': '0', 'name': 'HDA Intel PCH: ALC892 Analog (hw:0,0)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '2' +'index': '1', 'name': 'HDA Intel PCH: ALC892 Alt Analog (hw:0,2)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '2' +'index': '2', 'name': 'HDA NVidia: HDMI 0 (hw:1,3)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '3', 'name': 'HDA NVidia: HDMI 1 (hw:1,7)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '4', 'name': 'HDA NVidia: HDMI 2 (hw:1,8)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '5', 'name': 'HDA NVidia: HDMI 3 (hw:1,9)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '6', 'name': 'HDA NVidia: HDMI 0 (hw:2,3)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '7', 'name': 'HDA NVidia: HDMI 1 (hw:2,7)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '8', 'name': 'HDA NVidia: HDMI 2 (hw:2,8)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '9', 'name': 'HDA NVidia: HDMI 3 (hw:2,9)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '10', 'name': 'Logitech USB Headset: Audio (hw:3,0)', 'defaultSampleRate': '44100.0', 'maxInputChannels': '1' +'index': '11', 'name': 'sysdefault', 'defaultSampleRate': '48000.0', 'maxInputChannels': '128' +'index': '12', 'name': 'front', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '13', 'name': 'surround21', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '14', 'name': 'surround40', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '15', 'name': 'surround41', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '16', 'name': 'surround50', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '17', 'name': 'surround51', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '18', 'name': 'surround71', 'defaultSampleRate': '44100.0', 'maxInputChannels': '0' +'index': '19', 'name': 'pulse', 'defaultSampleRate': '44100.0', 'maxInputChannels': '32' +'index': '20', 'name': 'dmix', 'defaultSampleRate': '48000.0', 'maxInputChannels': '0' +'index': '21', 'name': 'default', 'defaultSampleRate': '44100.0', 'maxInputChannels': '32' +``` + +It can be seen that the last device (index 21) is considered default. But on this machine, a headset is being used as +the input device which has an index of 10. After finding the correct index the demo application can be invoked as below + +```bash +python demo/python/rhino_demo.py \ +--rhino_library_path ./lib/linux/x86_64/libpv_rhino.so \ +--rhino_model_file_path ./lib/common/rhino_params.pv \ +--rhino_context_file_path ./resources/contexts/linux/smart_lighting_linux.rhn \ +--porcupine_library_path ./resources/porcupine/lib/linux/x86_64/libpv_porcupine.so \ +--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \ +--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn +--input_audio_device_index 10 +``` + +If the problem persists we suggest storing the recorded audio into a file for inspection. This can be achieved by + +```bash +python demo/python/rhino_demo.py \ +--rhino_library_path ./lib/linux/x86_64/libpv_rhino.so \ +--rhino_model_file_path ./lib/common/rhino_params.pv \ +--rhino_context_file_path ./resources/contexts/linux/smart_lighting_linux.rhn \ +--porcupine_library_path ./resources/porcupine/lib/linux/x86_64/libpv_porcupine.so \ +--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \ +--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn +--input_audio_device_index 10 \ +--output-path ~/test.wav +``` + +If after listening to stored file there is no apparent problem detected please open an issue. diff --git a/demo/python/rhino_demo.py b/demo/python/rhino_demo.py index 0bae2c6ee..2d9cb384e 100644 --- a/demo/python/rhino_demo.py +++ b/demo/python/rhino_demo.py @@ -113,6 +113,7 @@ def run(self): library_path=self._rhino_library_path, model_file_path=self._rhino_model_file_path, context_file_path=self._rhino_context_file_path) + print(rhino.context_expressions) pa = pyaudio.PyAudio() @@ -124,12 +125,6 @@ def run(self): frames_per_buffer=porcupine.frame_length, input_device_index=self._input_device_index) - context_help_path = self._rhino_context_file_path.replace('.pv', '_info.txt') - if os.path.exists(context_help_path): - with open(context_help_path, 'r') as f: - for x in f: - print(x.strip('\n')) - # NOTE: This is true now and will be correct possibly forever. If it changes the logic below need to change. assert porcupine.frame_length == rhino.frame_length @@ -148,8 +143,12 @@ def run(self): intent_extraction_is_finalized = rhino.process(pcm) else: if rhino.is_understood(): - for attribute in rhino.get_attributes(): - print('%s: %s' % (attribute, rhino.get_attribute_value(attribute))) + intent, slot_values = rhino.get_intent() + print('intent: %s' % intent) + print('---') + for slot, value in slot_values.items(): + print('%s: %s' % (slot, value)) + print() else: print("didn't understand the command") @@ -195,31 +194,15 @@ def show_audio_devices_info(cls): if __name__ == '__main__': parser = argparse.ArgumentParser() - parser.add_argument( - '--rhino_library_path', - help="absolute path to Rhino's dynamic library", - type=str, - default=_abs_path('lib/linux/x86_64/libpv_rhino.so')) + parser.add_argument('--rhino_library_path', help="absolute path to Rhino's dynamic library") - parser.add_argument( - '--rhino_model_file_path', - help="absolute path to Rhino's model file path", - type=str, - default=_abs_path('lib/common/rhino_params.pv')) + parser.add_argument('--rhino_model_file_path', help="absolute path to Rhino's model file path") parser.add_argument('--rhino_context_file_path', help="absolute path to Rhino's context file", type=str) - parser.add_argument( - '--porcupine_library_path', - help="absolute path to Porcupine's dynamic library", - type=str, - default=_abs_path('resources/porcupine/lib/linux/x86_64/libpv_porcupine.so')) + parser.add_argument('--porcupine_library_path', help="absolute path to Porcupine's dynamic library") - parser.add_argument( - '--porcupine_model_file_path', - help="absolute path to Porcupine's model parameter file", - type=str, - default=_abs_path('resources/porcupine/lib/common/porcupine_params.pv')) + parser.add_argument('--porcupine_model_file_path', help="absolute path to Porcupine's model parameter file") parser.add_argument('--porcupine_keyword_file_path', help='absolute path to porcupine keyword file', type=str) diff --git a/include/README.md b/include/README.md new file mode 100644 index 000000000..865a94bbe --- /dev/null +++ b/include/README.md @@ -0,0 +1 @@ +`PV_RHINO_BAREMACHINE` should be defined only when compiling for systems without filesystem support. \ No newline at end of file diff --git a/include/picovoice.h b/include/picovoice.h index bb1939451..afdb8828a 100644 --- a/include/picovoice.h +++ b/include/picovoice.h @@ -39,6 +39,7 @@ typedef enum { PV_STATUS_INVALID_ARGUMENT, PV_STATUS_STOP_ITERATION, PV_STATUS_KEY_ERROR, + PV_STATUS_INVALID_STATE, } pv_status_t; #ifdef __cplusplus diff --git a/include/pv_rhino.h b/include/pv_rhino.h index 2151c9c11..00aaeeeae 100644 --- a/include/pv_rhino.h +++ b/include/pv_rhino.h @@ -28,141 +28,110 @@ extern "C" #endif /** - * Forward declaration for Speech to Intent object (a.k.a Rhino). The object translates speech (commands) in a given - * context into structured data (intent). It processes incoming audio in consecutive frames (chunks). The number of - * samples per frame can be attained by calling 'pv_rhino_frame_length()'. The incoming audio needs to have a sample - * rate equal to 'pv_sample_rate()' and be 16-bit linearly-encoded. Furthermore, Rhino operates on single channel audio. + * Forward declaration for speech-to-intent object (a.k.a Rhino). + * The object directly infers intent from speech commands within a given context of interest in real-time. It + * processes incoming audio in consecutive frames (chunks) and at the end of each frame indicates if the intent + * extraction is finalized. When finalized, the intent can be retrieved as structured data in form of an intent string + * and pairs of slots and values representing arguments (details) of intent. The number of samples per frame can be + * attained by calling 'pv_rhino_frame_length()'. The incoming audio needs to have a sample rate equal to + * 'pv_sample_rate()' and be 16-bit linearly-encoded. Furthermore, Rhino operates on single channel audio. */ typedef struct pv_rhino_object pv_rhino_object_t; -#ifdef PV_RHINO_BARE_MACHINE -PV_API pv_status_t pv_rhino_init(const void *context, pv_rhino_object_t **object); +#ifdef PV_RHINO_BAREMACHINE +/** + * Constructor. + * + * @param context Context parameters. A context represents the set of expressions (commands), intents, and intent + * arguments (slots) within a domain of interest. + * @param context_length Length of context in bytes. + * @param[out] object Constructed speech-to-intent object. + * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' or 'PV_STATUS_OUT_OF_MEMORY' on failure. + */ +PV_API pv_status_t pv_rhino_init(const void *context, int context_length, pv_rhino_object_t **object); #else /** * Constructor. * * @param model_file_path Absolute path to file containing model parameters. - * @param context_file_path Absolute path to file containing context parameters. - * @param object Constructed Speech to Intent object. - * @return Status code. Returns 'PV_STATUS_OUT_OF_MEMORY', 'PV_STATUS_IO_ERROR', or 'PV_STATUS_INVALID_ARGUMENT' on + * @param context_file_path Absolute path to file containing context parameters. A context represents the set of + * expressions (commands), intents, and intent arguments (slots) within a domain of interest. + * @param[out] object Constructed speech-to-intent object. + * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT', 'PV_STATUS_IO_ERROR', or 'PV_STATUS_OUT_OF_MEMORY' on * failure. */ -PV_API pv_status_t pv_rhino_init( - const char *model_file_path, - const char *context_file_path, - pv_rhino_object_t **object); +PV_API pv_status_t pv_rhino_init(const char *model_file_path, const char *context_file_path, pv_rhino_object_t **object); #endif /** * Destructor. * - * @param object Speech to Intent object. + * @param object Speech-to-intent object. */ PV_API void pv_rhino_delete(pv_rhino_object_t *object); /** - * Processes a frame of audio and returns a flag weather it has finalized intent extraction. + * Processes a frame of audio and emits a flag indicating if the engine has finalized intent extraction. When finalized, + * 'pv_rhino_is_understood()' should be called to check if the command was valid (is within context of interest). * - * @param object Speech to Intent object. + * @param object Speech-to-intent object. * @param pcm A frame of audio samples. The number of samples per frame can be attained by calling * 'pv_rhino_frame_length()'. The incoming audio needs to have a sample rate equal to 'pv_sample_rate()' and be 16-bit * linearly-encoded. Furthermore, Rhino operates on single channel audio. - * @param is_finalized Flag indicating whether the engine has finalized intent extraction. - * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' on failure. + * @param[out] is_finalized Flag indicating whether the engine has finalized intent extraction. + * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' or 'PV_STATUS_OUT_OF_MEMORY' on failure. */ PV_API pv_status_t pv_rhino_process(pv_rhino_object_t *object, const int16_t *pcm, bool *is_finalized); /** - * Indicates weather the engine understood the intent within speech command. + * Indicates if the spoken command is valid, is within the domain of interest (context), and the engine understood it. * - * @param object Speech to Intent object. - * @param is_understood Flag indicating weather the engine understood the intent within the speech. + * @param object Speech-to-intent object. + * @param[out] is_understood Flag indicating if the spoken command is valid, is within the domain of interest (context), + * and the engine understood it. * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' on failure. */ PV_API pv_status_t pv_rhino_is_understood(const pv_rhino_object_t *object, bool *is_understood); /** - * Retrieves the intent attributes after the engine has finalized the extraction and only if the command is understood. - * i.e. this should be called only after 'is_finalized' returned by 'pv_rhino_process' is set to true and then - * 'is_understood' returned by 'pv_rhino_is_understood' is set to true. - * - * @param object Speech to Intent object. - * @param num_attributes Number of extracts attributes within speech command. - * @param attributes Attribute values. - * @return Status code. Returns 'PV_STATUS_OUT_OF_MEMORY', or 'PV_STATUS_INVALID_ARGUMENT' on failure. - */ -PV_API pv_status_t pv_rhino_get_attributes(const pv_rhino_object_t *object, int *num_attributes, const char ***attributes); - -/** - * Retrieves the number of intent attributes after the engine has finalized the extraction and only if the command is - * understood. i.e. this should be called only after 'is_finalized' returned by 'pv_rhino_process' is set to true and - * then 'is_understood' returned by 'pv_rhino_is_understood' is set to true. + * Getter for the intent inferred from spoken command. The intent is presented as an intent string and pairs of slots + * and their values. It should be called only after intent extraction is finalized and it is verified that the spoken + * command is valid and understood via calling 'pv_rhino_is_understood()'. * - * @param object Speech to Intent object. - * @param num_attributes Number of inferred attributes. - * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' on failure. - */ -PV_API pv_status_t pv_rhino_get_num_attributes(const pv_rhino_object_t *object, int *num_attributes); - -/** - * Retrieves the a given attribute's value after the engine has finalized the extraction and only if the command is - * understood. i.e. this should be called only after 'is_finalized' returned by 'pv_rhino_process' is set to true and - * then 'is_understood' returned by 'pv_rhino_is_understood' is set to true. - * - * @param object Speech to Intent object. - * @param attribute_index The index of attribute. - * @param attribute Attribute value. - * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' on failure. - */ -PV_API pv_status_t pv_rhino_get_attribute(const pv_rhino_object_t *object, int attribute_index, const char **attribute); - -/** - * Retrieves the a given attribute's value after the engine has finalized the extraction and only if the command is - * understood. i.e. this should be called only after 'is_finalized' returned by 'pv_rhino_process' is set to true and - * then 'is_understood' returned by 'pv_rhino_is_understood' is set to true. - * - * @param object Speech to Intent object. - * @param attribute Attribute. - * @param value Returned attribute value. - * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' on failure. + * @param object Speech-to-intent object. + * @param[out] intent Inferred intent. + * @param[out] num_slots Number of slots. + * @param[out] slots Array of inferred slots. Its memory needs to be freed by the caller. + * @param[out] values Array of inferred slot values in the same order of inferred slots. Its memory needs to be freed + * by the caller. + * @return State code. Returns 'PV_STATUS_INVALID_ARGUMENT', 'PV_STATUS_INVALID_STATE', or 'PV_STATUS_OUT_OF_MEMORY' on + * failure. */ -PV_API pv_status_t pv_rhino_get_attribute_value(const pv_rhino_object_t *object, const char *attribute, const char **value); +PV_API pv_status_t pv_rhino_get_intent( + const pv_rhino_object_t *object, + const char **intent, + int *num_slots, + const char ***slots, + const char ***values); /** - * Resets the internal state of the Speech to Intent engine. + * Resets the internal state of the engine. It should be called before the engine can be used to infer intent from a new + * stream of audio. * - * @param object Speech to Intent object. + * @param object Speech-to-intent object. * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' on failure. */ PV_API pv_status_t pv_rhino_reset(pv_rhino_object_t *object); /** - * Getter of attributes within a context. The caller is responsible for freeing the returned array of attributes. + * Getter for expressions. Each expression maps a set of spoken phrases to an intent and possibly a number of slots + * (intent arguments). * - * @param object Speech to Intent object. - * @param num_attributes Number of attributes within current context. - * @param attributes Context attributes. - * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' or 'PV_STATUS_OUT_OF_MEMORY' on failure. - */ -PV_API pv_status_t pv_rhino_get_context_attributes( - const pv_rhino_object_t *object, - int *num_attributes, - const char ***attributes); - -/** - * Getter for different values of a given attribute. The caller is responsible for freeing the returned array of values. - * - * @param object Speech to Intent object. - * @param attribute A given attribute within current context. - * @param num_values Number of possible values for the given attribute in current context. - * @param values Possible values for given attribute in current context. - * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' or 'PV_STATUS_OUT_OF_MEMORY' on failure. + * @param object Speech-to-intent object. + * @param[out] expressions Expressions. + * @return Status code. Returns 'PV_STATUS_INVALID_ARGUMENT' on failure. */ -PV_API pv_status_t pv_rhino_get_attribute_values( - const pv_rhino_object_t *object, - const char *attribute, - int *num_values, - const char ***values); +PV_API pv_status_t pv_rhino_context_expressions(const pv_rhino_object_t *object, const char **expressions); /** * Getter for version string. diff --git a/lib/README.md b/lib/README.md new file mode 100644 index 000000000..9b609f634 --- /dev/null +++ b/lib/README.md @@ -0,0 +1,15 @@ +## Common + +Contains parameters for deep neural networks used by Rhino. Parameters are shipped as a separate binary file to +reduce the total size of the library on platforms that need to support multiple ABIs (e.g. Android). + + +## Linux + +Tested on Ubuntu 16.04/18.04. + +## Raspberry Pi + +* **arm11** is tuned for A, B, and Zero. +* **cortex-a7** is tuned for 2. +* **cortex-a53** is tuned for 3 and 3 B+. diff --git a/lib/common/rhino_params.pv b/lib/common/rhino_params.pv index 8cadda395..821bcd95c 100644 Binary files a/lib/common/rhino_params.pv and b/lib/common/rhino_params.pv differ diff --git a/lib/linux/x86_64/libpv_rhino.so b/lib/linux/x86_64/libpv_rhino.so index 18cfd4ead..3f2bfb61e 100755 Binary files a/lib/linux/x86_64/libpv_rhino.so and b/lib/linux/x86_64/libpv_rhino.so differ diff --git a/lib/raspberry-pi/arm11/libpv_rhino.so b/lib/raspberry-pi/arm11/libpv_rhino.so new file mode 100755 index 000000000..4382011d2 Binary files /dev/null and b/lib/raspberry-pi/arm11/libpv_rhino.so differ diff --git a/lib/raspberry-pi/cortex-a53/libpv_rhino.so b/lib/raspberry-pi/cortex-a53/libpv_rhino.so new file mode 100755 index 000000000..115e9512c Binary files /dev/null and b/lib/raspberry-pi/cortex-a53/libpv_rhino.so differ diff --git a/lib/raspberry-pi/cortex-a7/libpv_rhino.so b/lib/raspberry-pi/cortex-a7/libpv_rhino.so new file mode 100755 index 000000000..7d030ac76 Binary files /dev/null and b/lib/raspberry-pi/cortex-a7/libpv_rhino.so differ diff --git a/resources/README.md b/resources/README.md new file mode 100644 index 000000000..d68dca457 --- /dev/null +++ b/resources/README.md @@ -0,0 +1,4 @@ +## Porcupine + +This is a minimal subset of [Porcupine](https://github.com/Picovoice/Porcupine) needed to enable wake-word detection for +Rhino demo applications. In practice, Porcupine and Rhino are almost always used together. diff --git a/resources/audio_samples/test_within_context.wav b/resources/audio_samples/test_within_context.wav index a8571e908..518bdc53b 100644 Binary files a/resources/audio_samples/test_within_context.wav and b/resources/audio_samples/test_within_context.wav differ diff --git a/resources/contexts/coffee_maker.pv b/resources/contexts/coffee_maker.pv deleted file mode 100644 index 57347d499..000000000 Binary files a/resources/contexts/coffee_maker.pv and /dev/null differ diff --git a/resources/contexts/coffee_maker_info.txt b/resources/contexts/coffee_maker_info.txt deleted file mode 100644 index 782f69abb..000000000 --- a/resources/contexts/coffee_maker_info.txt +++ /dev/null @@ -1,21 +0,0 @@ -******************************************************************************* -coffee maker context info: - -This context is designed to ask a voice-enabled coffe maker to make you coffee. -You can select the (1) size, (2) number of shots, (3) type of drink, and (4) -milk or cream. Below are a few examples - -* Can I have an Americano -* Can I get a large latte -* Please may I have a single-shot espresso -* I want a medium double-shot Cappucino -* Make me a large Americano with cream -* I'd like a small Mocha with milk - -Available sizes: Small, Medium, Large -Available number of shots: single-shot, double shot, triple shot -Available drinks: Americano, Cappucino, Espresso, Latte, Mocha -Additions: Milk, Cream - -******************************************************************************* - diff --git a/resources/contexts/linux/coffee_maker_linux.rhn b/resources/contexts/linux/coffee_maker_linux.rhn new file mode 100644 index 000000000..d3007ebe6 Binary files /dev/null and b/resources/contexts/linux/coffee_maker_linux.rhn differ diff --git a/resources/contexts/linux/smart_lighting_linux.rhn b/resources/contexts/linux/smart_lighting_linux.rhn new file mode 100644 index 000000000..49ddfc6f0 Binary files /dev/null and b/resources/contexts/linux/smart_lighting_linux.rhn differ diff --git a/resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn b/resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn new file mode 100644 index 000000000..c7b7726f3 Binary files /dev/null and b/resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn differ diff --git a/resources/contexts/raspberrypi/smart_lighting_raspberrypi.rhn b/resources/contexts/raspberrypi/smart_lighting_raspberrypi.rhn new file mode 100644 index 000000000..98ec8f846 Binary files /dev/null and b/resources/contexts/raspberrypi/smart_lighting_raspberrypi.rhn differ diff --git a/resources/contexts/smart_light.pv b/resources/contexts/smart_light.pv deleted file mode 100644 index e3505ab14..000000000 Binary files a/resources/contexts/smart_light.pv and /dev/null differ diff --git a/resources/contexts/smart_light_info.txt b/resources/contexts/smart_light_info.txt deleted file mode 100644 index 4bcc5c4ec..000000000 --- a/resources/contexts/smart_light_info.txt +++ /dev/null @@ -1,21 +0,0 @@ -******************************************************************************* -Smart Light context info: - -This context is designed for controlling a voice-enabled smart light system. -You can turn on/off, change color, or change intensity (make dimmer or brighter) -of lights. Optionally you select to control the lights within a specific location. -Below are a few examples. - -* Turn on the lights -* Set the lights green -* Make the lights brighter -* Turn on the lights in the kitchen -* Set the lights in living room lavender - -Available actions: Turn on, Turn off, Set, Make -Available locations: Living room, Bedroom, Bathroom, Balcony, Kitchen, Entrance -Available colors: Blue, Violet, Red, Green, Yellow, Lavender, Pink, Olive, Purple, Silver, White -Available intensities: Brighter, Darker - -******************************************************************************* - diff --git a/resources/porcupine/lib/common/porcupine_params.pv b/resources/porcupine/lib/common/porcupine_params.pv index 6477b651a..7206be899 100644 Binary files a/resources/porcupine/lib/common/porcupine_params.pv and b/resources/porcupine/lib/common/porcupine_params.pv differ diff --git a/resources/porcupine/lib/linux/x86_64/libpv_porcupine.so b/resources/porcupine/lib/linux/x86_64/libpv_porcupine.so index f6db7bb82..159602de0 100755 Binary files a/resources/porcupine/lib/linux/x86_64/libpv_porcupine.so and b/resources/porcupine/lib/linux/x86_64/libpv_porcupine.so differ diff --git a/resources/porcupine/lib/raspberry-pi/arm11/libpv_porcupine.a b/resources/porcupine/lib/raspberry-pi/arm11/libpv_porcupine.a new file mode 100644 index 000000000..10d9abf3a Binary files /dev/null and b/resources/porcupine/lib/raspberry-pi/arm11/libpv_porcupine.a differ diff --git a/resources/porcupine/lib/raspberry-pi/arm11/libpv_porcupine.so b/resources/porcupine/lib/raspberry-pi/arm11/libpv_porcupine.so new file mode 100755 index 000000000..d1ea1fef8 Binary files /dev/null and b/resources/porcupine/lib/raspberry-pi/arm11/libpv_porcupine.so differ diff --git a/resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.a b/resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.a new file mode 100644 index 000000000..fc6e05b7b Binary files /dev/null and b/resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.a differ diff --git a/resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so b/resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so new file mode 100755 index 000000000..242aa5c30 Binary files /dev/null and b/resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so differ diff --git a/resources/porcupine/lib/raspberry-pi/cortex-a7/libpv_porcupine.a b/resources/porcupine/lib/raspberry-pi/cortex-a7/libpv_porcupine.a new file mode 100644 index 000000000..9980a6f31 Binary files /dev/null and b/resources/porcupine/lib/raspberry-pi/cortex-a7/libpv_porcupine.a differ diff --git a/resources/porcupine/lib/raspberry-pi/cortex-a7/libpv_porcupine.so b/resources/porcupine/lib/raspberry-pi/cortex-a7/libpv_porcupine.so new file mode 100755 index 000000000..4c9ac428a Binary files /dev/null and b/resources/porcupine/lib/raspberry-pi/cortex-a7/libpv_porcupine.so differ diff --git a/resources/porcupine/resources/keyword_files/alfred_linux.ppn b/resources/porcupine/resources/keyword_files/alfred_linux.ppn deleted file mode 100644 index 375ab1de4..000000000 --- a/resources/porcupine/resources/keyword_files/alfred_linux.ppn +++ /dev/null @@ -1 +0,0 @@ -%UcB'oa;u&&[Vm2FmYDف)Ũ`:XStmR \ No newline at end of file diff --git a/resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn b/resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn new file mode 100644 index 000000000..bb8e745bb --- /dev/null +++ b/resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn @@ -0,0 +1,2 @@ +;dD7svkUba@RFhKZ 򜫦sz&/^,V,5C^.ak +g-u)Z'u \ No newline at end of file diff --git a/resources/porcupine/resources/keyword_files/linux/hey_rachel_linux.ppn b/resources/porcupine/resources/keyword_files/linux/hey_rachel_linux.ppn new file mode 100644 index 000000000..1ef3d21f9 Binary files /dev/null and b/resources/porcupine/resources/keyword_files/linux/hey_rachel_linux.ppn differ diff --git a/resources/porcupine/resources/keyword_files/rachel_linux.ppn b/resources/porcupine/resources/keyword_files/rachel_linux.ppn deleted file mode 100644 index 48d77a42f..000000000 --- a/resources/porcupine/resources/keyword_files/rachel_linux.ppn +++ /dev/null @@ -1,2 +0,0 @@ -H -Ro/:̙dk1 Sظżu@GŖ> á +jqjkZp r2`^ \ No newline at end of file diff --git a/resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn b/resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn new file mode 100644 index 000000000..cdaef0220 --- /dev/null +++ b/resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn @@ -0,0 +1,3 @@ +yNO0>RCv2~gQb7 +2tM46 +֧f@ulY%ttgOn  Pj+#t \ No newline at end of file diff --git a/resources/porcupine/resources/keyword_files/raspberrypi/hey_rachel_raspberrypi.ppn b/resources/porcupine/resources/keyword_files/raspberrypi/hey_rachel_raspberrypi.ppn new file mode 100644 index 000000000..9b96be43f --- /dev/null +++ b/resources/porcupine/resources/keyword_files/raspberrypi/hey_rachel_raspberrypi.ppn @@ -0,0 +1 @@ + 56}CءE{@1iHoyN^5|}&-y^^OO \ No newline at end of file