Skip to content
bencmark for speech-to-intent engine
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea initial commit May 6, 2019
benchmark updated result with dialog flow comparison May 17, 2019
data updated result with dialog flow comparison May 17, 2019
rhino @ 74060b1 initial commit May 6, 2019
.gitignore initial commit May 6, 2019
.gitmodules initial commit May 6, 2019
LICENSE Initial commit May 5, 2019
README.md updated result with dialog flow comparison May 17, 2019

README.md

Speech to Intent Benchmark

License

This is a framework to benchmark the accuracy of Picovoice's speech-to-intent engine (a.k.a rhino) against other natural language understanding engines. For more information regarding the engine refer to its repository directly here. This repository contains all data and code to reproduce the results. In this benchmark we evaluate the accuracy of engine for the context of voice enabled coffee maker. You can listen to one of the sample commands here. In order to simulate the real-life situations we have tested in two noisy conditions (1) Cafe and (2) Kitchen. You can listen to samples of noisy data here and here.

Additionally we compare the accuracy of rhino with Google's Dialogflow.

Data

The speech commands are crowd sourced from more than 50 unique speakers. Each speaker contributed about 10 different commands. Collectively there are 625 commands used in this benchmark. Noise is downloaded from Freesound.

Usage

Clone the directory and its submodules via

git clone --recurse-submodules https://github.com/Picovoice/speech-to-intent-benchmark.git

The repository grabs the latest version of rhino as a Git submodule under rhino. All data needed for this benchmark including speech, noise, and labels are provided under data. Additionally the Dialogflow agent used in this benchmark is exported here. The benchmark code is located under benchmark.

The first step is to mix the clean speech data under clean with noise. There are two types of noise used for this benchmark (1) cafe and (2) kitchen. In order to create noisy test data enter the following from the root of the repository in shell

python benchmark/mixer.py ${NOISE}

${NOISE can be either kitchen or cafe.

Then in order to run the noisy commands through speech-to-intent engine run the following

python benchmark/picovoice.py ${NOISE}

The script creates the accuracy results for speech to intent engine. In order to run the noisy spoken commands through Dialogflow API run the following

python benchmark/dialogflow.py ${GOOGLE_CLOUD_PLATFORM_CREDENTIAL_PATH} ${GOOGLE_CLOUD_PLATFORM_PROJECT_ID} ${NOISE}

Results

Below is the result of benchmark. Command Acceptance Probability (Accuracy) is defined as the probability of the engine to correctly understand the speech command.

You can’t perform that action at this time.