Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

unit-test codecov standard

Bespoken Benchmarking Project

This is Bespoken's open-source benchmarking project.

This provides a general mechanism for testing and evaluating NLP platforms.

We have conducted two tests so far:


We interact with the voice assistants using the Bespoken Device Service - which allow us to interact exactly as a real person would with an actual device. Read more here.

For running the tests and collecting the results, we leverage our batch testing framework:

Benchmark Results

Results are meant to published on a bi-monthly basis. The table below summarizes our tests and results to-date:

Date Test Type Data Set Platforms Results
7/26/2020 General Knowledge ComQA Alexa, Google Assistant, Siri Link
11/20/2020 Speech Recognition DefinedCrowd Amazon Connect, Google Dialogflow, Twilio Voice Link

The published results are viewable here:


General Knowledge

We classify answers as correct or not by the presence of the answer from the dataset.

In the case where the dataset has multiple answers, if anyone is present we include it. Read more here

Speech Recognition Accuracy

We take datasets from DefinedCrowd and run them through the various platforms using our Virtual Devices for IVR:

Read more here


We appreciate all feedback. Open an issue to suggest additional datasets as well as improvements to our methodology.

Contact us at