Skip to content

bespoken/nlp-benchmark

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
src
 
 
 
 
web
 
 
 
 
 
 
 
 
 
 
 
 
 
 

unit-test codecov standard

Bespoken Benchmarking Project

This is Bespoken's open-source benchmarking project.

This provides a general mechanism for testing and evaluating NLP platforms.

We have conducted two tests so far:

Process

We interact with the voice assistants using the Bespoken Device Service - which allow us to interact exactly as a real person would with an actual device. Read more here.

For running the tests and collecting the results, we leverage our batch testing framework:
https://gitlab.com/bespoken/batch-tester

Benchmark Results

Results are meant to published on a bi-monthly basis. The table below summarizes our tests and results to-date:

Date Test Type Data Set Platforms Results
7/26/2020 General Knowledge ComQA Alexa, Google Assistant, Siri Link
11/20/2020 Speech Recognition DefinedCrowd Amazon Connect, Google Dialogflow, Twilio Voice Link

The published results are viewable here:
https://benchmark.bespoken.io

Methodology

General Knowledge

We classify answers as correct or not by the presence of the answer from the dataset.

In the case where the dataset has multiple answers, if anyone is present we include it. Read more here

Speech Recognition Accuracy

We take datasets from DefinedCrowd and run them through the various platforms using our Virtual Devices for IVR:

Read more here

Contact

We appreciate all feedback. Open an issue to suggest additional datasets as well as improvements to our methodology.

Contact us at contact@bespoken.io.