Skip to content

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Notifications You must be signed in to change notification settings

MissRu/Leaderboard

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeechColab ASR leaderboard

1. Overview

"If you can’t measure it, you can’t improve it." -- Peter Drucker

Regarding to the current state of Automatic Speech Recognition(ASR), the term "State-Of-The-Art"(SOTA) is kind of vague in the sense that:

  • For industry, there is no objective and quantative benchmark on how these commercial APIs perform in real-life scenarios, at least in public domain.
  • For academia, it is becoming harder today to compare ASR models due to the fragmentation of research toolkits and ecosystems.
  • How are academic SOTA and industrial SOTA related ?

Overview

As above figure shows, SpeechIO leaderboard serves as an ASR benchmarking platform, by providing 3 components:

  1. TestSet Zoo: A collection of test sets covering wide range of speech recognition scenarios
  2. Model Zoo: A collection of models including commercial APIs and open-sourced pretrained models
  3. An automated benchmarking pipeline:
    • defines a simplest-possible specification on recognition interface, the format of input test sets, the format of output recognition results.
    • As long as model submitters conform to this specification, a fully automated pipeline will take care of the rest (e.g. data preparation -> recognition invocation -> text post processing -> WER/CER/SER evaluation)

With SpeechIO leaderboard, anyone can benchmark, reproduce, compare others' systems on local machine, as long as they are published in model zoo and test-set zoo.


2. TestSet Zoo

Test Sets From Public Academic Datasets

已公开
Unlocked
编号
TEST_SET_ID
说明
DESCRIPTION
语言
LANGUAGE
LIBRISPEECH_TEST_CLEAN "test_clean" set of LibriSpeech en
LIBRISPEECH_TEST_OTHER "test_other" set of LibriSpeech en
GIGASPEECH_V1.0.0_DEV dev set of GigaSpeech en
GIGASPEECH_V1.0.0_TEST test set of GigaSpeech en
AISHELL1_TEST test set of AISHELL-1 zh
AISHELL2_IOS_TEST test set of AISHELL-2 (iOS channel) zh
AISHELL2_ANDROID_TEST test set of AISHELL-2 (Android channel) zh
AISHELL2_MIC_TEST test set of AISHELL-2 (Microphone channel) zh

SpeechIO Test Sets (ZH)

SpeechIO test sets are carefully curated by SpeechIO authors, crawled from publicly available sources (Youtube, TV programs, Podcast etc), covering various well-known acoustic scenarios(AM) and content domains(LM & vocabulary), labeled by professional annotators.

已公开
Unlocked
编号
TEST_SET_ID
名称
Name
场景
Scenario
内容领域
Topic Domain
时长
hours
难度(1-5)
Difficulty
SPEECHIO_ASR_ZH00000 接入调试集
For leaderboard submitter debugging
视频会议、论坛演讲
video conference & forum speech
经济、货币、金融
economy, currency, finance
1.0 ★★☆
SPEECHIO_ASR_ZH00001 新闻联播 新闻播报
TV News
时政
news & politics
9
SPEECHIO_ASR_ZH00002 鲁豫有约 访谈电视节目
TV interview
名人工作/生活
celebrity & film & music & daily
3 ★★☆
SPEECHIO_ASR_ZH00003 天下足球 专题电视节目
TV program
足球
Sports & Football & Worldcup
2.7 ★★☆
SPEECHIO_ASR_ZH00004 罗振宇跨年演讲 会场演讲
Stadium Public Speech
社会、人文、商业
Society & Culture & Business Trend
2.7 ★★
SPEECHIO_ASR_ZH00005 李永乐老师在线讲堂 在线教育
Online Education
科普
Popular Science
4.4 ★★★
SPEECHIO_ASR_ZH00006 张大仙 & 骚白 王者荣耀直播 直播
Live Broadcasting
游戏
Game
1.6 ★★★☆
SPEECHIO_ASR_ZH00007 李佳琪 & 薇娅 直播带货 直播
Live Broadcasting
电商、美妆
Makeup & Online shopping/advertising
0.9 ★★★★☆
SPEECHIO_ASR_ZH00008 老罗语录 线下培训
Offline lecture
段子、做人
Life & Purpose & Ethics
1.3 ★★★★☆
SPEECHIO_ASR_ZH00009 故事FM 播客
Podcast
人生故事、见闻
Ordinary Life Story Telling
4.5 ★★☆
SPEECHIO_ASR_ZH00010 创业内幕 播客
Podcast
创业、产品、投资
Startup & Enterprenuer & Product & Investment
4.2 ★★☆
SPEECHIO_ASR_ZH00011 罗翔 刑法法考培训讲座 在线教育
Online Education
法律 法考
Law & Lawyer Qualification Exams
3.4 ★★☆
SPEECHIO_ASR_ZH00012 张雪峰 考研线上小讲堂 在线教育
Online Education
考研 高校报考
University & Graduate School Entrance Exams
3.4 ★★★☆
SPEECHIO_ASR_ZH00013 谷阿莫&牛叔说电影 短视频
VLog
电影剪辑
Movie Cuts
1.8 ★★★
SPEECHIO_ASR_ZH00014 贫穷料理 & 琼斯爱生活 短视频
VLog
美食、烹饪
Food & Cooking & Gourmet
1 ★★★☆
SPEECHIO_ASR_ZH00015 单田芳 白眉大侠 评书
Traditional Podcast
江湖、武侠
Kongfu Fiction
2.2 ★★☆
SPEECHIO_ASR_ZH00016 德云社相声演出 剧场相声
Theater Crosstalk Show
包袱段子
Funny Stories
1 ★★★
SPEECHIO_ASR_ZH00017 吐槽大会 脱口秀电视节目
Standup Comedy
明星糗事
Celebrity Jokes
1.8 ★★☆
SPEECHIO_ASR_ZH00018 小猪佩奇 & 熊出没 少儿动画
Children Cartoon
童话故事、日常
Fairy Tale
0.9 ★☆
SPEECHIO_ASR_ZH00019 CCTV5 NBA 比赛转播 体育赛事解说
Sports Game Live
篮球、NBA
NBA Game
0.7 ★★★
SPEECHIO_ASR_ZH00020 篮球人物 纪录片
Documentary
篮球明星、成长
NBA Super Stars' Life & History
2.2 ★★
SPEECHIO_ASR_ZH00021 汽车之家 车辆评测 短视频
VLog
汽车测评
Car benchmarks, Road driving test
1.7 ★★★☆
SPEECHIO_ASR_ZH00022 小艾大叔 豪宅带看 短视频
VLog
房地产、豪宅
Realestate, Mansion tour
1.7 ★★★
SPEECHIO_ASR_ZH00023 无聊开箱 & Zealer评测 短视频
VLog
产品开箱评测
Unboxing
2 ★★★
SPEECHIO_ASR_ZH00024 付老师种植技术 短视频
VLog
农业、种植
Agriculture, Planting
2.7 ★★★☆
SPEECHIO_ASR_ZH00025 石国鹏讲古希腊哲学 线下培训
Offline lecture
历史,古希腊哲学
History, Greek philosophy
1.3 ★★☆
SPEECHIO_ASR_ZH00026 张震鬼故事 广播节目
Broadcasting Program
鬼故事
Horror Stories
2.4 ★★★
SPEECHIO_ASR_ZH00027 华语辩论世界杯 辩论赛
Debates Contest
兴趣、技能、成长
Hobby, Skill, Growth
1.4 ★★★
SPEECHIO_ASR_ZH00028 时政现场同传 同声传译
Simultaneous Translation
时政、社会公共治理
News & Events on Public Governance
2.1 ★★★☆

To pull a unlocked test set from cloud to your local dataset-zoo leaderboard/datasets/*:

ops/pull dataset <TEST_SET_ID>

3. Model Zoo

Cloud API Models

API models are usually small (basically client programs), so we normally put them in this github repo.

已公开
Unlocked
编号
MODEL_ID
类型
type
模型作者/所有人
model author/owner
简介
description
链接
Service URL
aispeech_api_zh Cloud API 思必驰
AISpeech
思必驰开放平台 https://cloud.aispeech.com
aliyun_api_en Cloud API 阿里巴巴
Alibaba
阿里云 https://www.alibabacloud.com/product/intelligent-speech-interaction
aliyun_api_zh Cloud API 阿里巴巴
Alibaba
阿里云 https://ai.aliyun.com/nls/asr
baidu_pro_api_zh Cloud API 百度
Baidu
百度智能云(极速版) https://cloud.baidu.com/product/speech/asr
google_api_en Cloud API 谷歌
Google
谷歌云 https://cloud.google.com/speech-to-text
Cloud API 讯飞
IFlyTek
讯飞开放平台(听写) https://www.xfyun.cn/services/voicedictation
iflytek_lfasr_api_zh Cloud API 讯飞
IFlyTek
讯飞开放平台(转写) https://www.xfyun.cn/services/lfasr
microsoft_rest_api_en Cloud API 微软
Microsoft
Azure https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
microsoft_rest_api_zh Cloud API 微软
Microsoft
Azure https://azure.microsoft.com/zh-cn/services/cognitive-services/speech-services/
microsoft_sdk_en Cloud API 微软
Microsoft
Azure https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
microsoft_sdk_zh Cloud API 微软
Microsoft
Azure https://azure.microsoft.com/zh-cn/services/cognitive-services/speech-services/
sogou_api_zh Cloud API 搜狗
Sogou
AI开放平台 https://ai.sogou.com/product/one_recognition/
tencent_api_zh Cloud API 腾讯
Tencent
腾讯云 https://cloud.tencent.com/product/asr
yitu_api_zh Cloud API 依图
YituTech
依图语音开放平台 https://speech.yitutech.com

Local Engine (Open-sourced Pretrained ASR Models)

Local models/engines are normally too large for github, so we store these models in cloud.

已公开
Unlocked
编号
MODEL_ID
类型
type
模型作者/所有人
model author/owner
简介
description
speechio_kaldi_multicn pretrained model Xingyu NA(那兴宇) Kaldi multi_cn recipe
wenet_multi_cn pretrained model Binbin Zhang(张彬彬)@wenet-e2e WeNet multi_cn recipe
vosk_model_cn batteries-included local engine alphacephei Chinese engine of Vosk
wenet_wenetspeech pretrained model Binbin Zhang(张彬彬)@wenet-e2e WeNet wenetspeech recipe

To pull a unlocked model from cloud to your local model-zoo leaderboard/models/*:

ops/pull model <MODEL_ID>

4. Benchmarking Pipeline

To submit your model to leaderboard and get it benchmarked over all(including locked) test sets, follow this Specification

Also you can pull publicly unlocked models & test sets, and trigger benchmarking pipeline on your local machine via:

ops/leaderboard_runner requests/request.yaml

the content of request.yaml is described in above specification.


5. Latest Leaderboard Report

result


Contacts

Email: leaderboard@speechio.ai

About

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.5%
  • Shell 9.9%
  • Dockerfile 5.6%