Skip to content

Latest commit

 

History

History
57 lines (54 loc) · 13.8 KB

datasets.md

File metadata and controls

57 lines (54 loc) · 13.8 KB

List of datasets for robust speech processing

This is a summarized version of the data set list that can be found on the dataset page of the original rosp wiki. For more detailed information including link to the data and related paper, refer to that page. New datasets can be added to this page. In this case, please provide detailed information about the data in a webpage or paper that should be mentioned on the link cell.

Datasets use case lang. speak. style total time (h) dist. or noisy mics video cams rel. year cost (non- memb) Link
ShATR meeting UK English spontaneous 0.6 3 no 1994 free to add
LLSEC dialog read, spontaneous 1.4 4 no 1996 free to add
MicArray office US English digits, command 0.2 9-16 no 1996 free to add
RWCP Spoken Dialog Corpus dialog Japanese spontaneous 10 2 no 1996 - 1997 free to add
SUSAS stress US English command ? 1 no 1999 0.5k$ to add
Aurora-2 public spaces US English digits 33 1 no 2000 free given TIDigits (0.5 k$) to add
SPINE1, SPINE2 military US English command, spontaneous 38 2 no 2000 - 2001 7.4 k$ to add
Aurora-3 (subset of SpeechDat- Car) car various digits ? 4 no 2000 - 2003 1 k€ to add
RWCP Meeting Speech Corpus meeting Japanese spontaneous 3.5 1 3 2001 free to add
RWCP Real Environment Speech Database domestic, office US English, Japanese read ? 84 no 2001 free to add
SpeechDat- Car car various digits, command, read, spontaneous ? 4 no 2001 - 2011 39 - 182 k€ per lang to add
Aurora-4 public spaces US English read ? 1 no 2002 free given WSJ0 (1.5 k$) to add
TED seminar non-native English lecture 47 1 no 2002 0.5 k$ to add
CUAVE speech overlap US English digits 3 1 1 2002 free to add
CU-Move Microphone Array Data car US English digits, command, read, dialog 286 43624 no 2002 - 2011 25 k$ to add
PDA office US English read 1.6-3 43469 no 2003 free to add
CENSREC-1 (Aurora-2J) public spaces Japanese digits ? 1 no 2003 free to add
AVICAR car US English, non-native English read 40 7 4 2004 free to add
AV16.3 meeting N/S spontaneous 1.5 16 3 2004 free to add
ICSI Meeting Corpus meeting US English, other English meeting 72 6 no 2004 2.8 k$ to add
NIST Meeting Pilot Corpus Speech meeting US English meeting 15 7 no 2004 5.5 k$ to add
CHIL Meetings seminar, meeting non-native English seminar, meeting 60 79 - 147 43625 2004 - 2007 3.5 k€ to add
SPEECON public space, domestic, office, car various command, read, spontaneous ? 3 no 2004 - 2011 75 k€ per lang to add
CENSREC-2 car Japanese digits ? 1 no 2005 free to add
CENSREC-3 car Japanese read ? 1 no 2005 21 k¥ to add
Aurora-5 public spaces, domestic, office, car US English digits ? 1 no 2006 free given TIDigits (0.5 k$) to add
AMI meeting UK English, other English meeting 100 16 6 2006 free to add
PASCAL SSC speech overlap UK English command 8.8 1 no 2006 free to add
HIWIRE airplane non-native English command 21 1 no 2007 0.05 k€ to add
NOIZEUS public spaces US English read 0.6 1 no 2007 free to add
UT-Drive car US English command, dialog 40 5 2 2007 25 k$ to add
SASSEC, SiSEC under- determined cocktail party N/S read 0.3 2 no 2007 - 2011 free to add
MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData speech overlap UK English read 10 8 - 40 partial 2007 - 2014 1.5 k$ to add
CENSREC-4 (Simulated) public spaces, domestic, office, car Japanese digits ? 1 no 2008 free to add
CENSREC-4 (Real) public spaces, domestic, office, car Japanese digits ? 1 no 2008 free to add
DICIT domestic Italian command 6 16 2 2008 free to add
SiSEC head-geometry speech overlap N/S read 1.9 2 no 2008 free to add
COSINE dialog US English, non-native English spontaneous 38 20 no 2009 free to add
SiSEC real-world noise public spaces N/S read 0.3 43500 no 2010 free to add
SiSEC dynamic cocktail party N/S read 0.2 43500 no 2010 - 2011 free to add
CHiME 1, CHiME 2 Grid domestic UK English command 70 2 no 2011 - 2012 free to add
CHiME 2 WSJ0 domestic US English read 78 2 no 2012 free given WSJ0 (1.5 k$) to add
ETAPE TV/radio debates, outdoor interviews French spontaneous 42 1 1 2012 ? to add
GALE TV dialog Mandarin, Arabic spontaneous 120 - 251 per lang 1 no 2013 3.5 - 7 k$ per lang to add
REVERB SimData domestic, office UK English read 25 8 no 2013 free given WSJCAM0 (1.75 k$) to add
Sheffield Wargames Corpus cocktail party UK English spontaneous 7 92 3 2013 free to add
DIRHA domestic various command, read, spontaneous 11 40 no 2014 free (partial avail.) to add
CHiME 3 public spaces US English read 48 6 no 2015 free given WSJ0 (1.5 k$) to add