This is a summarized version of the data set list that can be found on the dataset page of the original rosp wiki. For more detailed information including link to the data and related paper, refer to that page. New datasets can be added to this page. In this case, please provide detailed information about the data in a webpage or paper that should be mentioned on the link cell.
Datasets | use case | lang. | speak. style | total time (h) | dist. or noisy mics | video cams | rel. year | cost (non- memb) | Link |
---|---|---|---|---|---|---|---|---|---|
ShATR | meeting | UK English | spontaneous | 0.6 | 3 | no | 1994 | free | to add |
LLSEC | dialog | read, spontaneous | 1.4 | 4 | no | 1996 | free | to add | |
MicArray | office | US English | digits, command | 0.2 | 9-16 | no | 1996 | free | to add |
RWCP Spoken Dialog Corpus | dialog | Japanese | spontaneous | 10 | 2 | no | 1996 - 1997 | free | to add |
SUSAS | stress | US English | command | ? | 1 | no | 1999 | 0.5k$ | to add |
Aurora-2 | public spaces | US English | digits | 33 | 1 | no | 2000 | free given TIDigits (0.5 k$) | to add |
SPINE1, SPINE2 | military | US English | command, spontaneous | 38 | 2 | no | 2000 - 2001 | 7.4 k$ | to add |
Aurora-3 (subset of SpeechDat- Car) | car | various | digits | ? | 4 | no | 2000 - 2003 | 1 k€ | to add |
RWCP Meeting Speech Corpus | meeting | Japanese | spontaneous | 3.5 | 1 | 3 | 2001 | free | to add |
RWCP Real Environment Speech Database | domestic, office | US English, Japanese | read | ? | 84 | no | 2001 | free | to add |
SpeechDat- Car | car | various | digits, command, read, spontaneous | ? | 4 | no | 2001 - 2011 | 39 - 182 k€ per lang | to add |
Aurora-4 | public spaces | US English | read | ? | 1 | no | 2002 | free given WSJ0 (1.5 k$) | to add |
TED | seminar | non-native English | lecture | 47 | 1 | no | 2002 | 0.5 k$ | to add |
CUAVE | speech overlap | US English | digits | 3 | 1 | 1 | 2002 | free | to add |
CU-Move Microphone Array Data | car | US English | digits, command, read, dialog | 286 | 43624 | no | 2002 - 2011 | 25 k$ | to add |
PDA | office | US English | read | 1.6-3 | 43469 | no | 2003 | free | to add |
CENSREC-1 (Aurora-2J) | public spaces | Japanese | digits | ? | 1 | no | 2003 | free | to add |
AVICAR | car | US English, non-native English | read | 40 | 7 | 4 | 2004 | free | to add |
AV16.3 | meeting | N/S | spontaneous | 1.5 | 16 | 3 | 2004 | free | to add |
ICSI Meeting Corpus | meeting | US English, other English | meeting | 72 | 6 | no | 2004 | 2.8 k$ | to add |
NIST Meeting Pilot Corpus Speech | meeting | US English | meeting | 15 | 7 | no | 2004 | 5.5 k$ | to add |
CHIL Meetings | seminar, meeting | non-native English | seminar, meeting | 60 | 79 - 147 | 43625 | 2004 - 2007 | 3.5 k€ | to add |
SPEECON | public space, domestic, office, car | various | command, read, spontaneous | ? | 3 | no | 2004 - 2011 | 75 k€ per lang | to add |
CENSREC-2 | car | Japanese | digits | ? | 1 | no | 2005 | free | to add |
CENSREC-3 | car | Japanese | read | ? | 1 | no | 2005 | 21 k¥ | to add |
Aurora-5 | public spaces, domestic, office, car | US English | digits | ? | 1 | no | 2006 | free given TIDigits (0.5 k$) | to add |
AMI | meeting | UK English, other English | meeting | 100 | 16 | 6 | 2006 | free | to add |
PASCAL SSC | speech overlap | UK English | command | 8.8 | 1 | no | 2006 | free | to add |
HIWIRE | airplane | non-native English | command | 21 | 1 | no | 2007 | 0.05 k€ | to add |
NOIZEUS | public spaces | US English | read | 0.6 | 1 | no | 2007 | free | to add |
UT-Drive | car | US English | command, dialog | 40 | 5 | 2 | 2007 | 25 k$ | to add |
SASSEC, SiSEC under- determined | cocktail party | N/S | read | 0.3 | 2 | no | 2007 - 2011 | free | to add |
MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData | speech overlap | UK English | read | 10 | 8 - 40 | partial | 2007 - 2014 | 1.5 k$ | to add |
CENSREC-4 (Simulated) | public spaces, domestic, office, car | Japanese | digits | ? | 1 | no | 2008 | free | to add |
CENSREC-4 (Real) | public spaces, domestic, office, car | Japanese | digits | ? | 1 | no | 2008 | free | to add |
DICIT | domestic | Italian | command | 6 | 16 | 2 | 2008 | free | to add |
SiSEC head-geometry | speech overlap | N/S | read | 1.9 | 2 | no | 2008 | free | to add |
COSINE | dialog | US English, non-native English | spontaneous | 38 | 20 | no | 2009 | free | to add |
SiSEC real-world noise | public spaces | N/S | read | 0.3 | 43500 | no | 2010 | free | to add |
SiSEC dynamic | cocktail party | N/S | read | 0.2 | 43500 | no | 2010 - 2011 | free | to add |
CHiME 1, CHiME 2 Grid | domestic | UK English | command | 70 | 2 | no | 2011 - 2012 | free | to add |
CHiME 2 WSJ0 | domestic | US English | read | 78 | 2 | no | 2012 | free given WSJ0 (1.5 k$) | to add |
ETAPE | TV/radio debates, outdoor interviews | French | spontaneous | 42 | 1 | 1 | 2012 | ? | to add |
GALE | TV dialog | Mandarin, Arabic | spontaneous | 120 - 251 per lang | 1 | no | 2013 | 3.5 - 7 k$ per lang | to add |
REVERB SimData | domestic, office | UK English | read | 25 | 8 | no | 2013 | free given WSJCAM0 (1.75 k$) | to add |
Sheffield Wargames Corpus | cocktail party | UK English | spontaneous | 7 | 92 | 3 | 2013 | free | to add |
DIRHA | domestic | various | command, read, spontaneous | 11 | 40 | no | 2014 | free (partial avail.) | to add |
CHiME 3 | public spaces | US English | read | 48 | 6 | no | 2015 | free given WSJ0 (1.5 k$) | to add |