/
getting-data.rst
44 lines (33 loc) · 1.49 KB
/
getting-data.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Getting some data
-----------------
.. toctree::
:maxdepth: 4
You can either have some data of your own that you would like to use the
package on, or you may know of some datasets that are already in this
format that you'd like to reuse.
It may be easier to start with an extant dataset. Here is the list that
we know exists. Please note that the large majority of these data are
NOT public, and thus if you cannot retrieve them, this means you need to
get in touch with the data managers.
Public data sets
~~~~~~~~~~~~~~~~
We have prepared a `public
dataset <https://gin.g-node.org/LAAC-LSCP/vandam-data>`__ for
testing purposes which is based on the `VanDam Public Daylong HomeBank
Corpus <https://homebank.talkbank.org/access/Public/VanDam-Daylong.html>`__;
VanDam, Mark (2018). VanDam Public Daylong HomeBank Corpus.
doi:10.21415/T5388S.
From the `LAAC team <https://lscp.dec.ens.fr/en/research/teams-lscp/language-acquisition-across-cultures>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. csv-table:: List of LAAC datasets
:header-rows: 1
:file: _static/datasets.csv
EL1000
~~~~~~
The `EL1000 dataset <https://gin.g-node.org/EL1000/EL1000>`__ contains several corpora accessible
upon request.
Other private datasets
~~~~~~~~~~~~~~~~~~~~~~
We know of no other private datasets at present, but we hope one day to
be able to use `datalad’s search
feature <http://docs.datalad.org/en/stable/generated/man/datalad-search.html>`__