Chinese Structure Dataset from Szeto et al.'s (2018) paper in CLDF-Format

Notes on the dataset

This is a structural dataset originally published along with a paper by Szeto et al. (2018) on Chinese dialect classification:

Szeto, P. Y.; Ansaldo, U. & Matthews, S.Typological variation across Mandarin dialects: An areal perspective with a quantitative approach Linguistic Typology, 2018, 22, 233-275.",

The raw data which you can find in the folder raw/ was extracted and typed off from the original paper, which contains the major data, but unfortunately does not list any sources as per dataset.

The data is converted using cldfbench. We will show later how to use this.

Install requirements

$ pip install -e .[test]

Running the script

$ cldfbench makecldf cldfbench_szetosinitic.py

Test with `pycldf` API

$ pytest
$ cldf stats cldf/cldf-metadata.json 
<cldf:v1.0:StructureDataset at cldf>
                          value
------------------------  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
dc:bibliographicCitation  Szeto, P. Y.; Ansaldo, U. & Matthews, S.Typological variation across Mandarin dialects: An areal perspective with a quantitative approach Linguistic Typology, 2018, 22, 233-275.
dc:conformsTo             http://cldf.clld.org/v1.0/terms.rdf#StructureDataset
dc:identifier             https://github.com/cldf-datasets/szetosinitic
dc:license                http://www.apache.org/licenses/LICENSE-2.0
dc:source                 sources.bib
dc:title                  Structure Dataset on Chinese Dialects
dcat:accessURL            https://github.com/cldf-datasets/szetosinitic
prov:wasDerivedFrom       [{'rdf:about': 'https://github.com/cldf-datasets/szetosinitic', 'rdf:type': 'prov:Entity', 'dc:created': 'v1.0-7-g3c848be', 'dc:title': 'Repository'}, {'rdf:about': 'https://github.com/glottolog/glottolog', 'rdf:type': 'prov:Entity', 'dc:created': 'v4.7', 'dc:title': 'Glottolog'}]
prov:wasGeneratedBy       [{'dc:title': 'python', 'dc:description': '3.8.10'}, {'dc:title': 'python-packages', 'dc:relation': 'requirements.txt'}]
rdf:ID                    szetosinitic
rdf:type                  http://www.w3.org/ns/dcat#Distribution

                Type              Rows
--------------  --------------  ------
values.csv      ValueTable         882
languages.csv   LanguageTable       42
parameters.csv  ParameterTable      21
codes.csv       CodeTable           42
sources.bib     Sources              1

Converting to NEXUS format

$ python nexus.py

Acknowledgements

Thanks a lot to David Morrison for extracting the major data table from the PDF provided along with the original paper.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
cldf		cldf
etc		etc
examples		examples
raw		raw
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
cldfbench_szetosinitic.py		cldfbench_szetosinitic.py
metadata.json		metadata.json
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py

License

cldf-datasets/szetosinitic

Folders and files

Latest commit

History

Repository files navigation

Chinese Structure Dataset from Szeto et al.'s (2018) paper in CLDF-Format

Notes on the dataset

Install requirements

Running the script

Test with pycldf API

Converting to NEXUS format

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages

Test with `pycldf` API