Tone-WordLength

Scripts for preparing data and replicating analyses for the paper "Tones and word length across languages".

tones.R contains code for main analyses and figures. Instructions for accessing data to be combined for the purposes of the analyses are in the first part of the script. It produces pho2.RData, pho3.RData, and wld.RData, which are also supplied for convenience. To produce the two first, languages.tsv and tones.tsv (from EURPhon) are needed. In case someone should want to carry out their own analyses not using the script, the following two files are offered: pho_tones_count_wl.txt and pho_tones_pa_wl. The first has the essential data on word length and the number of tones and is a subset of the latter, which also includes additional data for some languages where only data on presence and absence of tone (not counts) were available. The files are generated within the script and are not used as physical files, but are offered here for convenience.

analyze_TeDDi.R extracts word length data for Bible and Universal Declaration of Human Rights texts from TeDDi (Moran et al. 2022. TeDDi sample: Text Data Diversity sample for language comparison and multilingual NLP. LREC 2022, 1150–1158) and correlates it with word length data from ASJP. The script stores the word length data from TeDDi in two R objects, wl_bibles.RData and wl_udhrs.RData, which are supplied here for convenience in case it is desirable to avoid having to download and process the large (773 MB) TeDDi_v0_1.RData file.

analyze_NorthEuraLex.R extracts word length data from NorthEuraLex (Dellert et al. 2020. NorthEuraLex: a wide-coverage lexical database of Northern Eurasia. Lang. Resources & Evaluation 54, 273–301). It correlates that with ASJP data, and also performs correlations of ASJP 40- and 100-item lists.

phylogenetic_correlation.R performs phylogenetic correlation for 16 families with 6 or more members and at least one tonal language. It requires matrices with lexical distances, supplied here as 16 files (called Afro-Asiatic_LDN.txt, Athabaskan-Eyak-Tlingit_LDN.txt, etc.) contained in LDN_matrices.zip. This zipped file should be unzipped and the files placed flatly in the folder with the script together with other required files mentioned in phylogenetic_correlation.R. The script explains how the creation of these files can be replicated. From the matrices branch lengths are supplied to Glottolog trees. Other input to the analysis is data on tones and word length for each of 16 families. This is prepared by the script which then calls BayesTraitsV4.exe, which should be downloaded to the working directory from http://www.evolution.reading.ac.uk/BayesTraitsV4.0.1/BayesTraitsV4.0.1.html. The direct input to and output from Bayestraits is generated by the script and BayesTraits itself, but is supplied here in two zipped files for documentation: BayesTraitsInput.zip and BayesTraitsOutput.zip. A summary of the results, also generated by the script, is in results_BT_correlation.txt. Every run of BayesTraits will produce slightly different results, which is why all output files are supplied here for documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tone-WordLength

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
BayesTraitsInput.zip		BayesTraitsInput.zip
BayesTraitsOutput.zip		BayesTraitsOutput.zip
LICENSE		LICENSE
README.md		README.md
analyze_TeDDi.R		analyze_TeDDi.R
languages.tsv		languages.tsv
pho2.RData		pho2.RData
pho3.RData		pho3.RData
pho_tones_count_wl.txt		pho_tones_count_wl.txt
pho_tones_pa_wl.txt		pho_tones_pa_wl.txt
phylogenetic_correlation.R		phylogenetic_correlation.R
results_BT_correlation.txt		results_BT_correlation.txt
tones.R		tones.R
tones.tsv		tones.tsv
wl_bibles.RData		wl_bibles.RData
wl_udhrs.RData		wl_udhrs.RData
wld.RData		wld.RData

License

Sokiwi/Tone-WordLength

Folders and files

Latest commit

History

Repository files navigation

Tone-WordLength

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages