This package is an early stage of development. Function definitions are subject to change.
This R package is designed to make it easier to carry out analysis of monophthong covariation.
The package is written in a tidyverse
style. Plotting functions produce ggplot objects, which can be modified by the user in the usual ggplot fashion. PCA visualisations use functions from the factoextra
package. The functions to fit by-vowel models, by default, use mgcv
.
The origin of these functions is in work towards Brand et al. (2021) and Wilson Black et al. (under review), carried out at the New Zealand Institute of Language, Brain and Behaviour and the University of Canterbury.
The package includes two pairs of datasets. One is designed to allow exploration of analysis in the style of Brand et al. (2021), the second is designed to follow the analysis of Wilson Black et al. (2022).
- A sample of 100 speakers from ONZE.
onze_vowels
: the mean F1 and F2 values for 100 speakers in the ONZE corpus, with 50 born at or before 1920 and 50 after.onze_intercepts
: random intercepts for the 100 speakers inonze_means
. Intercepts generated by Brand et al. (2021).
- A sample of 77 speakers from QuakeBox, with 11 in each age category.
qb_vowels
: vowel tokens for speaker monologues after filtering to remove unstressed tokens, stop words, and outliers.qb_intervals
: monologues divided into 60 and 240 second intervals for the same speakers asqb_vowels
.
It is recommended to use these datasets in order to try out methods. We do not recommend that these are directly used for research. Full anonymised datasets are given in supplementary data for Brand et al. (2021) and Wilson Black et al. (under review). If you are interested in research on the basis of the ONZE or QuakeBox corpora, contact NZILBB.
Apply Lobanov 2.0 normalisation as developed in Brand et al. (2021). This variant of Lobanov normalisation is designed to work for datasets whether the vowel types have different token counts from one another. The Lobanov 2.0 value for a vowel is given by:
Take a PCA object produced by prcomp
and plot a PC by the percentage
contribution of the original variables to the PC. For instance:
A cutoff value of 50% is given by default. This highlights the collection of
highest-contribution variables which account for 50% of the PC in question. This
can be turned off by setting cutoff = NULL
.
The PC to plot is selected by the argument pc_n
.
This function runs a permutation test on a PCA analysis. Given data in a format
acceptable by prcomp
(a data frame with all numeric columns), the function
repeats the analysis on n
permutations of the data in each column. It
returns an S3 object with attributes:
-
$permuted_variances
n
xpc_no
matrix of variances explained by firstpc_no
PCs inn
permutations of original data. -
$permuted_correlations
list of lengthn
of significant pairwise correlations inn
permutations of the data (<= 0.05). -
$actual_variances
pc_n
x 2 tibble of variances explained by firstpc_n
PCs with original data. -
$actual_correlations
the number of significant pairwise correlations (<= 0.05) in the original data.
Plots the results of the permutation test. If argument violin
is set to
FALSE
, then a the values for each permutation are connected with a line.
The plots for the number of significant correlations and for variance explained
by each PC are joined together using the patchwork
library.
In the near future, this will be turned into a generic plot for the permutation_test
class
TODO
TODO
TODO
It is often useful to have Wells lexical sets in small capitals in R Markdown documents. The way to achieve this manually is to add "vowel". This package includes an RStudio add in, which can be attached to a keyboard shortcut, to quickly include this code within an R Markdown document. For instance: