Supplementary Online Material for the First Edition of "Corpus Linguistics: A Guide to the Methodology" by Anatol Stefanowitsch
"Corpus Linguistics: A Guide to the Methodology" is a corpus linguistics textbook by Anatol Stefanowitsch, published by Language Science Press. The directories listed below contain Supplementary Online Material for some of the case studies presented in the book. The directory names are arbitrary four-character sequences cross-referenced in the book, typically as a note accompanying the table presenting the results of the respective case study. The descriptions below also contain references to the section numbers of the respective case study in the first edition of the book. Each directory contains an info file with further information and at least one data file, typically in csv format.
Note: The files containing co-occurrence frequencies typically also contain a column with the corresponding values for the G statistic.
4DYT: Complete concordance of begin and start as matrix verbs in the LOB corpus, annotated for type of complement, embedded verb and aktionsart of the embedded verb as discussed in Section 8.2.3.1.
83MR: The complete co-occurrence frequencies of the adjectives little and small with nouns and adjectives, and the nouns boy and girl with adjectives as discussed in Section 7.2.4.1.
847K: Frequency of potential military keywords in the BROWN and LOB corpora as discussed in Section 10.2.2.1.
BYHW: Lemmatized frequency lists of the nouns functioning as direct objects to the verbs cause, bring about and lead to as discussed in Section 7.2.3.2.
CUBF: Frequency of the name Karl Marx in the Google Books archives for German and English as discussed in Section 10.2.5.2.
CWPX: Complete co-occurrence frequencies for adjectives coordinated with and as discussed in Section 7.2.2.2.
D9P4: Complete co-occurrence frequencies for nouns with the adjectives high and tall as discussed in Section 7.2.2.2.
DVJ9: Concordance of begin and start as matrix verbs in the British National Corpus where they are preceded by immediately, suddenly, quickly, slowly, gradually, or eventually, annotated for type of adverb, complement, embedded verb as discussed in Section 8.2.3.1.
EWTN: Lists of hits for the affixes -ise/-ize/-yse/-yze and -ify in the LOB Corpus in their order of occurrence.
FXHV: Co-occurrence frequencies of the collocational framework [a _ of] and its collocates in the Written Academic subsection of the British National Corpus Baby edition as discussed in Section 10.2.1.1.
H4BM: Concordance of analytic and synthetic comparatives of the adjectives angry, empty, friendly, lively, risky and sorry from the British National Corpus, annotated for whether there is a comparative form in the preceding 7 and/or 20 words, and if so, what type as discussed in Section 8.2.4.4.
HKD3: Data set (csv) for the case studies of the English possessive constructions presented in Sections 5.2, 5.3 and 5.4.
K7BC: Complete frequencies for words occurring in the grammar pattern [there VERB something ADJECTIVE about] in the British National Corpus as discussed in Section 8.2.1.2.
KVCF: Complete concordance of electric/electrical LOB and BROWN corpora, annotated for modified noun and type of modified noun according to the categories described in Section 9.2.1.2.
KVMN: Frequency of words (and other tokens) in the 2017 election manifestos of the British parties Labour and Liberal Debocrats as discussed in Section 10.2.4.1.
LAF3: Lists of hits for the affixes -ship in the genres Fiction and Newspapers in the British National Corpus in their order of occurrence as discussed in Section 9.2.2.1.
LKTH: Complete co-occurrence frequencies for adjectives and selected degree adverbs as discussed in Section 7.2.1.1.
LMY7: Lists of hits for the affixes -icle and mini- in the British National Corpus in their order of occurrence.
MNH4: Co-occurrence frequencies for the equine nouns and their adjectival collocates presented in Section 7.1.3.6.
MPXF: Co-occurrence frequencies of the collocational framework [a _ of] and its collocates in the Written Academic subsection of the British National Corpus Baby edition compared to the rest of the corpus as discussed in Section 10.2.1.2.
NFHY: Sample of monosyllabic binomials annotated for sonority as shown in Table 8.18.
PYX4: Frequency of words (and other tokens) in the 2001 and 2017 election manifestos of the British Labour party as discussed in Section 10.2.4.1.
Q8DT: Frequency of verbs in the going-to future and the will future in the Corpus of Late Modern English Texts in three periods of English as discussed in Section 10.2.5.1.
QXVR: Complete co-occurrence frequencies, percentages and G values for words occurring in the collocational framework [a _ of] in the British National Corpus as discussed in Section 8.2.1.1.
RLW8: Frequencies for the co-occurrence of verbs of communication with the subject pronouns he and she in the British National Corpus as discussed in Section 8.2.6.1.
TXQP: Various corpus files. Check the TXQP-info file for more details.
U7BR: Annotated frequency list of words containing the suffixes -ic and -ical in the LOB corpus as discussed in Section 9.2.1.3.
UH9B: Frequency of words in the Commerce subsection of the British National Corpus, Baby Edition, compared to the frequency of words in the rest of the corpus as discussed in Section 11.2.3.1.
VQTL: Frequencies of the suffixes -ic and -ical with stems containing the affixes -olog- and -ist in the British National Corpus as discussed in Section 9.2.1.4.
VU79: Frequency of words (and other tokens) in the 2001 and 2017 election manifestos of the British Labour party as discussed in Section 10.2.4.1.
WLVF: Concordance of ADJ-NOUN sequences with hot, warm, cool, cold in the British National Corpus, Baby Edition, annotated for metaphoricity and type of metaphor as discussed in section 11.2.1.1.
Y7JC: Complete co-occurrence frequencies of the adjectives electric and electrical with the nouns they modify attributively in the British National Corpus as discussed in Section 9.2.1.2.
If you are interested in helping to maintain this material, please send an email to clmbook@stefanowitsch.net.
To the extent that the material is copyrightable and unless otherwise noted, it is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, see https://creativecommons.org/licenses/by-sa/4.0/. You are advised to check the licensing conditions of the corpora from which the data are derived for potential further restrictions.