Skip to content
Significance testing for collocates.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Build Status DOI


CollocateR is a package for the statistical programming language R. Albeit imperfectly, the package increasingly uses functions and workflows from the tidyverse and tidytext packages.


CollocateR serves a simple purpose. It processes collocates for keywords in context in text files and calculates significance for them, based on tests set out in Barnbrook et al's Collocation: Applications and Implications, Palgrave 2013, and formulae explained in the British National Corpus home.


- save_collocates: Return a list containing a tokenised version of the original document, a record of the node in original and hashed format, lists of left and right collocate locations, and document word_length.

  • get_freqs: A frequency count for collocates, both in context and in the document in general
  • pmi: a 'pointwise mutual information' significance test based on the probability of nodes and collocates occurring together compared to the probability of their occurring independently.
  • npmi: as above, but normalised so all results occur between 1 (perfect collocation) and -1 (the terms never collocate).
  • z-score: a probability test comparing probability of collocate occurring in near the node versus its occurrence across the text


  • save_collocates
  • pmi
  • npmi
  • z-score
  • MI Cubed
  • log_log
  • log_likelihood
  • Import other elements


README generated with readme2tex.

You can’t perform that action at this time.