This is a simple application to help researchers calculate Log Likelihood (LL) and Odds Ratio (OR) statistics against a word list from the BNC.
Compiled packages exist for both Windows and Mac OS. If you have Python-3 installed, you can run the source code directly on any platform. Just navigate to the source directory and run:
- Download the appropriate version
- Unzip the folder. Note that ALL contents of the folder are necessary, do not remove any of them. If you need to move the .exe file, move the entire folder together
- Run the executable file from the folder by double-clicking it
- Under the "File" menu, load either a file or folder. Each time more data is loaded, the statistics are re-calculated.
- The program will create a word frequency list from your corpus and use it to calculate LL and OR against the BNC. Results are sorted by LL, but you can re-sort or save the results as a CSV and sort them by other criteria using MS Excel, LibreOffice, or similar programs
: Note that the running time will depend on your corpus size. If your corpus is large, it may take some time to run the calculations, so please be patient. Results are displayed when complete.
The tokenizer splits on spaces, and converts upper case letters to lower. For punctuation, the rules are:
- If it is followed by a space, the punctuation is separated from the word.
- Most punctuation counts as a word boundary (but is not included in words).
- Numbers may contain: 0-9, commans, periods, and semi-colors (eg: 12:45, or 1,300.00).
- Single quotes split a word in two and are attached to the second word.
You're fine, fire-truck! you 're fine fire truck "'Tis", replied Aunt Helga. tis replied aunt helga Don't tell someone what they can or can't do don 't tell someone what they can or can 't do There are 100,000,000,000.00 words in the BNC. there are 100,000,000,000.00 words in the bnc
The BNC word list is a CSV file and can be located in LL_OR_BNC/Data directory. It is encoded in UTF8 which may cause some characters to display incorrectly in MS Excel.
v1.2 - July 3rd, 2014
- Fix display bug on OSX for information windows
- Increase font size
- Tokenization has been modified to be more similar to that of AntConc
- Added option to filter out numbers
- Added option to filter by minimum frequency
- Added support for Home, End, Page Up, and Page Down keys
v1.1 - June 6th, 2014
- Minor changes to OR algorithm
- Add sorting by all columns
- UI enhancements
- Updated wordlist and tokenization
- Added proper help and about information