type_token_ratio.py
is a program designed to calculate the Type-Token Ratio from speech sample. More information about the Type-Token Ratio can be obtained by searching the ASHA web-site using the term "type token ratio".
To process a speech sample, it must be saved as a text file containing a list of utterances. Two common text file formats are .csv
and .txt
and can be created using MS Excel or any text editor, such as TextEdit on Mac OS X, Microsoft Notepad or WordPad on Windows, and gedit on Linux.
To download this as a stand-alone application, click on the appropriate link, Mac, Windows, or Linux. Once the file downloads, you should be able to double-click on the resulting file to run the application. Select a csv file that includes only the speech samples you want analyzed, and the application will calculate the frequency of unique words.
The command to calculate the Type-Token Ratio for a word sample saved with the filename sample_utterances.csv
will be:
python ~/path/to/type_token_ratio.py ~/path/to/sample_utterances.csv
The corresponding output will be:
================================================================================
Copyright (C) 2013 Steven C. Howell
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
================================================================================
Counter({'chicken': 3, 'good': 2, 'ate': 1, 'for': 1, 'i': 1, 'is': 1, 'the': 1})
unique words: ['ate', 'for', 'i', 'is', 'good', 'chicken', 'the']
total utterances: 3
total words: 10
unique words: 7
unique/total: 0.7
output saved to: sample_utterances_out.txt
Additionally a text file containing the results will be created in the same directory. The file name for this output file will match the input file with _out
added at the end and the file extension changed to .txt
. For this example, the output file will be called sample_utterances_out.txt
. The contents of this file will have the following format:
Type-Token Ratio: 0.7
total utterances: 3
total words: 10
unique words: 7
1 ate
1 for
1 i
1 is
2 good
3 chicken
1 the
Counter({'chicken': 3, 'good': 2, 'ate': 1, 'for': 1, 'i': 1, 'is': 1, 'the': 1})
For those using Mac OS X who are unfamiliar with using a unix terminal, these steps can be followed to simplify the process. Please note that the commands must be typed exactly, replacing sample.csv
with the csv file name used in step 2.
- Create a fresh worksheet within a MS Excel spreadsheet of all the utterances you want to analyze and no other information (this can be a copy of another worksheet with everything removed except the utterances you want to analyze).
- Use Save As to save this current worksheet as a
csv
file. For this example, we will say you saved it assample_utterances.csv
:- From the File menu, select Save As.
- Type in the desired file name.
- Select the down arrow next to the file name box to choose to places the file on the Desktop.
- In the File Format drop down menu select the Comma Separated Values (.csv) option.
- Save a copy of
type_token_ratio.py
to your Desktop (you can download this file by right clicking here, selecting save as, then chosing to save it to the Desktop). - On your Mac, open up Launchpad, then choose the Utilities menu, and run the Terminal program (which has an icon similar to a TV screen).
- In the screen that opens, type the command
cd
and pressEnter
. - Type the command
cd Desktop/
and pressEnter
. - Type the command
python ./type_token_ratio.py sample.csv
and pressEnter
. - The result are shown in the terminal window and the output file,
sample_out.txt
(same name as the input with_out
added to the end andcsv
changed totxt
), will be located on the desktop.
For those using Windows who are unfamiliar with running Python scripts, these steps can be followed to simplify the process. Please note that the commands must be typed exactly, replacing sample.csv
with the csv file name used in step 4.
- Download the Anaconda Python Distribution from this webpage: https://www.continuum.io/downloads (it is recommended to select the Python 3.5 64-bit installer)
- Install Anaconda Python, following the installation prompts
- Install for: Just Me (recommended)
- Destination Folder: `C:\Users\schowell\Anaconda3
- Advanced Options: Yes to all
- Installation will proceed for a few minutes
- After the installation finishes, select "Next"pwd , unselect "Learn more about Anaconda Cloud", then select "Finish"
- Create a fresh worksheet within a MS Excel spreadsheet of all the utterances you want to analyze and no other information (this can be a copy of another worksheet with everything removed except the utterances you want to analyze).
- Use Save As to save this current worksheet as a
csv
file. For this example, we will say you saved it assample_utterances.csv
:- From the File menu, select Save As.
- Type in the desired file name.
- Select the down arrow next to the file name box to choose to places the file on the Desktop.
- In the File Format drop down menu select the Comma Separated Values (.csv) option.
- Save a copy of
type_token_ratio.py
to your Desktop (you can download this file by right clicking here, selecting save as, then chosing to save it to the Desktop). - Open a command prompt by opening the Start Menu (press windows key), type
cmd
, then pressEnter
. - Type the command
cd Desktop
and pressEnter
. - Type the command
python type_token_ratio.py sample.csv
and pressEnter
. - The result are shown in the command prompt window and the output file,
sample_out.txt
(same name as the input with_out
added to the end andcsv
changed totxt
), will be located on the desktop.
-
If the input file extension is
xlsx
,xls
,doc
,docx
, or any other non-text, binary file format, the program will fail to run or the output will not represent the Type-Token Ratio of the language sample. -
If the a correct path and filename are not provided for both the
type_token_ratio.py
program, and the language sample text file (e.g.,sample_utterances.csv
in the above example), on error will be displayed similar to one of the two following errors:
bash: ./type_token_ration.py: No such file or directory
or
Traceback (most recent call last):
File "type_token_ratio.py", line 41, in <module>
f = open(sys.argv[1]) # open the file
IOError: [Errno 2] No such file or directory: 'sample_uterances.csv'