This script contains the basic functions needed to perform functional load calculations.

In this case, we apply the functions to the classic [Surendran and Niyogi (2006)](http://people.cs.uchicago.edu/~dinoj/fload_bookchapter.pdf) 's toy example.



#1. Introduction
First, we import the standard modules Counter and Math.

In [3]:
from collections import Counter
import math

Then, we encode the toy example in Surendran and Niyogi (2006) as a string.

In [4]:
toy = 'atuattatuatatautuaattuua'

#2. Extract ngrams

Now, we need a function to extract ngrams given a parameter k.

In [5]:
def ngrams(text, k=1):
    '''
    :param text: the input text
    :param k: the order of the Markov model
    :return: ngram counts
    '''
    counts = Counter()
    if k == 0: #return unigrams if k=0
        counts = Counter(text)
    else: #return k+1grams if k>1
        for index, letter in enumerate(text[:-k]):
            counts[text[index:index+k+1]] +=1
    return counts


By applying the function to the text with k=1, we obtain the phoneme bigram distribution presented in Surendran and Niyogi (2006).

In [7]:
for bigram, count in ngrams(toy).items():
  print(bigram,count)

at 6
tu 4
ua 4
tt 2
ta 3
au 1
ut 1
aa 1
uu 1


#3. Calculate entropy

Now that we have an ngram distribution, we can calculate entropy using the classic Shannon formula:

<a href="https://www.codecogs.com/eqnedit.php?latex=H_{kS}(L)&space;=&space;\frac{1}{k&plus;1}&space;\left(-\sum_{x&space;\in&space;X}^{}&space;p(x)log_{2}p(x)\right)" target="_blank"><img src="https://latex.codecogs.com/gif.latex?H_{kS}(L)&space;=&space;\frac{1}{k&plus;1}&space;\left(-\sum_{x&space;\in&space;X}^{}&space;p(x)log_{2}p(x)\right)" title="H_{kS}(L) = \frac{1}{k+1} \left(-\sum_{x \in X}^{} p(x)log_{2}p(x)\right)" /></a>


In [8]:
def entropy(text, k=1):
    '''
    :param text: the input text
    :param k: the order of the Markov model
    :return: entropy
    '''
    ngrams_dic = ngrams(text, k) #retrieves ngrams
    total = sum(ngrams_dic.values()) #ngram total
    sommation = 0
    for value in ngrams_dic.values(): #sommation
        sommation += value/total * math.log(value/total, 2)
    sommation = sommation / (k+1)
    return -sommation


We can now calculate the entropy of our toy example:

In [15]:
print('The entropy of the text is %s' % round(entropy(toy),3))

The entropy of the text is 1.43


#4. Estimate functional load

Through the functional load formula defined by Hockett (1955):

<a href="https://www.codecogs.com/eqnedit.php?latex=FL(x,y)&space;=&space;\frac{H(L)&space;-&space;H(L_{xy})}{H(L)}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?FL(x,y)&space;=&space;\frac{H(L)&space;-&space;H(L_{xy})}{H(L)}" title="FL(x,y) = \frac{H(L) - H(L_{xy})}{H(L)}" /></a>

We can estimate the functional load of every contrast *(x,y)* in the text.

In [10]:
def functional_load(text, phon1, phon2):
    '''
    :param text: the input text
    :param phon1: phoneme replaced
    :param phon2: phoneme used as replacement
    :return: the different in entropy between the two states
    '''
    merged_text = text.replace(phon1, phon2)
    return (entropy(text)-entropy(merged_text))/entropy(text)


Now we can estimate entropy loss for each contrast.

In [16]:
print('Entropy loss after a merger between [a] and [u] is %s' % round(functional_load(toy, 'a', 'u'),3))
print('Entropy loss after a merger between [a] and [t] is %s' % round(functional_load(toy, 'a', 't'),3))
print('Entropy loss after a merger between [t] and [u] is %s' % round(functional_load(toy, 't', 'u'),3))

Entropy loss after a merger between [a] and [u] is 0.345
Entropy loss after a merger between [a] and [t] is 0.425
Entropy loss after a merger between [t] and [u] is 0.381
