# Simulating Language, Lab 5, Rational Speech Act model

This lab introduces the Rational Speech Act model, which is a way of doing Bayesian modelling of pragmatic inference and rational behaviour during communication. The RSA model is potentially very general, but here we are going to use it to model pragmatic word use, and see how pragmatic speakers and listeners can go beyond literal word use to facilitate communication. Our model of word meaning builds on the model we used in Lab 2; now we are going to move beyond single words to (very small!) lexicons, and model inference during communication rather than learning.

## Representing meanings, words, and lexicons

As in Lab 2, we are going to assume that the meaning of a word is the set of things that that word can be used to refer to, and we are going to model "things that words refer to" (referents) as numbers. So a word meaning is just a set of numbers, representing the set of things that the word can be used to refer to. As in Lab 2, we will represent word meanings as sets, e.g. like this:
```python
a_word_meaning = {0,1,2}
```
We also need a representation of words, which we will just represent as strings (in python, you can represent these using single- or double-quoted strings of characters, e.g.:
```python
a_word = 'word'
another_word = "floccinaucinihilipilification"
```
Now we have our representation of meanings and words, we can model a lexical entry, which is a pairing of a meaning and its associated word. We are going to represent these as *tuples* in python, which for our purposes work a bit like lists but use round brackets instead of square brackets. So this is how we would represent a lexical entry for a word which refers to referents 0, 1 and 2:
```python
a_lexical_entry = ({0,1,2},'dax')
```
You can access the elements in a tuple by index in the same way you would pull items out from a list, e.g. `a_lexical_entry[0]` will give you the meaning from `a_lexical_entry`, `a_lexical_entry[1]` will pull out the word.

Check you can access the meaning and word from `a_lexical_entry` in this way, then create your own lexical entry for a word, *fep*, which can be used to refer to referents 6, 7, 8, and 9.

In [1]:
a_lexical_entry = ({0,1,2},'dax')
print(a_lexical_entry[0])
print(a_lexical_entry[1])
fep_lexical_entry = ({6,7,8,9},'fep')
print(fep_lexical_entry)

{0, 1, 2}
dax
({8, 9, 6, 7}, 'fep')


Now we have a way of representing meanings, words, and single lexical entries, we can represent a lexicon, which is just going to be a list of lexical entries. For instance, a lexicon with three very specific words which refer to single referents, would be represented like this:
```python
lab_tutor_lexicon = [({0},'tamar'),({1},'henry'),({2},'andres')]
```

Create `lab_tutor_lexicon` by copying that line of code into a cell. How would you access the first lexical entry from this lexicon? How would you access the meaning of the 2nd entry? How would you access the word from the 3rd entry? How would you count how many lexical entries the lexicon contains?

In [2]:
lab_tutor_lexicon = [({0},'tamar'),({1},'henry'),({2},'andres')]
#first entry
print(lab_tutor_lexicon[0])
#meaning of second entry
print(lab_tutor_lexicon[1][0])
#word from third entry
print(lab_tutor_lexicon[2][1])
#number of lexical entries
print(len(lab_tutor_lexicon))

({0}, 'tamar')
{1}
andres
3


## The code

Now we've introduced the notation we are going to be using to represent lexicons we can get on and build the model. As usual we start by loading the libraries we need and various libraries for manipulating log probabilities.  

In [3]:
from scipy.special import logsumexp
from math import log, log1p, exp

%matplotlib inline
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg', 'pdf')

### Functions for dealing with log probabilities

As per Lab 4, we are going to deal with log-probabilities rather than raw probabilities, so we need a couple of functions for dealing with those. I am also providing a little utility function that will convert log probabilities to normal probabiltities, which you might find helpful when it comes to looking at the model results. 

In [4]:
def log_subtract(x,y):
    return x + log1p(-exp(y - x))

def normalize_logprobs(logprobs):
    logtotal = logsumexp(logprobs) #calculates the summed log probabilities
    normedlogs = []
    for logp in logprobs:
        normedlogs.append(logp - logtotal) #normalise - subtracting in the log domain
                                           #is equivalent to dividing in the normal domain
    return normedlogs

#A little utility function that converts a list of log probabilities to normal probabilities 
#Note there are other ways this could be written, e.g. using list comprehensions
def logprobs_to_probs(logprobs):
    probs = []
    for logp in logprobs:
        probs.append(exp(logp))
    return probs

### Possible words and possible referents
We'll start out with a very simple two-word lexicon, where one of the words has a specific meaning and the other has a more general meaning that subsumes the meaning of the first word - that's `small_lexicon` below. 

In order to make the code work smoothly we also have to specify what words our model has to consider, and what possible referents there are that words can refer to. These are just lists. **Important point to note for when you are creating your own lexicons**: if your lexicon refers to a word that isn't in `possible_words` or referent that isn't in `possible_referents`, or there are possible words or referents that aren't covered in your lexicon, your code will break! 

In [5]:
small_lexicon = [({0}, 'word1'), ({0, 1, 2}, 'word2')]
possible_words = ['word1','word2']
possible_referents = [0,1,2]

### A literal speaker and a literal listener

We'll start by writing functions for our literal speaker (*s0* in Frank & Goodman's terms) and a literal listener (*l0*). The literal speaker and literal listener just use words as dictated by their lexicon, without doing any fancy pragmatics. 

For our model of a literal speaker, we are going to provide a target referent (the object the literal speaker wants to refer to) and the speaker's lexicon, and the literal speaker will return a (log) probability distribution over possible words. The values in this probability distribution give the (log) probability of the literal speaker producing each of those words in order to label the target referent. The literal speaker behaves as you would expect - they are equally likely to use any of the words that include the target referent in their meaning. We also include a small probability of the literal speaker making a mistake and using the wrong word to refer to an object - this is specified by `error_probability`, it ensures we never deal with zero probabilities (which don't play nice in the log domain) but you can think of it as the probability of lexical access errors by the speaker. It can be very small if you like.

In [6]:
error_probability = 0.001 #you can make this as small as you like, as long as it's not 0

def s0(target_referent,lexicon):
    word_probs = []
    for candidate_word in possible_words: #consider each possible word
        for meaning,word in lexicon: #work through the lexicon
            if word==candidate_word: #if the word for this lexical entry is the one we want
                if target_referent in meaning: #can this word be used to refer to this referent?
                    word_probs.append(log(1-error_probability)) #if yes, it is likely to be used                
                else:
                    word_probs.append(log(error_probability)) #otherwise it is unlikely to be used 
    return normalize_logprobs(word_probs) #normalise these so they are true log probabilities

Check that the literal speaker behaves as expected using `small_lexicon`. How likely are they to produce word1 to convey target referent 0? How likely are they to use word2? You can find this out by calling `s0(0,small_lexicon)` or, if you'd rather see probabilities than log probabilities, `logprobs_to_probs(s0(0,lexicon))`. If you want to see the words printed out alongside their probabilities of being produced you can do something like:
```python
s0_probabilities = logprobs_to_probs(s0(0,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s0_probabilities[i])
```

*OK, so using the suggested code above, here are the probabilities for the two available words (word1 and word2) for a literal speaker labelling referent 0. Referent 0 can be labelled using either word, so the literal speaker just decides that both are equally possible.*

In [7]:
s0_probabilities = logprobs_to_probs(s0(0,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s0_probabilities[i])

word1 0.5
word2 0.5


Check the literal speaker's behaviour for referents 0, 1 and 2, and make sure you understand why the literal speaker behaves the way they do.

*And here are the outputs for refernts 1 and 2. According to `small_lexicon` these can only be labelled using word 2, so that's what the literal speaker does (note that, because of `error_probability`, a small probability is assigned to word1 in both cases).*

In [8]:
print("For referent 1")
s0_probabilities = logprobs_to_probs(s0(1,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s0_probabilities[i])

print("For referent 2")
s0_probabilities = logprobs_to_probs(s0(2,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s0_probabilities[i])

For referent 1
word1 0.0010000000000000002
word2 0.9989999999999999
For referent 2
word1 0.0010000000000000002
word2 0.9989999999999999


We can do the same for a literal listener, who receives a word and, based on their lexicon, returns a (log) probability distribution over referents, indicating how likely they think it is that the word they heard refers to each possible referent. The literal hearer just assumes that all referents included in the word's meaning are equally likely, and any others are very unlikely (but can occur with some small probability given by `error_probability`).

In [9]:
def l0(received_word,lexicon):
    referent_probs = []
    for candidate_referent in possible_referents: #consider each possible referent in turn
        for meaning,word in lexicon: #consider each lexical entry
            if word==received_word: #if this lexical entry matches the received word
                if candidate_referent in meaning: #if the candidate referent is included in the word's meaning 
                    referent_probs.append(log(1-error_probability)) #then it's likely to be this referent they arer talking about
                else:
                    referent_probs.append(log(error_probability)) #otherwise it's quite unlikely
    return normalize_logprobs(referent_probs) #normalise at the end

How likely does a literal speaker who uses `small_lexicon` think each referent is after hearing word1? Again, you can see this probability distribution by running something like the following:
```python
l0_probabilities = logprobs_to_probs(l0('word1',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l0_probabilities[i])
```
Also check out the literal listener estimates given word2, and make sure you understand them.

*Here are the literal listener probabilities over each referent after hearing word1 and word2. Word1 can only be used to refer to referent 0, so that's where the literal listener assigns the probability; word2 is 3-ways-ambiguous for the literal listener (it can refer to referents 0, 1 or 2), so the probabiity is split over those three interpretations.*

In [10]:
print("After hearing word1")
l0_probabilities = logprobs_to_probs(l0('word1',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l0_probabilities[i])
print("After hearing word2")
l0_probabilities = logprobs_to_probs(l0('word2',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l0_probabilities[i])

After hearing word1
referent 0 0.9980019980019978
referent 1 0.0009990009990009992
referent 2 0.0009990009990009992
After hearing word2
referent 0 0.3333333333333333
referent 1 0.3333333333333333
referent 2 0.3333333333333333


### A pragmatic speaker and a pragmatic listener

Now we can move on to define our pragmatic speaker (*s1*), who reasons about a literal listener when selecting their word to convey an intended referent, and a pragmatic listener, (*l1*), who reasons about a pragmatic speaker when interpreting words. Because the l1 model builds on the s1 model, we will build the s1 model first.

Just like the literal speaker, our s1 speaker has a target referent they'd like to convey, and a lexicon that they are going to use. But unlike s0, who just looks up the target referent in their lexicon in a fairly dumb way, s1 is going to be pragmatic: they will reason about how a literal listener might interpret the word they produce, and weight their production in favour of words that are more likely to be interpreted as conveying their intended referent. The crucial line in the s1 definition below is `l0_prob_of_target_referent = l0(candidate_word,lexicon)[target_referent]`: if I use this particular `candidate_word`, what's the probability that the literal listener will correctly conclude I was talking about `target_referent`? For the pragmatic speaker, this *probability of correct interpretation by a literal listener* is what determines the probability of producing a particular word - the pragmatic speaker is more likely to use words that have a better chance of being interpreted correctly by the literal listener.

In [11]:
def s1(target_referent,lexicon):
    word_probs = []
    for candidate_word in possible_words: #consider each candidate word
        l0_prob_of_target_referent = l0(candidate_word,lexicon)[target_referent] #how likely is the literal listener  
                                                            #to choose target_referent if they hear this word?
        word_probs.append(l0_prob_of_target_referent)#note that down
    return normalize_logprobs(word_probs) #and normalise at the end

Use `s1(0,small_lexicon)` to figure out how likely the pragmatic speaker is to produce each word in order to convey referent 0 to a literal listener; e.g. you could do:
```python
s1_probabilities = logprobs_to_probs(s1(0,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
```
Are these the same or different than the literal speaker probabilities for conveying the same target referent? Can you figure out why the pragmatic speaker behaves like this? It might help to look back at the literal listener's behaviour. Similarly, what does the pragmatic speaker do to convey referents 1 and 2? Why?

*Here are the s1 pragmatic speaker probabilities for each word, given that they want to convey referent 0 to a pragmatic listener. Recall that the literal speaker would be equally likely to use word1 and word2 to do this. However, the pragmatic speaker knows that the literal listener finds word2 ambiguous (see above - l0 will interpret word2 as meaning referent 0 or 1 or 2). As a result the pragmatic speaker prefers to use word1 rather than word2 - s1 has 3 times to probability of using word1 because it's 3 times more likely to be correctly interpreted as conveying referent 0 by the literal listener.*

In [12]:
s1_probabilities = logprobs_to_probs(s1(0,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])

word1 0.7496248124062032
word2 0.2503751875937969


*Here's the same stuff for a pragmatic speaker attempting to convey referents 1 and 2 - there's nothing fancy going on here, a literal listener would never interpret word1 as conveying either of these referents so the pragmatic speaker has no choice but to use word2.*

In [13]:
print("For referent 1")
s1_probabilities = logprobs_to_probs(s1(1,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("For referent 2")
s1_probabilities = logprobs_to_probs(s1(2,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])

For referent 1
word1 0.002988047808764942
word2 0.9970119521912352
For referent 2
word1 0.002988047808764942
word2 0.9970119521912352


Now we have the pragmatic speaker we can define our pragmatic listener, l1, who reasons about the behaviour of a pragmatic speaker to calculate a probability distribution over referents given a received word. The key line in the definition of our l1 listener is `s1_prob_of_word_given_meaning = s1(candidate_referent,lexicon)[word_index]`: how likely is a pragmatic speaker to use this word to convey some `candidate_referent`? That determines the pragmatic listener's interpretation of the word - they reason about how a pragmatic speaker would be likely to behave, and interpret the word they received accordingly.

In [14]:
def l1(received_word,lexicon):
    word_index = possible_words.index(received_word) #.index tells me the index of received_word in possible_words
    referent_probs = []
    for candidate_referent in possible_referents: #consider each possible referent
        s1_prob_of_word_given_meaning = s1(candidate_referent,lexicon)[word_index] #how likely is the pragmatic speaker
                                                                #to use this word to convey this candidate referent?
        referent_probs.append(s1_prob_of_word_given_meaning) #note that down
    return normalize_logprobs(referent_probs) #and normalise at the end

How does a pragmatic listener who uses `small_lexicon` interpret word 1? You can get a nicely formatted list as follows:
```python
l1_probabilities = logprobs_to_probs(l1('word1',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
```
And how does the pragmatic listener interpret word2? Again, make sense of these behaviours in terms of the behaviour of the pragmatic speaker, who l1 is reasoning about.

*OK, so here are the pragmatic listener's probabilities for each referent after hearing word1 and word2. Just like for the literal listener, word1 is unambiguous - it can only be used to label referent 0 by the pragmatic speaker (because it can only be interpreted as meaning referent 0 by the literal listener), so the pragmatic listener interprets it as conveying referent 0. Stuff gets more interesting for word2. Remember that for a literal listener this word was 3-ways ambiguous, equally likely to be interpreted as conveying referents 0, 1 or 2. However, a pragmatic speaker has a preferred way of conveying referent 0 - they will tend to use word1, and will mainly use word 2 to refer to referents 1 and 2. That means, for a pragmatic listener reasoning about the pragmatic speaker, word2 becomes less ambiguous - the pragmatic speaker is unlikely to use it to refer to referent 0, so the pragmatic listener tends to interpret it as conveying referent 1 or referent 2 (equally likely).*

In [15]:
print("After hearing word1")
l1_probabilities = logprobs_to_probs(l1('word1',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("After hearing word2")
l1_probabilities = logprobs_to_probs(l1('word2',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])

After hearing word1
referent 0 0.9920909364267967
referent 1 0.003954531786601677
referent 2 0.003954531786601677
After hearing word2
referent 0 0.11155555555555559
referent 1 0.4442222222222223
referent 2 0.4442222222222223


## Questions
The main aim for today's lab is to work through the questions above and make sure your understand why the RSA model behaves the way it does. Once you are happy with that, attempt the questions below.
1. How do synonymy (several ways of expressing a given meaning) and ambiguity (several meanings expressed using the same signal) in the lexicon affect communication between literal speakers and literal listeners? How does the availability of pragmatic inference (i.e. moving up to s1 and l1) change the consequences of ambiguity? 

*The `small_lexicon` example illustrates these various points. Ambiguity is generally a problem, and especially so for literal speakers and listeners. For example, imagine a literal speaker attempting to convey referent 1 to a literal listener. The literal speaker uses word2, because that's what's specified in their lexicon. However, the literal listener will often interpret word2 as conveying referent 0 or referent 2, because it's ambiguous; as a result, they will fail to communicate about referent 1 most of the time (on average, two thirds of the time), and the same for referent 2. They also don't do great with referent 0, but this time because of the combination of synonymy and ambiguity in `small_lexicon`. There are two ways of expressing referent 0, which in itself isn't necessarily a problem - you can construct a toy lexicon where every referent has several words associated with it and where perfect communication is posisble. But in `small_lexicon` one of the synonymous forms, word2, is ambiguous; as a result, the literal speaker will often use word2 to label referent 0, and when that happens the literal listener is unlikely to arrive at the correct interpretation.*

*Some but not all of these problems are reduced for pragmatic speakers and listeners. In particular, the pragmatic speaker is aware of the ambiguity of word2, and as a result tends to avoid it when other options are available (i.e. for referent 0, where word1 is a better option). For referents 1 and 2 they have no choice to but to use the highly ambiguous word2, because they have no other means available for expressing those referents. But the pragmatic listener can reduce the ambiguity a little, because they know that the pragmatic speaker has a better way of expressing referent 0, so word2 becomes effectively 2-and-a-bit-ways ambiguous. Note that it's not magic though - in particular, the problems posed by ambiguity are reduced, but not eliminated.*

2. Are there cases where perfect communication between pragmatic individuals is possible even if their lexicon contains ambiguous entries? Is perfect communication possible when the lexicon *only* contains ambiguous entries? Play around with different lexicons to find out.

*The aim here was encourage you to play around with different lexicons to explore this. In answer to the first part of the question: I don't think **perfect** communication is possible when there's ambiguity, but decent communication should be possible even when the lexicon contains ambiguity. For example, in `small_lexicon2` below (a minor extension to `small_lexicon`), there is one highly ambiguous word but in practice, for pragmatic speakers and listeners, they still communicate succesfully most of the time.* 

In [16]:
small_lexicon2 = [({0}, 'word1'), ({0, 1, 2}, 'word2'),({2}, 'word3')]
possible_words = ['word1','word2','word3']
possible_referents = [0,1,2]

In [17]:
print("S1, for referent 0")
s1_probabilities = logprobs_to_probs(s1(0,small_lexicon2))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S1, for referent 1")
s1_probabilities = logprobs_to_probs(s1(1,small_lexicon2))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S1, for referent 2")
s1_probabilities = logprobs_to_probs(s1(2,small_lexicon2))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])

S1, for referent 0
word1 0.7490627343164209
word2 0.25018745313671586
word3 0.0007498125468632841
S1, for referent 1
word1 0.002979145978152929
word2 0.9940417080436941
word3 0.002979145978152929
S1, for referent 2
word1 0.0007498125468632841
word2 0.2501874531367158
word3 0.7490627343164209


In [18]:
print("L1, after hearing word1")
l1_probabilities = logprobs_to_probs(l1('word1',small_lexicon2))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L1, after hearing word2")
l1_probabilities = logprobs_to_probs(l1('word2',small_lexicon2))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L1, after hearing word3")
l1_probabilities = logprobs_to_probs(l1('word3',small_lexicon2))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])

L1, after hearing word1
referent 0 0.9950464935247344
referent 1 0.003957463939204809
referent 2 0.0009960425360607953
L1, after hearing word2
referent 0 0.1674147963424772
referent 1 0.6651704073150456
referent 2 0.16741479634247713
L1, after hearing word3
referent 0 0.0009960425360607953
referent 1 0.003957463939204809
referent 2 0.9950464935247344


*Regarding the second part of the question, I couldn't come up with any cases where the lexicon contains **only** ambiguous cases but communication nonetheless succeeds most of the time - in the examples above, there is always at least one signal which is unambiguous, and reasioning about the use of that unambiguous signal reduces ambiguity elsewhere. So for instance, `lexicon3` should work OK despite using 3 ambiguous words, but it's reliant on word1 being unambiguous.*

In [19]:
lexicon3 = [({0}, 'word1'), ({0, 1, 2, 3}, 'word2'),({1,2,3}, 'word3'),({0,3}, 'word4')]
possible_words = ['word1','word2','word3','word4']
possible_referents = [0,1,2,3]

In [20]:
print("S1, for referent 0")
s1_probabilities = logprobs_to_probs(s1(0,lexicon3))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S1, for referent 1")
s1_probabilities = logprobs_to_probs(s1(1,lexicon3))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S1, for referent 2")
s1_probabilities = logprobs_to_probs(s1(2,lexicon3))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S1, for referent 3")
s1_probabilities = logprobs_to_probs(s1(3,lexicon3))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("L1, after hearing word1")
l1_probabilities = logprobs_to_probs(l1('word1',lexicon3))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L1, after hearing word2")
l1_probabilities = logprobs_to_probs(l1('word2',lexicon3))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L1, after hearing word3")
l1_probabilities = logprobs_to_probs(l1('word3',lexicon3))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L1, after hearing word4")
l1_probabilities = logprobs_to_probs(l1('word4',lexicon3))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])

S1, for referent 0
word1 0.5707484649084265
word2 0.1431156060656265
word3 0.00019094810682538586
word4 0.2859449809191217
S1, for referent 1
word1 0.0017068062190903897
word2 0.4275549578821427
word3 0.5698831259830027
word4 0.0008551099157642853
S1, for referent 2
word1 0.0017068062190903897
word2 0.4275549578821427
word3 0.5698831259830027
word4 0.0008551099157642853
S1, for referent 3
word1 0.0009209056324094245
word2 0.23068686091856078
word3 0.30747988533374543
word4 0.46091234811528436
L1, after hearing word1
referent 0 0.9924627954592974
referent 1 0.002967930315463129
referent 2 0.002967930315463129
referent 3 0.0016013439097762804
L1, after hearing word2
referent 0 0.11645712751754302
referent 1 0.3479132962481121
referent 2 0.3479132962481121
referent 3 0.1877162799862328
L1, after hearing word3
referent 0 0.0001319215244314055
referent 1 0.3937187541542961
referent 2 0.3937187541542961
referent 3 0.2124305701669763
L1, after hearing word4
referent 0 0.38198954971040733
refe

3. [Mainly a coding problem] There's no reason to stop at `l1`, a pragmatic listener who reasons about a pragmatic speaker who reasons about a a literal listener - we could happy model `s2` (a pragmatic speaker who reasons about an `l1` listener), `l2` (a pragmatic listener who reasons about `s2`), or even higher (s3, l3, s4, l4, etc). Either adapt the code for `s1` and `l1` to produce code for `s2` and `l2` (which should be relatively straightforward) or (more ambitiously), write two recursive functions, `sn` and `ln` which can handle *any* level of recursion. Using either of these techniques, what happens when we model higher and higher levels of pragmatic reasoning? 

*I will just provide the non-recursive definitions for s2 and l2 and leave you the fun of writing the recursive one yourself - the only thing to remember if you attempt that is that sn models ln-1, but ln models sn, that's just the naming convention. Notice that my s2 and l2 definitions are literally just cut-and-paste of the s1 and l1 definitions, but s2 reasons about l1 rather than l0 and l2 reasons about s2 rather than s1.*

In [21]:
def s2(target_referent,lexicon):
    word_probs = []
    for candidate_word in possible_words: #consider each candidate word
        l0_prob_of_target_referent = l1(candidate_word,lexicon)[target_referent] #how likely is the literal listener  
                                                            #to choose target_referent if they hear this word?
        word_probs.append(l0_prob_of_target_referent)#note that down
    return normalize_logprobs(word_probs) #and normalise at the end

def l2(received_word,lexicon):
    word_index = possible_words.index(received_word) #.index tells me the index of received_word in possible_words
    referent_probs = []
    for candidate_referent in possible_referents: #consider each possible referent
        s1_prob_of_word_given_meaning = s2(candidate_referent,lexicon)[word_index] #how likely is the pragmatic speaker
                                                                #to use this word to convey this candidate referent?
        referent_probs.append(s1_prob_of_word_given_meaning) #note that down
    return normalize_logprobs(referent_probs) #and normalise at the end


*To see what difference this makes, let's try s2 and l2 on `small_lexicon`, which we already tested extensively above. Remember I have to redefine `possible_words` and `possible_referents` whenever I redefine the lexicon.*

In [22]:
small_lexicon = [({0}, 'word1'), ({0, 1, 2}, 'word2')]
possible_words = ['word1','word2']
possible_referents = [0,1,2]

In [23]:
print("S1, for referent 0")
s1_probabilities = logprobs_to_probs(s1(0,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S1, for referent 1")
s1_probabilities = logprobs_to_probs(s1(1,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S1, for referent 2")
s1_probabilities = logprobs_to_probs(s1(2,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S2, for referent 0")
s1_probabilities = logprobs_to_probs(s2(0,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S2, for referent 1")
s1_probabilities = logprobs_to_probs(s2(1,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])
print("S2, for referent 2")
s1_probabilities = logprobs_to_probs(s2(2,small_lexicon))
for i in range(len(possible_words)):
    print(possible_words[i],s1_probabilities[i])

S1, for referent 0
word1 0.7496248124062032
word2 0.2503751875937969
S1, for referent 1
word1 0.002988047808764942
word2 0.9970119521912352
S1, for referent 2
word1 0.002988047808764942
word2 0.9970119521912352
S2, for referent 0
word1 0.8989209349497581
word2 0.10107906505024204
S2, for referent 1
word1 0.00882359861645974
word2 0.9911764013835402
S2, for referent 2
word1 0.00882359861645974
word2 0.9911764013835402


*So the s2 speaker just shows a stronger preference to avoid word2 when labelling referent 0, because they know that l1 is likely to interpret word2 as conveying something other than referent 0 despite its literal ambiguity.*

In [24]:
print("L1, after hearing word1")
l1_probabilities = logprobs_to_probs(l1('word1',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L1, after hearing word2")
l1_probabilities = logprobs_to_probs(l1('word2',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L2, after hearing word1")
l1_probabilities = logprobs_to_probs(l2('word1',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])
print("L2, after hearing word2")
l1_probabilities = logprobs_to_probs(l2('word2',small_lexicon))
for i in range(len(possible_referents)):
    print('referent',possible_referents[i],l1_probabilities[i])

L1, after hearing word1
referent 0 0.9920909364267967
referent 1 0.003954531786601677
referent 2 0.003954531786601677
L1, after hearing word2
referent 0 0.11155555555555559
referent 1 0.4442222222222223
referent 2 0.4442222222222223
L2, after hearing word1
referent 0 0.9807464425029756
referent 1 0.009626778748512222
referent 2 0.009626778748512222
L2, after hearing word2
referent 0 0.04851565660082567
referent 1 0.47574217169958716
referent 2 0.47574217169958716


*Similarly, l2 shows a slight sharpening of the patterns seen in l1. The gains don't look like they'd be huge though, so assuming there is some cost to extra levels of recursive reasoning, you might wonder whether and when it's worth the cognitive effort.*

4. [Hard] Our `s0` and `s1` functions are models of how a (literal or pragmatic) speaker would speak given a particular lexicon: in other words, it's modelling a likelihood, p(data|lexicon), where data is a pairing of intended referents and words. Think about how you could use this as the basis for a model of word learning where the learner's task is to infer an entire lexicon based on the productions of a speaker using that lexicon (Hint: to do this you need to decide what the hypothesis space is and what the prior is). Is learning easier or harder when you learn from a literal or pragmatic speaker? If you are feeling brave, implement the model, otherwise have a think about what kinds of challenges a learner learning from a pragmatic speaker might face when it comes to inferring the "correct" (i.e. same as the teacher's) word meanings.

*So in this model the hypothesis space would be the set of all possible lexicons (which might get very large!) and we'd have to define a prior over lexicons, which commits us to making a decision about which lexicons are more likely. You could just use a uniform prior, but we'll see in the next lab (when we model compositionality) that actually thinking about things like the complexity of the lexicon (how many distinct forms does it use?) makes sense in these cases.*

*I haven't provided an implementation of this full model, but just thinking about it: one of the challenges facing a learner who learns from pragmatic speakers is that the way they use words doesn't reflect their **real** meaning, which might meake it hard for learners to figure out what that real meaning is. For instance, in the `small_lexicon` example above, both the speaker and listener thing that word2 refers to referents 0, 1 and 2, but in practice when they use the word they tend to avoid using it for referent 0. As a result, and based on a small sample of observations of this behaviour, a learner might learn the wrong meaning for word2, namely `{1,2}`. That's the wrong thing to do if your goal is to infer the same lexicon as the person generating your data, but it does suggest that, if combined with iterated learning, this might work nicely as a model of language change where initial pragmatic flexibility becomes lixicalised/grammaticalised, resulting in changes in the meaning of words or expressions.*