# COGS 2201: Problem set 3

## The Cohort Model of Word Recognition

For this problem set, you will first work with an implementation of the [Cohort Model of word recognition](http://psycholinguistics.wikifoundry.com/page/The+Cohort+Model+of+Word+Recognition), devised by William Marslen-Wilson. Read that link, but in brief, it is a model of how people understand a spoken word as it unfolds over time. For example, as the word 'cat' is heard bit by bit -- 'c', 'ca', 'cat' -- how are we activating words (and hopefully eventually the right word!) in our head?

Below is a quick and dirty implementation of the Cohort Model that I wrote. Read through it, and see if you can understand what it's doing, and how it implements the description of the model in the link above. (It may be easier once you see an example output of the model a couple cells below.)

In [1]:
def cohortModel(word, EnglishWords):
    soFar = ''                                # when we start listening to a word, we of course haven't heard anything yet, so we'll represent that as an empty string
    activated = list(EnglishWords)            # we'll maintain a list of the activated words...we'll start by assuming all the words we know are possible and thus activated
    timeCourse = [ list(activated) ]          # we'll make a list -- timeCourse -- of the activated words after each letter; so this will be a list of lists of strings!
    for letter in word:                       # then start listening to the word letter by letter
        soFar = soFar + letter                # add the newly heard letter to the portion of the word heard so far
        for word in list(activated):          # now look through the currently activated words
            if not word.startswith(soFar):    # if this particular currently activated word is inconsistent with the input so far
                activated.remove(word)        # ...then deactivate that word
        timeCourse.append( list(activated) )  # add currently activated words to timeCourse...and continue the loop  
    return timeCourse

After you run the cell above, test the cohort model with a fragment of English:

In [2]:
EnglishWords = ['cathedral', 'cat','dog','catheter']
context      = "I went to London and saw St. Paul's "
word         = 'cathedral'
testRun      = cohortModel(word, EnglishWords)

Now show what the `timeCourse` of the word recognition process is for our `testRun`. It is  less critical to understand how this next chunk works. It is just printing the output (and part of the input) in an interpretable fashion.

In [3]:
for letter, timeSlice in zip('_' + word, testRun):
    print("After letter {}, the activated words are".format(letter),timeSlice)

After letter _, the activated words are ['cathedral', 'cat', 'dog', 'catheter']
After letter c, the activated words are ['cathedral', 'cat', 'catheter']
After letter a, the activated words are ['cathedral', 'cat', 'catheter']
After letter t, the activated words are ['cathedral', 'cat', 'catheter']
After letter h, the activated words are ['cathedral', 'catheter']
After letter e, the activated words are ['cathedral', 'catheter']
After letter d, the activated words are ['cathedral']
After letter r, the activated words are ['cathedral']
After letter a, the activated words are ['cathedral']
After letter l, the activated words are ['cathedral']


Now, answer some questions about the model!

** Question 1:** Describe the *above implementation* of the cohort model in terms of Marr's three levels. One or two sentences per level will suffice.

**Question 2:** How does the run-time of the model depend on the size of the lexicon, `EnglishWords`, that it is given? In other words, how does the length of time the computer takes to run `cohortModel` depend on the size of `EnglishWords`? You can answer this in an informal way -- no need to get mathematically precise. You can answer this just by analyzing the code, but you could also test your analysis by running the model with lexicons of different sizes. If you pursue this path, you may find what I wrote below helpful...

In [6]:
import time

def squares_to_n(n):
    squares = []
    for x in range(n):
        squares.append(x)
    return squares

start_time = time.time()
my_squares = squares_to_n(10**7)
duration = time.time() - start_time

print(duration)

1.0554327964782715


**Question 3:** If my implementation of the cohort model were an accurate model of human word recognition (it's not, but pretend it is!), then what empirical predictions does your answer to question two make of human performance?

**Question 4:** Discuss the modularity of this model. What information passes between the model and the broader program? What information does not? Put another way, what information does model expose, or not?

## A silly but hopefully instructive model of categorization

We went through all that trouble of teaching you some Python basics. We should probably have you write something, too, huh? How about this:

Later in the semester we'll talk about concepts and categorization. That is, what exactly are concepts like 'water' or 'freedom', and when we come across things in the world, how do we decide what concept or category a thing belongs to?

Let's preview this a little bit, by imagining an incredibly simple world and model of categorization: to decide if something is a cat or a dog (supposing, for this moment, that no other concepts exist), you simply count up whether it has more cat-like features or more dog-like features. That is, if it meows, is small, and plays with string, it's a cat. If it's big, barks, and plays with bones, it's a dog. If it's some mix of those, then it's simply whatever animal it shares more features with.

**Queston 5:** Write a `dog_or_cat` function that takes a list of features for a given object and determines whether that object is a cat or a dog.

I've gotten you started a bit below, and here's a hint to get you going: maybe you want to check each individual feature of the object and see whether it's in `cat_features` or `dog_features`? So you'll definitely be needing to write a loop and conditional (if/then) or two!

In [7]:
def dog_or_cat(object_features):
    """
    object_features: list of features like 'meows', 'barks', 'small', 'big', etc.
    """
    cat_features = ['meows','small','plays-with-string']
    dog_features = ['barks','big',  'plays-with-bone']
    
    similarity_to_cat = 0
    similarity_to_dog = 0
    
    """
    Insert your code below
    """
    
    # maybe given them everthing except the loop + embedded if/else, 
    # as well as the second if/else?
    
    for feature in object_features:
        if feature in cat_features:
            similarity_to_cat = similarity_to_cat + 1
        else:
            similarity_to_dog = similarity_to_dog + 1

    if similarity_to_cat > similarity_to_dog:
        category = 'cat'
    else:
        category = 'dog'
    
    return category

Test the model. Of course, if you programmed it right, this next cell should output `'cat'`.

In [8]:
test_object = ['meows','small','plays-with-string']
dog_or_cat(test_object)

'cat'

Try more test-cases. Again, if your function was written appropriately, then running the next chunk should not generate any errors!

In [9]:
assert dog_or_cat(['meows','small','plays-with-string']) == 'cat'
assert dog_or_cat(['barks','big',  'plays-with-bone'])   == 'dog'
assert dog_or_cat(['meows','small','plays-with-bone'])   == 'cat'

**Question 6:** What about this model of categorization seems inaccurate or incomplete with respect to concepts and categorization? How might you modify this model to rectify these issues? You can just discuss this verbally (in a few sentences), but do feel free to try implementing some modifications with Python, if you are so inclined! (If you chose this latter route, you could copy and paste the `dog_or_cat` definition from above into a new cell below, and then modify, rename, and test it.)