## COG403: Problem 2 of Problem Set 3: Bilingual Semantic Networks
### The 2 problems for Problem Set 3 are due 29 Nov. 2018, 2 pm

In this problem, you'll connect English and Dutch monolingual semantic networks to model how bilingual Dutch-English speakers respond to a free association (FA) task.

This problem draws on modeling ideas from Matusevych et al. (2018), and uses the human bilingual FA data from Van Hell & De Groot (1998) [VHDG], generously made available by the first author. 

**For each part of this problem, you'll find a "to do" list, and cells below it that indicate where to insert your code or text answer.  The cells are labeled "Part x.n", where x is the problem part (a, b, c, etc) and n is the numbered item from the to-do list (1, 2, etc) -- eg, "Part a.3".**

**If, for any answer, you want to run additional code to support your answer, create a new code cell and clearly mark the answer cell that refers to it.**

**References:**

Matusevych, Y., Kalantari Dehaghi, A. A., & Stevenson, S. (2018). Modeling bilingual word associations as connected monolingual networks. In Proceedings of the Eighth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL) (pp. 46–56).  Association for Computational Linguistics.  https://homepages.inf.ed.ac.uk/ymatusev/publications/CMCL_2018.pdf

Janet G. van Hell and Annette M. B. de Groot. 1998. Conceptual representation in bilingual memory: Effects of concreteness and cognate status in word association. Bilingualism: Language and Cognition 1(3):193–211.  (Available online through https://onesearch.library.utoronto.ca.)

### Initialization

The cell labeled Part 0 below contains the code you should run to read in the English and Dutch monolingual data from the Small World of Words (SWOW), which will create a semantic network for each language.  These two semantic networks, in `eng_graph` and `dut_graph`, will form the basis for your bilingual semantic network (which you'll join with translation links in part (a) below).

The code here also reads in the human bilingual FA data from VHDG, and creates a graph structure for that data as well.  These graphs **will not be used as part of your semantic networks**, but rather will serve as "gold standard" data reflecting the human bilingual associations to which you'll compare the output of the bilingual semantic network you create.

The data from VHDG will yield 3 graphs:

* `ee_bilingual_gold`: Includes English responses of bilinguals to English cues.
* `de_bilingual_gold`: Includes English responses of bilinguals to Dutch cues.
* `ed_bilingual_gold`: Includes Dutch responses of bilinguals to English cues.

Parts (b-c) below will consider responses in your bilingual network corresponding to the English-English data.

Part (d) below will consider responses that "cross" languages, corresponding to the Dutch-English and English-Dutch data.


In [38]:
# Part 0: Run this code first.

import networkx as nx
from tqdm import tqdm


def read_graph(file_path):
    digraph = nx.DiGraph()
    with open(file_path, 'r') as f:
        for line in tqdm(f):
            line = line.split('\t')
            digraph.add_edge(line[0], line[1], weight=float(line[2]))
    return digraph

eng_graph = read_graph('data/en_swow.tsv')
dut_graph = read_graph('data/nl_swow.tsv')

ee_bilingual_gold = read_graph('data/biling_data_EE.tsv')
de_bilingual_gold = read_graph('data/biling_data_DE.tsv')
ed_bilingual_gold = read_graph('data/biling_data_ED.tsv')

print('\nEE Graph for bilinguals')
print(ee_bilingual_gold['apple'])

print('\nDE Graph for bilinguals')
print(de_bilingual_gold['appel'])

print('\nED Graph for bilinguals')
print(ed_bilingual_gold['apple'])

1204534it [00:07, 163570.88it/s]
1213599it [00:05, 216440.43it/s]
2120it [00:00, 156815.77it/s]
1584it [00:00, 169791.65it/s]
1919it [00:00, 144046.20it/s]


EE Graph for bilinguals
{'orange': {'weight': 0.025}, 'bite': {'weight': 0.05}, 'vegetable': {'weight': 0.025}, 'peach': {'weight': 0.025}, 'pie': {'weight': 0.025}, 'pear': {'weight': 0.1}, 'bean': {'weight': 0.025}, 'fruit': {'weight': 0.2}, 'nice': {'weight': 0.025}, 'banana': {'weight': 0.075}, 'green': {'weight': 0.1}, 'citrus': {'weight': 0.025}, 'tree': {'weight': 0.225}, 'eat': {'weight': 0.075}}

DE Graph for bilinguals
{'orange': {'weight': 0.02564102564102564}, 'fruit': {'weight': 0.20512820512820512}, 'pip': {'weight': 0.02564102564102564}, 'peach': {'weight': 0.05128205128205128}, 'eat': {'weight': 0.07692307692307693}, 'banana': {'weight': 0.15384615384615385}, 'red': {'weight': 0.02564102564102564}, 'pear': {'weight': 0.20512820512820512}, 'food': {'weight': 0.02564102564102564}, 'green': {'weight': 0.02564102564102564}, 'tree': {'weight': 0.1794871794871795}}

ED Graph for bilinguals
{'peer': {'weight': 0.4358974358974359}, 'eten': {'weight': 0.05128205128205128}, 'ban




### Part (a)

Here, you'll create a `BilingualGraph` class and associated methods, and use those methods to join `eng_graph` and `dut_graph` into a `bilingual_graph` which forms a connected bilingual semantic network.  

The `bilingual_graph` will consist of two types of edges: the **association edges** within the semantic network of each language (the edges in `eng_graph` and in `dut_graph`), and **translation edges** that you'll create here, which connect words that are 'translation equivalents' in the two languages.

The weights on translation edges will be based on machine translation probabilities for English words into Dutch and Dutch words into English, provided in `data/word_alignments.csv`.

In addition, translation weights between pairs of words that are automatically determined to be cognates will be upweighted, to reflect this more certain knowledge of a strong relation between the English and Dutch word pair.  The cognates list is found in `data/cognates.tsv`, which provides the Levenshtein distance between pairs of English and Dutch words, as long as that distance is greater than or equal to 0.5.

See the docstrings below for precise instructions on creating bilingual edges to connect the two monolingual semantic networks.

**To do for Part (a):**

1. Write the methods for `BilingualGraph` according to the docstrings in cell Part a.1.  **Note:** Some code is provided; you only need to fill in methods marked as "TODO".  Call the test cases for the `BilingualGraph` in the cell labeled "Test Cases for Part a.1" to ensure your code is correct before proceeding.

2. Call the code in cell Part a.2 to show the results of some of the methods.

In [78]:
#### Part a.1  Write the methods for class BilingualGraph 
####           according to the docstrings below

from collections import defaultdict
import numpy as np


def load_cognate_links(file_path):
    """
    file_path: str -- path to file containing cognate pairs
    
    Return a dictionary mapping a tuple of str (english_str, dutch_str) to the
    Levenshtein distance (ratio) between them. Only word pairs with a
    Levenshtein ratio of 0.5 or more are included.
    """
    result = {}
    with open(file_path, 'r') as f:
        for line in f:
            line = line.strip('\n').split('\t')
            result[(line[0], line[1])] = float(line[2])
    return result


class BilingualGraph(object):
    
    def __init__(self, eng_graph, dut_graph,
                 alignments_file_path='data/word_alignments.csv',
                 cognates_file_path='data/cognates.tsv'):
        """
        eng_graph: networkx.DirectedGraph -- graph of English SWOW cues and responses,
            linked by association edges
        dut_graph: networkx.DirectedGraph -- graph of Dutch SWOW cues and responses,
            linked by association edges
        alignments_file_path: str -- path to a csv file containing machine translation word
            alignments. The lines in this file have the format:
            english_word,dutch_word,english_dutch_alignment,dutch_english_alignment
        cognates_file_path: str -- path to a tsv file containing cognate pairs and their
            Levenshtein ratio scores. The lines in this file will have the format
            english_word\tdutch_word\tlevenshtein_ratio
        
        Set attributes eng_graph and dut_graph to eng_graph and dut_graph passed
        as parameters. Set attributes eng_dut_links and dut_eng_links to translation
        edges generated by get_translation_edges.
        """
        self.eng_graph = eng_graph
        self.dut_graph = dut_graph
        self.eng_dut_links = self.get_translation_edges(
            alignments_file_path, cognates_file_path, 'dut')
        self.dut_eng_links = self.get_translation_edges(
            alignments_file_path, cognates_file_path, 'eng')       
    
    def get_translation_edges(self, alignments_file_path, cognates_file_path, target_lang):
        """
        alignments_file_path: str -- path to a csv file containing machine translation word
            alignments. The lines in this file have the format:
            english_word,dutch_word,english_dutch_alignment,dutch_english_alignment
        cognates_file_path: str -- path to a tsv file containing cognate pairs and their
            Levenshtein ratio scores. The lines in this file will have the format
            english_word\tdutch_word\tlevenshtein_ratio
        target_lang: str in {'eng', 'dut'} -- the target language
        
        Return a dict mapping str to dict. The inner dict should map str to
        float. The keys in the outer dict are strings in the source language.
        The keys in the inner dict are strings in the target language (target_lang).
        The values in the inner dict are the source-target translation edges.
        
        The source-target translation edges for a given word, source_word in the
        source language, should be computed as follows:
            1. Find all words in the target language that have a source-target
                alignment probability greater than zero.
            2. Set the translation edge weights from source_word to a given
                target_word found in step one to:
                    word_alignment * (1 + cognate_score)
                where word_alignment is the source-target machine translation word
                alignment from alignments_file_path, and cognate_score is the
                Levenshtein distance between the source_word and target_word from
                cognates_file_path.
            3. Normalize source_word's outgoing translation edge scores, so that they
                sum to one. Do this by dividing the translation edge weights from step
                2 by the sum of the translation edge weights for all target words that
                source_word has a translation edge to.
        """
        assert target_lang in ['eng', 'dut']
        cognate_links = load_cognate_links(cognates_file_path)
        result = defaultdict(dict)
        with open(alignments_file_path) as f:
            # skip headers
            headers = next(f)
            for line in tqdm(f):
                line = line.strip('\n').split(',')
                if len(line) != 4:
                    print(line)
                en = line[0]
                nl = line[1]
                
                # score is product of alignments probabilities
                if target_lang == 'eng':
                    score = float(line[3])
                else:
                    score = float(line[2])
                
                # skip cases where score is 0
                if score == 0:
                    continue
                
                # skip cases where words are not in SWOW graphs
                if nl not in self.dut_graph or en not in self.eng_graph:
                    continue
                
                # increase weighting for cognates
                if (en, nl) in cognate_links:
                    score = score * (cognate_links[(en, nl)] + 1)
                
                if target_lang == 'eng':
                    result[nl][en] = score
                else:
                    result[en][nl] = score
                    
        # normalize, so we get a probability distribution
        for l1_word in result:
            weight_sum = sum(result[l1_word].values())
            for l2_word in result[l1_word]:
                result[l1_word][l2_word] /= weight_sum
                
        return result
    
    def is_cue(self, language, word):
        """
        language: str in {'eng', 'dut'}
        word: str
        
        Return true if and only if word is a cue in the graph for language.
        """
        if language == 'eng':
            responses = []
            for word in self.eng_graph[word]:
                responses.append(word)
            if len(responses) > 0:
                return True
            else:
                return False
        elif language == 'dut':
            responses = []
            for word in self.dut_graph[word]:
                responses.append(word)
            if len(responses) > 0:
                return True
            else:
                return False
        
    def is_translatable(self, target_language, word):
        """
        target_language: str in {'eng', 'dut'}
        word: str
        
        Return true if and only if word has a translation edge to a word in
        target_language. (Note that the language of word is the source language,
        not target_langauge.)
        """
        if target_language == 'eng':
            if word in self.dut_eng_links:
                return True
            else:
                return False
        elif target_language == 'dut':
            if word in self.eng_dut_links:
                return True
            else:
                return False
        
    
    def translate(self, target_language, word, find_cue=False):
        """
        target_language: str in {'eng', 'dut'} -- the language to translate into
        word: str -- the word to translate
        find_cue: bool -- when set to True, the returned value must be a cue in
            target_language
            
        Return a translation of word in target_language. Randomly select
        the translation from the possible translation edges for word in
        target_language, weighted by the translation edge weights. When
        find_cue is set to True, restrict your search to words in target_language
        that are cues. (Note that this means that you will need to re-normalize
        the probabilities, since you will be considering a restricted set).
        
        Precondition: word must be have a translation in target_language
        """
        assert (self.is_translatable(target_language, word))
        if find_cue is False:
            translates = []
            weights = []
            if target_language == 'dut':
                for k in list(self.eng_dut_links[word]):
                    translates.append(k)
                    weights.append(self.eng_dut_links[word][k])
            else:
                for k in list(self.dut_eng_links[word]):
                    translates.append(k)
                    weights.append(self.dut_eng_links[word][k])
            if len(translates) > 1:
                translation = np.random.choice(translates, p=weights)
            else:
                translation = translates.pop()
        else:
            cues = []
            weights = []
            weight_sum = 0
            if target_language == 'dut':
                for k in list(self.eng_dut_links[word]):
                    if self.is_cue('dut', k):
                        cues.append(k)
                        weights.append(self.eng_dut_links[word][k])
                        weight_sum += self.eng_dut_links[word][k]
            else:
                for k in list(self.dut_eng_links[word]):
                    if self.is_cue('eng', k):
                        cues.append(k)
                        weights.append(self.dut_eng_links[word][k])
                        weight_sum += self.dut_eng_links[word][k]
            for i in range(len(weights)):
                weights[i] = (weights[i]/weight_sum)
            if len(cues) > 1:
                translation = np.random.choice(cues, p=weights)
            else:
                translation = cues.pop()
        return translation
                    
    def free_association(self, language, cue, find_translatable=False):
        """
        language: str -- language to do free association in
        cue: str -- start word for free association
        find_translatable: bool -- when set to true, the returned value must
            have a translation from language to the other language
            
        Return a str found by a one-step weighed random walk from cue in 
        the graph for language. Randomly select the node for free
        association from among cue's outgoing edges in language, weighted
        by the association weights. When find_translatable is set to True,
        restrict your search to words in language that have translations in
        the other language. (Note that this means that you will need to
        re-normalize the probabilities, since you will be considering a restricted
        set)
        """
        assert (self.is_cue(language, cue))
        walks = []
        weights = []
        weight_sum = 0
        if language == 'eng':
            for step in self.eng_graph[cue]:
                if find_translatable is False or self.is_translatable('dut', step):
                    walks.append(step)
                    weights.append(self.eng_graph[cue][step]['weight'])
                    weight_sum += self.eng_graph[cue][step]['weight']
        elif language == 'dut':
            for step in self.dut_graph[cue]:
                if find_translatable is False or self.is_translatable('eng', step):
                    walks.append(step)
                    weights.append(self.dut_graph[cue][step]['weight'])
                    weight_sum += self.dut_graph[cue][step]['weight']
        if find_translatable:
            for i in range(len(weights)):
                weights[i] = (weights[i]/weight_sum)
        if len(walks) > 1:
            word = np.random.choice(walks, p=weights)
        else:
            word = walks.pop()
        return word

        

bilingual_graph = BilingualGraph(eng_graph, dut_graph)

109610it [00:00, 301319.92it/s]
109610it [00:00, 246302.37it/s]


In [74]:
### TEST CASES for Part a.1

import numpy as np
import networkx as nx


eng_graph_dummy = nx.DiGraph()
eng_graph_dummy.add_edge('winter', 'pie', weight=1.0)
eng_graph_dummy.add_node('snow')
eng_graph_dummy.add_node('apple')
eng_graph_dummy.add_node('party')
eng_graph_dummy.add_node('pumpkin')

dut_graph_dummy = nx.DiGraph()
dut_graph_dummy.add_edge('winter', 'sneeuw', weight=1.0)
dut_graph_dummy.add_edge('taart', 'appel', weight=1.0)
dut_graph_dummy.add_node('feest')
dut_graph_dummy.add_node('partij')
dummy_graph = BilingualGraph(
    eng_graph_dummy, dut_graph_dummy, cognates_file_path='data/dummy_cognates.tsv',
    alignments_file_path='data/dummy_alignments.csv')
assert dummy_graph.is_translatable('eng', 'pie') == False
assert dummy_graph.is_translatable('dut', 'pie') == True
assert dummy_graph.is_translatable('dut', 'pumpkin') == False
assert dummy_graph.is_translatable('dut', 'taart') == False
assert dummy_graph.is_translatable('eng', 'taart') == True


assert dummy_graph.is_cue('eng', 'winter') == True
assert dummy_graph.is_cue('eng', 'pie') == False
assert dummy_graph.is_cue('dut', 'winter') == True
assert dummy_graph.is_cue('dut', 'taart') == True
assert dummy_graph.is_cue('dut', 'sneeuw') == False


assert dummy_graph.translate('dut', 'pie') == 'taart'
assert dummy_graph.translate('eng', 'sneeuw') == 'snow'


assert dummy_graph.free_association('dut', 'winter') == 'sneeuw'
assert dummy_graph.free_association('eng', 'winter') == 'pie'

6it [00:00, 12366.50it/s]
6it [00:00, 31261.89it/s]


In [69]:
#### Part a.2  Call this code to show results of some of your methods.

print("Dut for horoscope: {}".format(bilingual_graph.translate('dut', 'horoscope')))
print("Eng for foefelen: {}".format(bilingual_graph.translate('eng', 'foefelen')))

print("Eng FA for penguin: {}".format(bilingual_graph.free_association('eng', 'penguin')))
print("Dut FA for pinguïn: {}".format(bilingual_graph.free_association('dut', 'pinguïn')))

Dut for horoscope: horoscoop
Eng for foefelen: fiddle
Eng FA for penguin: monogamy
Dut FA for pinguïn: noordpool


### Part (b)

Here, you'll write code to model the English-English task -- ie, where bilinguals are given cues in English, and provide responses in English.  You'll explore whether/how much implicit translation impacts the results by comparing a bilingual model to a monolingual model (ie, one that is equivalent to a monolingual English speaker).

You'll model the free association task as a random walk.  Unlike in Problem Set 2.2, where a random walk had many steps so it could generate a sequence of words in a category (ie, a list of 'animals'), **here a random walk will traverse a single association link**.  If you stayed in the English network and ran the random walk over, say, 1000 trials for each English cue, then you'd get a distribution of 1000 single responses to each cue, which would match those of monolingual speakers of English.

But, we're modeling bilinguals, and we suspect they don't only think in English responses to English cues!  This means they don't necessarily stay in their English network.

As a second alternative, a bilingual might implicitly translate the English cue to Dutch, traverse an association link in Dutch to a Dutch response to that translated cue, and then translate the Dutch response back to English, to get the final English response.

So, your random walks will have two possibilities:

i. Do a random "association" walk from the English cue (within the English network), and output the English response.  (This kind of walk will have length 1.)

ii. Do a random "translation" walk from the English cue into Dutch, then do a random "association" walk from the Dutch translation (within the Dutch network), then do a random "translation" walk from the Dutch response back to English, and output that English response.  (This kind of walk will have length 3.)

All choices of association or translation edges in the network should be made probabilistically according to the edge weight.

The choices between (i) and (ii) above will be made based on the translation probability -- a parameter to `random_walk` called `p_translate` -- which effects a simple (possibly biased) coin flip.

**To do for Part (b):**

1. Write function `random_walk` in code cell b.1 according to the docstring.  Call the test cases in the following cell to check your code.

2. Call `random_walk` in code cell b.2 with the given parameters to show some results of calling your code.

In [75]:
#### Part b.1  Write function `random_walk` according to the docstring below.

def random_walk(graph, start_lang, start, p_translate=0.5):
    """
    graph: BilingualGraph -- the graph to use to do a random walk
    start_lang: str in {'eng', 'dut'} -- the language to start a random walk in 
    start: str -- the word in language to start at
    p_translate: float -- the probability of doing a "translation" walk (as
        opposed to an "association" walk).
    
    Do a translation walk or an association walk, starting at start in start_lang,
    and return the result. Randomly decide whether to do a translation or association
    walk, giving a translation walk a weight of p_translate. The definitions of
    association and translation walks are provided in the problem description above.
    
    Make sure to use the methods translate and free_association defined in the
    BilingualGraph class, rather than re-implementing this functionality here.
    Set find_cue and find_translatable to True where appropriate, to avoid running
    into dead ends.
    """
    if np.random.rand() <= p_translate:
        #ii, random translation walk
        if start_lang == 'eng':
            opposite_lang = 'dut'
        else:
            opposite_lang = 'eng'
        start_opp_translate = graph.translate(opposite_lang, start, True)
        opp_opp_free_association = graph.free_association(opposite_lang, start_opp_translate, True)
        result = graph.translate(start_lang, opp_opp_free_association)
    else:
        #i, random free association walk
        result = graph.free_association(start_lang, start)
    return result
        


In [100]:
# TEST CASE for code in part b.1

assert random_walk(dummy_graph, 'eng', 'winter', p_translate=0) == 'pie'
assert random_walk(dummy_graph, 'eng', 'winter', p_translate=1.0) == 'snow'

In [106]:
#### Part b.2  Call function `random_walk` as follows to show the results of your code.
print(random_walk(bilingual_graph, 'eng', 'apple', p_translate=0))
print(random_walk(bilingual_graph, 'eng', 'apple', p_translate=0.5))
print(random_walk(bilingual_graph, 'eng', 'apple', p_translate=1.0))

pie
tree
moist


### Part (c)

Here you'll see how the model of bilingual association performs under different values of the probability of translating (`p_translate`) in the random walk.

You'll run your network on the set of cues used in the VHDG experiments, doing 1000 random walk trials for each cue.  You'll repeat this for each value of `p_translate` in the random walk, varying from 0 (equivalent to a monolingual) to 1 (ie, always translating) by .1 increments.

For each of the 11 sets of results you produce, you'll compare the output of your network to the distribution over responses given by human bilinguals, which are stored in  `ee_bilingual_gold` from Part 0.  We'll provide the evaluation measure for you to use: it computes a distance between two probability distributions.  This is an error measure, so **smaller numbers indicate a better match**.

**To do for Part (c):**

1. Write `evaluate_random_walks` in cell c.1 according to the docstring below.  Note that it will need to call the function `error` which we provide, to calculate the mismatch between the distribution of responses for each cue given by your runs of `random_walk`, and the human data in `ee_bilingual_gold`.

2. Call the code in cell c.2 to generate the 11 sets of results and print them.  When you are ready to run this to analyze the results, you must call it with the default of 1000 random walks per cue (as is done in the code cell as provided).  But this can take several minutes to run, so you can try it out on smaller amounts to make sure it's working.  Smaller amounts will not give you consistent results, however, so don't count on the patterns you see there for answering the questions below.

3. In cell c.3, answer the following question: What value (or range of values) for `p_translate` gives you the best performance (ie, closest match to human data)?  Explain what you think this might indicate about the nature of crosslinguistic transfer in the bilingual lexicon.  (Max 100 words.)

4. Call the code in cell c.4, which will evaluate the results you've obtained, split into VHDG's cue categories of cognates vs. non-cognates, and concrete vs. abstract.  (Both these designations were manually determined for each English-Dutch cue pair by VHDG.)

5. In cell c.5, answer the following question:  Compare the results of the bilingual model to the model that only uses the monolingual (English) network (ie, `p_translate` is 0).  Consider the difference between the two results for each category of cue pairs: cognate/non-cognate x concrete/abstract.  For which of the four categories of words does the bilingual model show the smallest difference from the monolingual model? the largest difference?  What does this say about the bilingual lexicon?  (Max 100 words.)

In [103]:
#### Part c.1: Write code for evaluate_random_walks according to the docstring below.
from collections import Counter

def evaluate_random_walks(
        sample, start_lang, graph, gold_standard_graph,
        n_trials=1000, p_translate=0.5):
    """
    sample: list of str -- the list of cues to use as start points for
        random walks
    start_lang: str -- language to start random walks in
    graph: BilingualGraph -- graph to use for random walks
    gold_standard_graph: BilingualGraph -- graph to use to evaluate
        the results of the random walks
    n_trials: int -- the number of random walks to do for each
        cue word in sample
    p_translate: float -- the probability of doing a translation walk
        (as opposed to a pure association walk)
        
    Return a dict mapping str to float. The keys in the result should
    be the cues in sample, and the values are the error for this cue.
    The error for each cue in sample is calculated as follows:
        1. Call the function random_walk from part b 1000 times, with start equal
            to cue. You should also pass graph and start_lang. Store the resulting
            responses.
        2. Generate a list of tuple of (response, score), where responses
            are unique responses from step 1. Compute the score for each
            response by dividing that response's count by n_trials (the total
            number of responses found).
        3. Generate a gold standard list of tuple of (response, score), where
            responses are the nodes that the cue has outgoing association edges
            to in gold_standard_graph, and score is the weights of these
            association edges.
        4. Call error (defined below) on the lists of tuples generated in
            steps 2 and 3.
    """
    result = defaultdict()
    for cue in sample:
        if start_lang == 'eng':
            opposite_lang = 'dut'
        else:
            opposite_lang = 'eng'
        if graph.is_cue(start_lang, cue) and graph.is_translatable(opposite_lang, cue):
            response_store = []
            for n in range(n_trials):
                response_store.append(random_walk(graph, start_lang, cue, p_translate))
            counts = Counter(response_store)
            tuple_list = []
            for word in counts:
                score = (counts[word]/n_trials)
                tuple_list.append((word, score))
            gold_tuple = []
            for response in gold_standard_graph[cue]:
                gold_tuple.append((response, gold_standard_graph[cue][response]['weight']))
            error_score = error(tuple_list, gold_tuple)
            result[cue] = error_score
    return result
        

def error(sample1, sample2):
    """
    sample1: list of tuple of (str, float) -- (response, score) tuples
    sample2: list of tuple of (str, float) -- (response, score) tuples
    
    Return the error of sample1 relative to sample2 based on formula (3)
    in Matusevych et al., (2018). This uses the following procedure:
        1. Make sure that sample1 and sample2 have the same length. If they
            do not have the same length, make them the same length by removing
            tuples from the longer sample. If tuples need to be removed, remove
            the lowest weighted tuples.
        2. Collect a list of words that are the union of the responses in
            sample2 and sample2.
        3. For each response in the list found in step 2, take the absolute value
            of the difference of the scores for the response in sample1 and sample2.
            If the response does not occur in one of the samples, default to a score
            of 0. Sum the absolute values of the score differences over the responses.
        3. Multiply the result from step 3 times 0.5.
    """
    sample1 = sorted(sample1, key=lambda x: x[1], reverse=True)
    sample2 = sorted(sample2, key=lambda x: x[1], reverse=True)
    if len(sample1) > len(sample2):
        sample1 = sample1[:len(sample2)]
    elif len(sample2) > len(sample1):
        sample2 = sample2[:len(sample1)]
    sample1 = {k: v for k, v in sample1}
    sample2 = {k: v for k, v in sample2}
    vals = set(list(sample1.keys()) + list(sample2.keys()))
    error = 0.5 * sum([abs(sample1.get(v, 0) - sample2.get(v, 0)) for v in vals])
    return error


In [120]:
#### Test case for part c.1

ee_dummy_gold_standard_graph = nx.DiGraph()
ee_dummy_gold_standard_graph.add_edge('winter', 'snow', weight=1.0)

assert evaluate_random_walks(
    ['winter'], 'eng', dummy_graph, ee_dummy_gold_standard_graph, p_translate=0.0) == {'winter': 1.0}

assert evaluate_random_walks(
    ['winter'], 'eng', dummy_graph, ee_dummy_gold_standard_graph, p_translate=1.0) == {'winter': 0.0}

test_result = evaluate_random_walks(
        ['winter'], 'eng', dummy_graph, ee_dummy_gold_standard_graph, p_translate=0.75)['winter']
assert test_result < 0.14 and test_result > 0.11

# not translatable
assert evaluate_random_walks(
    ['pumpkin'], 'eng', dummy_graph, ee_dummy_gold_standard_graph, p_translate=0.5) == {}

# not a cue
assert evaluate_random_walks(
    ['snow'], 'eng', dummy_graph, ee_dummy_gold_standard_graph, p_translate=0.5) == {}

In [104]:
#### Part c.2: Call code to get cues to test, call evaluate_random_walks on
####           each value of the translation probability, and print results.

#WARNING: This code takes up to 10 mins to run.


# sample of words to compute errors for -- the cues in the human EE data
sample = [x for x in ee_bilingual_gold.nodes() if len(ee_bilingual_gold[x]) > 0]

result_0 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.0)
result_10 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.1)
result_20 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.2)
result_30 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.3)
result_40 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.4)
result_50 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.5)
result_60 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.6)
result_70 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.7)
result_80 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.8)
result_90 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=0.9)
result_100 = evaluate_random_walks(
        sample, 'eng', bilingual_graph, ee_bilingual_gold, p_translate=1.0)

# Print results:
print("0% translate mean error: {:.3f}".format(np.mean(list(result_0.values()))))
print("10% translate mean error: {:.3f}".format(np.mean(list(result_10.values()))))
print("20% translate mean error: {:.3f}".format(np.mean(list(result_20.values()))))
print("30% translate mean error: {:.3f}".format(np.mean(list(result_30.values()))))
print("40% translate mean error: {:.3f}".format(np.mean(list(result_40.values()))))
print("50% translate mean error: {:.3f}".format(np.mean(list(result_50.values()))))
print("60% translate mean error: {:.3f}".format(np.mean(list(result_60.values()))))
print("70% translate mean error: {:.3f}".format(np.mean(list(result_70.values()))))
print("80% translate mean error: {:.3f}".format(np.mean(list(result_80.values()))))
print("90% translate mean error: {:.3f}".format(np.mean(list(result_90.values()))))
print("100% translate mean error: {:.3f}".format(np.mean(list(result_100.values()))))

0% translate mean error: 0.574
10% translate mean error: 0.560
20% translate mean error: 0.548
30% translate mean error: 0.540
40% translate mean error: 0.534
50% translate mean error: 0.528
60% translate mean error: 0.528
70% translate mean error: 0.528
80% translate mean error: 0.536
90% translate mean error: 0.544
100% translate mean error: 0.556


#### Part c.3: Answer the following question: What value (or range of values) for `p_translate` gives you the best performance (ie, closest match to human data)?  Explain what you think this might indicate about the nature of crosslinguistic transfer in the bilingual lexicon.  (Max 100 words.)



The range of values 50% - 70% for p_translate gives me the best performance (lowest error mean error, closest match to human data), while both extremes of 0% and 100% give me the worst performance.

These results indicate that the nature of cross-linguistic transfer in the bilingual lexicon is neither a completely random walk, or a completely translation walk. Instead, we see a lower mean error when both queries are taken into consideration. 

Indeed, crosslinguistic transfer appears to be influenced by both languages, even when only one language is being spoken. This is reflective of the findings in the Rabinovich et al paper, where she found that non-native speakers of English are heavily influenced in using vocabulary particularly within their native language's cognates.

In [105]:
#### Part c.4:  Code for evaluating error on cognates vs. non-cognate cues,
####            and abstract vs. concrete cues.

def get_error_subset(word_sample, error_dict):
    """
    word_sample: list of str -- list of cues. All cues must be in error_dict.
    error_dict: dict mapping str to int -- This maps cues to error values. (Should
        be in the format returned by evaluate_random_walks)
    
    Return the mean error for the words in word_sample.
    """
    return np.mean([v for k, v in error_dict.items() if k in word_sample])

def load_word_pairs(file_path):
    """
    file_path: str -- path to file containing word pairs
    
    Return a list of tuple of word pairs.
    """
    result = []
    with open(file_path, 'r') as f:
        for line in f:
            result.append(tuple(line.split()))
    return result

cognate_concrete = load_word_pairs('data/cognate_concrete.txt')
cognate_abstract = load_word_pairs('data/cognate_abstract.txt')
non_cognate_concrete = load_word_pairs('data/non_cognate_concrete.txt')
non_cognate_abstract = load_word_pairs('data/non_cognate_abstract.txt')

type_to_pairs = {
    'cognate_concrete': cognate_concrete,
    'cognate_abstract': cognate_abstract,
    'non_cognate_concrete': non_cognate_concrete,
    'non_cognate_abstract': non_cognate_abstract
}

print("description\tbilingual_error\tmonolingual_error")
for name in sorted(type_to_pairs.keys()):
    word_pairs = type_to_pairs[name]
    curr_sample = [x[1] for x in word_pairs]
    curr_bi_error = get_error_subset(curr_sample, result_50)
    curr_mon_error = get_error_subset(curr_sample, result_0)
    print("{}\t{:.3f}\t{:.3f}".format(name, curr_bi_error, curr_mon_error))

description	bilingual_error	monolingual_error
cognate_abstract	0.547	0.599
cognate_concrete	0.480	0.496
non_cognate_abstract	0.543	0.583
non_cognate_concrete	0.501	0.577


#### Part c.5:  Answer the following question:  Compare the results of the bilingual model to the model that only uses the monolingual (English) network (ie, `p_translate` is 0).  Consider the difference between the two results for each category of cue pairs: cognate/non-cognate x concrete/abstract.  For which of the four categories of words does the bilingual model show the smallest difference from the monolingual model? the largest difference?  What does this say about the bilingual lexicon?  (Max 100 words.)

In examining the four categories of words, concrete cognate words show the smallest difference from the monolingual model, while the largest difference being the concrete non-cognate words.

This indicates that the bilingual lexicon is greater influenced by another language's cognates if those cognates are concrete words rather than abstract words. 

It also indicates that the bilingual lexicon is more similar to the monolingual lexicon in regards to concrete cognate words, and more different to the monolingual lexicon when in regards to non-concrete cognate words. 

Additionally, with the Rabinovich et al paper, we may also say that these results further support their hypothesis that cognates shape lexical choice within bilingual speakers.

### Part (d)

In this part, you'll explore how your bilingual model does on the tasks of giving Dutch responses to English cues, and English responses to Dutch cues, comparing those results to the human data from VHDG on these tasks.

We'll take a fairly simple approach here: Either the model will first translate the word from the cue language to the response language, and then do free association within the response language, or the model will first do free association in the cue language, and then translate the associate to the response language.

**To do for Part (d):**

1. Write `random_walk_crosslang_task` in cell d.1 according to the docstring below.  Call the test cases in the following cell to check your code.

2. Write `evaluate_random_walks_crosslang_task` according to the docstring below.  We suggest you copy your code for `evaluate_random_walks` from above into cell d.2 and make suitable modifications.  Call the test cases in the following cell to check your code.

3.  Call the code in cell d.3 to run your model on the crosslanguage association tasks, and print the results.

4.  In cell d.4, answer the following:  Compare the performance of free associating first vs. translating first, on the English-Dutch and Dutch-English crosslanguage association tasks.  What do these results say about the bilingual lexicon? (Max 150 words.)


In [112]:
#### Part d.1:  Write random_walk_crosslang_task according to the docstring below.


def random_walk_crosslang_task(
        graph, start_lang, start, fa_first):
    """
    graph: BilingualGraph -- the graph to use to do a random walk
    start_lang: str in {'eng', 'dut'} -- the language to start a random walk in 
    start: str -- the word in language to start at
    fa_first: bool -- When set to True, do an association association-first walk
        (as opposed to a translation-first walk).
    
    Do a translation-first or an association-first cross-language random walk,
    starting at start in start_lang, and return the result. Randomly decide whether
    to do a translation or association step first, giving an association-first walk
    a weight of fa_first.  **@Julia -- fa_first is a boolean.**
    
    Make sure to use the methods translate and free_association defined in the
    BilingualGraph class, rather than re-implementing this functionality here.
    Set find_cue and find_translatable to True where appropriate, to avoid running
    into dead ends.
    """
    if start_lang == 'eng':
        opposite_lang = 'dut'
    else:
        opposite_lang = 'eng'
    if fa_first:
        association = graph.free_association(start_lang, start, True)
        translation = graph.translate(opposite_lang, association)
        report = translation
    else:
        translation = graph.translate(opposite_lang, start, True)
        association = graph.free_association(opposite_lang, translation)
        report = association
    return report

In [113]:
# TEST CASES FOR CODE IN CELL d.1

assert random_walk_crosslang_task(dummy_graph, 'eng', 'winter', fa_first=False) == 'sneeuw'
assert random_walk_crosslang_task(dummy_graph, 'eng', 'winter', fa_first=True) == 'taart'

In [117]:
#### Part d.2:  Write evaluate_random_walks_crosslang_task according to the docstring below.

def evaluate_random_walks_crosslang_task(
        sample, start_lang, graph, gold_standard_graph,
        fa_first, n_trials=1000):
    """
    sample: list of str -- the list of cues to use as start points for
        random walks
    start_lang: str -- language to start random walks in
    graph: BilingualGraph -- graph to use for random walks
    gold_standard_graph: BilingualGraph -- graph to use to evaluate
        the results of the random walks
    n_trials: int -- the number of random walks to do for each
        cue word in sample
    fa_first: bool -- when set to True, do association-first walk
        (otherwise, do a translation-first walk)
        
    Return a dict mapping str to float. The keys in the result should
    be the cues in sample, and the values are the error for this cue.
    The error for each cue in sample is calculated as follows:
        1. Call the function random_walk_translation_task from the cell above 1000
            times, with start equal to cue. You should also pass graph and start_lang.
            Store the resulting responses.
        2. Generate a list of tuple of (response, score), where responses
            are unique responses from step 1. Compute the score for each
            response by dividing that response's count by n_trials (the total
            number of responses found).
        3. Generate a gold standard list of tuple of (response, score), where
            responses are the nodes that the cue has outgoing association edges
            to in gold_standard_graph, and score is the weights of these
            association edges.
        4. Call error (defined in part c) on the lists of tuples generated in
            steps 2 and 3.
    """
    result = defaultdict()
    for cue in sample:
        if start_lang == 'eng':
            opposite_lang = 'dut'
        else:
            opposite_lang = 'eng'
        if graph.is_cue(start_lang, cue) and graph.is_translatable(opposite_lang, cue):
            response_store = []
            for n in range(n_trials):
                response_store.append(random_walk_crosslang_task(graph, start_lang, cue, fa_first))
            counts = Counter(response_store)
            tuple_list = []
            for word in counts:
                score = (counts[word]/n_trials)
                tuple_list.append((word, score))
            gold_tuple = []
            for response in gold_standard_graph[cue]:
                gold_tuple.append((response, gold_standard_graph[cue][response]['weight']))
            error_score = error(tuple_list, gold_tuple)
            result[cue] = error_score
    return result
    

In [121]:
## TEST CASES FOR CODE IN CELL d.2

ed_dummy_gold_standard_graph = nx.DiGraph()
ed_dummy_gold_standard_graph.add_edge('winter', 'sneeuw', weight=1.0)

assert evaluate_random_walks_crosslang_task(
    ['winter'], 'eng', dummy_graph, ed_dummy_gold_standard_graph, fa_first=False) == {'winter': 0.0}
assert evaluate_random_walks_crosslang_task(
    ['winter'], 'eng', dummy_graph, ed_dummy_gold_standard_graph, fa_first=True) == {'winter': 1.0}

# not translatable
assert evaluate_random_walks_crosslang_task(
    ['pumpkin'], 'eng', dummy_graph, ed_dummy_gold_standard_graph, fa_first=False) == {}
assert evaluate_random_walks_crosslang_task(
    ['pumpkin'], 'eng', dummy_graph, ed_dummy_gold_standard_graph, fa_first=True) == {}

# not a cue
assert evaluate_random_walks_crosslang_task(
    ['snow'], 'eng', dummy_graph, ed_dummy_gold_standard_graph, fa_first=False) == {}
assert evaluate_random_walks_crosslang_task(
    ['snow'], 'eng', dummy_graph, ed_dummy_gold_standard_graph, fa_first=True) == {}

In [119]:
#### Part d.3:  Call the code below to run your model on the crosslanguage association
####            task, and print the results.

# test both hypotheses for english-dutch FA
en_sample = [x for x in ed_bilingual_gold.nodes() if len(ed_bilingual_gold[x]) > 0] 
en_nl_fa_first = evaluate_random_walks_crosslang_task(
    en_sample, 'eng', bilingual_graph, ed_bilingual_gold, fa_first=True)
en_nl_trans_first = evaluate_random_walks_crosslang_task(
    en_sample, 'eng', bilingual_graph, ed_bilingual_gold, fa_first=False)

# test both hypotheses for dutch-english FA
nl_sample = [x for x in de_bilingual_gold.nodes() if len(de_bilingual_gold[x]) > 0]
nl_en_fa_first = evaluate_random_walks_crosslang_task(
    nl_sample, 'dut', bilingual_graph, de_bilingual_gold, fa_first=True)
nl_en_trans_first = evaluate_random_walks_crosslang_task(
    nl_sample, 'dut', bilingual_graph, de_bilingual_gold, fa_first=False)

# Print results
print("FA first Eng-Dut: {:.3f}".format(np.mean(list(en_nl_fa_first.values()))))
print("Trans first Eng-Dut: {:.3f}".format(np.mean(list(en_nl_trans_first.values()))))
print("FA first Dut-Eng: {:.3f}".format(np.mean(list(nl_en_fa_first.values()))))
print("Trans first Dut-Eng: {:.3f}".format(np.mean(list(nl_en_trans_first.values()))))

FA first Eng-Dut: 0.565
Trans first Eng-Dut: 0.559
FA first Dut-Eng: 0.521
Trans first Dut-Eng: 0.549


#### Part d.4: Answer the following:  Compare the performance of free associating first vs. translating first, on the English-Dutch and Dutch-English crosslanguage association tasks.  What do these results say about the bilingual lexicon?  (Max 150 words.)

For English-Dutch crosslanguage association task, Association-First has a slightly higher mean error than Translation-First.

For Dutch-English crosslanguage association task, Translation-First has a higher mean error than Association-First.

These results may indicate that the bilingual lexicon is not 'symmetrical', such that forward translation and back translations have similar results for the same crosslanguage association task. Indeed, it shows rather that the 'connectivity' is more indicative of performance than the direction of performance.

For example, 
FA first Eng-Dut AND Trans first Dut-Eng
or
(English -> FA -> English -> T -> Dutch) AND (English <- FA <- English <- T <- Dutch )

are in the same order of connectivity, and as directionality changes, it performs more similarly compared to 
FA first Eng-Dut AND FA first Dut-Eng.

And inversely,

Trans first Eng-Dut AND FA first Dut-Eng
or
(English -> T -> Dutch -> FA -> Dutch) AND (English <- T <- Dutch <- FA <- Dutch)

are also in the same order of connectivity, and performs similarly.

Therefore, these findings show that the bilingual lexicon is likely to be connected by one cognate for BOTH languages rather than just one language, and the only difference between language's task is directionality of the connections.
