## Building the Cooc Table

In [2]:
import pprint as pp
from collections import defaultdict

In [3]:
def build_cooc_table(filepath_fr, filepath_en) :
    
    # defaultdict provides a default value for the key that does not exist.
    cooc_table = defaultdict(dict)

    fr = open(filepath_fr, 'r')
    en = open(filepath_en, 'r')

    for line_fr, line_en in zip(fr, en):
        line_fr, line_en = line_fr.split(), line_en.split()
        print(cooc_table)
        # use set to remove any duplicates
        for word_fr in set(line_fr):
            # build count dict for the English sentence
            if word_fr in cooc_table :
                # copy dict if the word in French has already been seen and exists in the cooc table
                counts_en = cooc_table[word_fr]
            else:
                # otherwise initialize a defaultdict =>  "int" specifies the type and means we can directly add an int 
                # value to the count without initializing anything (a default of 0 is set)
                counts_en = defaultdict(int)
         
            for word_en in set(line_en):
                counts_en[word_en] += 1

            cooc_table[word_fr] = counts_en
        print (cooc_table)
    
    return cooc_table

## Sorting the cooc table and printing it to a file

In [12]:
def sorted_cooc(cooc_table):
    cooc_list = []
    for word_fr in cooc_table:
        for (word_en, freq) in cooc_table[word_fr].items(): #.items returns a list of keys and values as a tuples
            cooc_list.append((word_fr, word_en, freq)) # append the tuple to the list
    
    
    # .sort method has a key parameter which takes a function specifying which elements shoud be compared
    # since we are using the frequencies to order our tuples (position 2 in each tuple), the elmt in pos 2 is 
    # what the function should return
    
    # lambda functions are a quick way of writing functions :
    # lambda cooc_tuple : cooc_tuple[2] 
    # is equivalent to 
    # def return_freq(cooc_tuple):
    #     return cooc_tuple[2]
    cooc_list.sort(key=lambda cooc_tuple : cooc_tuple[2], reverse=True)  

    return cooc_list

In [14]:
cooc_table = build_cooc_table('./french.corpus', './english.corpus')

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [15]:
with open('./naive_lexicon.txt', 'w') as f:
    f.write(pp.pformat(sorted_cooc(cooc_table))) # pformat will write a "prettier" version of the list to the file

### Mini Topo sur les fonctions lambda
Lambda functions can be very practical sometimes :  usually a shortcut for declaring small single-expression anonymous functions.
They behave just like regular functions declared with the "def" keyword.
Lambdas are restricted to a songle expression, so there isn't even a return statement...

In practice:
Most frequently used to write short and concise "key functions" for sorting iterables by an alternate key, like in the sorted_cooc function above.

In [2]:
# Some examples:
add = lambda x, y: x + y 
print(add(5,3))

# Can be used directly inline as an expression :
(lambda x, y: x + y)(5,3)

8


8

In [6]:
# For sorting :
tuples = [(1, 'd'), (2, 'b'), (3, 'a')]
print(sorted(tuples, key=lambda x : x[1]))

print(sorted(range(-5, 6), key=lambda x: x * x))

[(3, 'a'), (2, 'b'), (1, 'd')]
[0, -1, 1, -2, 2, -3, 3, -4, 4, -5, 5]


In [9]:
# Caveat :
# Although it can look "cool" to use lambdas whenever you can, it's not always the clearest way to write your code...
# Take a second to think if using a lambda function is really the best way to go
# If you find yourself doing something remotely complex with a lambda function, using a classic "def" funciton is usually a better idea

# When filtering a list for example:
print(list(filter(lambda x: x % 2 == 0, range(16)))) # not necessarily as readable

# vs.
print([x for x in range(16) if x % 2 == 0]) # usually a little clearer

#vs.

def filter_odd_numbers(nums_list):
    only_evens = []
    for x in nums_list:
        if x %2 == 0:
            only_evens.append(x)
    return only_evens
print(filter_odd_numbers(range(16)))

[0, 2, 4, 6, 8, 10, 12, 14]
[0, 2, 4, 6, 8, 10, 12, 14]
[0, 2, 4, 6, 8, 10, 12, 14]


In [10]:
# The "Zen of Python" Easter Egg by Tim Peters
# Just a couple of guidelines by the creator you can revisit as much as you like to become a better pythonista
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
