<h2> Congratulations!!! 

If you have came up to this point and open ipython notebook, you should have successfully installed the tools required for DL4MT. 
To test that you can import all the tools that you've just downloaded, click on the next box and then press `shift` + `enter`

In [None]:
import h5py, cython, pydot, numpy, scipy, theano, fuel, sklearn, nltk, gensim

You should not see any error when you press `shift` + `enter` on the previous line. 

In `ipython-notebook`, every "clickable" box is refer to as a cell. And you can switch between a text cell, where you can write text like this or a code cell where you can execute python code on the fly (as above).

If you're already a fluent Parseltongue (Python) coder, please go ahead with 

 - the [summerschool exercises from MILA](https://github.com/mila-udem/summerschool2015) or
 - take a look at the nice introduction to [Neural Machine Translation](http://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/) from Prof. Kyunghyun Cho or
 - read more about the [Unreasonable Effectiveness of Recurrent Neural Nets](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) or
 - enjoy the nice [Neural Net NLP primer](http://u.cs.biu.ac.il/~yogo/nnlp.pdf) from Yoav Goldberg or
 - hone your snake charming, please do take a look at [Fluent Python](http://shop.oreilly.com/product/0636920032519.do) 
 
 
Pythonic Fun
=====
 
The following lines would be just a few common pythonic idioms that you might see in the following days. 
To proceed, click on each ipython notebook code cell and press `shift` + `enter`...


File I/O
====

To open a text file in Python, it's best to use the [`io` module](https://docs.python.org/2/library/io.html) and the `with` operator, you can also specify the encoding for your file. Note that when using the `with` statement, the file automatically closes when you're out of the with scope. There is no need to explicitly close a file when using `with`.


For example:

In [None]:
import io # This is the io module.
# This is the with statement that controls the file I/O
with io.open('test.txt', 'r', encoding='utf8') as fin: 
    for line in fin: # Iterates through each line in the file.
        print line # prints the line to console.

Pickle (not Sauerkraut)
----

Pickling is a concept to save a python object as binary and then read them again without the hassle of packing and unpacking an object from a specific file format. The [`pickle` object](https://docs.python.org/2/library/pickle.html) is a nifty way to save your trained models and load them in the future. 

In [None]:
import io
import cPickle as pickle # Note that in python3, you simple do `import pickle`
from collections import Counter # This is a counter object, see https://docs.python.org/2/library/collections.html

word_counts = Counter() # Let's initialize a word counter object to keep count of words in a file.
with io.open('test.txt', 'r', encoding='utf8') as fin:
    text = fin.read() # Reads the whole file as a single string object.
    text = text.split() # Let's split the text into a list of words.
    word_counter.update(text)
    
with io.open('wordcounts.pk', 'wb') as fout: # Note that 'wb' means a file for writing objects as binary objects.
    pickle.dump(word_counts, fout) # Dumps the *word_counts* object into a pickle file named 'wordcounts.pk'

Now you would see a new file `wordcounts.pk` appear in the directory where this ipython notebook resides. You can easily read a `word_counter` now by loading the pickled object rather than recounting from the textfile:

In [None]:
import io
import cPickle as pickle

with io.open('wordcounts.pk', 'rb') as fin: # Opens a binary file with the 'rb' parameter/flag.
    word_counts = pickle.load(fin)
    
for word, count in word_counts.items(): # Iterates through the newly loaded *word_counts* object.
    print word, count # prints the word and its count.

Some NLP (atlas)
====

Let's get down to some NLP work given the knowledge of pickles and file I/O in python. First let's start with some corpus access, Part-of-Speech (POS) and tokenization tools that `NLTK` provides:

In [None]:
import io

from nltk.corpus import brown # This the tagged brown corpus which is also a subset of the Penntreebank corpus.
from nltk import word_tokenize, sent_tokenize, pos_tag # The default sentence, word tokenizer and pos-tagger from NLTK


In [None]:
#############################################################
# TODO: Please fill in the code to read the `test.txt` file.
# (remember to press `shift` + `enter` after the code)
#############################################################

with io.open('test.txt', 'r', encoding='utf8') as fin:
    text = fin.read()
    print text

    
###########################
# Answer: DON'T PEEK!!!!
###########################
#with io.open('test.txt', 'r', encoding='utf8') as fin:
#    text = fin.read()


Now that you have read the file, let's try to tokenize the file:

In [None]:
for sentence in sent_tokenize(text):
    for word in word_tokenize(sentence):
        print word
    print '# END of sentence'

In [None]:
for sentence in sent_tokenize(text):
    for word, tag in pos_tag(word_tokenize(sentence)):
        print word, tag
    print '########'

In [None]:
# To get the tagged words into a pickle-able object, 
# you can keep the whole process tagged file as a 
# list of list of strings using list comprehension, 
# see https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

tagged_text = [pos_tag(word_tokenize(sentence)) for sentence in sent_tokenize(text)]
print tagged_text

In [None]:
##########################################################
# Now try to pickle the *tagged_text* into a pickle file,
# and then try to open the newly pickled file.
##########################################################




###########################
# Answer: DON'T PEEK!!!!
###########################

#import pickle
#with io.open('tagged_text.pk', 'wb') as fout:
#    tagged_text = [pos_tag(word_tokenize(sentence)) for sentence in sent_tokenize(text)]
#    pickle.dump(tagged_text, fout)
#    
#with io.open('tagged_text.pk', 'rb') as fin:
#    pickled_tagged_text = pickle.load(fin)
#    
#print pickled_tagged_text



Some More NLP
====

Alright, that's how NLTK tokenizes and POS tag a corpus. Now, let's skip all the hard work of tagging a corpus and just simply read one off NLTK:

In [None]:
from nltk.corpus import brown
tagged_text = brown.tagged_sents()
print tagged_text # Note: this is the same data structure as the pickled tagged_text above.