# Assignment 0

This notebook will help verify that you're all set up with the Python packages we'll be using this semester.

**Your task:** just run the cells below, and verify that the output is as expected. If anything looks wrong, weird, or crashes, update your Python installation or contact the course staff. We don't want library issues to get in the way of the real coursework!

In [1]:
# Version checks
import importlib
def version_greater_equal(v1, v2):
    for x, y in zip(v1.split('.'), v2.split('.')):
        if int(x) != int(y):
            return int(x) > int(y)
    return True

assert version_greater_equal('1.2.3', '0.1.1')
assert version_greater_equal('1.2.3', '0.5.1')
assert version_greater_equal('1.2.3', '1.2.3')
assert version_greater_equal('0.22.0', '0.20.3')
assert not version_greater_equal('1.1.1', '1.2.3')
assert not version_greater_equal('0.5.1', '1.2.3')
assert not version_greater_equal('0.20.3', '0.22.0')

def version_check(libname, min_version):
    m = importlib.import_module(libname)
    print ("%s version %s is " % (libname, m.__version__))
    print ("OK"
           if version_greater_equal(m.__version__, min_version)
           else "out-of-date. Please upgrade!")

version_check("numpy", "1.21.5")
version_check("matplotlib", "3.5.2")
version_check("pandas", "1.4.4")
version_check("nltk", "3.7")
version_check("keras", "2.11.0")
version_check("tensorflow", "2.11.0")

numpy version 1.26.4 is 
OK
matplotlib version 3.7.1 is 
OK
pandas version 2.1.4 is 
OK
nltk version 3.8.1 is 
OK
keras version 3.4.1 is 
OK
tensorflow version 2.17.0 is 
OK


## TensorFlow

We'll be using [TensorFlow](tensorflow.org) to build deep learning models this semester. TensorFlow is a whole programming system in itself, based around the idea of a computation graph and deferred execution. We'll be talking a lot more about it in Assignment 1, but for now you should just test that it loads on your system.

Run the cell below; you should see:
```
Hello, TensorFlow!
42
```

In [2]:
import tensorflow as tf

hello = tf.constant("Hello, TensorFlow!")
tf.print(hello)

a = tf.constant(10)
b = tf.constant(32)
tf.print((a+b))

Hello, TensorFlow!
42


## NLTK

[NLTK](http://www.nltk.org/) is a large compilation of Python NLP packages. It includes implementations of a number of classic NLP models, as well as utilities for working with linguistic data structures, preprocessing text, and managing corpora.

NLTK is included with Anaconda, but the corpora need to be downloaded separately. Be warned that this will take up around 3.2 GB of disk space if you download everything! If this is too much, you can download individual corpora as you need them through the same interface.

Type the following into a Python shell on the command line. It'll open a pop-up UI with the downloader:

```
import nltk
nltk.download()
```

Alternatively, you can download individual corpora by name. The cell below will download the famous [Reuters-21578 benchmark corpus](https://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html):

In [3]:
import nltk
assert(nltk.download('punkt'))
assert(nltk.download('reuters'))  # should return True if successful, or already installed

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package reuters to /root/nltk_data...


Now we can look at a few sentences. Expect to see:
```
ASIAN EXPORTERS FEAR DAMAGE FROM U . S .- JAPAN RIFT Mounting trade friction between the U . S . And Japan has raised fears among many of Asia ' s exporting nations that the row could inflict far - reaching economic damage , businessmen and officials said .

They told Reuter correspondents in Asian capitals a U . S . Move against Japan might boost protectionist sentiment in the U . S . And lead to curbs on American imports of their products .
```

In [4]:
from nltk.corpus import reuters
# Look at the first two sentences
for s in reuters.sents()[:2]:
    print(" ".join(s))
    print("")

ASIAN EXPORTERS FEAR DAMAGE FROM U . S .- JAPAN RIFT Mounting trade friction between the U . S . And Japan has raised fears among many of Asia ' s exporting nations that the row could inflict far - reaching economic damage , businessmen and officials said .

They told Reuter correspondents in Asian capitals a U . S . Move against Japan might boost protectionist sentiment in the U . S . And lead to curbs on American imports of their products .



NLTK also includes a sample of the [Penn treebank](https://www.cis.upenn.edu/~treebank/), which we'll be using later in the course for parsing and part-of-speech tagging. Here's a sample of sentences, and an example tree. Expect to see:
```
The top money funds are currently yielding well over 9 % .

(S
  (NP-SBJ (DT The) (JJ top) (NN money) (NNS funds))
  (VP
    (VBP are)
    (ADVP-TMP (RB currently))
    (VP (VBG yielding) (NP (QP (RB well) (IN over) (CD 9)) (NN %))))
  (. .))
```

In [5]:
assert(nltk.download("treebank"))  # should return True if successful, or already installed
print("")
from nltk.corpus import treebank
# Look at the parse of a sentence.
# Don't worry about what this means yet!
idx = 45
print(" ".join(treebank.sents()[idx]))
print("")
print(treebank.parsed_sents()[idx])

[nltk_data] Downloading package treebank to /root/nltk_data...
[nltk_data]   Unzipping corpora/treebank.zip.



The top money funds are currently yielding well over 9 % .

(S
  (NP-SBJ (DT The) (JJ top) (NN money) (NNS funds))
  (VP
    (VBP are)
    (ADVP-TMP (RB currently))
    (VP (VBG yielding) (NP (QP (RB well) (IN over) (CD 9)) (NN %))))
  (. .))


We can also look at the [Europarl corpus](http://www.statmt.org/europarl/), which consists of *parallel* text - a sentence and its translations to multiple languages. You should see:
```
ENGLISH: Resumption of the session I declare resumed the session of the European Parliament adjourned on Friday 17 December 1999 , and I would like once again to wish you a happy new year in the hope that you enjoyed a pleasant festive period .
```
and its translation into French and Spanish.

In [6]:
assert(nltk.download("europarl_raw"))  # should return True if successful, or already installed
print("")
from nltk.corpus import europarl_raw

idx = 0

print("ENGLISH: " + " ".join(europarl_raw.english.sents()[idx]))
print("")
print("FRENCH: " + " ".join(europarl_raw.french.sents()[idx]))
print("")
print("SPANISH: " + " ".join(europarl_raw.spanish.sents()[idx]))

[nltk_data] Downloading package europarl_raw to /root/nltk_data...
[nltk_data]   Unzipping corpora/europarl_raw.zip.



ENGLISH: Resumption of the session I declare resumed the session of the European Parliament adjourned on Friday 17 December 1999 , and I would like once again to wish you a happy new year in the hope that you enjoyed a pleasant festive period .

FRENCH: Reprise de la session Je déclare reprise la session du Parlement européen qui avait été interrompue le vendredi 17 décembre dernier et je vous renouvelle tous mes vux en espérant que vous avez passé de bonnes vacances .

SPANISH: Reanudación del período de sesiones Declaro reanudado el período de sesiones del Parlamento Europeo , interrumpido el viernes 17 de diciembre pasado , y reitero a Sus Señorías mi deseo de que hayan tenido unas buenas vacaciones .
