# Preface

This first exercise is primarily meant to ensure that everything is working fine. Complete the assignment by filling in code where necessary using Jupyter in Colab.

# Numpy tutorial
You'll need to know some numpy to work with vectors. If you are not familiar to numpy, we recommend to check the following Python/Numpy tutorial:

   http://cs231n.github.io/python-numpy-tutorial/

We recommend to follow it up to scipy section at least, but learning Matplotlib might be helpful plots learning curves and similar.

# Making sure everything works

If you are planning to run the labs in your laptop, you'll need to make sure you have Python, Jupyter, and TensorFlow installed. For some help with that, see 'Getting set up' document in  [Egela](https://egela.ehu.eus/course/view.php?id=43863). Please, make all the installations outside the class. 

Once you've done all of that, you should open this notebook in Jupyter and run the below:

In [None]:
import numpy as np

If that worked as expected, you should be able to run the below a few times and get different outcomes each time.

In [None]:
np.random.rand()

0.8353166254236456

Now let's try importing and testing TensorFlow. This can be a bit trickier to install properly. Even once it's installed, running this line should take a few seconds.

In [None]:
%tensorflow_version 2.x
import tensorflow as tf
tf.__version__

'2.7.0'

First we define some tensor variables.

In [None]:
random_scalar = tf.random.uniform(())

Then we call a tensorflow function to get its value:

In [None]:
tf.print(random_scalar)


0.711256742


Variables can depend on other variables...

In [None]:
double_random_scalar = 2 * random_scalar
double_random_scalar_gt_one = double_random_scalar > 1

In [None]:
tf.print(double_random_scalar_gt_one)

1


Both TensorFlow and NumPy allow nearly any variable to take the form of a tensor (i.e., a vector, a matrice, or a higher-order such structure):

In [None]:
np.random.rand(2,3)

array([[0.27802865, 0.59786088, 0.90666511],
       [0.48350597, 0.13932249, 0.28518744]])

In [None]:
random_tensor = tf.random.uniform((2,3))
double_random_tensor = 2 * random_tensor
double_random_tensor_gt_one = double_random_tensor > 1
tf.print(random_tensor)
tf.print(double_random_tensor_gt_one)


[[0.606512547 0.325416565 0.817865968]
 [0.0228177309 0.417286277 0.20732224]]
[[1 0 1]
 [0 0 0]]


## 1. Loading the data

Let's load the Stanford Sentiment Treebank. The data can be originaly downloaded from here: [the train/dev/test Stanford Sentiment Treebank distribution](http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip). If you already copied dl4nlp_labs folder to your Colab Notebooks``, you should have the data for this lab dl4nlp_labs/data/trees`.

In order to load the data, yiu we'll need to mount your Drive folder first and give the access to the Notebook. This will require one-step authentication. Please when you run the cell below follow the instructions.

Once you mount everything, make sure sst_home = 'drive/My Drive/Colab Notebooks/dl4nlp_labs/data/trees/'' is correct path for the data.

Please run the cell below to upload the following data files:

    dl4nlp_labs/data/trees/train.txt.
    dl4nlp_labs/data/trees/dev.txt.
    dl4nlp_labs/data/trees/test.txt.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# set seed for replicability of results
import numpy as np
import tensorflow as tf

np.random.seed(1)
tf.random.set_seed(2)

In [None]:
# Load the data
import re

# Let's do 2-way positive/negative classification instead of 5-way    
def load_sst_data(path,
                  easy_label_map={0:0, 1:0, 2:None, 3:1, 4:1}):
    data = []
    with open(path) as f:
        for i, line in enumerate(f): 
            example = {}
            example['label'] = easy_label_map[int(line[1])]
            if example['label'] is None:
                continue
            
            # Strip out the parse information and the phrase labels---we don't need those here
            text = re.sub(r'\s*(\(\d)|(\))\s*', '', line)
            example['text'] = text[1:]
            data.append(example)
    return data

sst_home = 'drive/My Drive/Colab Notebooks/dl4nlp_labs/data/trees/'
training_set = load_sst_data(sst_home + 'train.txt')
dev_set = load_sst_data(sst_home + 'dev.txt')
test_set = load_sst_data(sst_home + 'test.txt')

print('Training size: {}'.format(len(training_set)))
print('Dev size: {}'.format(len(dev_set)))
print('Test size: {}'.format(len(test_set)))

Training size: 6920
Dev size: 872
Test size: 1821


## 2. Examining the data

In [None]:
# Print a sample of negative text chunks
[example["text"] for example in training_set if example["label"] == 0][:10]

["This is n't a new idea .",
 "... a sour little movie at its core ; an exploration of the emptiness that underlay the relentless gaiety of the 1920 's ... The film 's ending has a `` What was it all for ? ''",
 'Made me unintentionally famous -- as the queasy-stomached critic who staggered from the theater and blacked out in the lobby .',
 'The modern-day royals have nothing on these guys when it comes to scandals .',
 "It 's only in fairy tales that princesses that are married for political reason live happily ever after .",
 'An absurdist spider web .',
 'By no means a slam-dunk and sure to ultimately disappoint the action fans who will be moved to the edge of their seats by the dynamic first act , it still comes off as a touching , transcendent love story .',
 "It 's not a great monster movie .",
 "Too often , Son of the Bride becomes an exercise in trying to predict when a preordained `` big moment '' will occur and not `` if . ''",
 'A party-hearty teen flick that scalds like aci

In [None]:
# Print a sample of positive text chunks
[example["text"] for example in training_set if example["label"] == 1][:10]

["The Rock is destined to be the 21st Century 's new `` Conan '' and that he 's going to make a splash even greater than Arnold Schwarzenegger , Jean-Claud Van Damme or Steven Segal .",
 "The gorgeously elaborate continuation of `` The Lord of the Rings '' trilogy is so huge that a column of words can not adequately describe co-writer\\/director Peter Jackson 's expanded vision of J.R.R. Tolkien 's Middle-earth .",
 'Singer\\/composer Bryan Adams contributes a slew of songs -- a few potential hits , a few more simply intrusive to the story -- but the whole package certainly captures the intended , er , spirit of the piece .',
 'Yet the act is still charming here .',
 "Whether or not you 're enlightened by any of Derrida 's lectures on `` the other '' and `` the self , '' Derrida is an undeniably fascinating and playful fellow .",
 'Just the labour involved in creating the layered richness of the imagery in this chiaroscuro of madness and light is astonishing .',
 'Part of the charm of 

## Assignments
### Part 1:

Write a python function using NumPy to compute the following function of `x`. You can set $\mu$ to 0 and $\sigma$ to 1. This happens to be the probability distribution function for a normal distribution, but we're just using it as an arbitrary demo, and you shouldn't use any preexisting code for this particular distribution. You'll likely need to search for relevant NumPy documentation.

![The PDF of the standard normal distribution.](https://drive.google.com/uc?id=11NpGnvDTRhnEkFwDsRYdLKMYDGHUMODM)



In [None]:
def np_fn(x):
  mu = 0
  sigma = 1
  sq_sigma = np.square(sigma)

  stp_one = 1/np.sqrt(2*sq_sigma*np.pi)
  stp_two = np.exp(-np.square(x-mu) / (2*sq_sigma))
  stp_three = stp_one * stp_two
  
  return stp_three

Assume `x` is a vector. You should be able to run the following command and get the subsequent result:

In [None]:
x = np.array([0, 1, 2, 3])
np_fn(x)

array([0.39894228, 0.24197072, 0.05399097, 0.00443185])

Expected output: `array([ 0.39894228,  0.24197072,  0.05399097,  0.00443185])
`

### Part 2:
Now try to write the same function (`tf_fn(x)`) in TensorFlow.

In [None]:
import math as m

def tf_fn(x):
  mu = 0
  sigma = 1
  pi = tf.constant(m.pi)
  sq_sigma = sigma**2

  stp_one = 1/tf.sqrt(2*sq_sigma*pi)
  stp_two = tf.exp(-tf.square(x-mu) / (2*sq_sigma))
  stp_one = tf.cast(stp_one, tf.float32)
  stp_two = tf.cast(stp_two, tf.float32)

  print(stp_one, stp_two)
  stp_three = stp_one * stp_two
  
  return stp_three

You should be able to run command below, and get the same output as above.

In [None]:
tf.print(tf_fn(x))

tf.Tensor(0.39894223, shape=(), dtype=float32) tf.Tensor([1.         0.60653067 0.13533528 0.011109  ], shape=(4,), dtype=float32)
[0.398942232 0.241970703 0.0539909601 0.00443184795]


# Atribution:
Adapted by Oier Lopez de Lacalle, Olatz Perez de Viñaspre and Ander Barrena, based on a notebook by Sam Bowman at NYU