## Training a an RMN

In [25]:
import numpy as np
import pandas as pd
from scipy import stats
import os

In [26]:
os.chdir("../../../scripts/assembly")
from session_speaker_assembly import *
from preprocess import *
from document import *
from constant import SPEECHES, SPEAKER_MAP, HB_PATH

FileNotFoundError: [Errno 2] No such file or directory: '../../../scripts/assembly'

In [None]:
df = subject_docs(session = 111, path = HB_PATH, subject = "health", min_len_tokens=100)

In [None]:
df.head()

In [None]:
speaker_speeches = df.groupby("speakerid")

In [None]:
speaker_keys = list(speaker_speeches.groups.keys())

In [None]:
speaker_keys[:10]

In [None]:
len(speaker_keys)

There are a total of 535 Members of Congress. 100 serve in the U.S. Senate and 435 serve in the U.S. House of Representatives. A length of 50 suggests that nearly everyone commented on "health" (in a speech of more than 50 words) at some point.

In [None]:
from keras.preprocessing.text import Tokenizer

In [None]:
tokenizer = Tokenizer()

In [None]:
tokenizer.fit_on_texts(df["speech"].values)

In [None]:
tokenizer.word_index

In [None]:
vocab_size = len(tokenizer.word_index)
vocab_size

In [None]:
speaker_speeches.get_group(speaker_keys[0]).speech.values

In [None]:
x_train = tokenizer.texts_to_sequences(speaker_speeches.get_group(speaker_keys[0]).speech.values)

In [None]:
x_train

In [None]:
from keras.preprocessing.sequence import pad_sequences

In [None]:
max_len = WINDOW_DEFAULT + 1
x_train_padded = pad_sequences(x_train, maxlen=max_len, padding="post")

In [None]:
x_train_padded

I think that the sentences need to be in integer-tokenized form.

From Iyyer et el.

"Each input to the RMN is a tuple that contains identifiers for a book and two character, as well as the spans corresponding to their relationship: $(b, c_1, c_2, S_{c_1,c_2})$. Given one such input, our objective is to reconstruct $S_(c_1,c_2)$ using a linear combination of relationship descriptors from R as shown in Figure 2; we now describe this process formally."


### Needs for Baseline goal

Let...
* $s_{v_t}$ be the $t_{th}$ span of text in the span set $S_{c_1,c_2}$
* $v_{s_t}$ be the vector that results from taking the element-wise average of the word vectors in $s_{v_t}$
* $d$ be the dimension of the embedding
* $k$ be the number of decsriptors


Compute Sequence: Given $s_{v_t}$, do the following steps:
1. compute avg speech vector, $v_{s_t}$,
    * $v_{s_t} \in \mathbb{R}^{d}$
2. compute hidden state with Relu activation: 
    * $h_t =  relu \space (W_h \cdot v_{s_t})$
    * $W_h \in \mathbb{R}^{d \times d}$ 
    * $h_t \in  \mathbb{R}^{d}$
3. get distribution over topics using another hidden layer: 
    * $d_t = softmax \space (W_d \cdot h_t)$
    * $W_d \in  \mathbb{R}^{k \times d}$
    * $d_t \in  \mathbb{R}^{k}$
    * $d_{t,i} \in (0,1) \space \forall i$ 
4. recompose original sentence using the distribution over descriptors and the descriptor matrix:
    * $r_t = R^Td_t$
    * $R^T \in \mathbb{R}^{d \times k}$
    * $r_t \in \mathbb{R}^{d}$
5. score distance between $r_t$ and $v_{s_t}$
    * $distance = dist(r_t, v_{s_t})$
    
    
#### Notes on implementing it with keras
Every step that uses a matrix multiplication above can be implemented in keras using a dense layer, formatted like this:
* `h = keras.layers.Dense(units = a, input_shape = (b, ), activation= "the_activation")(prev_layer)`
    * This will make the dense layer use a weight matrix $W \in \mathbb{R}^{a \times b}$, and activation "`the_activation`"

In [21]:
# Imports
import keras
import tensorflow as tf
from keras.layers import Embedding, Dense, Lambda

In [22]:
d = 100
k = 20

In [None]:
wordids = keras.layers.Input(shape=(max_len,))

# Embed the wordids.
e = keras.layers.Embedding(input_dim=vocab_size, 
                           output_dim=d, 
                           input_length=max_len)(wordids)

# Take elementwise average over vectors
a = keras.layers.Lambda(lambda x: keras.backend.mean(x, axis=1))(e)

# dense layer
ht = keras.layers.Dense(units = d, input_shape = (d, ), activation = "relu")(a)

# dense layer with softmax activation, (where previous states will eventually be inserted) 
dt = keras.layers.Dense(units = k, input_shape = (d, ), activation = "softmax")(ht)

# reconstruction layer
rt = keras.layers.Dense(units = d, input_shape = (k, ), activation = "linear")(dt)

# rt = keras.layers.Dense(units = d, input_shape = (k, ), activation = "linear")(a)

In [None]:
print(rt)

In [None]:
model.summary()

In [None]:
#compile model
model = keras.Model(inputs=wordids, outputs=rt)
model.compile(optimizer = 'adam', loss="categorical_crossentropy")

In [None]:
model.fit(x=x_train_padded, y=x_train_padded, batch_size=1)

In [None]:
for l in model.layers:
    print(l)
    print(50*"=")
    print("input shape", l.input_shape)
    print("output shape", l.output_shape)

In [None]:
from keras.models import Sequential
from keras.layers import Flatten, Dropout

In [None]:
model = Sequential()
model.add(Flatten(input_shape=(4,)))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='sigmoid'))

In [None]:
mo