## Here we will do an implementation of the neural network language model 

### FNN architecture

The architecture of the Forward Neural Network. 

* $n$ context size
* $m$ the number of features associated with each word (ex: m = 100, Each word is represented by a vector of size 100).
* $C$ is size $|V|\times m$

$$y = b + Wx + U\tanh(d + Hx)$$

Where:

* $x = (C(w_{t-1}), C(w_{t-2}), \ldots, C(w_{t-n+1}))$, vector of size $m\times(n-1)$
* $h$ be the number of hidden units
* $H$ Corresponds to the dense layer. $H$ has $m\times(n-1)$ columns and $h$ rows
* $d$ Corresponds to the dense layer. $d$ is a vector of size $h$
* $U$ Corresponds to the second dense layer. $U$ has $h$ columns $|V|$ lines
* W dense **(can be equal to zero)**
* $b$ vector of size $|V|$ 


Total number of parameters

$ |V |(1 + nm + h) + h(1 + (n − 1)m)$

Input data
=====

For n=4

$$D = [(2, 10, 3, 5), (8, 30, 2, 20), ...]$$

In [36]:
import re
import numpy as np
import itertools
import pandas as pd
import numpy as np
import re
import os
from tqdm import tqdm
from utility import text_preprocessing, create_unique_word_dict
import csv 
%matplotlib inline
import matplotlib.pyplot as plt

In [37]:
np.random.seed(0)
m = 10
sizeV = 5
C = np.random.randn(sizeV, m)
a="ca|ca"
for i in range(len(a)):
    print(i)
a.replace(a[0],"")

0
1
2
3
4


'a|a'

In [34]:
a="http://arxiv.org/abs/1303.6933v1|Hans Grauert (1930-2011)|Alan Huckleberry|math.HO|Hans Grauert died in September of 2011. This article reviews his life in mathematics and recalls some detail his major accomplishments.|2013-03-27T19:23:57Z|2013-03-27T19:23:57Z|math"

In [86]:
texts=pd.read_csv("input/arxiv_test.csv",error_bad_lines=False)
texts = [x for x in texts['text']]
#for i in range(len(texts)):
#    texts[i]=(texts[i].split('|'))[4]
    
texts

b'Skipping line 30: expected 5 fields, saw 8\nSkipping line 46: expected 5 fields, saw 8\nSkipping line 54: expected 5 fields, saw 6\nSkipping line 64: expected 5 fields, saw 12\nSkipping line 71: expected 5 fields, saw 6\nSkipping line 86: expected 5 fields, saw 8\nSkipping line 106: expected 5 fields, saw 9\nSkipping line 115: expected 5 fields, saw 6\nSkipping line 126: expected 5 fields, saw 6\nSkipping line 132: expected 5 fields, saw 7\nSkipping line 134: expected 5 fields, saw 7\nSkipping line 136: expected 5 fields, saw 6\nSkipping line 169: expected 5 fields, saw 7\nSkipping line 190: expected 5 fields, saw 6\nSkipping line 199: expected 5 fields, saw 7\nSkipping line 203: expected 5 fields, saw 11\nSkipping line 208: expected 5 fields, saw 6\nSkipping line 215: expected 5 fields, saw 6\nSkipping line 217: expected 5 fields, saw 7\nSkipping line 227: expected 5 fields, saw 7\nSkipping line 230: expected 5 fields, saw 6\nSkipping line 242: expected 5 fields, saw 6\nSkipping lin

['2017] that every essentially countable equivalence relation that is induced by an action of abelian non-archimedean Polish group is essentially hyperfinite.|2020-01-16T15:09:02Z|2020-01-16T15:09:02Z',
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 ' independent of the dimension of the hypercube.|2018-08-28T18:32:05Z|2018-08-28T18:32:05Z',
 nan,
 nan,
 nan,
 ' can be extended from continuous maps between locally compact Hausdorff spaces to separated locally proper maps between arbitrary topological spaces.|2014-04-30T08:35:41Z|2014-11-05T20:38:23Z',
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 na

In [3]:
print(C)

[[ 1.76405235  0.40015721  0.97873798  2.2408932   1.86755799 -0.97727788
   0.95008842 -0.15135721 -0.10321885  0.4105985 ]
 [ 0.14404357  1.45427351  0.76103773  0.12167502  0.44386323  0.33367433
   1.49407907 -0.20515826  0.3130677  -0.85409574]
 [-2.55298982  0.6536186   0.8644362  -0.74216502  2.26975462 -1.45436567
   0.04575852 -0.18718385  1.53277921  1.46935877]
 [ 0.15494743  0.37816252 -0.88778575 -1.98079647 -0.34791215  0.15634897
   1.23029068  1.20237985 -0.38732682 -0.30230275]
 [-1.04855297 -1.42001794 -1.70627019  1.9507754  -0.50965218 -0.4380743
  -1.25279536  0.77749036 -1.61389785 -0.21274028]]


In [10]:
np.shape(C)

(5, 10)

In [14]:
C[[2, 4, 3], :]

array([[-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502,  2.26975462,
        -1.45436567,  0.04575852, -0.18718385,  1.53277921,  1.46935877],
       [-1.04855297, -1.42001794, -1.70627019,  1.9507754 , -0.50965218,
        -0.4380743 , -1.25279536,  0.77749036, -1.61389785, -0.21274028],
       [ 0.15494743,  0.37816252, -0.88778575, -1.98079647, -0.34791215,
         0.15634897,  1.23029068,  1.20237985, -0.38732682, -0.30230275]])

In [21]:
X=[[1, 2,4], [0,3, 4]]
temp = C[X, :]
#print(temp)
nb_features=10
result = np.reshape(temp, (np.shape(X)[0], m * np.shape(X)[1]))
print(np.shape(result))

(2, 30)


In [22]:
np.random.seed(0)
C = np.random.randn(4, 10)

In [9]:
np.shape(np.ravel(C))

(40,)

In [10]:
C.shape

(4, 10)

In [11]:
X = np.array([[1, 2, 3],[0,2,1]])

In [11]:
X.shape

(2, 3)

In [23]:
np.shape(np.reshape(C[X,:],(np.shape(X)[0],10*np.shape(X)[1])))

(2, 30)

In [17]:
np.reshape((np.concatenate(C[X, :])),(2,30))

array([[ 0.14404357,  1.45427351,  0.76103773,  0.12167502,  0.44386323,
         0.33367433,  1.49407907, -0.20515826,  0.3130677 , -0.85409574,
        -2.55298982,  0.6536186 ,  0.8644362 , -0.74216502,  2.26975462,
        -1.45436567,  0.04575852, -0.18718385,  1.53277921,  1.46935877,
         0.15494743,  0.37816252, -0.88778575, -1.98079647, -0.34791215,
         0.15634897,  1.23029068,  1.20237985, -0.38732682, -0.30230275],
       [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
        -0.97727788,  0.95008842, -0.15135721, -0.10321885,  0.4105985 ,
        -2.55298982,  0.6536186 ,  0.8644362 , -0.74216502,  2.26975462,
        -1.45436567,  0.04575852, -0.18718385,  1.53277921,  1.46935877,
         0.14404357,  1.45427351,  0.76103773,  0.12167502,  0.44386323,
         0.33367433,  1.49407907, -0.20515826,  0.3130677 , -0.85409574]])

In [14]:
np.shape(np.concatenate(C[:, np.concatenate(X)]).reshape((X.shape[0], X.shape[1]*C.shape[0])))

(2, 12)

In [122]:
np.shape(C[:, np.concatenate(X)])

(4, 6)

In [128]:
class Project_and_concat() : 
    """
    The input is a vector x = (w_{t-1}, w_{t-2}, ..., w_{t-n+1})
    For example, for n=4 the input vector x can be
    (4, 2, 10)
    where 4, 2 and 10 are the indexes of the corresponding words.
    """
    def __init__(self, nb_features,dict_size) : # V*m ou m*V
        self.nb_features = nb_features
        self.dict_size = dict_size
        self.C = np.random.randn(dict_size,nb_features)
        self.nb_params = nb_features * dict_size # Nombre de parametres de la couche
        self.save_X = None # Parametre de sauvegarde des donnees
    def set_params(self,params) : 
        # Permet de modifier les parametres de la couche, en entree, prend un vecteur de la taille self.nb_params
        pass
    def get_params(self) : 
        # Rend un vecteur de taille self.params qui contient les parametres de la couche
        return np.ravel(self.C)
    def forward(self,X) : 
        # calcul du forward, X est le vecteur des donnees d'entrees
        self.save_X = np.copy(X)
        return np.ravel(np.concatenate(C[X, :]))
    def backward(self,grad_sortie) :  
        # retropropagation du gradient sur la couche, 
        #grad_sortie est le vecteur du gradient en sortie
        #Cette fonction rend :
        #grad_local, un vecteur de taille self.nb_params qui contient le gradient par rapport aux parametres locaux
        #grad_entree, le gradient en entree de la couche 
        grad_local=None
        grad_entree=np.reshape(grad_sortie,(dict_size,nb_features))
        return grad_local,grad_entree
        
# 2 étapes dans cette couche, les selections des lignes de C puis la concaténation
# est ce que la selection des lignes de C rentre dans le calcul du dradient d'entree



In [3]:
A = np.array([[1, 2], [3, 4]])

In [4]:
A

array([[1, 2],
       [3, 4]])

In [8]:
np.dot(np.ones(4), np.concatenate(A))

10.0

In [28]:
a=np.array([[1,2,3],[4,5,6]])
b=np.array([2,3,4])

a=np.exp(a)
s=np.sum(a,axis=1)
print(a)
print(a.T)
print(s)
print(a.T/s)
print(np.sum(a.T/s,axis=0))

[[  2.71828183   7.3890561   20.08553692]
 [ 54.59815003 148.4131591  403.42879349]]
[[  2.71828183  54.59815003]
 [  7.3890561  148.4131591 ]
 [ 20.08553692 403.42879349]]
[ 30.19287485 606.44010263]
[[0.09003057 0.09003057]
 [0.24472847 0.24472847]
 [0.66524096 0.66524096]]
[1. 1.]


In [45]:
a=np.array([[1,2,3],[4,5,6]])
b=np.array([1,2,3])

print(np.sum(a,axis=0))

print(a)

print(np.sum(a,axis=0))

print(np.exp(a)/np.sum(np.exp(a),axis=0))

[5 7 9]
[[1 2 3]
 [4 5 6]]
[5 7 9]
[[0.04742587 0.04742587 0.04742587]
 [0.95257413 0.95257413 0.95257413]]


In [46]:
import Neuralword as Neur

print(np.shape(a))
print((Neur.ilogit(a)))

(2, 3)
[[0.04742587 0.04742587 0.04742587]
 [0.95257413 0.95257413 0.95257413]]


In [10]:
l=np.array([-5,-7,-10])
np.argmax(l)

0