# DEEP LEARNING WITH KERAS, AN EXAMPLE

This is an example of using deep learning (a subset of machine learning), to model rules and patterns to transform passed in inputs to desired outputs.  

This post is written above an introductory level, while aiming to remain accessible, interesting and hands-on.

Let’s take a small data set and attempt to reach reasonable predictions…

The idea is to take a text snippet and identify a class by tagging a target label.

The type of text snippets are 3 to 7 word phrases, examples like: end of the road, go for the green, hard at work.
The objective is to predict the language of the text, hence this is a classification problem.

The 15 language categories are:
French, Italian, Bangla, Tagalog, Spanish, Japanese, Korean, Hindi, German, Greek, Somali, Portuguese, Czech, Croatian and Romanian

The dataset: 20 phrases for training, which is what the model will learn from (`trainx.txt` & `trainy.txt`), 2 phrases for validation, which is the first chance to see how well the model generalizes to unseen data (`valx.txt` & `valy.txt`) and 1 phrase to test the accuracy of the classifier (`testx.txt` & `testy.txt`).

Using the Keras library in Python enables quick prototyping and has many built in features for experimenting with neural network architectures.  The architecture is the network design describing how to go from input to output.  Neural nets can be constructed in a variety of ways similar to the floor plan in an office or store, or the layout for orchestrating traffic (think jug-handles, turn lanes, overpass ramps, or round-about systems).  Back on topic…


**Let’s access the data.**

In [1]:
trainx = open('trainx.txt', 'r', encoding = "utf8")
trainx = trainx.readlines()

valx = open('valx.txt', 'r', encoding = "utf8")
valx = valx.readlines()

testx = open('testx.txt', 'r', encoding = "utf8")
testx = testx.readlines()

trainy = open('trainy.txt', 'r', encoding = "utf8")
trainy = trainy.readlines()

valy = open('valy.txt', 'r', encoding = "utf8")
valy = valy.readlines()

testy = open('testy.txt', 'r', encoding = "utf8")
testy = testy.readlines()

print(trainx[0:2])
print(trainy[0:2])

['{"text":"maître de la maison"}\n', '{"text":"padrone di casa"}\n']
['{"classification":"fr"}\n', '{"classification":"it"}\n']



Opening the files, and looking at the first few records, some clean up is required to remove all text that is not part of the input phrase or target language label. This is handled by defining a function `multipleReplace` and specifying the characters to discard.

In [2]:
def multipleReplace(text, wordDict):
    for key in wordDict:
        text = text.replace(key, wordDict[key])
    return text

rep_x = {'{"text":"': '', '"}': '', '\n': ''} 

train_x=[]
for i in range(len(trainx)):
    w = multipleReplace(trainx[i], rep_x)
    train_x.append(w)
    
val_x=[]
for i in range(len(valx)):
    w = multipleReplace(valx[i], rep_x)
    val_x.append(w)
    
test_x=[]
for i in range(len(testx)):
    w = multipleReplace(testx[i], rep_x)
    test_x.append(w)

rep_y = {'{"classification":"': '', '"}\n': '', '"}': ''}

train_y=[]
for i in range(len(trainy)):
    w = multipleReplace(trainy[i], rep_y)
    train_y.append(w)
    
val_y=[]
for i in range(len(valy)):
    w = multipleReplace(valy[i], rep_y)
    val_y.append(w)

test_y=[]
for i in range(len(testy)):
    w = multipleReplace(testy[i], rep_y)
    test_y.append(w)

print(train_x[0])
print(train_y[0])

maître de la maison
fr


Keras has a tokenizer which will be explored further below in the post, but first lets use nltk (you may need to run `nltk.download(‘punkt’)` before `import nltk`).  Tokenizing the phrase, means parsing the statement for the individual words often referred to as tokens.  

In [3]:
import nltk 

train_wrds = []
train_doc = []

for i in range(len(train_x)):
    wd = nltk.word_tokenize(train_x[i])
    train_wrds.extend(wd)
    train_doc.append((wd, train_y[i]))

train_wrds = sorted(list(set(train_wrds)))
train_doc[0:2]

[(['maître', 'de', 'la', 'maison'], 'fr'), (['padrone', 'di', 'casa'], 'it')]

This step merges the tokens of a phrase with the language label in a list called `train_doc`, and also creates a unique list of the individual tokens.

The next code block identifies the number of languages as class categories, and builds a dictionary to one-hot-encode the language labels with a binary indicator representation as to where among the language categories a particular language is.  One Hot Encoding is a way of numerically representing ‘yes’ and ‘no’.

In [4]:
classes = set(train_y)

one_hot_classes = []
empty_output = [0] * len(classes)

for i in range(len(classes)):
    en = list(empty_output)
    en[i] = 1
    one_hot_classes.append(en)

dds = {}  #create dictionary - alternate method discussed further below
dds = zip(classes, one_hot_classes)
dds = dict(dds)
tag = list(dds.keys())
dds

{'be': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'cr': [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
 'cz': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 'fr': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
 'ge': [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 'gr': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
 'in': [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'it': [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'ja': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 'ko': [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'pg': [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'ro': [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'so': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
 'sp': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
 'ta': [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]}

The following step is to change the `train_doc` from words to numbers.  

The phrase snippet will now be represented as a bag of words, where every word from every phrase creates a universe of words we will call a bag.  For each phrase, or record instance row in data talk, a 1 for yes will appear where that word is positioned in the bag, and 0’s in all other positions.  

As this bag is large, a simple illustration is useful.  

`[‘I like green’, ‘I like blue’]`

If these are the only two phrases, there are 4 unique words in the bag, `'I'`, `'like'`, `'green'`, and `'blue'`.

For a new statement `'I like red'`, the column positions for `'I'` and `'like'` will be switched on to yes as 1’s where the other two word positions in the bag, `'blue'` and `'green'`, will display 0.  

`[‘I like red’] >> [1, 1, 0, 0]`

Similarly the target label is transformed from a text description of say 'be' for Bangla, to the [1, 0, 0, … 0] representation as in the dictionary previously created.

In [5]:
training= []
output_empty = [0] * len(classes)

for dc in train_doc:
    bag = []  
    token_words = dc[0] 
    for ws in train_wrds:
        bag.append(1) if ws in token_words else bag.append(0)
    output_row = list(output_empty)
    output_row[tag.index(dc[1])] = 1
    training.append([bag, output_row])

training[0][1]  #the [1] index signals the output below will display the target label
#training[0][0] will display only the phrase as a one hot encoded representation (one column for each word in the bag)
#training[0] will display both the phrase and target for the first record

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

We now have transformed the text in `train_doc` to a representation in numbers and call this list `training`.

The first record `(['maître', 'de', 'la', 'maison'], 'fr')` is now `([0, 0, ...,0], [0, 0, ...,0])`

The next code block shuffles the records, and splits them into two, the first is the lists of phrases as `xtrain` and second list is the corresponding labels as `ytrain`. 

In [6]:
import random
random.shuffle(training)

xtrain = []
ytrain = []

for i in range(len(training)):
    xx = training[i][0]
    yy = training[i][1]
    xtrain.append(xx)
    ytrain.append(yy)

Similarly the same steps are repeated for the validation data phrases and the test data phrases.

In [7]:
val_wrds = []
val_doc = []

for i in range(len(val_x)):
    wd = nltk.word_tokenize(val_x[i])
    val_wrds.extend(wd)
    val_doc.append((wd, val_y[i]))
    
val_wrds = sorted(list(set(val_wrds)))

test_wrds = []
test_doc = []

for i in range(len(test_x)):
    wd = nltk.word_tokenize(test_x[i])
    test_wrds.extend(wd)
    test_doc.append((wd, test_y[i]))
    
test_wrds = sorted(list(set(test_wrds)))

val_doc[0:2]
test_doc[0:2]

[(['la', 'notte', 'dei', 'morti', 'viventi'], 'it'),
 (['noaptea', 'mortilor', 'vii'], 'ro')]

In [8]:
val = [] 
test = [] 

for dc in val_doc:
    bag = []  
    token_words = dc[0] 
    for ws in train_wrds:
        bag.append(1) if ws in token_words else bag.append(0)
    output_row = list(output_empty)
    output_row[tag.index(dc[1])] = 1
    val.append([bag, output_row])
    
for dc in test_doc:
    bag = []  
    token_words = dc[0] 
    for ws in train_wrds:
        bag.append(1) if ws in token_words else bag.append(0)
    output_row = list(output_empty)
    output_row[tag.index(dc[1])] = 1
    test.append([bag, output_row])
    
random.shuffle(val)

xval = []
yval = []

for i in range(len(val)):
    xx = val[i][0]
    yy = val[i][1]
    xval.append(xx)
    yval.append(yy)

xtest = []
ytest = []

for i in range(len(test)):
    xx = test[i][0]
    yy = test[i][1]
    xtest.append(xx)
    ytest.append(yy)

This has all been elaborate steps of pre-processing data work, also known as preparing the data for modeling and in the format best accepted by the neural network.  

Let’s now construct the neural network architecture.

**Attention Please**

Keras provides a `Sequential` API, which operates like pushing dominoes, where one triggers the next in a sequential order.  There is also a Functional API in Keras which allows for multiple inputs, multiple output, and otherwise architectures that are not restricted to flow sequentially.  Do not let the term API scare you, as it merely implies a functionality that we import in to use.  

`Dense` and `Dropout` are used, where dense simply means the use of a fully connected layer and dropout is a form of regularization.  Fully connected implies each node is connected to each node in the previous and following layers. That’s it.  Dropout is a way to deselect some nodes from influencing the learning and subsequently the decision output of the model.  Think of dropout as when the smartest kids in class are absent, and the other students can not depend on the absentees to quickly raise their hands and supply answers (lol).

The architecture can be altered with more or less layers, more or less neurons, a different optimizer, a different loss function, a different learning rate, varying learning rate decay, different activation functions and kernel initializers.  This is all considered hyper parameter tuning and model construction.  This post aims at demonstrating an example instead of going thru each component of machine learning. 

Neurons are the number of nodes or connections in a layer.  In the first layer of the architecture this is 10000, and the second layer has 5000 neurons.  With respect to choosing the number of neurons, lets interpret each layer as focusing on a specific aspect of learning, while each neuron is focusing on aspects of that specific aspect (apologies for getting all fortune cookie on you).  Onward…

The loss is simply how the neural network evaluates how well it is performing, and the optimizer seeks to minimize the loss, and this is checked by looking at the accuracy.  The compiler is the conclusion of the network architecture.  

Next, the model is `fit` to the data, which means trained with the input phrases and the associated language labels.  In this example, a separate data set has been reserved for validation, but the validation can be done by partitioning some of the training data instead.

`scores` looks at the test data, and evaluates how the model performs to unseen data.  Note, the `evaluate` method requires the inputs to be numpy arrays and not lists.  

`predict_classes` enables the `classification_report` to display how well each category was predicted.  

Let’s look at the actual predictions in the next code block and compare them with the answers.  The phrase in the test data set is “word of the day”.

In [9]:
#This is the architecture below

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
print("Keras backend : ", keras.backend.backend())

input_size = len(train_wrds)

model = Sequential()
model.add(Dense(10000,input_dim=input_size,kernel_initializer="uniform",activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(5000,kernel_initializer="uniform",activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(len(classes),kernel_initializer="uniform",activation="softmax"))
model_optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.001 / 3)
model.compile(loss='categorical_crossentropy',
              optimizer=model_optimizer,
              metrics=['accuracy'])

#This is the architecture above

history = model.fit(xtrain, ytrain,
          epochs=3,
          validation_data=(xval, yval),
          batch_size=32,
          verbose=2,
          shuffle=True)
#history outputs the model training

import numpy as np

scores = model.evaluate(np.array(xtest), np.array(ytest), verbose=1)
print("On Test Data %s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

y_pred = model.predict_classes(xtest)
y_prd = keras.utils.to_categorical(y_pred, num_classes = len(classes))
target_names = list(dds.keys())

from sklearn.metrics import classification_report
print(classification_report(np.array(ytest), y_prd, target_names=target_names))

Using TensorFlow backend.


Keras backend :  tensorflow
Train on 300 samples, validate on 30 samples
Epoch 1/3
 - 4s - loss: 2.8009 - acc: 0.1200 - val_loss: 2.4462 - val_acc: 0.2667
Epoch 2/3
 - 1s - loss: 1.2266 - acc: 0.7200 - val_loss: 1.8725 - val_acc: 0.4000
Epoch 3/3
 - 1s - loss: 0.2990 - acc: 0.9733 - val_loss: 1.5209 - val_acc: 0.5667
On Test Data acc: 53.33%
             precision    recall  f1-score   support

         be       0.00      0.00      0.00         1
         pg       1.00      1.00      1.00         1
         it       0.00      0.00      0.00         1
         ro       0.00      0.00      0.00         1
         ko       0.20      1.00      0.33         1
         in       0.00      0.00      0.00         1
         cr       0.00      0.00      0.00         1
         ge       0.50      1.00      0.67         1
         ta       1.00      1.00      1.00         1
         gr       1.00      1.00      1.00         1
         fr       0.00      0.00      0.00         1
         so       0

  'precision', 'predicted', average, warn_for)


**We have just ran a neural network!**

Performance is okay for initial pass, and in future posts we can refine this approach and explore more advanced methods.  

Three learning passes at the data (epochs), allowed the model to understand the training data well, but the validation data was not able to mirror such performance.  With each pass the network improves, via reduced loss, and rising accuracy for both the training and validation sets.  The test set is on par with the validation accuracy, which is reasonable given the small dataset.  

Lets qualify performance.  On a previous assignment, the model I created reached 96% and I was asked if that was good.  It was terrible!  96% means 1 out of 32 approximately is incorrect.  99% would imply 1 out of 100 is incorrect, and 99.5% would imply 1 out of 200 is incorrect.  The standard for performance is high.  This data set is extremely small with just 20 phrases to train on, so generally more data will help improve the learning.

*Note, once a model has been trained, it can be saved and re-loaded later for use.  

In [10]:
#To see the literal predictions

y_p = model.predict(xtest)
y_p = y_p.argmax(axis=1)
real_Pred=[]
def label():
    for i in range(len(y_p)):
        p = y_p[i]
        pred = tag[int(p)]
        print(pred)
        real_Pred.append(pred)

lb = label() 

fr
ko
so
gr
ko
sp
ta
ko
ge
pg
ko
so
ja
ko
ge


In [11]:
#To see the actual test language labels to compare to above predictions
test_y

['it',
 'ro',
 'so',
 'gr',
 'ko',
 'sp',
 'ta',
 'be',
 'fr',
 'pg',
 'cr',
 'in',
 'ja',
 'cz',
 'ge']

Looking at the two code block outputs above, shows the first two were wrong, followed by the next five being correct, and ultimately 8 of 15 being correct.

You may have noticed the bag will only get bigger with more samples, and the sparse matrix filled predominantly with zeros will take more space and memory, and quickly become inefficient.  Instead of ‘one hot encoding’ the input data, let’s try ‘integer encoding’ the input data so that a finite number of columns can be used, saving space and memory.  

Let’s use Keras’ `Tokenizer`, `pad_sequences` and `to_catgeorical` (for one hot encoding), and refer back to the uploaded data, stripped of everything but the phrases and labels, `train_x` and `train_y`.

The max phrase size is 7 words and let’s limit the integer encoding to 1000, which we call `top_words`.

`Tokenizer` has some useful functionality, such as methods: `fit_on_texts` which creates a vocabulary, `texts_to_sequences` which integer encodes, and `word_index` which creates a dictionary. **Wow!

`pad_sequences` normalizes the numeric representation for situations of phrases varying between 3 and 7 words, so that the result is each integer encoded sequence will not be the same length of columns. 

Using the same methods, with the exception of `fit_to_texts` again, allows the validation and test data to reference the same bag of words for indexing, thus creating the desired same relationships.

That takes care of the inputs.  For the outputs, scikit library provides `label_encoder` to transform the language label to an integer, and using Keras’ `to_categorical` alters to one-hot-encoding format (scikit has `label_binarizer` which does this as well).

In [12]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences 
from keras.utils import to_categorical

max_phrase_size = 7 #maximum length of the sentence
embedding_vecor_length = 3
top_words = 1000

#for inputs - integer encoding
tokenizer = Tokenizer(top_words)
tokenizer.fit_on_texts(train_x)

seqns = tokenizer.texts_to_sequences(train_x)
word_index = tokenizer.word_index
xtrain2 = pad_sequences(seqns, max_phrase_size)

seqns2 = tokenizer.texts_to_sequences(val_x)
xval2 = pad_sequences(seqns2, max_phrase_size)

seqns3 = tokenizer.texts_to_sequences(test_x)
xtest2 = pad_sequences(seqns3, max_phrase_size)

#for label outputs
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(train_y)
encoded_Y = encoder.transform(train_y)
ytrain2 = to_categorical(encoded_Y)

encoded_Y2 = encoder.transform(val_y)
yval2 = to_categorical(encoded_Y2)

encoded_Y3 = encoder.transform(test_y)
ytest2 = to_categorical(encoded_Y3)

print('Found %s unique tokens.' % len(word_index))

Found 668 unique tokens.


Let’s look at a training example in phrase format, padded sequence format, and one-hot-encoded label format to see we have the desired data structure for neural networks.  

In [13]:
print(train_x[0])
print(train_x[31])

print(xtrain2[0])  # can easily set to be a variable and divide by /len(word_index) to max min scale
print(xtrain2[31]) # can easily set to be a variable and divide by /len(word_index) to max min scale

print(ytrain2[0])
print(ytrain2[31])

maître de la maison
dueño de la tienda
[ 0  0  0 98  1  4 46]
[  0   0   0  48   1   4 146]
[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]


Surprisingly, the efficient on space, integer encoding creates relationships of distance between numbers that should not exist, hence the practice of one-hot-encoding.  

This data format as is, DOES NOT perform as well as the first attempt…in the current architecture format that is. (Even after max-min scaling, which reduced the integers to values between 0 and 1)

An embedding layer and matrix is a more advanced technique to explore in another post.

To test your knowledge so far I encourage you to re-run the neural network architecture with these integer encoded input data. 


**But we are not done yet.  One more approach, to provide maximum value to the readers.**

I pulled 50 commonly used words associated with the subject of school, then web scraped the language transformations to create a training data set.  Voila! This is `swlx.txt` & `swly.txt`

The objective is to create a vocabulary of these 50 school related words in each of the 15 languages, and train a model to take in any school related word, phrase, sentence or more and predict the language.  The idea is, it is expected the network will perform well when using a sentence that includes one of the 50 words trained on, regardless of the other words in the sentence.  Also the model should learn some relationships for when a word is the same in multiple languages, but perhaps lose some accuracy.  And it will be interesting to see if the model learns to make decent predictions on non-school related inputs. 

Let’s get to it, before the editors cut me off (lol).

Run the same cleaning operation.

In [14]:
swx = open('swlx.txt', 'r', encoding = "utf8")
swx = swx.readlines()

rep_x = {'{"text":"': '', '"}': '', '\n': ''} 

sw_x=[]
for i in range(len(swx)):
    w = multipleReplace(swx[i], rep_x)
    sw_x.append(w)

print(sw_x[0])

swy = open('swly.txt', 'r', encoding = "utf8")
swy = swy.readlines()

rep_y = {'{"classification":"': '', '"}\n': '', '"}': ''}

sw_y=[]
for i in range(len(swy)):
    w = multipleReplace(swy[i], rep_y)
    sw_y.append(w)

print(sw_y[0])

examen
fr


`CountVectorizer` has a method `fit_transform` which will absorb the vocabulary of our school related tokens.  The method `toarray` one-hot-encodes, which is being used for the input data. 

The `label_encoder` method, used above, translates the language labels to numbers and we one-hot-encode the output data. You are getting the hang of it now.  To satisfy any curiosity we display integer encoded labels and see the range is 0-14 corresponding to the 15 languages.

In [15]:
import sklearn
from sklearn.feature_extraction.text import CountVectorizer

vect = CountVectorizer()
tags = vect.fit_transform(sw_x)

xtrains = tags.toarray()
xtrains.shape

from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical

encode = LabelEncoder()
encode.fit(sw_y)
encode_y = encode.transform(sw_y)
ytrains = to_categorical(encode_y)
ytrains.shape
encode_y

array([ 3,  7,  0, 14, 13,  8,  9,  6,  4,  5, 12, 10,  2,  1, 11,  3,  7,
        0, 14, 13,  8,  9,  6,  4,  5, 12, 10,  2,  1, 11,  3,  7,  0, 14,
       13,  8,  9,  6,  4,  5, 12, 10,  2,  1, 11,  3,  7,  0, 14, 13,  8,
        9,  6,  4,  5, 12, 10,  2,  1, 11,  3,  7,  0, 14, 13,  8,  9,  6,
        4,  5, 12, 10,  2,  1, 11,  3,  7,  0, 14, 13,  8,  9,  6,  4,  5,
       12, 10,  2,  1, 11,  3,  7,  0, 14, 13,  8,  9,  6,  4,  5, 12, 10,
        2,  1, 11,  3,  7,  0, 14, 13,  8,  9,  6,  4,  5, 12, 10,  2,  1,
       11,  3,  7,  0, 14, 13,  8,  9,  6,  4,  5, 12, 10,  2,  1, 11,  3,
        7,  0, 14, 13,  8,  9,  6,  4,  5, 12, 10,  2,  1, 11,  3,  7,  0,
       14, 13,  8,  9,  6,  4,  5, 12, 10,  2,  1, 11,  3,  7,  0, 14, 13,
        8,  9,  6,  4,  5, 12, 10,  2,  1, 11,  3,  7,  0, 14, 13,  8,  9,
        6,  4,  5, 12, 10,  2,  1, 11,  3,  7,  0, 14, 13,  8,  9,  6,  4,
        5, 12, 10,  2,  1, 11,  3,  7,  0, 14, 13,  8,  9,  6,  4,  5, 12,
       10,  2,  1, 11,  3

Training on the same neural net architecture as above… (no validation set)

In [16]:
modeln = Sequential()
modeln.add(Dense(10000,input_dim=xtrains.shape[1],kernel_initializer="uniform",activation="relu"))
modeln.add(Dropout(0.5))
modeln.add(Dense(5000,kernel_initializer="uniform",activation="relu"))
modeln.add(Dropout(0.5))
modeln.add(Dense(15,kernel_initializer="uniform",activation="softmax"))
model_optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.001 / 5)
modeln.compile(loss='categorical_crossentropy',
              optimizer=model_optimizer,
              metrics=['accuracy'])

historyn = modeln.fit(xtrains, ytrains,
          epochs=5,
          batch_size=32,
          verbose=2,
          shuffle=True)

Epoch 1/5
 - 2s - loss: 2.8778 - acc: 0.0523
Epoch 2/5
 - 1s - loss: 2.0551 - acc: 0.4915
Epoch 3/5
 - 1s - loss: 0.9212 - acc: 0.8902
Epoch 4/5
 - 1s - loss: 0.3421 - acc: 0.9124
Epoch 5/5
 - 1s - loss: 0.3367 - acc: 0.9020


Lets go straight to prediction, but this time, enter a word, phrase or sentence in one of the languages and lets evaluate.

In [17]:
trythis = ['quelle école allez-vous fréquenter']

#quelle école allez-vous fréquenter
#se poio scholeío tha parevretheíte
#eotteon haggyoe danil yejeong-ingayo

#Āmi ē'i sēmisṭārē bharti karaba ēbaṁ sam'māna saṅgē snātaka habē
#main is semestar mein naamaankan aur sammaan ke saath snaatak hone kee yojana bana raha hoon
#vou inscrever este semestre e planejo graduar com honras
#Ich werde dieses Semester einschreiben und planen, mit Ehren zu absolvieren

#permite studierea programelor și cumpărarea manualelor
#studia il programma e compra i libri di testo
#hinahayaan ang pag-aaral ng syllabus at bumili ng mga aklat-aralin

new_ = vect.transform(trythis)

use_ =new_.toarray()

y_n = modeln.predict(use_)

y_n = y_n.argmax(axis=1)
y_n

predictions_for_trythis = encode.inverse_transform(y_n)
predictions_for_trythis

array(['fr'],
      dtype='<U2')

How about “test your knowledge” (lol) by creating a test data set and setting up predict_classes.  

Some sample sentences have been commented out that you can copy-paste into trythis =[‘…’] to play with the tool. 

Wink Emoji

Spend your time getting smarter, follow me on Twitter @calcqu