# Train model and generate beer names
### by Thiago Akio Nakamura

This is the second part of the little project to train a neural network to create a beer name for my latest craft beer creation. The [first part can be found here]().

In this second part, we'll build that training dataset from the pre-processed data obtained on the first part. Thean create and train the neural network. And finally use the trained neural network to create a few beer names for us.

In [1]:
import pandas as pd
import numpy as np
import pickle
import h5py

from sklearn.model_selection import train_test_split

from keras.models import Model
from keras.layers import Input, Dense, Dropout
from keras.layers import LSTM, concatenate, Lambda
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.callbacks import TensorBoard, EarlyStopping, ModelCheckpoint, Callback
from keras.layers.wrappers import Bidirectional
from keras import backend as K

import time

Using TensorFlow backend.


First we define a few hyperparameters. We set a beer name size of 30, where we'll pad or trucante when necessary.

In [2]:
NAME_LENGTH = 30
LSTM_SIZE = 128
DENSE_SIZE = 128
STYLE_SIZE = 16
DROPOUT_RATE = 0.3

BASE_DIR = './'
DATA_DIR = BASE_DIR + '/data/'
DATA_FILE = DATA_DIR + 'beers_data.csv'

STAMP = 'beer_name'
sample = None
if sample is not None:
    STAMP = 'sampled_' + STAMP
    
MODEL_FILE = BASE_DIR + STAMP + '.{epoch:02d}.hdf5'
LOGS_DIR = BASE_DIR + 'logs/' + STAMP

VAL_SPLIT = 0.1
MAX_EPOCHS = 100

MODEL_FILE

'./beer_name.{epoch:02d}.hdf5'

In [3]:
%%time
df = pd.read_csv(DATA_FILE)
if sample is not None:
    df = df.sample(sample)
    
df = df[-pd.isnull(df['name'])]
df = df[-pd.isnull(df['style'])]
df.head()

CPU times: user 87.5 ms, sys: 14.5 ms, total: 102 ms
Wall time: 104 ms


We'll create a character-level tokenizer, and also add special characters for `<NULL>` and `<END>`, which will help delimit the beer name. Furthermore, we create the dictionary for characters and styles, mapping to their indices, which is what the neural network uses.

In [4]:
tokenizer = Tokenizer(filters='', lower=False, char_level=True)
for name in df['name'].tolist():
    tokenizer.fit_on_texts(name)
pickle.dump(tokenizer, open(BASE_DIR + 'tokenizer.pkl', 'wb'))

In [5]:
tokenizer = pickle.load(open(BASE_DIR + 'tokenizer.pkl', 'rb'))
char_2_idx = tokenizer.word_index
char_2_idx['<NULL>'] = 0
char_2_idx['<END>'] = len(char_2_idx)
idx_2_char = { v: k for k, v in char_2_idx.items() }

NUM_CHARS = len(idx_2_char)

In [6]:
style_2_idx = { s: i for i, s in enumerate(sorted(set(df['style'].tolist()))) }
idx_2_style = { v: k for k, v in style_2_idx.items() }
NUM_STYLES = len(idx_2_style)

In [7]:
print('Number of beer names:', df.shape[0])
print('Number of beer styles:', NUM_STYLES)
print('Number distict characters:', NUM_CHARS)

Number of beer names: 52505
Number of beer styles: 174
Number distict characters: 78


Next we create the real dataset to be used during the network training. The network will be trained "create" a beer name character by character. I.e., given a few chracters of a beer name, it should predict what is the next character. The network should also take into consideration the beer style.

To build the dataset, we go through all the names in our data, and for each character in each name, create a data point we the "previuous" sequence of characters and beer style as input and the "next" character as label.

In [8]:
%%time
from itertools import islice

def window(seq, n):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

def get_training_seq(names, styles, name_length=NAME_LENGTH):
    assert styles.shape[0] == names.shape[0]
    names = tokenizer.texts_to_sequences(names)
    names = [name + [char_2_idx['<END>']] for name in names]
    names = [pad_sequences([name], maxlen=name_length + len(name))[0] for name in names]
    
    x = []
    s = []
    for name, style in zip(names, styles):
        windows = list(window(name, name_length))
        window_style = [style_2_idx[style] for _ in windows]
        x.extend(windows)
        s.extend(window_style)
    x = np.array(x)
    s = np.array(s)
    y = np.zeros(x.shape[0])
    for i, _ in enumerate(x):
        if i < x.shape[0]-1:
            y[i] = (x[i + 1, -1])
    
    return x.astype('int32'), s.astype('int32'), y.astype('int32')

x, s, y = get_training_seq(df['name'], df['style'])

CPU times: user 13.8 s, sys: 758 ms, total: 14.5 s
Wall time: 15.2 s


In [9]:
%%time
x_t, x_v, s_t, s_v, y_t, y_v = train_test_split(x, s, y, test_size=VAL_SPLIT)

CPU times: user 337 ms, sys: 34.9 ms, total: 372 ms
Wall time: 469 ms


In [10]:
assert x_t.shape[0] == s_t.shape[0]
assert x_t.shape[0] == y_t.shape[0]
assert x_v.shape[0] == s_v.shape[0]
assert x_v.shape[0] == y_v.shape[0]
print('Number training samples:', x_t.shape[0])
print('Number validation samples:', x_v.shape[0])
print('Input sequence size:', x_t.shape[1])

Number training samples: 887322
Number validation samples: 98592
Input sequence size: 30


Next we create the neural network model.

In [11]:
one_hot_shape = (NAME_LENGTH, NUM_CHARS)

name_input = Input(shape=(NAME_LENGTH, ), dtype='int32', name='name_input')
input_one_hot = Lambda(K.one_hot,
                       arguments={'num_classes': NUM_CHARS},
                       output_shape=one_hot_shape)(name_input)

name_lstm = LSTM(LSTM_SIZE, dropout=DROPOUT_RATE, 
                 recurrent_dropout=DROPOUT_RATE,
                 name='name_lstm')(input_one_hot)

style_one_hot_shape = (NUM_STYLES,)
style_input = Input(shape=(1, ), dtype='int32', name='style_input')
style_one_hot = Lambda(K.one_hot,
                       arguments={'num_classes': NUM_STYLES},
                       output_shape=style_one_hot_shape)(style_input)
style_dense = Dense(STYLE_SIZE, activation='relu', name='style_dense')(style_one_hot)

merged = concatenate([style_dense, name_lstm])
merged = Dropout(DROPOUT_RATE)(merged)
prediction = Dense(NUM_CHARS, activation='softmax', name='prediction')(merged)

model = Model(inputs=[name_input, style_input], outputs=prediction)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
style_input (InputLayer)         (None, 1)             0                                            
____________________________________________________________________________________________________
name_input (InputLayer)          (None, 30)            0                                            
____________________________________________________________________________________________________
lambda_2 (Lambda)                (None, 174)           0           style_input[0][0]                
____________________________________________________________________________________________________
lambda_1 (Lambda)                (None, 30, 78)        0           name_input[0][0]                 
___________________________________________________________________________________________

In [12]:
# This defines a callback to run a beer name create samples on every epoch.
class Sample(Callback): 
    def __init__(self, styles, diversity, max_sampled_length=NAME_LENGTH):
        super(Sample, self).__init__()
        self.styles = styles
        self.diversity = diversity
        self.max_sampled_length = max_sampled_length
        
    def sample(self, preds, temperature=1.0):
        # helper function to sample an index from a probability array
        preds = np.asarray(preds).astype('float64')
        preds = np.log(preds) / temperature
        exp_preds = np.exp(preds)
        preds = exp_preds / np.sum(exp_preds)
        probas = np.random.multinomial(1, preds, 1)
        return np.argmax(probas)
            
    def on_epoch_end(self, epoch, logs={}):
        print('\n----- %d -----' % epoch)
        for test_style_idx in self.styles:
            print('----- %s -----' % idx_2_style[test_style_idx])
            for diversity in self.diversity:
                name_seed = np.zeros((1, NAME_LENGTH))
                generated = ''
                next_index = 0
                count = 0
                while (next_index != char_2_idx['<END>'] and count < self.max_sampled_length):
                    preds = self.model.predict([name_seed, np.array([test_style_idx])], verbose=0)[0]
                    next_index = self.sample(preds, diversity)
                    next_char = idx_2_char[next_index]

                    if next_index != char_2_idx['<NULL>']:
                        generated += next_char
                    name_seed = np.delete(np.concatenate((name_seed, [[next_index]]), axis=1), 0, 1)
                    count = count + 1

                print('(%f) %s' % (diversity, generated))
        print('-------------')

In [13]:
# Callback to see traning in TensorBoard
tensor_board = TensorBoard(log_dir=LOGS_DIR)
# Stop if stop improving.
early_stopping = EarlyStopping(monitor='val_loss', patience=3)
# Save all epochs so we can evaluate afterwards.
model_checkpoint = ModelCheckpoint(MODEL_FILE, monitor='val_loss')
# Sample at every epoch from styles 1 and 5 with 3 levels of creativity.
sample = Sample([1, 5], [0.5, 1.0, 1.2])

Finally, trains the model.

In [14]:
model.fit([x_t, s_t], y_t, validation_data=([x_v, s_v], y_v), batch_size=256,
          epochs=MAX_EPOCHS, callbacks=[tensor_board, early_stopping, model_checkpoint, sample])

Next, we sample a few beer names for a few styles on different epochs of traning.

In [15]:
from keras.models import load_model
import tensorflow as tf

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

# Test samples from epoch 0, 20, 40 and 60.
epochs_to_sample = ['00', '20', '40', '60']
model_names = [ STAMP + '.' + epoch + '.hdf5' for epoch in epochs_to_sample ]
# Get samples for styles: Saison, # Belgian Dubel, Hefeweizen, Porter
test_styles = [154, 30, 101, 144, 13]
# Get names for different levels of creativity.
test_creativity = [0.2, 0.5, 1.0, 1.2, 1.5]
# Max name length.
max_sampled_length = NAME_LENGTH
# Number of samples per situation.
sample_each = 3

for model_name, epoch in zip(model_names, epochs_to_sample):
    print('\n----- Epoch %s -----' % epoch)
    model.load_weights(model_name)
    for test_style_idx in test_styles:
        print('  ----- Style %s -----' % idx_2_style[test_style_idx])
        for creativity in test_creativity:
            print('    ----- Creativity %f -----' % creativity)
            for j in range(sample_each):
                name_seed = np.zeros((1, NAME_LENGTH))
                generated = ''
                next_index = 0
                count = 0
                while (next_index != char_2_idx['<END>'] and count < max_sampled_length):
                    preds = model.predict([name_seed, np.array([test_style_idx])], verbose=0)[0]
                    next_index = sample(preds, creativity)
                    next_char = idx_2_char[next_index]

                    if next_index != char_2_idx['<NULL>']:
                        generated += next_char
                    name_seed = np.delete(np.concatenate((name_seed, [[next_index]]), axis=1), 0, 1)
                    count = count + 1

                print('    %s' % generated)
        print('  -------------\n')
    print('-------------\n')


----- Epoch 00 -----
  ----- Style Saison / Farmhouse Ale -----
    ----- Creativity 0.200000 -----
    Saris Stout<END>
    Barrin Stout<END>
    Sarian Stout<END>
    ----- Creativity 0.500000 -----
    Boun Stout<END>
    Sous Gonder Stout<END>
    Suaniss<END>
    ----- Creativity 1.000000 -----
    Ostiico Balia Criphan Norr Sin
    Bhaachuste Perion<END>
    Alain Imperrial<END>
    ----- Creativity 1.200000 -----
    Kparuacobaccer<END>
    Unmingon<END>
    Speat Juelilunoke Cadortat Bri
    ----- Creativity 1.500000 -----
    7.15 IPi<END>
    Hernh Nilfy A<END>
    Tteppesporrcn Srack<END>
  -------------

  ----- Style Belgian Belgian Dubbel -----
    ----- Creativity 0.200000 -----
    Berre Darder Blend<END>
    Barrer Barrel Ale<END>
    Stout Breck Ale<END>
    ----- Creativity 0.500000 -----
    Souble Berres Ale<END>
    Barill Mangin<END>
    Barrion Berner<END>
    ----- Creativity 1.000000 -----
    Cazer Paperiy Grazc<END>
    My Musnenbans<END>
    Dopen Farke Pe

    Flun Special Wood<END>
    Paroke Fudzofa<END>
    Granty Vaje<END>
    ----- Creativity 1.200000 -----
    Porter Porter<END>
    Stout Alebarley Best<END>
    Plack Banjum Ecsleite<END>
    ----- Creativity 1.500000 -----
    Natoms & Whost Shitty Wonds #7
    Ab Pub Patiummer' Et Temo<END>
    Owy Stous<END>
  -------------

  ----- Style American IPA -----
    ----- Creativity 0.200000 -----
    Pale Ale<END>
    Strang Stout<END>
    Stout Ale<END>
    ----- Creativity 0.500000 -----
    Barrel Aged Coffee IPA<END>
    Perition India Pale Ale<END>
    Imperial Pale Ale<END>
    ----- Creativity 1.000000 -----
    Shishrong Fna<END>
    Dog Struffist Ale<END>
    Pontition Lager<END>
    ----- Creativity 1.200000 -----
    In Tra kjustei<END>
    Paleand Spphopmay<END>
    Allielua Wheat Lager<END>
    ----- Creativity 1.500000 -----
    Imperial Mout Pale Ale<END>
    Wh tzepn'a FramtypElanas<END>
    Un Wui I1 Siwtt Yeysos<END>
  -------------

-------------


----- Epoch 60 

Finally, we create a few names for the beer of interest.

In [17]:
americanipa = 13 # The style we want
creativity = 1 # Creativity we want
for j in range(50):
    name_seed = np.zeros((1, NAME_LENGTH))
    generated = ''
    next_index = 0
    count = 0
    while (next_index != char_2_idx['<END>'] and count < max_sampled_length):
        preds = model.predict([name_seed, np.array([americanipa])], verbose=0)[0]
        next_index = sample(preds, creativity)
        next_char = idx_2_char[next_index]

        if next_index != char_2_idx['<NULL>']:
            generated += next_char
        name_seed = np.delete(np.concatenate((name_seed, [[next_index]]), axis=1), 0, 1)
        count = count + 1

    print('%s' % generated)

Japivalr Pale Ale<END>
Gone Epa Peanco<END>
Decteni Ina<END>
Celthuart Special Rour 2014<END>
Detter Pale Ale<END>
G.R.P.A<END>
Scat IPA<END>
Whar One In Ficon<END>
Bourbon Barrel Aged Nustlurder
Soctomawartle<END>
Roos<END>
NEGNMGTIIN IIS Stout<END>
Iulundy Zows Mrjc<END>
Skmale Zei Porter<END>
Dix Red Saison<END>
Farmhouse Red Ale<END>
Anchites Norter Steut<END>
Cassion Golden Ale<END>
Freee Grested Black IPA<END>
Essnule<END>
Gurank Mtone Red<END>
Barrel Aged Buster Neseppy<END>
Heaver<END>
The Old Casch IPA<END>
Tillate India Pale Ale<END>
Aich Catan IPA<END>
Seasy<END>
India Pale Ale<END>
Strange Dunkel<END>
Freckel's Rye<END>
TOI<END>
Tuvelies Double IPA<END>
Rattern<END>
Imperial Chocolate Ale<END>
Upleess IPA<END>
IPA Stout<END>
Chipsgen - Cuvee Imperial<END>
Tutero IPA<END>
Nille Ale<END>
Red Jack IPA<END>
Simun Monush<END>
Tirter IPA<END>
S-ple Of A<END>
Aniginal Ipa<END>
Poxeconta Double IPA<END>
II Beike<END>
Verry IPA<END>
American Cider<END>
Puluggnving<END>
Hornady Jaccu