# Speeches by UK Members of Pariament

Other notebooks in this series:
* <a href='https://www.kaggle.com/code/andrewsale/speech-scraping'>Scraping notebook</a>, used to create <a href='https://www.kaggle.com/datasets/andrewsale/uk-political-speeches'>the dataset</a>.
* <a href='https://www.kaggle.com/code/andrewsale/speeches-data-wrangling'>Wrangling notebook</a>, tidying up the dataset.
* <a href='https://www.kaggle.com/code/andrewsale/speeches-sampling'>Sampling notebook</a>, creating the short speech samples used for training the model in this notebook.

## The model

The aim is to build an app in which you can enter the text of a real speech given by a politician, or make up your own. The app will then predict if the speech was given by someone from the left-leaning Labour Party, or the right-leaning Conservative Party.

The model behind the app is constructed using Tensorflow. However, since it will be exported to TensorflowJS, we are limited as to which layers we can include.

The app is available <a href="https://andrewsale.github.io/MP-app.github.io/">here</a>, and there are two Github repositories: <a href="https://github.com/andrewsale/MP-app.github.io">one for the app</a>, and <a href="https://github.com/andrewsale/MPWhoSaidThat">another for the model building</a>.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.regularizers import l2
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, log_loss
from time import time
import re
import json

# Preprocessing: tokenizing the text

The keras tokenizer will convert to lower case and remove punctuation before splitting on white space and converting words to integers. We fit the tokenizer to the full training set speech text (not the samples), limiting the vocabulary to 30,000 words.

We then tokenize the speech samples, loaded from the <a href='https://www.kaggle.com/code/andrewsale/speeches-sampling'>sampling notebook</a>. These samples are about 100 words each, and we use the `pad_sequences` functions from keras to pad or truncate the sequences to this length.

### A note on other tokenizing techniques

There are other effective tokenizing techniques, such as <a href="https://www.tensorflow.org/text/guide/subwords_tokenizer">subword tokenizing</a>, however we needed a simple tokenizing method that could be reproduced in javasript.

In [None]:
# Load the trianing data to initialize the tokenizer
full_train_df = pd.read_csv('../input/speeches-sampling/full_train.csv')
# Load the sampled speeches for conversion
train_df = pd.read_csv('../input/speeches-sampling/train.csv')
val_df = pd.read_csv('../input/speeches-sampling/val.csv')
test_df = pd.read_csv('../input/speeches-sampling/test.csv')

In [None]:
vocab_size=30000
# Instantiate the tokenizer using the training data
tokenizer = Tokenizer(num_words=vocab_size)
tokenizer.fit_on_texts(full_train_df['Speech'])

In [None]:
# Convert speeches to sequences of tokens and put into tf dataset
BATCH_SIZE = 128
def tokenize_to_ds(df, batch_size=BATCH_SIZE):
    tokenized = tokenizer.texts_to_sequences(df.Speech)
    tokenized = pad_sequences(tokenized,maxlen=100)
    labels = df.Label.tolist()
    return tf.data.Dataset.from_tensor_slices((tokenized, labels)).cache().batch(batch_size).prefetch(tf.data.AUTOTUNE)

train_ds = tokenize_to_ds(train_df)
val_ds = tokenize_to_ds(val_df)
test_ds = tokenize_to_ds(test_df)

In [None]:
# Create the vocabulary dictionary.
# Note that the word_index attribute includes ALL words the tokenizer saw
# when fitting, even though vocab size was limited.
vocab = {}
for word, index in tokenizer.word_index.items():
    if index <= vocab_size:
        vocab[word] = index
        
# Write the vocab dictionary to a json file for loading in javascript
with open('tokenizer_dictionary.json', 'w') as file:    
    json.dump(vocab, file)

# Global build and test functions

These functions are used to perform a grid search and obtain some useful performance statistics, including the performance on entire speeches in the validation set, by predicting on 100 word samples from it and aggregating the results. They also visualize the results.

In [None]:
def fit_model(model_build_fn,
              X_train,
              X_val,
              y_train = None,
              y_val = None,
              n_repeats=1, 
              n_epochs=10, 
              early_stopping_patience = None,
              verbose=0,
              model_save_suffix='',
              **model_kwargs
             ):
    '''
    Takes a compiled model and data and fits it multiple times (with the same parameters).
    
    Args:
        model_build_fn:  function that builds and compiles TF model
        X_train:  input training data
        y_train:  training labels
        X_val:  input validation data
        y_val:  validation labels
        n_repeats=1: number of times to fit the model
        n_epochs=10, 
        early_stopping_patience = None
        **model_kwargs: specify any parameters to customize for the creation of the model 
        
    Returns:
        list of history classes
        
    File Outputs:
        model  --- saving the model from each run
    '''
    # Prepare dict to log histories of the runs
    history_log = model_kwargs.copy()
    history_log['Parameters'] = str(model_kwargs)
    history_log['Parameter keys'] = model_kwargs.keys()
    history_log['Number of runs'] = n_repeats
    run_times = []
    
    for i in range(n_repeats):        
        # Build model
        model = model_build_fn(**model_kwargs)
        
        # Establish callbacks
        if not early_stopping_patience == None:
            early_stopping_monitor = EarlyStopping(patience=early_stopping_patience)
#             callbacks = [early_stopping_monitor, TensorBoard(run_logdir)]
            callbacks = [early_stopping_monitor]
        else:
#             callbacks = [TensorBoard(run_logdir)]
            callbacks = []
        model_filename = f'model{model_save_suffix}_run_{i}'
        model_save = ModelCheckpoint(model_filename, 
                                     save_best_only=True, 
                                     save_format='tf'
                                    )
        callbacks.append(model_save)
    
        # Fit model  
        t0=time()
        history = model.fit(X_train, 
                            y_train,
                            epochs=n_epochs, 
                            validation_data=X_val, #(X_val, y_val), 
                            callbacks=callbacks, 
                            verbose = verbose)
        # Append the time for this run to the list
        run_times.append((time()-t0)/60)
        
        # Get vocab size
#         history_log['Vocab size'] = model.get_layer(name='Vectorizer').vocabulary_size()
        
        # Make preditictions on training and validation sets
        model = tf.keras.models.load_model(model_filename)  
#                                            custom_objects={"AddPositionalEncoding":AddPositionalEncoding,
#                                                            "positional_encoding":positional_encoding
#                                                           }
#                                          )
        history_log[f'y_train_pred_run_{i}'] = model.predict(X_train)
        history_log[f'y_val_pred_run_{i}'] = model.predict(X_val)
        
        # Get loss and accuracy on val set when using the full speeches
        full_val_loss, full_val_accuracy = val_performance(model, speech_segments_ds, speech_segments_df)
        history_log[f'y_full_val_loss_run_{i}'] = full_val_loss
        history_log[f'y_full_val_accuracy_run_{i}'] = full_val_accuracy
        
        # the following will extract the actual labels from a tf dataset when this is used
        if y_train == None:
            y_train1 = np.concatenate(list(X_train.map(lambda x,y:y).as_numpy_iterator()), axis=0)
        if y_val == None:
            y_val1 = np.concatenate(list(X_val.map(lambda x,y:y).as_numpy_iterator()), axis=0)
        history_log[f'y_train_actual'] = y_train1
        history_log[f'y_val_actual'] = y_val1
        # On the first run add the number of trainable parameters to the log dict
        if i==0:
            set_of_trainable_weights = model.trainable_weights
            trainable_count = int(sum([tf.keras.backend.count_params(p) for p in set_of_trainable_weights])) # counts trainable variables
            history_log['trainable parameters'] = trainable_count
            
        # Add the history of the current run to the log dict
        keys = history.history.keys()
        for key in keys:
            history_log[f'run_{i}_{key}'] = history.history[key]
        
        # Get val_loss trend over last 6 epochs
        actual_num_epochs = len(history.history['val_loss'])
        history_log['Number of epochs'] = actual_num_epochs
        first_epoch_for_slope = max(actual_num_epochs-6, 0)
        slopes = []
        for epoch in range(first_epoch_for_slope,actual_num_epochs):
            dy = history.history['val_loss'][actual_num_epochs-1] - history.history['val_loss'][first_epoch_for_slope]
            dx = actual_num_epochs - epoch
            slopes.append(dy/dx)
        history_log[f'Val_loss trend at end run {i}'] = np.mean(slopes)
    
    # Get mean end trend
    history_log[f'Val_loss mean trend at end'] = np.mean([history_log[f'Val_loss trend at end run {i}'] for i in range(n_repeats)])
    
    # Add the run time data to the log dict
    history_log['Mean run time (mins)'] = sum(run_times)/len(run_times)
    
    # Collect stats of the runs in the log dict
    for key in keys:        
        final_key_values = [history_log[f'run_{i}_{key}'][-1] for i in range(n_repeats)]
        best_key_values = [max(history_log[f'run_{i}_{key}']) for i in range(n_repeats)]

        history_log[f'mean_{key}'] = sum(final_key_values) / n_repeats
        if 'loss' in key:
            history_log[f'best_final_{key}'] = min(final_key_values)
            history_log[f'best_anytime_{key}'] = min(best_key_values)            
        else:
            history_log[f'best_final_{key}'] = max(final_key_values)
            history_log[f'best_anytime_{key}'] = max(best_key_values)
        history_log[f'std_final_{key}'] = (sum(
            [(history_log[f'mean_{key}'] - x)**2 for x in final_key_values]) / n_repeats) ** 0.5  
    full_val_losses = [history_log[f'y_full_val_loss_run_{i}'] for i in range(n_repeats)]
    full_val_accuracies = [history_log[f'y_full_val_accuracy_run_{i}'] for i in range(n_repeats)]
    history_log['best_anytime_full_val_accuracy'] = max(full_val_accuracies)
    history_log['best_anytime_full_val_loss'] = min(full_val_losses)
    history_log['mean_full_val_loss'] = np.mean(full_val_losses)
    history_log['mean_full_val_accuracy'] = np.mean(full_val_accuracies)
        
    return history_log

In [None]:
def grid_search(model_build_fn,
                X_train,
                X_val,
                y_train=None,
                y_val=None,
                n_repeats=1, 
                n_epochs=10, 
                early_stopping_patience = None,
                verbose=0,
                **model_kwargs
               ):
    '''
    Runs fit_model for each parameter combination for those provided in model_kwargs.
    Provide each model_kwarg as a list of suitable objects.
    
    Args:
        X_train: training data
        X_val: validation data
        y_train: if X_train is not a TF dataset, then the training labels should be here
        y_val: if X_val is not a TF dataset, then the training labels should be here
        n_repeats: number of times to fit each parameter combination
        n_epochs: number of epochs for each fit
        early_stopping_patience: int, number of epochs after which fitting stops if no
                improvement in val_loss observed 
        verbose: int, controls how much is printed
        model-kwargs: provide keyword model args each as LIST.
        
    Returns:
        List of dictionaries with model performance and history data
    '''
    param_combinations = []
    total_combs = 1
    number = {}
    for param in model_kwargs.keys():
        number[param] = len(model_kwargs[param])
        total_combs *= number[param]
    for i in range(total_combs):
        this_comb = {}
        cum_num=1
        for param in number.keys():
            n = int(i/cum_num)
            this_comb[param] = model_kwargs[param][n % number[param]]
            cum_num *= number[param]
        param_combinations.append(this_comb)
    history_logs = []
    for i, kwargs in enumerate(param_combinations):
        history_log = fit_model(model_build_fn,
                                X_train,
#                                 y_train,
                                X_val,
#                                 y_val,
                                n_repeats=n_repeats, 
                                n_epochs=n_epochs, 
                                early_stopping_patience = early_stopping_patience,
                                verbose=verbose,
                                model_save_suffix=f'_params_{i}',
                                **kwargs
                               )
        history_logs.append(history_log)
    return history_logs

In [None]:
def confusion_matrices(history_log, title=''):
    '''
    Plots confusion matrices based on the given history dictionary 
    from the fit_model function
    '''
    def prob_to_pred(x):
        if x < 0.5:
            return 0
        else:
            return 1
        
    def confmat_to_plot(confmat, axis, subtitle=''):
        sns.heatmap(confmat, annot=True, fmt='d', cbar=False, linewidths=.5, ax=axis)
        axis.set_xticklabels(['Predicted Tory', 'Predicted Labour'])
        axis.set_yticklabels(['Actual Tory', 'Actual Labour'])
        axis.set_title(subtitle)
        
    # Extract predictions from log into df
    train_preds = { f'y_train_pred_run_{i}' : [prob_to_pred(x) for x in history_log[f'y_train_pred_run_{i}']]  for i in range(history_log['Number of runs']) }
    val_preds = { f'y_val_pred_run_{i}' : [prob_to_pred(x) for x in history_log[f'y_val_pred_run_{i}']]  for i in range(history_log['Number of runs']) }
    
    # Print classification reports
    for i,run in enumerate(train_preds.keys()):
        print('-----------------------')
        print(f'Training set, run {i}:')
        print('-----------------------')  
        print(classification_report(history_log[f'y_train_actual'], train_preds[run]))  # print classification report
    
    for i,run in enumerate(val_preds.keys()): 
        print('-------------------------')
        print(f'Validation set, run {i}:')
        print('-------------------------')  
        print(classification_report(history_log[f'y_val_actual'], val_preds[run]))  # print classification report
        
    # Make plots for train + val set runs
    fig, ax = plt.subplots(nrows=2,ncols=history_log['Number of runs'], figsize = (4*history_log['Number of runs'], 8) )
    if history_log['Number of runs'] == 1:
        ax = ax.reshape((2,1))
    # Make plots for train runs
    for i,run in enumerate(train_preds.keys()):
        confmat_to_plot(confusion_matrix(history_log[f'y_train_actual'], train_preds[run]), axis=ax[0,i], subtitle=f'Training set, run {i}')
            
    # Make plots for val set runs
    for i,run in enumerate(val_preds.keys()):        
        confmat_to_plot(confusion_matrix(history_log[f'y_val_actual'], val_preds[run]), axis=ax[1,i], subtitle=f'Validation set, run {i}')
    plt.suptitle(title)
    plt.show()

In [None]:
def visualize_log(history_log_list):
    '''
    Visualizes the history of the grid search.
    
    Args:
        history_log_list : list of data from grid search
        input_format: use 'tf' for tf.data.Dataset input, 'pd' for pandas dataframe.
    '''
    # Establish the df and display parameters for reader
    print('-----------------------\nParameter lookup table:\n-----------------------')
    log_df = pd.DataFrame(history_log_list)
    log_df.index.name = 'Parameter Index'
    params_cols = set()
    for i in range(len(log_df)):
        params_cols = params_cols.union(set(log_df.loc[i,'Parameter keys']))
    other_cols = ['trainable parameters','mean_val_accuracy','best_anytime_val_accuracy', 'Mean run time (mins)',
                 'best_anytime_full_val_loss','best_anytime_full_val_accuracy',
                 'mean_full_val_loss','mean_full_val_accuracy', 'Val_loss mean trend at end',
                 'Number of runs', 'Number of epochs']
    display(log_df[list(params_cols)+other_cols])
    log_df  = log_df.reset_index()
    log_df.to_csv('log_df.csv')
    
    # Bar charts with best accuracies    
    fig = px.bar(data_frame=log_df, x=range(len(log_df)), y='mean_val_accuracy', hover_name='Parameter Index', title='Comparing mean validation accuracies')
    fig.show()
    fig = px.bar(data_frame=log_df, x=range(len(log_df)), y='best_anytime_val_accuracy', hover_name='Parameter Index', title='Comparing optimum validation accuracies')
    fig.show()
    
    # Loss plots
    def metric_df(metric='loss', run=0):
        padded_losses_dict = {}
        length = max([len(log_df.loc[i,f'run_{run}_{metric}']) for i in range(len(log_df))])
        for i in range(len(log_df)):
            this_loss = log_df.loc[i,f'run_{run}_{metric}']
            while len(this_loss) < length:
                this_loss.append(np.NaN)
            padded_losses_dict[log_df.loc[i,'Parameter Index']] = this_loss
        return pd.DataFrame( data=padded_losses_dict )
    
    for metric in ['loss', 'val_loss', 'accuracy', 'val_accuracy']:
        for run in range(log_df['Number of runs'].min()):
            fig = px.line(data_frame=metric_df(metric=metric, run=run),
                    title=f'Curves for {metric} of run {run} for each parameter set'
                   )
            fig.update_layout(legend_title_text='Parameter index')
            fig.show()
    
    # Show confusion matrices
    for history_log in history_log_list:
        confusion_matrices(history_log, 
                           title=f'Confusion matrix for {history_log["Parameters"]}'
                          )        

In [None]:
# This function is for preparing the data for testing the model performance 
# on the full speeches from the validation set. This will mimic the planned
# actual usage.

def full_val_samples(overlap = 50):
    '''
    Loads data into tf datasets:
    * train, val and test sets from the sampled speeches.
    * for each full speech in the val set, also get a dataset of samples from it.
    Speech segments are tokenized.
    
    Args:
        batch_size: int, batch size for training
        max_length: int, max length of a tokenized sequence
        overlap: int, amount of overlap between samples for the full val set
        
    Returns:
        4x datasets: train, val and test sets, and segmented full val speeches
        1x pandas dataframe of segmented full val speeches, with index        
    '''
    full_val_df = pd.read_csv('../input/speeches-sampling/full_val.csv', index_col=0)
    full_X_val = full_val_df['Speech']
    full_y_val = full_val_df['Label']
    speech_segments = []
    labels = []
    speech_indices = []
    for index, speech_label in enumerate(zip(full_X_val, full_y_val)):
        speech = speech_label[0]
        label = speech_label[1]
        split_speech = speech.split(' ')
        length = len(split_speech)
        if length < 100:
            segments = [speech]
        else:
            segments = []
            i=0
            while i <= length - 100:
                segments.append(split_speech[i:i+100])
                i+=(100-overlap)
        speech_segments += segments
        labels += [label]*len(segments)
        speech_indices += [index]*len(segments)
    speech_segments_df = pd.DataFrame({'Speech index':speech_indices,
                              'Speech': speech_segments,
                              "Label":labels
                             })
    speech_segments_ds = tokenize_to_ds(speech_segments_df, batch_size=512)
    
    return speech_segments_ds, speech_segments_df

In [None]:
def val_performance(model, speech_segments_ds, speech_segments_df):
    '''
    Measures the perfomance of the given model on the full length speeches from
    the validation set.
    
    Args:
        model: the model object
        speech_segments_ds: tf dataset with the segmented full speeches (as output by
                the function full_val_samples)
        speech_segments_df: pandas dataframe with the same segmented full speeches 
                (as output by the function full_val_samples)
    
    Returns:
        float: loss
        float: accuracy
    '''
    # get predicted proba for each segment
    segment_probs = model.predict(speech_segments_ds)
    # add to dataframe
    speech_segments_df['Probability'] = segment_probs
    # get mean prediction proba for each speech
    pred_df = speech_segments_df.groupby('Speech index')[['Label','Probability']].mean()
    # get predicted label
    pred_df['Prediction'] = pred_df['Probability'].apply(lambda x: 0 if x<0.5 else 1)
    # get loss and accuracy
    loss = log_loss(pred_df['Label'], pred_df['Probability'])
    accuracy = accuracy_score(pred_df['Label'], pred_df['Prediction'])
    return loss, accuracy

# Model architecture<a id='architecture'></a>

The model is built in the function `build_model` below. The model can be customized with various arguments, but the general structure is as follows:

* **Input**: The model expects the input to be sequences of 100 integers. Speeches/speech segments should be tokenized first before being plugged into the model. It is possible to include this sete inside the model with a TextVectorization layer, however these layers cannot be exported to TensorFlow.js (as we intend to do).
* **Embedding layer**: the input sequences are embedded into $n$-dimensional space. You may choose $n$, and you may also try using embeddings pretrained using GloVe (for `embed_dim` enter either "glove 50" or "glove 50 trainable" for the 50 dimensional embeddings, making this layer either trainable or not. GloVe embeddings may be 50, 100, 200 or 300 dimensional.)
* **Convolutional layers**: the embedded strings are passed through a sequence of three 1-dimensional convolutional layers. There is a skip connection (so we can propagate through three Conv1D layers or just one layer). The number of filters and filter size can be chosen with keywords `Conv_filters` and `Conv_filter_size`. Batch normalization is applied between each convultion layer and applying the activation function.
* **LSTM layers**: the user can specify the number of LSTM layers with the keyword `LSTM_layers`, whether they are bidirectional, with `BRNN` (a boolean), the numebr of units in each layer, `LSTM_units`, and what regularization to use: `LSTM_kernel_reg` requires an initialized keras regularizer (e.g. l1 or l2 - we used l2), `LSTM_dropout` expects a float between 0 and 1.
* **Dense layers to output**: before being passed through a single unit dense layer for the final output, we pass the data through some Dense layers of diminishing dimension. The initial dimension is given by `Dense_units`, the dimension is halved in each successive layer, with a total number of layers (excluding the final output layer) given by `Dense_layers`. Make `Dense_units` a power of 2, large enough so that each layer has an integer dimension! Regularization is applied through dropout layers.
* **Dropout layers**: dropout layers are scattered throughout the model, with a dropout rate controlled by `Dropout_rate`.

The model is compiled using an Adam optimizer, with a learning rate given by `learning_rate`.

If using the GloVe embeddings, then these need to be initialized too. Use the `get_glove_embedding_layer` to do this, which returns an embedding layer with pretrained weights.

Finally, there is an option to load an existing model for further training (useful when training takes more than 12 hours permitted on Kaggle servers). Set `saved_model_path` to the model path and all other parameters are ignored.

In [None]:
def get_glove_embedding_layer(vocab, embed_dim=300, trainable=False):
    '''
    Creates an embvedding layer using GloVe emebedding vectors.
    
    Args:
        vocab: dictionary of word to index for the tokenizer
        embed_dim: int, dimension of the embedding
        trainable: bool, whether the layer is trainable or not
        
    Returns:
        Embedding layer
    '''
    # First get the vectors            
    glove_dict = {}
    if embed_dim not in [50,100,200,300]:
        raise ValueError('Glove embedding dimensions are 50, 100, 200 or 300 only.')
    filename = f'../input/glove6b/glove.6B.{embed_dim}d.txt'
    with open(filename) as file:
        for line in file:
            values = line.split()
            word = values[0]
            vector = values[1:]
            glove_dict[word] = np.asarray(vector, dtype='float32')
    # Get vocab dict, word to index, from tokenizer
    vocab_dict = vocab
    # Next create the matrix
    embedding_matrix = np.zeros((len(vocab_dict)+1, embed_dim))
    for word, index in vocab_dict.items():
        embedding_vector = glove_dict.get(word)
        if embedding_vector is not None:
            embedding_matrix[index] = embedding_vector
    # Now create and return the layer
    return layers.Embedding(input_dim = len(vocab_dict)+1,
                     output_dim = embed_dim,
                     trainable = trainable,
                     weights = [embedding_matrix]
                    )

In [None]:
def build_model(embed_dim = 32,
               Conv_filters=None,
               Conv_filter_size=3,
               LSTM_layers=2, 
               LSTM_units=64, 
               LSTM_kernel_reg=None,
               LSTM_dropout=0,
               BRNN=False,
               Dense_layers=1, 
               Dense_units=128, 
               Dropout_rate=0,
               learning_rate=0.001,
               saved_model_path=None                
      ):
    '''
    Builds and compiles neural network.
    
    Args:
        embed_dim: int or string. If int then created embedding layer with this
                dimension. If string should either be 'glove 50' or 'glove 50 
                trainable', where 50 can be replaced by 100, 200 or 300 and
                refers to the dimension of the embedding using the GloVe
                word embeddings.
        Conv_filters: int, number of filters in each convolutional layer
        Conv_filter_size: int, sise of each filter
        LSTM_layers: int, number of LSTM layers 
        LSTM_units: int, number of the units per LSTM layer
        LSTM_kernel_reg: initialized keras regularizer, or None
        LSTM_dropout: float, between 0 and 1. Dropout rate used in LSTM layers
        BRNN: bool, whether to use bidirectional LSTM layers or not
        Dense_layers: int, number of dense layers
        Dense_units: int, dimension of first dense layer. Following dense layer
                dimensions are halved each time
        Dropout_rate: float, between 0 and 1. Dropout rate for all non-LSTM
                dropout layers.
        learning_rate: float, determines the learning rate
        saved_model_path: None or string with model path. If None, then no model
                is loaded. If model path provided then all other params are
                ignored.        
        
    Returns:
        compiled model
    '''
    if load_saved_model != None:
        model = tf.keras.models.load_model(saved_model_path)
        return model
    
    string_vec = tf.keras.Input(shape=(100,), dtype=tf.int32, name='input_layer')
    
    
    
    # create embedding layer
    if isinstance(embed_dim, int):
        embed_layer = layers.Embedding(vocab_size+1, embed_layer)
    else:
        if embed_dim in [f'glove {x}' for x in [50,100,200,300]]:
            glove_dim = int(re.findall(r'\d+', embed_dim)[0])
            embed_layer = get_glove_embedding_layer(vocab, embed_dim=glove_dim, trainable=False)
        elif embed_dim in [f'glove {x} trainable' for x in [50,100,200,300]]:
            glove_dim = int(re.findall(r'\d+', embed_dim)[0])
            embed_layer = get_glove_embedding_layer(vocab, embed_dim=glove_dim, trainable=True)
        else:
            raise ValueError('Embedding layer format not recognized. \
                             Should be either integer for trainable embedding of that dimension, \
                             or either "glove x" or "glove x trainable", for x = 50, 100, 200, 300.')
        
        embed_dim = glove_dim
            
    x = embed_layer(string_vec)
            
    x = layers.Dropout(Dropout_rate)(x)
    
    if Conv_filters != None:
        x1 = layers.Conv1D(filters=Conv_filters, kernel_size=Conv_filter_size, padding = 'same', activation=None)(x)
        x1 = layers.BatchNormalization()(x1)
        x1 = layers.Activation('relu')(x1)
        x1 = layers.Conv1D(filters=Conv_filters, kernel_size=Conv_filter_size, padding = 'same', activation=None)(x1)
        x1 = layers.BatchNormalization()(x1)
        x1 = layers.Activation('relu')(x1)
        x1 = layers.Conv1D(filters=Conv_filters, kernel_size=Conv_filter_size, padding = 'same', activation=None)(x1)
        x1 = layers.BatchNormalization()(x1)
        # skip connection
        x2 = layers.Conv1D(filters=Conv_filters, kernel_size=Conv_filter_size, padding = 'same', activation=None)(x)
        x2 = layers.BatchNormalization()(x2)
        # add
        x = layers.Add()([x1,x2])
        x = layers.Activation('relu')(x)
    
    # add LSTM layers
    if BRNN:
        if LSTM_layers>1:
            for i in range(LSTM_layers-1):
                x = layers.Bidirectional(layers.LSTM(LSTM_units, 
                                                     return_sequences = True, 
                                                     kernel_regularizer = LSTM_kernel_reg, 
                                                     recurrent_dropout=LSTM_dropout, 
                                                     name = f'LSTM_{i}'))(x)
        else:
            i=0
        x = layers.Bidirectional(layers.LSTM(LSTM_units, recurrent_dropout=LSTM_dropout, name = f'LSTM_{i+1}'))(x)
        
    else:
        if LSTM_layers>1:
            for i in range(LSTM_layers-1):
                x = layers.LSTM(LSTM_units, 
                                return_sequences = True, 
                                kernel_regularizer = LSTM_kernel_reg, 
                                recurrent_dropout=LSTM_dropout, 
                                name = f'LSTM_{i}')(x)
        else:
            i=0
        x = layers.LSTM(LSTM_units, recurrent_dropout=LSTM_dropout, name = f'LSTM_{i+1}')(x)
        
    # Another dropout layer
    x = layers.Dropout(Dropout_rate)(x)
    
    # we pass it through a few dense layers, with diminishing dimension, to get the output
    for i in range(Dense_layers):
        x = layers.Dense(Dense_units/(2**i), activation = 'relu')(x)
        x = layers.Dropout(Dropout_rate)(x)
        
    output = layers.Dense(1, activation = 'sigmoid')(x)

    model = tf.keras.Model(inputs = string_vec, outputs = output, name='Transformer_encoder')
    
    # compile
    Adam = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    model.compile(loss='binary_crossentropy', optimizer=Adam, metrics=['accuracy'])
    
    return model

# The grid search begins

We build the model, display a summary with one set of parameters, and then fit it using all combinations of those given in the `grid_search` function.  Finally, performance data is given and visualized after fitting is completed.

In [None]:
# Load data for full speech validation tests.
speech_segments_ds, speech_segments_df = full_val_samples()

In [None]:
# Load the model and view a summary.
model = build_model(
    embed_dim = 'glove 50 trainable',
    Conv_filters=128,
    Conv_filter_size=5,
    LSTM_layers=3, 
    LSTM_units=64, 
    LSTM_kernel_reg=None,
    LSTM_dropout=0.3,
    BRNN=True,
    Dense_layers=3, 
    Dense_units=128, 
    Dropout_rate=0.3,
    learning_rate = 0.00001
#     load_saved_model = '/kaggle/input/speeches-classification-model-trials-v5/model_params_0_run_0'
)
model.summary()

In [None]:
reg = l2(0.01)
reg2 = l2(0.1)
n_repeats = 1
history_log = grid_search(build_model, train_ds, val_ds, n_repeats = n_repeats, 
                          n_epochs=18, early_stopping_patience=8, verbose=1,
                                        embed_dim = ['glove 50 trainable'],
                                        Conv_filters=[128],
                                        Conv_filter_size=[5],
                                        LSTM_layers=[3], 
                                        LSTM_units=[64], 
                                        LSTM_kernel_reg=[reg],
                                        LSTM_dropout=[0.3],
                                        BRNN=[True],
                                        Dense_layers=[3], 
                                        Dense_units=[128], 
                                        Dropout_rate=[0.3],
                                        learning_rate = [0.00015]
#                                               load_saved_model = ['/kaggle/input/speeches-classification-model-trials-v5/model_params_0_run_0']
                                  )

In [None]:
visualize_log(history_log)