
Initialize Bert model

In [1]:
from transformers import TFAutoModel

bert = TFAutoModel.from_pretrained('bert-base-cased')

# we can view the model using the summary method
bert.summary()

Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


Model: "tf_bert_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bert (TFBertMainLayer)      multiple                  108310272 
                                                                 
Total params: 108,310,272
Trainable params: 108,310,272
Non-trainable params: 0
_________________________________________________________________


In [2]:
import tensorflow as tf

# two input layers
input_ids = tf.keras.layers.Input(shape=(512,), name='input_ids', dtype='int32')
mask = tf.keras.layers.Input(shape=(512,), name='attention_mask', dtype='int32')

# Access the transformer model within our bert object using the bert attribute (eg bert.bert instead of bert)
embeddings = bert.bert(input_ids, attention_mask=mask)[0]  # access final activations with [0]

# convert bert embeddings into 2 output classes
x = tf.keras.layers.LSTM(32, dropout=.3, recurrent_dropout=.3, return_sequences=True)(embeddings)
x = tf.keras.layers.LSTM(16, dropout=.4, recurrent_dropout=.4, return_sequences=False)(x)
# normalize
x = tf.keras.layers.BatchNormalization()(x)
# output
x = tf.keras.layers.Dense(64, activation='relu')(x)
y = tf.keras.layers.Dense(2, activation='softmax', name='outputs')(x)

Define model, specifying input and output layers.

In [4]:
# initialize model
model = tf.keras.Model(inputs=[input_ids, mask], outputs=y)

# print out model summary
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_ids (InputLayer)         [(None, 512)]        0           []                               
                                                                                                  
 attention_mask (InputLayer)    [(None, 512)]        0           []                               
                                                                                                  
 bert (TFBertMainLayer)         TFBaseModelOutputWi  108310272   ['input_ids[0][0]',              
                                thPoolingAndCrossAt               'attention_mask[0][0]']         
                                tentions(last_hidde                                               
                                n_state=(None, 512,                                         

Initialize training parameters:

In [5]:
optimizer = tf.keras.optimizers.Adam(lr=1e-4, decay=1e-6)
loss = tf.keras.losses.CategoricalCrossentropy()
acc = tf.keras.metrics.CategoricalAccuracy('accuracy')

model.compile(optimizer=optimizer, loss=loss, metrics=[acc])

  super().__init__(name, **kwargs)


Load in training and validation datasets

In [6]:
element_spec = ({'input_ids': tf.TensorSpec(shape=(64, 512), dtype=tf.int32, name=None),
                 'attention_mask': tf.TensorSpec(shape=(64, 512), dtype=tf.int32, name=None)},
                tf.TensorSpec(shape=(64, 2), dtype=tf.float64, name=None))

# load the training and validation sets
train_ds = tf.data.Dataset.load('train', element_spec=element_spec)
val_ds = tf.data.Dataset.load('val', element_spec=element_spec)

# view the input format
train_ds.take(1)

<TakeDataset element_spec=({'input_ids': TensorSpec(shape=(64, 512), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(64, 512), dtype=tf.int32, name=None)}, TensorSpec(shape=(64, 2), dtype=tf.float64, name=None))>

Train model

In [None]:
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=1
)

Finally, save model!

In [None]:
model.save('hate_detection_model')