# NLP Model

This first model is taken from tensorflow hub and is used to embed words into numbers which can then be fed into a machine learning algorithm down the line. Although it is from tensorflow hub we actually use it in keras just for ease of use. The model is called BERT and is a deep neural network trained on the wikipedia corpus. More information may be found [here](https://tfhub.dev/google/collections/bert/1). 

In our particular use case this model is takes a transaction description, converts that into numbers, which is then fed into a classifier to predict what category a transaction is.

In [5]:
# import dependencies
import tensorflow_text as text  
import tensorflow as tf
import tensorflow_hub as hub

# construct our neural network
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
preprocessor = hub.KerasLayer(
    "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/1")
encoder_inputs = preprocessor(text_input) # dict with keys: 'input_mask', 'input_type_ids', 'input_word_ids'
encoder = hub.KerasLayer(
    "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3",
    trainable=True)
outputs = encoder(encoder_inputs)
pooled_output = outputs["pooled_output"]      # [batch_size, 768].
sequence_output = outputs["sequence_output"]  # [batch_size, seq_length, 768].

2021-09-08 14:12:43.336989: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-08 14:12:52.272812: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


In [6]:
embedding_model = tf.keras.Model(text_input, pooled_output)
sentences = tf.constant(["(your text here)"])
print(embedding_model(sentences))

tf.Tensor(
[[-8.97940934e-01 -4.21122402e-01 -7.14763403e-01  7.73539960e-01
   5.23894966e-01 -2.22132713e-01  8.59079480e-01  2.56452829e-01
  -6.35725319e-01 -9.99986947e-01 -3.78802389e-01  8.12412560e-01
   9.83724594e-01  2.40791187e-01  9.18967366e-01 -6.08404875e-01
  -2.06732392e-01 -5.68343461e-01  3.09819609e-01 -4.92150992e-01
   6.79648578e-01  9.99840379e-01  4.06327635e-01  3.17327738e-01
   5.78656793e-01  9.68130112e-01 -7.63651133e-01  9.27225471e-01
   9.53164160e-01  6.58056974e-01 -6.57289445e-01  2.50330448e-01
  -9.88110542e-01 -2.39297241e-01 -7.63933182e-01 -9.91781294e-01
   4.83871549e-01 -7.73471415e-01  7.10498425e-04 -1.34689547e-02
  -8.96146536e-01  3.51920128e-01  9.99937892e-01 -9.21945870e-02
   3.70956540e-01 -3.04825932e-01 -1.00000000e+00  2.35189036e-01
  -8.96254420e-01  7.68425107e-01  6.41086936e-01  6.17558002e-01
   1.59086347e-01  4.23474520e-01  4.94475842e-01  2.82409787e-01
   6.78794459e-05  1.54811814e-01 -2.84662157e-01 -6.06176078e-01