# Model Validation

This code loads the trained model and verifies it using input strings.
Files required in same directory are
- glove.6B.50d.txt
- model.json
- best_model.h5
- emails.txt (contains sample email bodies)

 ## Initial Setup

In [14]:
import numpy as np
import re
# NLP
from nltk.tokenize.regexp import RegexpTokenizer
from nltk.corpus import stopwords 
from nltk.stem.wordnet import WordNetLemmatizer
import string
from nltk.stem.porter import PorterStemmer

from keras.layers import *
from keras.models import Sequential

## Cleaning and vectorization of input string

In [15]:
def clean(text):
    stop = set(stopwords.words('english'))
    stop.update(("to","cc","subject","http","from", "gbp", "usd", "eur", "cad", "sent","thanks", "acc", "ID", "account", "regards", "hi", "hello", "thank you"))
    exclude = set(string.punctuation) 
    lemma = WordNetLemmatizer()
    porter= PorterStemmer()
    
    text=text.rstrip()
    text = re.sub(r'[^a-zA-Z]', ' ', text)
    stop_free = " ".join([i for i in text.lower().split() if((i not in stop) and (not i.isdigit()))])
    punc_free = ''.join(ch for ch in stop_free if ch not in exclude)
    normalized = " ".join(lemma.lemmatize(word) for word in punc_free.split())
    stem = " ".join(porter.stem(token) for token in normalized.split())
    
    return normalized

embeddings = {}
with open('./glove.6B.50d.txt',encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coeffs = np.asarray(values[1:],dtype='float32')

        embeddings[word] = coeffs
    f.close()
print(len(embeddings))

def getOutputEmbeddings(X):  
    X = X.split()
    embedding_matrix_output = np.zeros((1,100,50))
    for jx in range(min(100, len(X))):
        embedding_matrix_output[0][jx] = embeddings[X[jx].lower()]
            
    return embedding_matrix_output


400000


## Class Labels used for Model Training

In [16]:
#dependent on model loaded
classes = ['BankFailed', 'BankProgress', 'BankComplete', 'BankRequest',
       'ClientProgress', 'ClientStatus', 'ClientComplete', 'ClientFailed']

## Load Trained Model

In [17]:
from keras.models import model_from_json

with open("model.json", "r") as file:
    model=model_from_json(file.read())
model.load_weights("best_model.h5")
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 100, 64)           29440     
_________________________________________________________________
dropout_3 (Dropout)          (None, 100, 64)           0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 64)                33024     
_________________________________________________________________
dropout_4 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 8)                 520       
_________________________________________________________________
activation_2 (Activation)    (None, 8)                 0         
Total params: 62,984
Trainable params: 62,984
Non-trainable params: 0
__________________________________________________

## Verify with sample strings

In [19]:
'''test_strings=["Greetings! I want to know about the status of transaction 111222. Please send me the details. Thanks!",
              "Hello, this is to inform you that due to lack of funds, cheque no. 938974814257534238634169 has bounced. Kindly ensure that you have enough funds and then submit a new cheque at the earliest.",
              "This is to inform you that your transaction 123456 is in progress and will be processed in 2-3 days.",
              "random test string"
             ] '''

emails=open("emails.txt", "r")
                
for test_str in emails:
    print(f"Original input --> {test_str}")     
    test_str = clean(test_str)
    print(f"Cleaned input --> {test_str}")
    emb_X = getOutputEmbeddings(test_str)
    #print (emb_X)
    p = model.predict_classes(emb_X)
    #print (p.shape)
    print(f'Output --> class {classes[p[0]]} ');
    print("\n\n")


Original input --> Payment of 471862128 CAD to account id 101165 has been made on 19/02/2020 and is in progress, please acknowledge.

Cleaned input --> payment id made progress please acknowledge
Output --> class BankFailed 



Original input --> Hi Tom, This is to inform you that BNY Mellon has fully paid 80517212 USD to Account ID 104276 on 15-01-2020 .

Cleaned input --> tom inform bny mellon fully paid id
Output --> class BankRequest 



Original input --> Greetings Cho, Please find attached details Client Name : HSBC Account ID: 103285 Legal Entity: CitiBank HongKong Currency: CAD Payment Type: Receive Paid Amount: 56327540 Payment Date: 16-10-2019 Payment Status: Processing Pending Amount: 2564636 Thanks.

Cleaned input --> greeting cho please find attached detail client name hsbc id legal entity citibank hongkong currency payment type receive paid amount payment date payment status processing pending amount
Output --> class BankFailed 



Original input --> Joe, Please find atta