## 用bert預測句子(CLS+text+SEP+asepct)的ht，concate LSTM的ht，最後再進行分類

#### tensorflow keras跟tfbert一起跑會出問題

In [1]:
import numpy as np
import pandas as pd
from tqdm import tqdm
import re

### 對處理好的laptop、restaurant的train、test資料作LSTM前處理

In [2]:
#把dataframe裡的text切成text左邊跟右邊並做一些處理的function
def split_text(df):
    df['left_text'] = 'N/A'
    df['right_text'] = 'N/A'
    
    for i in tqdm(range(len(df))):
        text = df.loc[i, 'text']
        aspect = df.loc[i, 'aspect']
        text_split = text.split(aspect) # 根據aspect切割text左右邊
        
        left_text = text_split[0]+aspect
        right_text = aspect+text_split[1]
        left_text = left_text.lower() # 把字串變成小寫
        right_text = right_text.lower()
        left_text = re.sub('-', ' ', left_text)
        right_text = re.sub('-', ' ', right_text)
        left_text = re.sub('[.,!"()#%&/:?~]', '', left_text) # 把字串中的一些符號刪除
        right_text = re.sub('[.,!"()#%&/:?~]', '', right_text)
        
        df.loc[i,'left_text'] = left_text
        df.loc[i,'right_text'] = right_text
        df.loc[i, 'left_right_text'] = left_text +' '+ right_text # 用來文字encoding
        
    return df

In [3]:
laptop_train = pd.read_csv('dataset/laptop_train_processed.csv', encoding='utf-8')
restaurant_train = pd.read_csv('dataset/restaurant_train_processed.csv', encoding='utf-8')
laptop_test = pd.read_csv('dataset/laptop_test_processed.csv', encoding='utf-8')
restaurant_test = pd.read_csv('dataset/restaurant_test_processed.csv', encoding='utf-8')

# 把train的資料串在一起
train_data = laptop_train.append(restaurant_train)
train_data = train_data.reset_index(drop=True)

#把test的資料串在一起
test_data = laptop_test.append(restaurant_test)
test_data = test_data.reset_index(drop=True)

#把train、test資料串在一起
data = train_data.append(test_data)
data = data.reset_index(drop=True)

# data切割text
data = split_text(data)

print('訓練資料集:', len(train_data))
print('測試資料集:', len(test_data))
print('所有資料集:', len(data))
data.head(10)

100%|██████████| 7673/7673 [00:02<00:00, 2786.88it/s]


訓練資料集: 5915
測試資料集: 1758
所有資料集: 7673


Unnamed: 0,text,aspect,polarity,left_text,right_text,left_right_text
0,I charge it at night and skip taking the cord ...,cord,neutral,i charge it at night and skip taking the cord,cord with me because of the good battery life,i charge it at night and skip taking the cord ...
1,I charge it at night and skip taking the cord ...,battery life,positive,i charge it at night and skip taking the cord ...,battery life,i charge it at night and skip taking the cord ...
2,The tech guy then said the service center does...,service center,negative,the tech guy then said the service center,service center does not do 1 to 1 exchange and...,the tech guy then said the service center serv...
3,The tech guy then said the service center does...,"""sales"" team",negative,the tech guy then said the service center does...,sales team which is the retail shop which i bo...,the tech guy then said the service center does...
4,The tech guy then said the service center does...,tech guy,neutral,the tech guy,tech guy then said the service center does not...,the tech guy tech guy then said the service ce...
5,"it is of high quality, has a killer GUI, is ex...",quality,positive,it is of high quality,quality has a killer gui is extremely stable i...,it is of high quality quality has a killer gui...
6,"it is of high quality, has a killer GUI, is ex...",GUI,positive,it is of high quality has a killer gui,gui is extremely stable is highly expandable i...,it is of high quality has a killer gui gui is ...
7,"it is of high quality, has a killer GUI, is ex...",applications,positive,it is of high quality has a killer gui is extr...,applications is easy to use and is absolutely ...,it is of high quality has a killer gui is extr...
8,"it is of high quality, has a killer GUI, is ex...",use,positive,it is of high quality has a killer gui is extr...,use and is absolutely gorgeous,it is of high quality has a killer gui is extr...
9,Easy to start up and does not overheat as much...,start up,positive,easy to start up,start up and does not overheat as much as othe...,easy to start up start up and does not overhea...


In [4]:
# print一個出來看看
n = 5
print(data.loc[n, 'text'])
print()
print(data.loc[n, 'left_text'])
print()
print(data.loc[n, 'right_text'])
print()
print(data.loc[n, 'left_right_text'])

it is of high quality, has a killer GUI, is extremely stable, is highly expandable, is bundled with lots of very good applications, is easy to use, and is absolutely gorgeous.

it is of high quality

quality has a killer gui is extremely stable is highly expandable is bundled with lots of very good applications is easy to use and is absolutely gorgeous

it is of high quality quality has a killer gui is extremely stable is highly expandable is bundled with lots of very good applications is easy to use and is absolutely gorgeous


In [5]:
# 把文字Label變成數字label
data.loc[data['polarity'] == 'positive', 'label'] = 2
data.loc[data['polarity'] == 'neutral', 'label'] = 1
data.loc[data['polarity'] == 'negative', 'label'] = 0
data['label'] = data['label'].astype(int)

data.head()

Unnamed: 0,text,aspect,polarity,left_text,right_text,left_right_text,label
0,I charge it at night and skip taking the cord ...,cord,neutral,i charge it at night and skip taking the cord,cord with me because of the good battery life,i charge it at night and skip taking the cord ...,1
1,I charge it at night and skip taking the cord ...,battery life,positive,i charge it at night and skip taking the cord ...,battery life,i charge it at night and skip taking the cord ...,2
2,The tech guy then said the service center does...,service center,negative,the tech guy then said the service center,service center does not do 1 to 1 exchange and...,the tech guy then said the service center serv...,0
3,The tech guy then said the service center does...,"""sales"" team",negative,the tech guy then said the service center does...,sales team which is the retail shop which i bo...,the tech guy then said the service center does...,0
4,The tech guy then said the service center does...,tech guy,neutral,the tech guy,tech guy then said the service center does not...,the tech guy tech guy then said the service ce...,1


In [6]:
#找出left_text跟right_text裡面最多是多少字
max_count = 0
for i in range(len(data)):
    left_text_word_count = len(data.loc[i,'left_text'].split())
    right_text_word_count = len(data.loc[i,'right_text'].split())
    big_count = max(left_text_word_count, right_text_word_count)
    if big_count>max_count:
        max_count = big_count
print('left_text與right_text最多的字數:', max_count)

left_text與right_text最多的字數: 72


### 對文字作encoding

In [7]:
from sklearn import metrics
from sklearn.preprocessing import LabelEncoder
from keras.preprocessing.text import Tokenizer

Using TensorFlow backend.


In [8]:
max_words = 7000 # 最大的字數
max_seq_length = 80 # 句子最長長度
embedding_dim = 300 # 每個字維度

In [9]:
# 把字變成token
tokenizer = Tokenizer(num_words = max_words)
tokenizer.fit_on_texts(data['left_right_text'].to_numpy())

word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
# word_index就是根據left_right_text內容彙整出來的切字跟代表那個字的token number (每個字的dict)

Found 6557 unique tokens.


In [10]:
# 檢查word_index(dictionary)裡面的東西，前面是字，後面是token
for x in list(word_index)[0:10]:
    print (x, ':', word_index[x])

the : 1
and : 2
a : 3
to : 4
is : 5
i : 6
of : 7
for : 8
food : 9
it : 10


In [11]:
# 檢查其中一項字串的token
n = 15 # index number
left_text = data['left_text'].to_numpy() # 轉成向量
right_text = data['right_text'].to_numpy()
left_text_seq = tokenizer.texts_to_sequences(left_text)
right_text_seq = tokenizer.texts_to_sequences(right_text)
print(data.loc[n, 'left_text'])
print(data.loc[n, 'right_text'])
print(left_text_seq[n])
print(right_text_seq[n])
print(type(right_text_seq))
# 把右邊的字串token倒過來，因為要從後面讀到前面
print('right text 倒過來')
for i in range(len(right_text_seq)):
    right_text_seq[i] = right_text_seq[i][::-1]
print(left_text_seq[n])
print(right_text_seq[n])
print(type(right_text_seq))

one night i turned the freaking thing off after using it the next day i turn it on no gui
gui screen all dark power light steady hard drive light steady and not flashing as it usually does
[51, 267, 6, 1211, 1, 1648, 161, 236, 92, 292, 10, 1, 358, 315, 6, 1007, 10, 20, 59, 1530]
[1530, 55, 33, 719, 148, 410, 1781, 100, 101, 410, 1781, 2, 22, 2934, 30, 10, 448, 213]
<class 'list'>
right text 倒過來
[51, 267, 6, 1211, 1, 1648, 161, 236, 92, 292, 10, 1, 358, 315, 6, 1007, 10, 20, 59, 1530]
[213, 448, 10, 30, 2934, 22, 2, 1781, 410, 101, 100, 1781, 410, 148, 719, 33, 55, 1530]
<class 'list'>


In [12]:
# token sequence 後面補0的方法
def text_seq_padding(text_seq):
    if len(text_seq) < max_seq_length:
        n = max_seq_length - len(text_seq)
        text_seq = np.pad(text_seq, (0, n), mode ='constant', constant_values=(0)) # array右邊append n 個 0
    return text_seq
# 把每個left_text_seq，right_text_seq padding到同樣的長度 (後面補0)
left_text_seq = [text_seq_padding(i) for i in left_text_seq] # 必須要 [ ] 輸出是list
left_text_seq = np.array(left_text_seq)

right_text_seq = [text_seq_padding(i) for i in right_text_seq]
right_text_seq = np.array(right_text_seq)

n = 15 # index number
print(left_text_seq[n])
print(right_text_seq[n])

[  51  267    6 1211    1 1648  161  236   92  292   10    1  358  315
    6 1007   10   20   59 1530    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0]
[ 213  448   10   30 2934   22    2 1781  410  101  100 1781  410  148
  719   33   55 1530    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0]


### 使用預先處理的詞向量 (crawl 300 dim)
#### https://fasttext.cc/docs/en/english-vectors.html

In [13]:
# import os
# import sys

In [14]:
# # 載入詞向量
# embeddings_index = {}
# file = open('dataset/crawl-300d-2M.vec', 'r', encoding='utf-8')
# for line in tqdm(file):
#     values = line.split()
#     word = values[0]
#     coefs = np.asarray(values[1:], dtype='float32')
#     embeddings_index[word] = coefs
# file.close()

# print('Found %s word vectors.' % len(embeddings_index))

In [15]:
# UNK = embeddings_index['UNK'] # unknown token
# print(UNK)

In [16]:
# # 根據得到的字典生成word_index裡每個字的詞向量
# real_word = 0
# embedding_matrix = np.zeros((len(word_index)+1, embedding_dim))       #預設一個全部都是0的matrix，總共有每一個unique token的數量
# for word, i in word_index.items():                                    #dict的index從1開始，所以np.zeros()數量要 +1
#     embedding_vector = embeddings_index.get(word)
#     if embedding_vector is not None:
#         embedding_matrix[i] = embedding_vector         #將找到的embedding vector丟到他位置的matrix, 如果找不到一樣維持0
#         real_word = real_word + 1 # 看真正有找到的詞有幾個
#     else:
#         embedding_matrix[i] = UNK
# print(embedding_matrix.shape)
# print(embedding_matrix)
# print('總共不重複的字數:', len(word_index))
# print('在字典裡找到的字數:', real_word)
# # embedding_matrix就是把word_index裡面的每個字所代表word embedding對應變成一個matrix (每個字的word embedding)

In [17]:
# # 把embedding_matrix.npy存起來，下次載入可以直接用
# np.save('embedding_matrix', embedding_matrix)

In [18]:
# 把embedding_matrix load 近來
embedding_matrix = np.load('dataset/embedding_matrix.npy')
print(type(embedding_matrix))
print(embedding_matrix.shape)
print(embedding_matrix)

<class 'numpy.ndarray'>
(6558, 300)
[[ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.0231      0.017       0.0157     ...  0.0744     -0.1118
   0.0963    ]
 [-0.1081      0.0191      0.0354     ...  0.1104      0.0475
  -0.0599    ]
 ...
 [ 0.16580001 -0.0169     -0.4138     ...  0.0933     -0.1168
  -0.1777    ]
 [-0.1179      0.0726     -0.005      ...  0.2079      0.0322
  -0.26879999]
 [ 0.24439999  0.1206      0.1123     ... -0.147      -0.0186
  -0.3204    ]]


### Bert資料前處理

In [19]:
import tensorflow as tf
from transformers import BertTokenizer, BertModel, TFBertForSequenceClassification, TFBertModel

In [20]:
# Load pre-trained model tokenizer, to convert our text into tokens that correspond to BERT’s vocabulary.
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

In [22]:
# 把aspect term token串在sep後面的練習
text = 'aspect term token'
aspect = 'aspect term'
text_tok = tokenizer.tokenize(text) # 把文字變成token
aspect_tok = tokenizer.tokenize(aspect)
text_id = tokenizer.convert_tokens_to_ids(text_tok) # 把token變成Id
aspect_id = tokenizer.convert_tokens_to_ids(aspect_tok)
print(text_id)
print(aspect_id)
text_cls_sep = tokenizer.build_inputs_with_special_tokens(text_id) # 加入CLS、SEP token id
print(text_cls_sep)
text_sep_aspect = text_cls_sep + aspect_id
text_sep_aspect.append(102)
print(text_sep_aspect)

[7814, 2744, 19204]
[7814, 2744]
[101, 7814, 2744, 19204, 102]
[101, 7814, 2744, 19204, 102, 7814, 2744, 102]


### 找出單句最多token

In [23]:
# 找出最多text add aspect中最多是幾個token，不包含CLS跟SEP
def find_max_token(pd):
    max_token = 0
    for i in range(len(pd)):
        text = pd.loc[i, 'text']
        aspect = pd.loc[i, 'aspect']
        text_aspect = text + aspect
        tokens_len = len(tokenizer.tokenize(text_aspect))
        if tokens_len>max_token:
            max_token = tokens_len
    return max_token

In [24]:
# 找出text add aspect中token最多的是幾個token，不包含CLS跟SEP
train_max_token = find_max_token(train_data)
test_max_token = find_max_token(test_data)
print('訓練資料集token最多是:', train_max_token)
print('測試資料集token最多是:', test_max_token)

訓練資料集token最多是: 91
測試資料集token最多是: 99


### 正式把資料轉換成token(padding)

#### 把句子轉變成token(CLS+text+SEP+asepct)+(padding)的function

In [25]:
# 把維度固定在128維
input_dim = 128
def input_ids_all(pd):
    pd['input_ids'] = 'N/A'
    for i in range(len(pd)):
        text = pd.loc[i, 'text']
        aspect = pd.loc[i, 'aspect']
        text_tokens = tokenizer.tokenize(text) # 把text轉成token
        aspect_tokens = tokenizer.tokenize(aspect) # 把aspect轉成token
        
        text_input_ids = tokenizer.convert_tokens_to_ids(text_tokens) # 把text token轉成text token id
        aspect_input_ids = tokenizer.convert_tokens_to_ids(aspect_tokens) # 把aspect token轉成aspect token id
        
        text_input_ids_cls = tokenizer.build_inputs_with_special_tokens(text_input_ids) # aspect token id加上CLS、SEP token id
        input_ids = text_input_ids_cls + aspect_input_ids # 把aspect token id接在text token id 後面 (CLS+text+SEP+aspect)
        input_ids.append(102)
        input_ids = np.array(input_ids)
        
        if len(input_ids) < input_dim:
            n = input_dim - len(input_ids)
            input_ids = np.pad(input_ids, (0, n), mode ='constant', constant_values=(0)) # array右邊append n 個 0  補長度到512
        
        pd['input_ids'][i] = input_ids
    return pd

In [26]:
# 將text轉成token，後面加上aspect token存進dataframe
data = input_ids_all(data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [27]:
# 把維度固定在128維
input_dim = 128
def input_ids_all_2(text, aspect):
    text_tokens = tokenizer.tokenize(text) # 把text轉成token
    aspect_tokens = tokenizer.tokenize(aspect) # 把aspect轉成token
        
    text_input_ids = tokenizer.convert_tokens_to_ids(text_tokens) # 把text token轉成text token id
    aspect_input_ids = tokenizer.convert_tokens_to_ids(aspect_tokens) # 把aspect token轉成aspect token id
        
    text_input_ids_cls = tokenizer.build_inputs_with_special_tokens(text_input_ids) # aspect token id加上CLS、SEP token id
    input_ids = text_input_ids_cls + aspect_input_ids # 把aspect token id接在text token id 後面 (CLS+text+SEP+aspect)
    input_ids.append(102)
    input_ids = np.array(input_ids)
        
    if len(input_ids) < input_dim:
        n = input_dim - len(input_ids)
        input_ids = np.pad(input_ids, (0, n), mode ='constant', constant_values=(0)) # array右邊append n 個 0  補長度到512
    return input_ids

In [28]:
data['input_ids_2'] = data.apply(lambda column: input_ids_all_2(column['text'], column['aspect']), axis=1)

In [29]:
data.head(2)

Unnamed: 0,text,aspect,polarity,left_text,right_text,left_right_text,label,input_ids,input_ids_2
0,I charge it at night and skip taking the cord ...,cord,neutral,i charge it at night and skip taking the cord,cord with me because of the good battery life,i charge it at night and skip taking the cord ...,1,"[101, 1045, 3715, 2009, 2012, 2305, 1998, 1355...","[101, 1045, 3715, 2009, 2012, 2305, 1998, 1355..."
1,I charge it at night and skip taking the cord ...,battery life,positive,i charge it at night and skip taking the cord ...,battery life,i charge it at night and skip taking the cord ...,2,"[101, 1045, 3715, 2009, 2012, 2305, 1998, 1355...","[101, 1045, 3715, 2009, 2012, 2305, 1998, 1355..."


In [30]:
n = 4423
print(data.loc[n, 'text'])
print(data.loc[n, 'aspect'])
print(data.loc[n, 'input_ids'])
print(data.loc[n, 'input_ids_2'])

I would have gotten some cole slaw and a knish if my stomach had more space.
knish
[  101  1045  2052  2031  5407  2070  5624 22889 10376  1998  1037 14161
  4509  2065  2026  4308  2018  2062  2686  1012   102 14161  4509   102
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0]
[  101  1045  2052  2031  5407  2070  5624 22889 10376  1998  1037 14161
  4509  2065  2026  4308  2018  2062  2686  1012   102 14161  45

In [31]:
# 把data的input_ids提出存進list
input_ids = list()
for i in range(len(data)):
    np_id = data.loc[i, 'input_ids']
    input_ids.append(np_id)
input_ids = np.array(input_ids)
print(input_ids.shape)
input_ids

(7673, 128)


array([[  101,  1045,  3715, ...,     0,     0,     0],
       [  101,  1045,  3715, ...,     0,     0,     0],
       [  101,  1996,  6627, ...,     0,     0,     0],
       ...,
       [  101, 24519, 10439, ...,     0,     0,     0],
       [  101, 24519, 10439, ...,     0,     0,     0],
       [  101, 24519, 10439, ...,     0,     0,     0]])

### 切割train、test data

#### X_train、Y_train

In [32]:
#把資料切割成train、test
X_left_train = left_text_seq[:5915]
X_right_train = right_text_seq[:5915]
X_left_test = left_text_seq[5915:]
X_right_test = right_text_seq[5915:]
print(len(X_left_train), len(X_right_train))
print(len(X_left_test), len(X_right_test))

5915 5915
1758 1758


In [33]:
train_input_ids = input_ids[:5915]
test_input_ids = input_ids[5915:]
print(len(train_input_ids))
print(len(test_input_ids))

5915
1758


#### Y_train、Y_test

In [34]:
Y = data['label'].to_numpy() # label轉乘2維矩陣   # keras不吃1維label
print('Shape of Y:', Y.shape)
Y_train = Y[:5915]
Y_test = Y[5915:]
print(len(Y_train))
print(len(Y_test))

Shape of Y: (7673,)
5915
1758


In [35]:
# 檢查polarity跟label有沒有不一樣
print('laptop_test', '         ','restaurant_test')
for i in range(20):
    print(laptop_test.loc[i, 'polarity'], data.loc[5915+i, 'label'], Y_test[i], '  ', restaurant_test.loc[i, 'polarity'], data.loc[6553+i, 'label'], Y_test[638+i])

laptop_test           restaurant_test
positive 2 2    positive 2 2
negative 0 0    positive 2 2
positive 2 2    positive 2 2
negative 0 0    positive 2 2
negative 0 0    positive 2 2
negative 0 0    positive 2 2
positive 2 2    positive 2 2
negative 0 0    positive 2 2
neutral 1 1    positive 2 2
positive 2 2    positive 2 2
positive 2 2    neutral 1 1
positive 2 2    positive 2 2
positive 2 2    positive 2 2
positive 2 2    positive 2 2
positive 2 2    negative 0 0
positive 2 2    positive 2 2
negative 0 0    neutral 1 1
negative 0 0    neutral 1 1
positive 2 2    positive 2 2
positive 2 2    positive 2 2


### Model

In [36]:
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import LSTM, Activation, Dense, Dropout, Input, Embedding, Flatten, InputLayer, Bidirectional, concatenate, add, average, Reshape
from tensorflow.keras.optimizers import RMSprop, Adam
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.utils import plot_model

### 把三邊input merge起來 (left text ht、right text ht、bert ht)，有加上dropout

In [37]:
# first input model 1
input_layer_1 = Input(shape = (max_seq_length,), dtype='int64')
embedding_1 = Embedding(len(word_index) + 1, embedding_dim, weights=[embedding_matrix], mask_zero=True, trainable=True)(input_layer_1)
lstm_hidden_1 = LSTM(128, return_sequences=False, dropout=0.3)(embedding_1)

# second input model 2
input_layer_2 = Input(shape = (max_seq_length,), dtype='int64')
embedding_2 = Embedding(len(word_index) + 1, embedding_dim, weights=[embedding_matrix], mask_zero=True, trainable=True)(input_layer_2)
lstm_hidden_2 = LSTM(128, return_sequences=False, dropout=0.3)(embedding_2)

# third input model 3
input_layer_3= Input(shape = (128,), dtype='int64')
bert = TFBertModel.from_pretrained('bert-base-uncased')(input_layer_3)
bert = bert[0]
dropout = Dropout(0.1)(bert)
flat = Flatten()(dropout)
bert_hidden = Dense(512)(flat)

# merge input model
# merge = concatenate([lstm_hidden_1, lstm_hidden_2, bert_hidden])
merge = concatenate([bert_hidden, lstm_hidden_1, lstm_hidden_2])
# merge = concatenate([lstm_hidden_1, lstm_hidden_2])
hidden_1 = Dense(256, activation='relu')(merge)
dropout_1 = Dropout(0.2)(hidden_1)
hidden_2 = Dense(64, activation='relu')(dropout_1)
dropout_2 = Dropout(0.2)(hidden_2)
output = Dense(3, activation='softmax')(dropout_2)
model = Model(inputs=[input_layer_1, input_layer_2, input_layer_3], outputs=output)
# model = Model(inputs=[input_layer_1, input_layer_2], outputs=output)
print(model.summary())
adam = Adam(lr=1e-3)

optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(loss=loss, optimizer=optimizer, metrics=[metric])

early_stopping = EarlyStopping(monitor='val_loss', patience=8, verbose=1, restore_best_weights=True)

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            [(None, 128)]        0                                            
__________________________________________________________________________________________________
tf_bert_model (TFBertModel)     ((None, 128, 768), ( 109482240   input_3[0][0]                    
__________________________________________________________________________________________________
dropout_37 (Dropout)            (None, 128, 768)     0           tf_bert_model[0][0]              
__________________________________________________________________________________________________
input_1 (InputLayer)            [(None, 80)]         0                                            
______________________________________________________________________________________________

In [None]:
model_fit = model.fit([X_left_train, X_right_train, train_input_ids],Y_train, batch_size=4,epochs=8,
                      validation_data=([X_left_test, X_right_test, test_input_ids], Y_test))
# model_fit = model.fit([X_left_train, X_right_train],Y_train, batch_size=64,epochs=30,
#                       validation_data=([X_left_test, X_right_test], Y_test), callbacks=[early_stopping])

Train on 5915 samples, validate on 1758 samples
Epoch 1/8
Epoch 2/8