**Concatenate.py** Trains and tests a concatenated CNN

In [None]:
from tensorflow.keras.layers import Dense, BatchNormalization, GlobalAveragePooling2D, Dropout, Activation
from tensorflow.keras.applications.resnet50 import preprocess_input, ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import concatenate
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
from keras.layers import Input
from os import listdir
import pandas as pd
import numpy as np
import cv2
import os

os.chdir(r"C:\Carabid_Data\Invert")

Importing modules <br>
Set dataset directory (adjust this to your own directory)

In [None]:
df = pd.read_csv("shuffletrain.csv")

Y = df['AllTaxa']
X = df.drop(["AllTaxa"], axis=1)
# convert to numpy arrays
X = np.array(X)
# work with labels
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

Read in feature vector dataset <br>
Labels ('AllTaxa' for LITL dataset or 'Order' for order level dataset) are converted to one hot encoded labels as `dummy_y` <br>
Numeric data from df is set to `X`. If contexual metadata or morphometric data is to be removed, the following lines of code can be used before `X = np.array(X)` respectively:

In [None]:
#For removing contextual metadata
X = X.drop(X.loc[:, 'decLat':'day'].columns, axis=1)
#For removing morphometric data
X = X.drop(X.loc[:, 'Area':'rawIntDensBlue'].columns, axis=1)

In [None]:
ncol = X.shape[1]
num_class = dummy_y.shape[1]
inputs = Input(shape = (ncol,))
annx = Dense(128, activation = 'relu')(inputs)
ann = Model(inputs, annx)

Setting up ANN side of the concatenated model, with a single dense layer and inputs from `X`

In [None]:
base_model = ResNet50(include_top = False, weights = 'imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.3)(x)
x = Dense(128)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.3)(x)
resnet = Model(inputs = base_model.input, outputs = x)

for layer in base_model.layers:
    layer.trainable = False

Setting up CNN side of concatenated model. The Image Net ResNet50 is used as the base model, with a global average pooling layer between it and the dense layers. Two dense layers are added with batch normalization and 0.3 dropout. The ResNet layers are set to not be trainable

In [None]:
concat = concatenate([ann.output, resnet.output])

combined = Dense(128)(concat)
combined = BatchNormalization()(combined)
combined = Activation('relu')(combined)
combined = Dropout(0.3)(combined)
combined = Dense(num_class, activation = "softmax")(combined)
model = Model(inputs = [ann.input, resnet.input], outputs = combined)
    
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

The outputs of the ANN and CNN are concatenated in a concatenation layer. One more dense layer is used with batch normalization and dropout before classification and output. <br>
The model is then compiled

In [None]:
model.fit(x=[X,images], y=dummy_y,
    epochs=10, batch_size=128,
    verbose = 1)

The model is fit using `X` and `images` (from **LoadImages.py**) over 10 epochs

In [None]:
testdf = pd.read_csv("shuffletestlitl.csv")
testX = testdf.drop(["AllTaxa"], axis = 1)
testY = testdf["AllTaxa"]
encoder = LabelEncoder()
encoder.fit(testY)
encoded_testY = encoder.transform(testY)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_testY = np_utils.to_categorical(encoded_testY)

preds = model.predict([testX, testimages])

The model is tested using test data generated from **LoadImages.py**