## Sedinet: predict categorical population

This Jupyter notebook accompanies the [SediNet](https://github.com/MARDAScience/SediNet) package

Written by Daniel Buscombe, MARDA Science

daniel@mardascience.com


> Demonstration of how to use SediNet to estimate from an ensemble of three models to estimate sediment population

First, this notebbok assumes you are a cloud computer such as Colab so we first download the SediNet package from github:


In [1]:
!git clone --depth 1 https://github.com/MARDAScience/SediNet.git

fatal: destination path 'SediNet' already exists and is not an empty directory.


In [0]:
import os, json
os.chdir('SediNet')

Import everything we need from sedinet_models.py

In [4]:
from sedinet_models import *

Using TensorFlow backend.


In [0]:
configfile = 'config_pop.json'

Load the config file and parse out the variables we need

In [0]:
# load the user configs
with open(os.getcwd()+os.sep+'config'+os.sep+configfile) as f:    
  config = json.load(f)     

###===================================================
## user defined variables: proportion of data to use for training (a.k.a. the "train/test split")
base    = int(config["base"]) #minimum number of convolutions in a sedinet convolutional block
csvfile = config["csvfile"] #csvfile containing image names and class values
res_folder = config["res_folder"] #folder containing csv file and that will contain model outputs
var = config["var"] # column name in the csv you wish to estimate
numclass = config["numclass"] # number of classes
dropout = float(config["dropout"]) 

###==================================================
ID_MAP = dict(zip(np.arange(numclass), [str(k) for k in range(numclass)]))
csvfile = res_folder+os.sep+csvfile

This next part reads the data in from the csv file as a pandas dataframe, gets an image generator, and then prepares three models with different base values

In [7]:
###===================================================
## read the data set in, clean and modify the pathnames so they are absolute
df = pd.read_csv(csvfile)
df['files'] = [k.strip() for k in df['files']]
df['files'] = [os.getcwd()+os.sep+f.replace('\\',os.sep) for f in df['files']]    

train_idx = np.arange(len(df))

train_gen = get_data_generator_1image(df, train_idx, True, ID_MAP, var, len(df))

models = []
for base in [base-2,base,base+2]:
  weights_path = var+"_base"+str(base)+"_model_checkpoint.hdf5"
  ##==============================================
  ## create a SediNet model to estimate sediment category
  model = make_cat_sedinet(base, ID_MAP, dropout)
  model.load_weights(os.getcwd()+os.sep+'res'+os.sep+res_folder+os.sep+weights_path)
  models.append(model)

W0729 17:38:08.879631 140230332524416 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0729 17:38:08.895930 140230332524416 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0729 17:38:08.900372 140230332524416 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0729 17:38:08.929696 140230332524416 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3976: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

W0729 17:38:08.962562 140230332524416 deprecation_wrapp

[INFORMATION] Model summary:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 512, 512, 3)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 510, 510, 18)      504       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 508, 508, 36)      5868      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 254, 254, 36)      0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 252, 252, 54)      17550     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 126, 126, 54)      0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 124, 124, 7

Now the models are set up, we use them below to make predictions on each image so we end up with three estimates per image, and our final estimate is their mode

A classification report is printed to screen showing per-class F1 scores which is an average of precision and recall. Precision is the proportion of positive identifications that are correct (a precision of 1 means there are no false positives), and recall is the proportion of actual positives identified correctly (a recall of 1 means there are no false negatives). 

In [0]:
x_train, (trueT)= next(train_gen) 
trueT = np.squeeze(np.asarray(trueT).argmax(axis=-1) )

P = []; PT = []      
for model in models:   
  predT = model.predict(x_train, batch_size=1) 
  predT = np.asarray(predT).argmax(axis=-1)      
  PT.append(predT)

predT = np.squeeze(mode(np.asarray(PT), axis=0)[0])

##==============================================
## print a classification report to screen, showing f1, precision, recall and accuracy
print("==========================================")
print("Classification report for "+var)
print(classification_report(trueT, predT))

Finally we print a confusion matrix showing normalized  correspondences between actual and estimated labels

In [0]:
classes = np.arange(len(ID_MAP))
##==============================================
## create figures showing confusion matrices for data set
plot_confmat(predT, trueT, var+'T',classes)  
plt.savefig(weights_path.replace('.hdf5','_cm_predict.png'), dpi=300, bbox_inches='tight') 
plt.close('all')   

If you are in Google Colab, to view the file you just made, go to tn the File tab over on the left, expand SediNet, and you'll see the png file called "pop_base22_xxxx.png"