## Sedinet: predict 9 percentiles of the grain size distribution from a small population of beach sands

This Jupyter notebook accompanies the [SediNet](https://github.com/MARDAScience/SediNet) package

Written by Daniel Buscombe, MARDA Science

daniel@mardascience.com


> Demonstration of how to use SediNet to estimate from an ensemble of three models to estimate nine percentiles of the cumulative grain size distribution from a population of beach sands

First, this notebbok assumes you are a cloud computer such as Colab so we first download the SediNet package from github:


In [1]:
!git clone --depth 1 https://github.com/MARDAScience/SediNet.git

fatal: destination path 'SediNet' already exists and is not an empty directory.


In [0]:
import os, json
os.chdir('SediNet')

Import everything we need from sedinet_models.py

In [3]:
from sedinet_models import *

Using TensorFlow backend.


In [0]:
configfile = 'config_sievedsand_9prcs.json'

Load the config file and parse out the variables we need

In [0]:
# load the user configs
with open(os.getcwd()+os.sep+'config'+os.sep+configfile) as f:    
  config = json.load(f)     

###===================================================
## user defined variables: proportion of data to use for training (a.k.a. the "train/test split")
base    = int(config["base"]) #minimum number of convolutions in a sedinet convolutional block
csvfile = config["csvfile"] #csvfile containing image names and class values
res_folder = config["res_folder"] #folder containing csv file and that will contain model outputs
name = config["name"] #name prefix for output files
dropout = float(config["dropout"]) 
add_bn = bool(config["add_bn"]) 

vars = [k for k in config.keys() if not np.any([k.startswith('base'), k.startswith('res_folder'), k.startswith('csvfile'), k.startswith('name'), k.startswith('dropout'), k.startswith('add_bn')])]

vars = sorted(vars)

###==================================================

csvfile = res_folder+os.sep+csvfile

This next part reads the data in from the csv file as a pandas dataframe, gets an image generator, and then prepares three models with different base values

In [0]:
###===================================================
## read the data set in, clean and modify the pathnames so they are absolute
df = pd.read_csv(csvfile)
df['files'] = [k.strip() for k in df['files']]
df['files'] = [os.getcwd()+os.sep+f.replace('\\',os.sep) for f in df['files']]    

train_idx = np.arange(len(df))

##==============================================
## create training and testing file generators, set the weights path, plot the model, and create a callback list for model training   
if len(vars)==1:        
  train_gen = get_data_generator_1vars(df, train_idx, True, vars, len(df))
elif len(vars)==2:        
  train_gen = get_data_generator_2vars(df, train_idx, True, vars, len(df))
elif len(vars)==3:        
  train_gen = get_data_generator_3vars(df, train_idx, True, vars, len(df))
elif len(vars)==4:        
  train_gen = get_data_generator_4vars(df, train_idx, True, vars, len(df))
elif len(vars)==5:        
  train_gen = get_data_generator_5vars(df, train_idx, True, vars, len(df))
elif len(vars)==6:        
  train_gen = get_data_generator_6vars(df, train_idx, True, vars, len(df))
elif len(vars)==7:        
  train_gen = get_data_generator_7vars(df, train_idx, True, vars, len(df))
elif len(vars)==8:        
  train_gen = get_data_generator_8vars(df, train_idx, True, vars, len(df))
elif len(vars)==9:        
  train_gen = get_data_generator_9vars(df, train_idx, True, vars, len(df))


In [7]:
x_train, tmp = next(train_gen)   
if len(vars)>1:    
  counter = 0
  for v in vars:
     exec(v+'_trueT = np.squeeze(tmp[counter])')
     counter +=1
else:
  exec(vars[0]+'_trueT = np.squeeze(tmp)')

models = []
for base in [base-2,base,base+2]:
  weights_path = name+"_base"+str(base)+"_model_checkpoint.hdf5"
  ##==============================================
  ## create a SediNet model to estimate sediment category
  model = make_cont_sedinet(base, vars, add_bn, dropout)
  model.load_weights(os.getcwd()+os.sep+'res'+os.sep+res_folder+os.sep+weights_path)
  models.append(model)

W0729 18:16:12.752464 139654820411264 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0729 18:16:12.767957 139654820411264 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0729 18:16:12.773661 139654820411264 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0729 18:16:12.805110 139654820411264 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3976: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

W0729 18:16:12.849107 139654820411264 deprecation_wrapp

[INFORMATION] Model summary:
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 512, 1024, 1) 0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 510, 1022, 22 220         input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 508, 1020, 44 8756        conv2d_1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 254, 510, 44) 0           conv2d_2[0][0]                   
________________________________________________________________________________

Now the models are set up, we use them below to make predictions on each image so we end up with three estimates per image, and our final estimate is their mean

In [0]:
for v in vars:
  exec(v+'_PT = []')
#PT = []      
for model in models:   
  tmp = model.predict(x_train, batch_size=1)
  if len(vars)>1:    
     counter = 0
     for v in vars:
        exec(v+'_PT.append(np.squeeze(tmp[counter]))')
        counter +=1
  else:
     exec(vars[0]+'_PT.append(np.asarray(np.squeeze(tmp)))') #.argmax(axis=-1))')

if len(vars)>1:
  for k in range(len(vars)):  
     exec(vars[k]+'_predT = np.squeeze(np.mean(np.asarray('+vars[k]+'_PT), axis=0))')
else:   
  exec(vars[0]+'_predT = np.squeeze(np.mean(np.asarray('+vars[0]+'_PT), axis=0))')

Finally we print a confusion matrix showing normalized  correspondences between actual and estimated labels

In [0]:
if len(vars)==9:    
  nrows = 3; ncols = 3
elif len(vars)==8:    
  nrows = 4; ncols = 2
elif len(vars)==7:    
  nrows = 4; ncols = 2           
elif len(vars)==6:    
  nrows = 3; ncols = 2
elif len(vars)==5:    
  nrows = 3; ncols = 2       
elif len(vars)==4:    
  nrows = 2; ncols = 2       
elif len(vars)==3:    
  nrows = 3; ncols = 1      
elif len(vars)==2:    
  nrows = 2; ncols = 1      
elif len(vars)==1:    
  nrows = 1; ncols = 1

## make a plot                  
fig = plt.figure(figsize=(4*nrows,4*ncols))
labs = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
for k in range(1,1+(nrows*ncols)):
  plt.subplot(nrows,ncols,k)
  plt.plot(eval(vars[k-1]+'_trueT'), eval(vars[k-1]+'_predT'), 'ko', markersize=3)
  #plt.plot(eval(vars[k-1]+'_true'), eval(vars[k-1]+'_pred'), 'bx', markersize=5)
  plt.plot([5, 1000], [5, 1000], 'k', lw=2)
  plt.xscale('log'); plt.yscale('log')
  #plt.text(11,700,'Test : '+str(np.mean(100*(np.abs(eval(vars[k-1]+'_pred') - eval(vars[k-1]+'_true')) / eval(vars[k-1]+'_true'))))[:5]+' %',  fontsize=8, color='b')
  plt.text(11,1000,'Train : '+str(np.mean(100*(np.abs(eval(vars[k-1]+'_predT') - eval(vars[k-1]+'_trueT')) / eval(vars[k-1]+'_trueT'))))[:5]+' %', fontsize=8)
  plt.xlim(10,1300); plt.ylim(10,1300)
  plt.title(r''+labs[k-1]+') '+vars[k-1], fontsize=8, loc='left')

#plt.show()
plt.savefig(name+str(IM_HEIGHT)+'_batch'+str(batch_size)+'_xy-base'+str(base)+'_predict.png', dpi=300, bbox_inches='tight')
plt.close()
del fig  

If you are in Google Colab, to view the file you just made, go to tn the File tab over on the left, expand SediNet, and you'll see the png file called "sievesand_9prcs_xxxx-predict.png"