## Calibrating the parameters of a low contrast penetration measurement ImageJ JavaScript

The low_contast_penetration.js JavaScript code works by assessing whether the mean grey value of a row of 20 pixels is less than a specified threshold. This threshold is determined as a proportion of the grey values of the pixels near 0mm depth. In order to obtain this proportion, I analysed images which a Clinical Scientist had classified and attempted to identify what the mean threshold was. Firstly, I recorded the following data obtained from these images in a spreadsheet. 

In [66]:
import pandas as pd

df = pd.read_csv('baseline_measurements.csv')
df[['image_no_marked','probe_type','called_depth_mm','mean_contrast_20px_at zero',
    'mean_contrast_20px_at_called_depth','proportion']]

IOError: File baseline_measurements.csv does not exist

I then obtained summary statistics of proportion of that rows grey value at called depth as a crude measure and plotted this.

In [None]:
proportion_mean = df['proportion'].mean()
proportion_std = df['proportion'].std()

print("Mean of proportions = " + str(proportion_mean))
print("Standard Deviation of proportions = " + str(proportion_std))

xs = list(df.index)
ys = list(df.proportion)
y_errs = [(i * (1 - i)) for i in ys]

err_xs = []
err_ys = []

for x, y, yerr in zip(xs, ys, y_errs):
    err_xs.append((x, x))
    err_ys.append((y - yerr/2, y + yerr/2))

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook, show
from bokeh.models import Range1d
output_notebook()

fig = figure(title='Threshold determined by Clinical Scientist',
            x_axis_label='Image Number',
            y_axis_label='Proportion of "zero" grey value')
fig.circle(xs,ys,radius=0.1)
#fig.multi_line(err_xs, err_ys,line_width=3)
fig.set(y_range=Range1d(0, 1))
fig.multi_line(xs=[[0,8],[0,8],[0,8]],
               ys=[[proportion_mean,proportion_mean],
                   [proportion_mean+proportion_std,proportion_mean+proportion_std],
                   [proportion_mean-proportion_std,proportion_mean-proportion_std]],
               color=['blue','grey','grey'],
               line_width=1)
#fig.multi_line(xs=[0,8],ys=[proportion_mean,proportion_mean])
show(fig);

I then then implemented an algorithm which started at 0 mm and looked for a row of pixels in the 20 mm wide ROI which had a grey value less than 0.8 of the grey value 0 mm. I used 0.8 as this seemed a sensible measure given the data obtained from the baseline measurements.

This algorithm was calling at depths much to shallow so I coded in a number of other parameters. Parameters were:

* Threshold proportion
* Number of rows below threshold before call is made
* Number of rows to backtrack once sufficient number of rows below threshold are observed
* Minimum number of pixels from 'zero' point before call can be made

After several iterations of algorithm re-design it was clear that differences in parameter values for different types of probe (as can be seen from table below). For each parameter, there are two possible values - one for linear and one for curved linear probes. 

In [None]:
alg_df = pd.read_csv('algorithm1.csv')
alg_df

Obtaining summary statistics of the classification error of each algorithm.

In [None]:
from scipy.stats import ttest_rel
from math import sqrt
alg_df_summary = pd.DataFrame(columns=['p_val','mean','std','se']);
for i in range(1,8):
    alg_name = 'algorithm' + str(i)
    diff = alg_df['actual'] - alg_df[alg_name]
    alg_df_summary.loc[i] = [ttest_rel(alg_df['actual'],alg_df[alg_name])[1],diff.mean(),diff.std(),(diff.std()/sqrt(6))]
alg_df_summary

Considering the p-values obtained from paired t-tests of the differences between actual and estimated depth, it seems algorithms 1,2,6 and 7 all have p-values > 0.05 and are therefore plausible candidates. 

In [None]:
xs = list(alg_df_summary.index)
ys = list(alg_df_summary['mean'])
y_errs = list(alg_df_summary['se'])

err_xs = []
err_ys = []

for x, y, yerr in zip(xs, ys, y_errs):
    err_xs.append((x, x))
    err_ys.append((y - (yerr)/2, y + (yerr)/2))

fig2 = figure(title = 'Algorithm Performance', y_axis_label = 'Mean difference from actaul +/- SE', x_axis_label = 'Algorithm')
fig2.circle(xs,ys,radius=0.05)
fig2.multi_line(err_xs, err_ys, line_width=5)
fig2.set(y_range=Range1d(-10,40))
show(fig2);

Algorithms 6 and 7 clearly appear to have the samllest variation of differences between actual and estimated values and these differences are grouped around 0 (i.e. no difference). 

I attempted to classify some unseen images using both algorithm 6 and found that this algorithm did not scale to other images well and generally failed to make a call on the majority of images. 

Algorithm 7 is effectively Algorithm 6 but tweeked so that it does make calls on unseen images - some of the parameters have been 'relaxed' albeit without losing too much performance as can be see from the above plot. 