## Random Forest machine learning on a sequence

First set up a parent path that contains the images and folders for labels, features and predictions.

In [None]:
import os
import zarr
from tnia.io.io_helper import collect_all_images 
from tnia.nd.ndutil import pad_to_largest
import napari

parent_path = r'/home/bnorthan/'
parent_path = r'/home/bnorthan/besttestset/images/Semantic/'

ml_path = os.path.join(parent_path, 'ml2_')
ml_labels_path = os.path.join(ml_path, 'ml_labels')
ml_features_path = os.path.join(ml_path, 'ml_features')
ml_predictions_path = os.path.join(ml_path, 'ml_predictions')

if not os.path.exists(ml_labels_path):
    os.makedirs(ml_labels_path)
if not os.path.exists(ml_features_path):
    os.makedirs(ml_features_path)
if not os.path.exists(ml_predictions_path):
    os.makedirs(ml_predictions_path)


## Collect images

Collect the images and put the 2D image sequence into a padded ND array.  This makes it easy to display in Napari

In [None]:
images = collect_all_images(str(parent_path))
padded_images = pad_to_largest(images)

padded_images.shape

Figure out the number of channels (this logic won't work for a grayscale image) and then calculate the label and features shapes. 

In [None]:
num_channels = padded_images.shape[-1]
label_shape = padded_images.shape[:-1]
features_shape = padded_images.shape[:-1] + (num_channels*12,)

num_channels, label_shape, features_shape

Since the labels, features and predictions (especially the features) could use a lot of memory for a large sequence use Zarr arrays for labels, features and predictions. 

In [None]:
print(ml_labels_path)
print(ml_features_path)
print(ml_predictions_path)

In [None]:
ml_labels = zarr.open(
    ml_labels_path,
    mode='a',
    shape=label_shape,
    dtype='i4',
    dimension_separator="/",
)

ml_features = zarr.open(
    ml_features_path,
    mode='a',
    shape=features_shape,
    dtype='f4',
    dimension_separator="/",
)

ml_predictions = zarr.open(
    ml_predictions_path,
    mode='a',
    shape=label_shape,
    dtype='i4',
    dimension_separator="/",
)

ml_labels.shape, ml_labels.dtype, ml_features.shape, ml_features.dtype

## View images, labels and predictions

View images, labels and predictions.  We can draw labels in Napari and these labels will be recognized by the subsequent cells. |

In [None]:
viewer = napari.Viewer()
viewer.add_image(padded_images, name='padded_images')
viewer.add_labels(ml_labels, name='ml_labels')
viewer.add_labels(ml_predictions.astype('uint32'), name='ml_predictions')

In [None]:
from tnia.machinelearning.random_forest_helper import extract_features_sequence, extract_features
padded_images.shape, padded_images.dtype, ml_labels.shape, ml_labels.dtype, ml_features.shape, ml_features.dtype


Now we extract features for the entire sequence.  The ```extract_features_sequence``` only computes features for images that have labels.  It returns a label vector and feature vector that can be used for pixel based machine learning. 

In [None]:
label_vector, features_vector = extract_features_sequence(padded_images, ml_labels, ml_features)

Now train a Random Forest Classifier to predict foreground and background

In [None]:
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(
            n_estimators=50, n_jobs=-1, max_depth=10, max_samples=0.05
        )

clf.fit(features_vector, label_vector-1)


## Now predict the entire sequence

In [None]:
import numpy as np
from skimage import future

for n in range(padded_images.shape[0]):
    print('predicting image', n)
    image = padded_images[n,:,:,:]
    if ml_features[n,:,:,:].sum() == 0:
        ml_features[n,:,:,:] = extract_features(image)
    features = ml_features[n,:,:,:]

    prediction = future.predict_segmenter(features.reshape(-1, features.shape[-1]), clf).reshape(features.shape[:-1]) + 1
    prediction = np.squeeze(prediction).astype(np.uint32)
    ml_predictions[n,:,:] = prediction

In [None]:
ml_predictions = ml_predictions.astype(np.uint32)


In [None]:
ml_predictions2 = (ml_predictions[:]-1)*5
viewer.add_labels(ml_predictions2, name='ml_predictions2')

In [None]:
viewer.add_labels(ml_predictions, name='ml_predictions')

In [None]:
features_vector.min(), features_vector.max()