## Random Forest machine learning on a sequence

First set up a parent path that contains the images and folders for labels, features and predictions.

In [1]:
import os
import zarr
from tnia.io.io_helper import collect_all_images 
from tnia.nd.ndutil import pad_to_largest
import napari

parent_path = r'/home/bnorthan/'
parent_path = r'/home/bnorthan/bekonbits/images/Columbia_Semantic/'

ml_path = os.path.join(parent_path, 'ml3c')
ml_labels_path = os.path.join(ml_path, 'ml_labels')
ml_features_path = os.path.join(ml_path, 'ml_features')
ml_predictions_path = os.path.join(ml_path, 'ml_predictions')

if not os.path.exists(ml_labels_path):
    os.makedirs(ml_labels_path)
if not os.path.exists(ml_features_path):
    os.makedirs(ml_features_path)
if not os.path.exists(ml_predictions_path):
    os.makedirs(ml_predictions_path)


## Collect images

Collect the images and put the 2D image sequence into a padded ND array.  This makes it easy to display in Napari

In [2]:
images = collect_all_images(str(parent_path))
padded_images = pad_to_largest(images)

padded_images.shape

(26, 2076, 3088, 3)

Figure out the number of channels (this logic won't work for a grayscale image) and then calculate the label and features shapes. 

In [3]:
num_channels = padded_images.shape[-1]
label_shape = padded_images.shape[:-1]
features_shape = padded_images.shape[:-1] + (num_channels*12,)

num_channels, label_shape, features_shape

(3, (26, 2076, 3088), (26, 2076, 3088, 36))

Since the labels, features and predictions (especially the features) could use a lot of memory for a large sequence use Zarr arrays for labels, features and predictions. 

In [4]:
ml_labels = zarr.open(
    ml_labels_path,
    mode='a',
    shape=label_shape,
    dtype='i4',
    dimension_separator="/",
)

ml_features = zarr.open(
    ml_features_path,
    mode='a',
    shape=features_shape,
    dtype='f4',
    dimension_separator="/",
)

ml_predictions = zarr.open(
    ml_predictions_path,
    mode='a',
    shape=label_shape,
    dtype='i4',
    dimension_separator="/",
)

ml_labels.shape, ml_labels.dtype, ml_features.shape, ml_features.dtype

((26, 2076, 3088), dtype('int32'), (26, 2076, 3088, 36), dtype('float32'))

## View images, labels and predictions

View images, labels and predictions.  We can draw labels in Napari and these labels will be recognized by the subsequent cells. |

In [5]:
viewer = napari.Viewer()
viewer.add_image(padded_images, name='padded_images')
viewer.add_labels(ml_labels, name='ml_labels')
viewer.add_labels(ml_predictions.astype('uint32'), name='ml_predictions')

<Labels layer 'ml_predictions' at 0x7a85ac3ce7e0>

## OK.  Here it is.  The Pytorch with 3 classes 

# Define the CrossEntropyLoss with ignore_index set to -1
criterion = nn.CrossEntropyLoss(ignore_index=-1)

# Compute the loss
loss = criterion(logits, targets)

print(f"Loss: {loss.item()}")

In [10]:
from tnia.machinelearning.random_forest_helper import extract_features_sequence, extract_features
padded_images.shape, padded_images.dtype, ml_labels.shape, ml_labels.dtype, ml_features.shape, ml_features.dtype


((26, 2076, 3088, 3),
 dtype('uint8'),
 (26, 2076, 3088),
 dtype('int32'),
 (26, 2076, 3088, 36),
 dtype('float32'))

Now we extract features for the entire sequence.  The ```extract_features_sequence``` only computes features for images that have labels.  It returns a label vector and feature vector that can be used for pixel based machine learning. 

In [11]:
label_vector, features_vector = extract_features_sequence(padded_images, ml_labels, ml_features)

image 0 has shape (2076, 3088, 3)
labels 0 has sum 515
features 0 already exist
image 4 has shape (2076, 3088, 3)
labels 4 has sum 643
features 4 already exist
image 9 has shape (2076, 3088, 3)
labels 9 has sum 2104
features 9 already exist
image 12 has shape (2076, 3088, 3)
labels 12 has sum 12812
features 12 already exist
image 13 has shape (2076, 3088, 3)
labels 13 has sum 7175
features 13 already exist
image 14 has shape (2076, 3088, 3)
labels 14 has sum 3804
features 14 already exist
image 16 has shape (2076, 3088, 3)
labels 16 has sum 2346
features 16 already exist
image 22 has shape (2076, 3088, 3)
labels 22 has sum 13199
features 22 already exist


Now train a Random Forest Classifier to predict foreground and background

In [12]:
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(
            n_estimators=50, n_jobs=-1, max_depth=10, max_samples=0.05
        )

clf.fit(features_vector, label_vector-1)


## Now predict the entire sequence

In [13]:
import numpy as np
from skimage import future

for n in range(padded_images.shape[0]):
    print('predicting image', n)
    image = padded_images[n,:,:,:]
    if ml_features[n,:,:,:].sum() == 0:
        ml_features[n,:,:,:] = extract_features(image)
    features = ml_features[n,:,:,:]

    prediction = future.predict_segmenter(features.reshape(-1, features.shape[-1]), clf).reshape(features.shape[:-1]) + 1
    prediction = np.squeeze(prediction).astype(np.uint32)
    ml_predictions[n,:,:] = prediction

predicting image 0
predicting image 1
predicting image 2
predicting image 3
predicting image 4
predicting image 5
predicting image 6
predicting image 7
predicting image 8
predicting image 9
predicting image 10
predicting image 11
predicting image 12
predicting image 13
predicting image 14
predicting image 15
predicting image 16
predicting image 17
predicting image 18
predicting image 19
predicting image 20
predicting image 21
predicting image 22
predicting image 23
predicting image 24
predicting image 25


In [12]:
ml_predictions = ml_predictions.astype(np.uint32)


In [15]:
ml_predictions2 = (ml_predictions[:]-1)*5
viewer.add_labels(ml_predictions2, name='ml_predictions2')

<Labels layer 'ml_predictions2' at 0x7fa6fbd089b0>

In [13]:
viewer.add_labels(ml_predictions, name='ml_predictions')

<Labels layer 'ml_predictions [1]' at 0x7fdac25aa4b0>

Traceback (most recent call last):
  File "/home/bnorthan/mambaforge/envs/easy_augment_pytorch/lib/python3.12/site-packages/vispy/app/backends/_qt.py", line 496, in mousePressEvent
    self._vispy_mouse_press(
  File "/home/bnorthan/mambaforge/envs/easy_augment_pytorch/lib/python3.12/site-packages/vispy/app/base.py", line 184, in _vispy_mouse_press
    ev = self._vispy_canvas.events.mouse_press(**kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bnorthan/mambaforge/envs/easy_augment_pytorch/lib/python3.12/site-packages/vispy/util/event.py", line 453, in __call__
    self._invoke_callback(cb, event)
  File "/home/bnorthan/mambaforge/envs/easy_augment_pytorch/lib/python3.12/site-packages/vispy/util/event.py", line 471, in _invoke_callback
    _handle_exception(self.ignore_callback_errors,
  File "/home/bnorthan/mambaforge/envs/easy_augment_pytorch/lib/python3.12/site-packages/vispy/util/event.py", line 469, in _invoke_callback
    cb(event)
  File "/home/bnor

In [42]:
features_vector.min(), features_vector.max()

(-0.04112321510910988, 1.0)