# Validate dataset

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)

This notebook is for checking if the dataset loading functions work as intended. We check the frames loading, the features and the image preprocessing.

In [None]:
!pip install --upgrade wormpose

If using Google Colab, please **restart the runtime** after installing the package (click on the menu Runtime > Restart runtime)

We first download some utils functions to display images:

In [None]:
!wget https://raw.githubusercontent.com/iteal/wormpose/main/examples/ipython_utils.py

### Download sample data
Download the sample data, or skip to use another dataset.

In [None]:
sample_data_root = 'wormpose_data'
import os, shutil
if os.path.exists(sample_data_root):
    shutil.rmtree(sample_data_root)
os.mkdir(sample_data_root)
!git clone https://github.com/iteal/wormpose_data.git

## Set inputs

We load the sample_data dataset, update "dataset_loader" and "dataset_path" for another dataset.

In [1]:
from wormpose.dataset.loader import load_dataset

# We have different loaders for different datasets, we use "sample_data" for the tutorial data,
# replace with "tierpsy" for Tierpsy tracker data, or with your custom dataset loader name
dataset_loader = 'tierpsy'

# Set the path to the dataset,
# for Tierpsy tracker data this will be the root path of a folder containing subfolders for each videos
dataset_path = "/bucket/StephensU/kosmas/workstreams/other_trackings/celegans/test_approach/data/wormpose_train_data"

# Set if the worm is lighter than the background in the image
# In the sample data, the worm is darker so we set this variable to False
worm_is_lighter = False

# This function loads the dataset
# optional fields: there is an optional resize parameter to resize the images
# also you can select specific videos from the dataset instead of loading them all
dataset = load_dataset(dataset_loader, dataset_path, worm_is_lighter=worm_is_lighter)

The sample data only contains one video, for another dataset, update "video_name" to choose a specific video in the dataset. 

We choose which frames to display. Update the variables "start" "end" "step" to visualize a different frame range.


In [2]:
video_names = dataset.video_names
print(f"There are {len(video_names)} video(s) in the dataset: \n{video_names}")

if len(video_names) == 0:
    raise ValueError("No video found in dataset, check the path or the loading functions.")
    
video_name = video_names[0]
print(f"\nWe now inspect one video: \"{video_name}\", change the value of video_name to inspect another video.")

MAX_FRAMES = 100
with dataset.frames_dataset.open(video_name) as frames:
    step = max(1, len(frames) // MAX_FRAMES)
    start, end = 0, len(frames)   
print(f"\nWe inspect the frame range [{start}:{end}:{step}], change the value of start, end or step to inspect another frame range.")

There are 1 video(s) in the dataset: 
['video0']

We now inspect one video: "video0", change the value of video_name to inspect another video.

We inspect the frame range [0:11779:117], change the value of start, end or step to inspect another frame range.


## Check frames reader

Run this cell to check if the frames loading is working as intended, this should display the raw frames from the dataset, of the frame range defined above.

In [3]:
from ipython_utils import ImagesViewer

img_viewer = ImagesViewer()
with dataset.frames_dataset.open(video_name) as frames:
    for frame in frames[start:end:step]:
        img_viewer.add_image(frame)
        
img_viewer.view_as_slider()

interactive(children=(IntSlider(value=0, description='index'), Output()), _dom_classes=('widget-interact',))

## Check features

Run the following cells to check the the features are consistent.
First, we look at the average worm length of all videos and see if they are all about the same size. The algorithm will be more accurate if all worms in the dataset have similar properties.

In [4]:
import numpy as np

print("Listing worm lengths for all videos (pixels):")

for video_name in video_names:
    features = dataset.features_dataset[video_name]
    worm_length = features.measurements['worm_length']
    average_worm_length = np.nanmean(worm_length)
    print(f"{video_name}: {average_worm_length:.1f}")
    
print(f"\nThe global image size is set to : {dataset.image_shape} pixels. \nWe will crop real images to this size and generate synthetic images of this size.")

Listing worm lengths for all videos (pixels):
video0: 101.5

The global image size is set to : (102, 102) pixels. 
We will crop real images to this size and generate synthetic images of this size.


Run the next cell to check if the skeleton and worm width are accurate.

The skeleton should be displayed on top of the worm body in gray. The head position should be displayed as a red dot. The worm width at three positions (head, midbody, tail) should be displayed as yellow circles with the radius corresponding to the width.

This only displays frames where features are available.

In [5]:
import numpy as np
import cv2
from wormpose.images.worm_drawing import draw_skeleton, draw_measurements
from ipython_utils import ImagesViewer


def is_valid(skel, measurements):
    return not np.any(np.isnan(skel)) and not np.any([np.isnan(x) for x in measurements[0]])

img_viewer = ImagesViewer()
VIEW_MAX = 100
with dataset.frames_dataset.open(video_name) as frames:
       
    features = dataset.features_dataset[video_name]     
    for index, (frame, skel, measurements) in enumerate(zip(frames, 
                                                            features.skeletons,
                                                           features.measurements)):  
        if is_valid(skel, measurements):
            colored_frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2BGR)
            draw_skeleton(colored_frame, skel, color=(200, 200, 200), head_color=(0, 0, 255))
            draw_measurements(colored_frame, skel, measurements, color=(0, 255, 255))    
            img_viewer.add_image(colored_frame)
            if img_viewer.count >= VIEW_MAX:
                break
        
img_viewer.view_as_slider()

interactive(children=(IntSlider(value=0, description='index', max=99), Output()), _dom_classes=('widget-intera…

## Check image preprocessing

We check if the image preprocessing is accurate.
First we see if we can pickle it, this is necessary for multiprocessing.

In [6]:
import pickle
try:
    pickle.dumps(dataset.frame_preprocessing)
    print('frame_preprocessing test passed successfully')
except:
    print('ERROR: frame_preprocessing is not pickable, this is needed for multiprocessing. Remove inner functions and classes from frame_preprocessing ')

frame_preprocessing test passed successfully


Now we run the preprocessing on actual frames. There should be a yellow bounding box around the worm in the processed image, to validate that all non worm object pixels have been set to a uniform color.

In [7]:
import cv2
from wormpose.dataset.image_processing import frame_preprocessor
from ipython_utils import ImagesViewer, display_as_slider

orig_img_viewer, processed_img_viewer = ImagesViewer(), ImagesViewer()

with dataset.frames_dataset.open(video_name) as frames:
    for index, frame in enumerate(frames[start:end:step]): 
        processed_frame, _, worm_roi = frame_preprocessor.run(dataset.frame_preprocessing, frame)
        frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2BGR)
        processed_frame = cv2.cvtColor(processed_frame, cv2.COLOR_GRAY2BGR)
        cv2.rectangle(processed_frame, 
                      (worm_roi[1].start, worm_roi[0].start),  
                      (worm_roi[1].stop, worm_roi[0].stop),
                      color=(0, 255, 255))
        orig_img_viewer.add_image(frame)
        processed_img_viewer.add_image(processed_frame)

display_as_slider(orig_img_viewer, processed_img_viewer)

interactive(children=(IntSlider(value=0, description='index'), Output()), _dom_classes=('widget-interact',))

## All good?

If every check looks ok, you can proceed with using the dataset. The notebook tutorial_sample_data goes through the training and predict process.