# Experiment Initialization

Here, I define the terms of my experiment, among them the location of the files in S3 (bucket and folder name), and each of the video prefixes (everything before the file extension) that I want to track. 

Note that these videos should be similar-ish: while we can account for differences in mean intensities between videos, particle sizes should be approximately the same, and (slightly less important) particles should be moving at about the same order of magnitude speed. In this experiment, these videos were taken in 0.4% agarose gel at 100x magnification and 100.02 fps shutter speeds with nanoparticles of about 100nm in diameter.

In [1]:
to_track = []
result_futures = {}
start_knot = 16 #Must be unique number for every run on Cloudknot.

remote_folder = 'Tissue_Studies/09_11_18_Regional' #Folder in AWS S3 containing files to be analyzed
bucket = 'ccurtis.data'
vids = 15
types = ['PEG', 'PS']
pups = [2, 3]
slices = [1, 2, 3]
for typ in types:
    for pup in pups:
        for slic in slices:
            for num in range(1, vids+1):
                #to_track.append('100x_0_4_1_2_gel_{}_bulk_vid_{}'.format(vis, num))
                to_track.append('{}_P{}_S{}_XY{}'.format(typ, pup, slic, '%02d' % num))

In [2]:
to_track

['PEG_P2_S1_XY01',
 'PEG_P2_S1_XY02',
 'PEG_P2_S1_XY03',
 'PEG_P2_S1_XY04',
 'PEG_P2_S1_XY05',
 'PEG_P2_S1_XY06',
 'PEG_P2_S1_XY07',
 'PEG_P2_S1_XY08',
 'PEG_P2_S1_XY09',
 'PEG_P2_S1_XY10',
 'PEG_P2_S1_XY11',
 'PEG_P2_S1_XY12',
 'PEG_P2_S1_XY13',
 'PEG_P2_S1_XY14',
 'PEG_P2_S1_XY15',
 'PEG_P2_S2_XY01',
 'PEG_P2_S2_XY02',
 'PEG_P2_S2_XY03',
 'PEG_P2_S2_XY04',
 'PEG_P2_S2_XY05',
 'PEG_P2_S2_XY06',
 'PEG_P2_S2_XY07',
 'PEG_P2_S2_XY08',
 'PEG_P2_S2_XY09',
 'PEG_P2_S2_XY10',
 'PEG_P2_S2_XY11',
 'PEG_P2_S2_XY12',
 'PEG_P2_S2_XY13',
 'PEG_P2_S2_XY14',
 'PEG_P2_S2_XY15',
 'PEG_P2_S3_XY01',
 'PEG_P2_S3_XY02',
 'PEG_P2_S3_XY03',
 'PEG_P2_S3_XY04',
 'PEG_P2_S3_XY05',
 'PEG_P2_S3_XY06',
 'PEG_P2_S3_XY07',
 'PEG_P2_S3_XY08',
 'PEG_P2_S3_XY09',
 'PEG_P2_S3_XY10',
 'PEG_P2_S3_XY11',
 'PEG_P2_S3_XY12',
 'PEG_P2_S3_XY13',
 'PEG_P2_S3_XY14',
 'PEG_P2_S3_XY15',
 'PEG_P3_S1_XY01',
 'PEG_P3_S1_XY02',
 'PEG_P3_S1_XY03',
 'PEG_P3_S1_XY04',
 'PEG_P3_S1_XY05',
 'PEG_P3_S1_XY06',
 'PEG_P3_S1_XY07',
 'PEG_P3_S1_

In [None]:
to_track.append('PS_P2_S3_rep')

The videos used with this analysis are fairly large (2048 x 2048 pixels and 651 frames), and in cases like this, the tracking algorithm can quickly eat up RAM. In this case, we chose to crop the videos to 512 x 512 images such that we can run our jobs on smaller EC2 instances with 16GB of RAM. 

Note that larger jobs can be made with user-defined functions such that splitting isn't necessary-- or perhaps an intermediate amount of memory that contains splitting, tracking, and msd calculation functions all performed on a single EC2 instance.

The compiled functions in the knotlets module require access to buckets on AWS. In this case, we will be using a publicly (read-only) bucket. If users want to run this notebook on their own, will have to transfer files from nancelab.publicfiles to their own bucket, as it requires writing to S3 buckets.

In [None]:
import diff_classifier.knotlets as kn

In [None]:
for prefix in to_track:
    kn.split(prefix, remote_folder=remote_folder, bucket=bucket)

## Tracking predictor

Tracking normally requires user input in the form of tracking parameters e.g. particle radius, linking max distance, max frame gap etc. When large datasets aren't required, each video can be manageably manually tracked using the TrackMate GUI. However, when datasets get large e.g. >20 videos, this can become extremely arduous. For videos that are fairly similar, you can get away with using similar tracking parameters across all videos. However, one parameter that is a little more noisy that the others is the quality filter value. Quality is a numerical value that approximate how likely a particle is to be "real." 

In this case, I built a predictor that estimates the quality filter value based on intensity distributions from the input images. Using a relatively small training dataset (5-20 videos), users can get fairly good estimates of quality filter values that can be used in parallelized tracking workflows.

Note: in the current setup, the predictor should be run in Python 3. While the code will run in Python 3, there are differences between the random number generators in Python2 and Python3 that I was not able to control for.

In [3]:
import os
import diff_classifier.imagej as ij
import boto3
import os.path as op
import diff_classifier.aws as aws
import diff_classifier.knotlets as kn
import numpy as np
from sklearn.externals import joblib

The regress_sys function should be run twice. When have_output is set to False, it generates a list of files that the user should manually track using Trackmate. Once the quality filter values are found, they can be used as input (y) to generate a regress object that can predict quality filter values for additional videos. Once y is assigned, set have_output to True and re-run the cell.

In [None]:
tnum=15 #number of training datasets
pref = []
for num in to_track:                    
    for row in range(0, 4):
        for col in range(0, 4):
            pref.append("{}_{}_{}".format(num, row, col))

y = np.array([2.5, 0.63, 3.55, 2.2, 3.3, 2.3, 1.73, 1.44, 1.4, 1.9, 1.53, 5.3, 1.82, 3.76, 2.6])

# Creates regression object based of training dataset composed of input images and manually
# calculated quality cutoffs from tracking with GUI interface.
regress = ij.regress_sys(remote_folder, pref, y, tnum, have_output=False, bucket_name=bucket)
#Read up on how regress_sys works before running.

In [None]:
#Pickle object
filename = 'regress.obj'
with open(filename,'wb') as fp:
    joblib.dump(regress,fp)

import boto3
s3 = boto3.client('s3')
aws.upload_s3(filename, remote_folder+'/'+filename, bucket_name=bucket)

Users should input all tracking parameters into the tparams object. Note that the quality value will be overwritten by values found using the quality predictor found above.

In [4]:
tparams = {'radius': 7.5, 'threshold': 0.0, 'do_median_filtering': False,
           'quality': 10.0, 'xdims': (0, 511), 'ydims': (1, 511),
           'median_intensity': 300.0, 'snr': 0.0, 'linking_max_distance': 15.0,
           'gap_closing_max_distance': 25.0, 'max_frame_gap': 6,
           'track_duration': 20.0}

## Cloudknot setup

Cloudknot requires the user to define a function that will be sent to multiple computers to run. In this case, the function knotlets.tracking will be used. We create a docker image that has the required installations (defined by the requirements.txt file from diff_classifier on Github, and the base Docker Image below that has Fiji pre-installed in the correct location.

Note that I modify the Docker image below such that the correct version of boto3 is installed. For some reason, versions later than 1.5.28 error out, so I specified 5.28 as the correct version. Run my_image.build below to double-check that the Docker image is successfully built prior to submitting the job to Cloudknot.

In [5]:
import cloudknot as ck
import os.path as op

github_installs=('https://github.com/ccurtis7/diff_classifier.git@Chad')
#my_image = ck.DockerImage(func=kn.tracking, base_image='arokem/python3-fiji:0.3', github_installs=github_installs)
my_image = ck.DockerImage(func=kn.assemble_msds, base_image='arokem/python3-fiji:0.3', github_installs=github_installs)
docker_file = open(my_image.docker_path)
docker_string = docker_file.read()
docker_file.close()

req = open(op.join(op.split(my_image.docker_path)[0], 'requirements.txt'))
req_string = req.read()
req.close()

#Make sure to double-check that the cloudknot requirements file has boto3 version 1.5.28
new_req = req_string[0:req_string.find('\n')-4]+'5.28'+ req_string[req_string.find('\n'):]
req_overwrite = open(op.join(op.split(my_image.docker_path)[0], 'requirements.txt'), 'w')
req_overwrite.write(new_req)
req_overwrite.close()

In [6]:
my_image.build("0.1", image_name="test_image")

The object all_maps is an iterable containing all the inputs sent to Cloudknot. This is useful, because if the user needs to modify some of the tracking parameters for a single video, this can be done prior to submission to Cloudknot.

In [None]:
names = []
all_maps = []
for prefix in to_track:    
    for i in range(0, 4):
        for j in range(0, 4):
            names.append('{}_{}_{}'.format(prefix, i, j))
            all_maps.append(('{}_{}_{}'.format(prefix, i, j), remote_folder, bucket, 'regress.obj',
                             4, 4, (512, 512), tparams))

all_maps

In [7]:
names = []
all_maps = []
for prefix in to_track:
    all_maps.append((prefix, remote_folder, bucket, (512, 512), 651, 4, 4))
    
all_maps

[('PEG_P2_S1_XY01',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY02',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY03',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY04',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY05',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY06',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY07',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY08',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY09',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S1_XY10',
  'Tissu

The Cloudknot knot object sets up the compute environment which will run the code. Note that the name must be unique. Every time you submit a new knot, you should change the name. I do this with the variable start_knot, which I vary for each run.

If larger jobs are anticipated, users can adjust both RAM and storage with the memory and image_id variables. Memory specifies the amount of RAM to be used. Users can build a customized AMI with as much space as they need, and enter the ID into image_ID. Read the Cloudknot documentation for more details.

In [14]:
ck.aws.set_region('us-east-1')

In [15]:
knot.clobber()

In [8]:
knot = ck.Knot(name='download_and_track_{}_b{}'.format('chad19', start_knot),
               docker_image = my_image,
               memory = 144000,
               resource_type = "SPOT",
               bid_percentage = 100,
               #image_id = 'ami-0e00afdf500081a0d', #May need to change this line
               pars_policies=('AmazonS3FullAccess',))

In [9]:
result_futures = knot.map(all_maps, starmap=True)

In [24]:
ck.aws.set_region('us-west-1')

In [11]:
knot2 = ck.Knot(name='download_and_track_{}_b{}'.format('chad20', start_knot),
               docker_image = my_image,
               memory = 144000,
               resource_type = "SPOT",
               bid_percentage = 100,
               #image_id = 'ami-0e00afdf500081a0d', #May need to change this line
               pars_policies=('AmazonS3FullAccess',))

In [13]:
result_futures2 = knot2.map(all_maps[90:180], starmap=True)

In [16]:
ck.aws.set_region('eu-central-1')

In [17]:
knot3 = ck.Knot(name='download_and_track_{}_b{}'.format('chad21', start_knot),
               docker_image = my_image,
               memory = 144000,
               resource_type = "SPOT",
               bid_percentage = 100,
               #image_id = 'ami-0e00afdf500081a0d', #May need to change this line
               pars_policies=('AmazonS3FullAccess',))

In [18]:
result_futures3 = knot3.map(all_maps[0:90], starmap=True)

In [25]:
ck.aws.set_region('us-west-1')

In [28]:
knot4 = ck.Knot(name='download_and_track_{}_b{}'.format('chad22', start_knot),
               docker_image = my_image,
               memory = 144000,
               resource_type = "SPOT",
               bid_percentage = 100,
               #image_id = 'ami-0e00afdf500081a0d', #May need to change this line
               pars_policies=('AmazonS3FullAccess',))

In [29]:
result_futures4 = knot4.map(all_maps2, starmap=True)

In [32]:
knot4.clobber()

In [34]:
knot5 = ck.Knot(name='download_and_track_{}_b{}'.format('chad25', start_knot),
               docker_image = my_image,
               memory = 144000,
               resource_type = "SPOT",
               bid_percentage = 100,
               #image_id = 'ami-0e00afdf500081a0d', #May need to change this line
               pars_policies=('AmazonS3FullAccess',))

In [35]:
result_futures5 = knot5.map(all_maps2, starmap=True)

In [None]:
result_futures = knot.map(all_maps2, starmap=True)

In [None]:
knot.clobber()

In [None]:
tparams2 = {'radius': 3.5, 'threshold': 0.0, 'do_median_filtering': False,
           'quality': 10.0, 'xdims': (0, 511), 'ydims': (1, 511),
           'median_intensity': 300.0, 'snr': 0.0, 'linking_max_distance': 12.0,
           'gap_closing_max_distance': 18.0, 'max_frame_gap': 3,
           'track_duration': 20.0}

In [30]:
missing = []
all_maps2 = []
import boto3
import botocore

s3 = boto3.resource('s3')


for name in to_track:
    try:
        s3.Object(bucket, '{}/features_{}.csv'.format(remote_folder, name)).load()
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "404":
            missing.append(name)
            all_maps2.append((name, remote_folder, bucket, (512, 512), 651, 4, 4))
        else:
            print('Something else has gone wrong')

In [31]:
all_maps2

[('PEG_P2_S2_XY12',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PEG_P2_S2_XY13',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PS_P2_S2_XY07',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PS_P3_S1_XY06',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PS_P3_S1_XY09',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4),
 ('PS_P3_S2_XY04',
  'Tissue_Studies/09_11_18_Regional',
  'ccurtis.data',
  (512, 512),
  651,
  4,
  4)]

In [None]:
missing

In [None]:
import diff_classifier.aws as aws

In [None]:
old_folder = 'Gel_Studies/08_14_18_gel_validation/old_msds2'

for name in missing:
    filename = 'Traj_{}.csv'.format(name)
    aws.download_s3('{}/{}'.format(old_folder, filename), filename, bucket_name=bucket)
    aws.upload_s3(filename, '{}/{}'.format(remote_folder, filename), bucket_name=bucket)

Users can monitor the progress of their job in the Batch interface. Once the code is complete, users should clobber their knot to make sure that all AWS resources are removed.

In [None]:
knot.clobber()

## Downstream analysis and visualization

The knotlet.assemble_msds function (which can also potentially be submitted to Cloudknot as well for large jobs) calculates the mean squared displacements and trajectory features from the raw trajectory csv files found from the Cloudknot submission. It accesses them from the S3 bucket to which they were saved.

In [None]:
for prefix in to_track[5:7]:
    kn.assemble_msds(prefix, remote_folder, bucket='ccurtis.data')
    print('Successfully output msds for {}'.format(prefix))

In [None]:
for prefix in to_track[5:7]:
    kn.assemble_msds(prefix, remote_folder, bucket='ccurtis.data')
    print('Successfully output msds for {}'.format(prefix))

In [None]:
all_maps2 = []
for prefix in to_track:    
    all_maps2.append((prefix, remote_folder, bucket, 'regress100.obj',
                     4, 4, (512, 512), tparams))

In [None]:
knot = ck.Knot(name='download_and_track_{}_b{}'.format('chad', start_knot),
               docker_image = my_image,
               memory = 16000,
               resource_type = "SPOT",
               bid_percentage = 100,
               #image_id = 'ami-0e00afdf500081a0d', #May need to change this line
               pars_policies=('AmazonS3FullAccess',))

Diff_classifier includes some useful imaging tools as well, including checking trajectories, plotting heatmaps of trajectory features, distributions of diffusion coefficients, and MSD plots.

In [None]:
import diff_classifier.heatmaps as hm
import diff_classifier.aws as aws

In [None]:
prefix = to_track[1]

msds = 'msd_{}.csv'.format(prefix)
feat = 'features_{}.csv'.format(prefix)
aws.download_s3('{}/{}'.format(remote_folder, msds), msds, bucket_name=bucket)
aws.download_s3('{}/{}'.format(remote_folder, feat), feat, bucket_name=bucket)

In [None]:
hm.plot_trajectories(prefix, upload=False, figsize=(8, 8))

In [None]:
geomean, geoSEM = hm.plot_individual_msds(prefix, x_range=10, y_range=300, umppx=1, fps=1, upload=False)

In [None]:
hm.plot_heatmap(prefix, upload=False)

In [None]:
hm.plot_particles_in_frame(prefix, y_range=6000, upload=False)

In [None]:
kn.assemble_msds(to_track[-1], remote_folder, bucket='ccurtis.data')