# Description
This notebook extract useful metrics from the plaques dataset or some part of
it. It relies on trackmate results to extract this information. Three metrics
are considered: number of infected cells, radius of the plaque, and radial
velocity of infected cells. Only considering infected cells, because these are
the only cells visible in the microscopy images at hand. Metrics are modeled
using a mean and standard deviation for each point in time that there is image
for it. Time is in hours post infection (hpi). Results are saved into a csv file
to be used as a reference to evaluate simulations in infectio.

# Implementation

## Part 1: choose set of files

### For WR virus    

Note: **Quick Fix**

The dataset needs to be changed to be used here. Because in many of the
of the experiments, two or more initial spots are infected and therefore center
and radii computations are not correct. As a quick fix for now, I am considering
only a few of the experiements that consist only of one plaque. For M061-WR
these are: 6, 8, 9, 11, 13, 14.

In [46]:
import os
import pandas as pd
import numpy as np

dataset_name = 'M061_WR_handpicked'
CSV_ROOT = "../dataset/plaques-ashkan/trackmate_output/dVGF_dF11_viruses/M061"
# include only files in range of 1 to 15 in their names, these are basic WR files
# Only consider quick fix files
single_plaque_files = [6, 8, 9, 11, 13, 14]
csv_files = [f for f in os.listdir(CSV_ROOT) if f.endswith(".csv") and int(f.split("-")[0]) in single_plaque_files]
csv_files

['8-spots.csv',
 '11-spots.csv',
 '9-spots.csv',
 '13-spots.csv',
 '6-spots.csv',
 '14-spots.csv']

### For dVGF/dF11

In [35]:
import os
import pandas as pd
import numpy as np

dataset_name = 'M061_dVGFdF11_handpicked'
CSV_ROOT = "../dataset/plaques-ashkan/trackmate_output/dVGF_dF11_viruses/M061"
# include only files in range of 46 to 60 in their names, these are dVGF/dF11 files
# Only consider quick fix files
single_plaque_files = [46, 48, 49, 50, 51, 52, 53, 55, 57, 58, 60]  # exclude 54, 56, 59, also 47 because first few frames not enough (less than 3) spots
csv_files = [f for f in os.listdir(CSV_ROOT) if f.endswith(".csv") and int(f.split("-")[0]) in single_plaque_files]
csv_files

['55-spots.csv',
 '60-spots.csv',
 '58-spots.csv',
 '53-spots.csv',
 '46-spots.csv',
 '52-spots.csv',
 '57-spots.csv',
 '51-spots.csv',
 '48-spots.csv',
 '50-spots.csv',
 '49-spots.csv']

## Part 2: add the time stamps of the time series data

In [47]:
# Because the imaging of the dataset starts with 20 h.p.i and ends 48 hpi with
# 10 minute intervals
time_stamps = [round(x, 2) for x in np.linspace(20.0, 48.0, 169).tolist()]
refdf = pd.DataFrame({'t': time_stamps})

print(refdf)

         t
0    20.00
1    20.17
2    20.33
3    20.50
4    20.67
..     ...
164  47.33
165  47.50
166  47.67
167  47.83
168  48.00

[169 rows x 1 columns]


## Part 3: infected count metrics

In [48]:
unique_track_id_counts = []

for file in csv_files:
    df = pd.read_csv(os.path.join(CSV_ROOT, file), skiprows=[1, 2, 3], low_memory=False)
    unique_counts = df.groupby('FRAME')['TRACK_ID'].nunique()
    unique_track_id_counts.append(unique_counts)

all_counts_df = pd.concat(unique_track_id_counts, axis=1)

# Calculate average and standard deviation for each frame
average_counts = all_counts_df.mean(axis=1)
std_dev_counts = all_counts_df.std(axis=1)

print(average_counts, std_dev_counts)

FRAME
0         7.833333
1         8.500000
2         9.166667
3         9.500000
4         9.666667
          ...     
164    1416.000000
165    1432.666667
166    1450.500000
167    1472.666667
168    1450.333333
Length: 169, dtype: float64 FRAME
0        3.920034
1        4.636809
2        4.708149
3        5.009990
4        5.202563
          ...    
164    428.224707
165    430.744781
166    435.876473
167    445.093099
168    435.813110
Length: 169, dtype: float64


In [49]:
# Adding count values to refdf
refdf['inf-count-mean'] = average_counts
refdf['inf-count-std'] = std_dev_counts
refdf

Unnamed: 0,t,inf-count-mean,inf-count-std
0,20.00,7.833333,3.920034
1,20.17,8.500000,4.636809
2,20.33,9.166667,4.708149
3,20.50,9.500000,5.009990
4,20.67,9.666667,5.202563
...,...,...,...
164,47.33,1416.000000,428.224707
165,47.50,1432.666667,430.744781
166,47.67,1450.500000,435.876473
167,47.83,1472.666667,445.093099


## Part 4: radius reference metrics

In [50]:
from scipy.spatial import ConvexHull

def get_convex_radius(points):
    if len(points) < 3:
        return 0
    hull = ConvexHull(points)
    boundary_points = points[hull.vertices]
    center = np.mean(boundary_points, axis=0)
    radii = (boundary_points - center)
    radii = np.linalg.norm(radii, axis=1)
    return radii.mean()

In [51]:
all_radii_stats = []

for file in csv_files:
    df = pd.read_csv(os.path.join(CSV_ROOT, file), skiprows=[1, 2, 3], low_memory=False)
    points_vs_frame = df.groupby('FRAME').apply(lambda x: x[['POSITION_X', 'POSITION_Y']].values)
    radii_vs_frame = [get_convex_radius(points) for points in points_vs_frame]
    # remove any zeros in there
    radii_vs_frame = [r for r in radii_vs_frame if r != 0]
    all_radii_stats.append(radii_vs_frame)

In [52]:
array_of_lists = np.array(all_radii_stats) * 3.1746  # because this dataset, both pixel width and height are this number in microns
mean_radii = array_of_lists.mean(axis=0)
std_radii = array_of_lists.std(axis=0)

refdf['radius-mean(um)'] = mean_radii
refdf['radius-std(um)'] = std_radii

refdf

Unnamed: 0,t,inf-count-mean,inf-count-std,radius-mean(um),radius-std(um)
0,20.00,7.833333,3.920034,174.592976,56.883914
1,20.17,8.500000,4.636809,180.940681,62.234864
2,20.33,9.166667,4.708149,192.655702,75.722809
3,20.50,9.500000,5.009990,201.052841,75.361281
4,20.67,9.666667,5.202563,203.102415,77.412856
...,...,...,...,...,...
164,47.33,1416.000000,428.224707,2668.476926,309.684605
165,47.50,1432.666667,430.744781,2782.897672,233.847700
166,47.67,1450.500000,435.876473,2841.379969,215.759468
167,47.83,1472.666667,445.093099,2841.155109,207.897807


## Part 5: Radial velocity reference metrics

**TODO:**

The definition is not very clear. What is meant by maximum radial component of
the whole track of a cell. For now leave this out.

## Part 6: Saving the reference metrics

In [53]:
save_path = os.path.join('..', 'output', 'reference_metrics_for_'+ dataset_name + '.csv')
refdf.to_csv(save_path, index=False)