**Purpose:** Investigate synchrony between the cell division and metabolic cycles.

This is inspired by fig 1 of Bieler et al. (2014), show below.  It shows the relationship between the cell division and circadian cycles.

![title](msb0010-0739-f1.jpg)

Figure 1. Circadian and cell cycle oscillators are tightly synchronized in NIH3T3 cells
- A. Single-cell time traces showing the circadian YFP signal (black, identified maxima in blue denoted as p), together with cell division events (nuclear envelope breakdown, red, denoted as d). The top trace is typical and shows three divisions before the circadian peaks, the second trace shows an early first division.
- B. Raster plot showing 3,160 traces (with at least two circadian peaks) aligned on the second circadian peak (blue), and sorted according to the interval between the first and second circadian peaks. Divisions (red) show a clear tendency to occur, on average, 5 h before the circadian peaks. A sparse group of early division events associated usually with longer circadian intervals is also visible.
- C. Division times measured with respect to the subsequent circadian peak show a unimodal distribution centered at −5 h. Inset: longer circadian intervals correlate with mitosis occurring, on average, closer to the next peak (also visible in B).
- D. Circadian phases at division (normalized division times) show a unimodal distribution. Inset: longer circadian intervals correlate with mitosis occurring at later circadian phases.

**Aims:**
- Import flavin signals and birth events from a population of same-strain cells subject to the same nutrient conditions.
    - Start from the more reliable BY4741 or FY4 time series first, to iron out any issues.
    - Real purpose is to do this with the mutants, especially with the `swe1_Del` from the Causton lab.  It's only going to be a matter of switching the dataset file.
- Process data: cut time series to duration of interest (births matrix should be cut too), detrend flavin signals.
- Align time series to the first birth event.
- Locate the subsequent metabolic cycle.
    - This may be through finding a peak, using the appropriate tolerance.
    - Fitting an autoregressive model or Gaussian process may be helpful.
    - I will also need to identify the troughs so that I only get one metabolic cycle (see heatmap in next main item).
    - I expect this to be the most technically difficult.  If all else fails, plotting a heatmap with birth times overlaid 'for now' may still help produce insight.
- Plot a heatmap that shows:
    - Where the second birth event is in relation to the first birth event.
    - Where the subsequent metabolic cycle is in relation to the first metabolic cycle.
- Plot histograms that show the distributions of:
    - Time differences between first and second birth events.
    - Time differences between first birth event and peak of metabolic cycle.

**Paradigms:**
- Use `aliby` data structures, i.e. `pandas` `DataFrames` with multi-indexing.
    - Births has only been recently (2022-01-18) added to the `DataFrame` output.  Need to look at how it is done there first, and try to replicate.
- Use `postprocessor` processes e.g. `fft` (Fourier transform), `autoreg` (autoregressive model -- add methods as appropriate).
- Ultimate goal to put all the cells together in a script to put in `skeletons`, but some people prioritise having plots fast over having clean code.

In [None]:
import PyQt5
%matplotlib qt

**When going through this again: if I repeat lines of code, put them in a function to make life easier.**

# Import data

**Important note: must be mindful of how I'm dealing with TWO sets of signals -- flavin and births.  Potentially THREE if I include mCherry.**

**Current code, copied from `svm_sandbox.ipynb`, deals with ONE set of signals; having two sets is another can of worms.  Name the data variables and manage them wisely.**

Note: lots of redundancy & spaghettification introduced with mCherry signals.  In particular, a _lot_ of try-except structures that wouldn't be there if this is all in one pipeline with a good logical flow.  Organise this later.

In [None]:
import numpy as np
import pandas as pd
import csv

# PARAMETERS
filename_prefix = './data/arin/Omero19979_'
#filename_prefix = './data/arin/Omero20016_'
#

# Import flavin signals
signal_flavin = pd.read_csv(filename_prefix+'flavin.csv')
signal_flavin.replace(0, np.nan, inplace=True) # because the CSV is constructed like that :/

# Import birth signals
signal_births = pd.read_csv(filename_prefix+'births.csv')
#signal_births.replace(0, np.nan, inplace=True)

def convert_df_to_aliby(
    signal,
    strainlookup_df,
):
    # Import look-up table for strains (would prefer to directly CSV -> dict)
    strainlookup_dict = dict(zip(strainlookup_df.position, strainlookup_df.strain))
    
    # Positions -> Strain (more informative)
    signal = signal.replace({'position': strainlookup_dict})
    signal.rename(columns = {"position": "strain"}, inplace = True)
    signal = signal.drop(['distfromcentre'], axis = 1)

    # Convert to multi-index dataframe
    signal_temp = signal.iloc[:,2:]
    multiindex = pd.MultiIndex.from_frame(signal[['strain', 'cellID']])
    signal = pd.DataFrame(signal_temp.to_numpy(),
                          index = multiindex)
    
    return signal

strainlookup_df = pd.read_csv(filename_prefix+'strains.csv')
signal_flavin = convert_df_to_aliby(signal_flavin, strainlookup_df)
signal_births = convert_df_to_aliby(signal_births, strainlookup_df)

# If there is mCherry, import that too
try:
    signal_mCherry = pd.read_csv(filename_prefix+'mCherry.csv')
    signal_mCherry.replace(0, np.nan, inplace=True)
    signal_mCherry = convert_df_to_aliby(signal_mCherry, strainlookup_df)
except:
    print('No mCherry signal')

In [None]:
signal_flavin

In [None]:
signal_mCherry

In [None]:
signal_births

# Choose a list of cells as working data

List strains

In [None]:
signal_flavin.index.get_level_values(0).unique().to_list()

Define `signal_flavin_wd` as working data

In [None]:
strain = 'tsa1_Del_tsa2_Del'

signal_flavin_wd = signal_flavin.loc[strain]
signal_births_wd = signal_births.loc[strain]

try:
    signal_mCherry_wd = signal_mCherry.loc[strain]
except:
    print('No mCherry signal')
    
# REPLACE this with proper handling later!
# If df is full of NaNs, re-define as empty so that subsequent try-expect routines
# return errors.
if np.sum(~signal_mCherry_wd.isnull().values) == 0:
    signal_mCherry_wd = []

In [None]:
signal_mCherry_wd

# Processing time series

## Range

Chop up time series according to `interval_start` and `interval_end`, then remove cells that have NaNs.  Print number of cells.

In [None]:
# PARAMETERS
interval_start = 25
interval_end = 168
#

signal_flavin_processed = signal_flavin_wd.iloc[:, interval_start:interval_end].dropna()
signal_births_processed = signal_births_wd.iloc[:, interval_start:interval_end].dropna() # don't expect NaN here, but just for consistency

try:
    signal_mCherry_processed = signal_mCherry_wd.iloc[:, interval_start:interval_end].dropna()
except:
    print('No mCherry signal')

# Note: number of rows from the two DataFrames may be different from now on.
# This is expected because the flavin DataFrame has NaNs but the births doesn't.
# This is okay because matching the two will be based on the 'cellID' index, which never changes.

## Detrend

Using sliding window

In [None]:
# POTENTIAL ISSUE: This removes some time points.  Need to find a way to re-align births so that it makes sense.
# I expect that there is a constant shift that is a function of the size of the sliding window.
# ----> seems like no re-aligning needed???  After a cursory look at some time series & where their peaks should be.

import matplotlib.pyplot as plt
import seaborn as sns

# PARAMETERS
window = 45
#

fig, ax = plt.subplots()
sns.heatmap(signal_flavin_processed)
plt.title('Before detrending')
plt.show()

def moving_average(input_timeseries,
                  window = 3):
    processed_timeseries = np.cumsum(input_timeseries, dtype=float)
    processed_timeseries[window:] = processed_timeseries[window:] - processed_timeseries[:-window]
    return processed_timeseries[window - 1 :] /  window

def detrend(signal, window):
    signal = signal.div(signal.mean(axis = 1), axis = 0)
    signal_movavg = signal.apply(lambda x: pd.Series(moving_average(x.values, window)), axis = 1)
    signal_norm = signal.iloc(axis = 1)[window//2: -window//2] / signal_movavg.iloc[:,0:signal_movavg.shape[1]-1].values
    return signal_norm

signal_flavin_processed = detrend(signal_flavin_processed, window)

fig, ax = plt.subplots()
sns.heatmap(signal_flavin_processed)
plt.title('After detrending')
plt.show()

# Re-align births because some time points were removed/shifted
# Doing it this way so that column headers are consistent with the (modified) flavin DataFrame
signal_births_processed = signal_births_processed.iloc(axis=1)[window//2: -window//2]

try:
    signal_mCherry_processed = detrend(signal_mCherry_processed, window)
except:
    print('No mCherry signal')

Plot heatmap before any alignment

**(I really should have heatmap as a function...)**

In [None]:
from sklearn.preprocessing import StandardScaler
import matplotlib.cm as cm

# Produce masked array for births
births_array = signal_births_processed.loc[signal_flavin_processed.index].to_numpy()
births_heatmap_mask = np.ma.masked_where(births_array == 0, births_array)

# Scale flavin signals
flavin_array = signal_flavin_processed.to_numpy()
scaler = StandardScaler().fit(flavin_array.transpose())
signal_flavin_scaled = scaler.transform(flavin_array.transpose())
signal_flavin_scaled = signal_flavin_scaled.transpose()

# Define horizontal axis ticks and labels
sampling_period = 5
xtick_step = 60

xtick_min = 0
xtick_max = sampling_period * signal_flavin_scaled.shape[1]
xticklabels = np.arange(xtick_min, xtick_max, xtick_step)
xticks = [
    int(
        np.where(
            (sampling_period * np.arange(signal_flavin_scaled.shape[1])) == label
        )[0].item()
    )
    for label in xticklabels
]

flavin_heatmap = plt.imshow(
    signal_flavin_scaled,
    cmap = cm.RdBu_r,
    interpolation='none',
)
births_heatmap = plt.imshow(
    births_heatmap_mask,
    interpolation='none',
)
plt.colorbar(flavin_heatmap, label='normalised flavin fluorescence (AU)')

# Labelling
plt.xticks(xticks, xticklabels)
plt.xlabel('Time (min)')
plt.ylabel('Cell')
plt.title(strain)
plt.show()

In [None]:
plt.plot(signal_flavin_scaled[6])

In [None]:
# Plot one flavin trace with births
cell_index = 1

time_axis = np.array(signal_births_processed.columns)
birth_mask = signal_births_processed.loc[cell_index].to_numpy().astype(bool)

plt.plot(signal_flavin_processed.columns,
    signal_flavin_processed.loc[cell_index])
for birth_time in time_axis[birth_mask]:
    plt.axvline(birth_time, color='k', linestyle='--')
plt.xlabel('Time (min)')
plt.ylabel('Normalised flavin fluorescence (AU)')
plt.title('Representative flavin signal with birth times indicated')

# Alignment

## To first birth

Save dataframes (I plan to remove this step later...)

In [None]:
signal_flavin_save = signal_flavin_processed
signal_births_save = signal_births_processed

try:
    signal_mCherry_save = signal_mCherry_processed
except:
    print('No mCherry signal')

### Align

In [None]:
signal_births_aligned = signal_births_save
signal_flavin_aligned = signal_flavin_save

common_index = signal_flavin_aligned.index

try:
    signal_mCherry_aligned = signal_mCherry_save
    common_index = common_index.intersection(signal_mCherry_aligned.index)
except:
    print('No mCherry signal 1')

births_shifted_rows = []
flavin_shifted_rows = []
mCherry_shifted_rows = []
# Match flavin and birth signals by cellID (shared and not changed)
for cellID in common_index:
    # Identify first birth and define shift
    birth_locs = np.where(signal_births_processed.loc[cellID].to_numpy() == 1)[0]
    if birth_locs.any():
        shift = birth_locs[0]
    else:
        shift = 0
    # Make first birth the first time point
    # When shifted, the rest of the df is NaNs
    births_shifted_rows.append(
        signal_births_aligned.loc[[cellID]].shift(periods = -shift, axis = 'columns')
    )
    # Shift flavin signals accordingly
    flavin_shifted_rows.append(
        signal_flavin_aligned.loc[[cellID]].shift(periods = -shift, axis = 'columns')
    )
    # If exists, shift mCherry signal accordingly
    try:
        mCherry_shifted_rows.append(
            signal_mCherry_aligned.loc[[cellID]].shift(periods = -shift, axis = 'columns')
        )
    except:
        print('No mCherry signal 2')
# Re-construct dataframes
signal_births_aligned = pd.concat(births_shifted_rows, ignore_index = True)
signal_births_aligned.set_index(common_index, inplace = True)

signal_flavin_aligned = pd.concat(flavin_shifted_rows, ignore_index = True)
signal_flavin_aligned.set_index(common_index, inplace = True)

try:
    signal_mCherry_aligned = pd.concat(mCherry_shifted_rows, ignore_index = True)
    signal_mCherry_aligned.set_index(common_index, inplace = True)
except:
    print('No mCherry signal 3')

In [None]:
signal_births_aligned

Remove cells without births

In [None]:
signal_births_drop = signal_births_processed.loc[signal_births_processed.sum(axis = 1) != 0, :]

signal_births_aligned = signal_births_aligned.loc[
    signal_births_drop.index.intersection(signal_births_aligned.index)
]
signal_flavin_aligned = signal_flavin_aligned.loc[
    signal_births_drop.index.intersection(signal_flavin_aligned.index)
]

try:
    signal_mCherry_aligned = signal_mCherry_aligned.loc[
        signal_births_drop.index.intersection(signal_mCherry_aligned.index)
    ]
except:
    print('No mCherry signal')

Arrange cells by interval between first and second birth

In [None]:
# Find intervals between first and second births
second_births_intervals = []
for cellID in signal_births_aligned.index:
    birth_locs = np.where(signal_births_aligned.loc[cellID].to_numpy() == 1)[0]
    birth_locs = np.delete(birth_locs, 0) # the first element of this list is always zero
    if birth_locs.any():
        second_births_intervals.append(birth_locs[0]) # this is when the 2nd birth is in relation to the 1st
    else:
        second_births_intervals.append(None)
# Absence of second birth represented by nan
second_births_intervals = np.array(second_births_intervals, dtype=float)

# Rearrange order of rows in dataframes according to time of second birth
def rearrange_by_sorted_list(
    df,
    my_list,
):
    return df.reindex(df.index[np.argsort(my_list)].to_list())

signal_births_aligned = rearrange_by_sorted_list(signal_births_aligned, second_births_intervals)
signal_flavin_aligned = rearrange_by_sorted_list(signal_flavin_aligned, second_births_intervals)
try:
    signal_mCherry_aligned = rearrange_by_sorted_list(signal_mCherry_aligned, second_births_intervals)
except:
    print('No mCherry signal')

### Visualise in heatmaps

Align flavin time series by births

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from sklearn.preprocessing import StandardScaler

# Apparently matplotlib.pyplot.imshow() is better?
# https://stackoverflow.com/questions/44012488/overlay-two-heatmaps-in-seaborn-one-being-frames-around-cells-in-the-other-fro

# Produce masked array for births
births_array = signal_births_aligned.to_numpy()
births_heatmap_mask = np.ma.masked_where(births_array == 0, births_array)

# Scale flavin signals
flavin_array = signal_flavin_aligned.to_numpy()
scaler = StandardScaler().fit(flavin_array.transpose())
signal_flavin_aligned_scaled = scaler.transform(flavin_array.transpose())
signal_flavin_aligned_scaled = signal_flavin_aligned_scaled.transpose()

# Define horizontal axis ticks and labels
sampling_period = 5
xtick_step = 60

xtick_min = 0
xtick_max = sampling_period * signal_flavin_aligned_scaled.shape[1]
xticklabels = np.arange(xtick_min, xtick_max, xtick_step)
xticks = [
    int(
        np.where(
            (sampling_period * np.arange(signal_flavin_aligned_scaled.shape[1])) == label
        )[0].item()
    )
    for label in xticklabels
]

# Superimpose flavin and births heatmaps
flavin_heatmap = plt.imshow(
    signal_flavin_aligned_scaled,
    cmap = cm.RdBu_r,
    interpolation='none',
)
births_heatmap = plt.imshow(
    births_heatmap_mask,
    interpolation='none',
)
plt.colorbar(flavin_heatmap, label='normalised flavin fluorescence (AU)')

# Labelling
plt.xticks(xticks, xticklabels)
plt.xlabel('Time (min)')
plt.ylabel('Cell')
plt.title(strain)
plt.show()

mCherry

In [None]:
# Scale mCherry signals
mCherry_array = signal_mCherry_aligned.to_numpy()
scaler = StandardScaler().fit(mCherry_array.transpose())
signal_mCherry_aligned_scaled = scaler.transform(mCherry_array.transpose())
signal_mCherry_aligned_scaled = signal_mCherry_aligned_scaled.transpose()

# Superimpose mCherry and births heatmaps
mCherry_heatmap = plt.imshow(
    signal_mCherry_aligned_scaled,
    cmap = cm.RdBu_r,
    interpolation='none',
)
births_heatmap = plt.imshow(
    births_heatmap_mask,
    interpolation='none',
)

# Labelling
plt.xticks(xticks, xticklabels)
plt.xlabel('Time (min)')
plt.ylabel('Cell')
plt.title(strain)
plt.show()

## To metabolic cycle peaks

### Locate metabolic cycle peaks

Sandbox: `scipy.signal.find_peaks` for one signal

In [None]:
# Isn't this just basically brute-forcing/tuning parameters until it makes sense?
# I'm basically training a neural net (i.e. my brain) on finding parameters
# for these peaks, and then applying these parameters to other time series,
# hoping it sticks.

# Would it make sense to:
# (1) use a classifier to determine where this is oscillating in the first place, then
# (2a) use a smoothing method (e.g. Savitzky-Golay) then find_peaks, or
# (2b) use a model-fitting method (e.g AR)
# (b) use a model-fitting or smoothing method?  Savitzky-Golay before find_peaks would help.

from scipy.signal import find_peaks

test_time_series = signal_flavin_processed.iloc[4].to_numpy() # is a good one
# Otherwise:
test_time_series = signal_flavin_processed.iloc[1].to_numpy()

# For now, these parameters work well for the obviously oscillating BY4741 time series.
# They don't work as well for the non-oscillating ones, but maybe there's no point,
# especially given that we're good at identifying which ones are oscillating.
peak_indices = find_peaks(
    test_time_series,
    distance = 10,
    prominence = 0.035,
)[0]

plt.plot(test_time_series)
plt.scatter(
    peak_indices,
    np.take(test_time_series, peak_indices),
)
plt.xlabel('Time point')
plt.ylabel('Normalised flavin fluorescence (AU)')
plt.show()

`scipy.signal.find_peaks` for a whole dataframe, produce a binary matrix stored in a new dataframe

(This can become a post-process later)

In [None]:
from scipy.signal import find_peaks

def find_peaks_mask(
    timeseries,
    distance,
    prominence,
):
    peak_indices = find_peaks(timeseries, distance = distance, prominence = prominence)[0]
    mask = np.zeros(len(timeseries), dtype = int)
    mask[peak_indices] = 1
    
    return mask

signal_flavin_peaks = signal_flavin_processed.apply(
    lambda x: find_peaks_mask(x, distance = 10, prominence = 0.035),
    axis = 1, result_type = 'expand'
)
signal_flavin_peaks.columns = signal_flavin_processed.columns

### Align

In [None]:
signal_peaks_save = signal_flavin_peaks
signal_flavin_save = signal_flavin_processed

In [None]:
signal_peaks_aligned = signal_peaks_save
signal_flavin_aligned = signal_flavin_save

common_index = signal_flavin_aligned.index

peaks_shifted_rows = []
flavin_shifted_rows = []
# Match flavin and peak signals by cellID (shared and not changed)
for cellID in common_index:
    # Identify first peak and define shift
    peak_locs = np.where(signal_peaks_aligned.loc[cellID].to_numpy() == 1)[0]
    if peak_locs.any():
        shift = peak_locs[0]
    else:
        shift = 0
    # Make first peak the first time point
    # When shifted, the rest of the df is NaNs
    peaks_shifted_rows.append(
        signal_peaks_aligned.loc[[cellID]].shift(periods = -shift, axis = 'columns')
    )
    # Shift flavin signals accordingly
    flavin_shifted_rows.append(
        signal_flavin_aligned.loc[[cellID]].shift(periods = -shift, axis = 'columns')
    )

# Re-construct dataframes
signal_peaks_aligned = pd.concat(peaks_shifted_rows, ignore_index = True)
signal_peaks_aligned.set_index(common_index, inplace = True)

signal_flavin_aligned = pd.concat(flavin_shifted_rows, ignore_index = True)
signal_flavin_aligned.set_index(common_index, inplace = True)

Arrange cells by interval between first and second flavin cycle peak

In [None]:
# Find intervals between first and second peaks
second_peaks_intervals = []
for cellID in signal_peaks_aligned.index:
    peak_locs = np.where(signal_peaks_aligned.loc[cellID].to_numpy() == 1)[0]
    peak_locs = np.delete(peak_locs, 0) # the first element of this list is always zero
    if peak_locs.any():
        second_peaks_intervals.append(peak_locs[0]) # this is when the 2nd birth is in relation to the 1st
    else:
        second_peaks_intervals.append(None)
# Absence of second peak represented by nan
second_peaks_intervals = np.array(second_peaks_intervals, dtype=float)

# Rearrange order of rows in dataframes according to time of second birth
def rearrange_by_sorted_list(
    df,
    my_list,
):
    return df.reindex(df.index[np.argsort(my_list)].to_list())

signal_peaks_aligned = rearrange_by_sorted_list(signal_peaks_aligned, second_peaks_intervals)
signal_flavin_aligned = rearrange_by_sorted_list(signal_flavin_aligned, second_peaks_intervals)

**Visualise:** average flavin time series

NB: the time series here are *not* scaled; they will be scaled in the heatmap.

In [None]:
plt.plot(signal_flavin_aligned.mean(axis = 0))

Scaled, and with error bars

In [None]:
# Scale
flavin_array = signal_flavin_aligned.to_numpy()
scaler = StandardScaler().fit(flavin_array.transpose())
signal_flavin_aligned_scaled = scaler.transform(flavin_array.transpose())
signal_flavin_aligned_scaled = signal_flavin_aligned_scaled.transpose()

# Axes
sampling_period = 5
time_axis = np.array(signal_flavin_aligned.columns) * sampling_period

mean_ts = signal_flavin_aligned_scaled.mean(axis = 0)
stderr = signal_flavin_aligned_scaled.std(axis = 0) / np.sqrt(len(signal_flavin_aligned_scaled))
plt.plot(time_axis, mean_ts, label = 'Mean')
plt.fill_between(time_axis, mean_ts - stderr, mean_ts + stderr, color='lightblue', label = 'Standard error')
plt.xlabel('Time (min)')
plt.ylabel('Normalised flavin fluorescence (AU)')
plt.title('Mean flavin signal, after alignment; ' + str(strain))
plt.legend(loc = 'upper right')

### Visualise in heatmaps

Flavin time series and peaks, **not aligned**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from sklearn.preprocessing import StandardScaler

# Produce masked array for peaks
peaks_array = signal_births_aligned.to_numpy()
peaks_heatmap_mask = np.ma.masked_where(signal_flavin_peaks == 0, signal_flavin_peaks)

# Scale flavin signals
flavin_array = signal_flavin_processed.to_numpy()
scaler = StandardScaler().fit(flavin_array.transpose())
signal_flavin_processed_scaled = scaler.transform(flavin_array.transpose())
signal_flavin_processed_scaled = signal_flavin_processed_scaled.transpose()

# Define horizontal axis ticks and labels
sampling_period = 5
xtick_step = 60

xtick_min = 0
xtick_max = sampling_period * signal_flavin_processed_scaled.shape[1]
xticklabels = np.arange(xtick_min, xtick_max, xtick_step)
xticks = [
    int(
        np.where(
            (sampling_period * np.arange(signal_flavin_processed_scaled.shape[1])) == label
        )[0].item()
    )
    for label in xticklabels
]

# Superimpose flavin and births heatmaps
flavin_heatmap = plt.imshow(
    signal_flavin_processed_scaled,
    cmap = cm.RdBu_r,
    interpolation='none',
)
peaks_heatmap = plt.imshow(
    peaks_heatmap_mask,
    interpolation='none',
)
plt.colorbar(flavin_heatmap, label='normalised flavin fluorescence (AU)')

# Labelling
plt.xticks(xticks, xticklabels)
plt.xlabel('Time (min)')
plt.ylabel('Cell')
plt.title(strain)
plt.show()

Flavin time series and peaks, **aligned**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from sklearn.preprocessing import StandardScaler

# Produce masked array for peaks
peaks_array = signal_peaks_aligned.to_numpy()
peaks_heatmap_mask = np.ma.masked_where(peaks_array == 0, peaks_array)

# Scale flavin signals
flavin_array = signal_flavin_aligned.to_numpy()
scaler = StandardScaler().fit(flavin_array.transpose())
signal_flavin_aligned_scaled = scaler.transform(flavin_array.transpose())
signal_flavin_aligned_scaled = signal_flavin_aligned_scaled.transpose()

# Define horizontal axis ticks and labels
sampling_period = 5
xtick_step = 60

xtick_min = 0
xtick_max = sampling_period * signal_flavin_aligned_scaled.shape[1]
xticklabels = np.arange(xtick_min, xtick_max, xtick_step)
xticks = [
    int(
        np.where(
            (sampling_period * np.arange(signal_flavin_aligned_scaled.shape[1])) == label
        )[0].item()
    )
    for label in xticklabels
]

# Superimpose flavin and births heatmaps
flavin_heatmap = plt.imshow(
    signal_flavin_aligned_scaled,
    cmap = cm.RdBu_r,
    interpolation='none',
)
peaks_heatmap = plt.imshow(
    peaks_heatmap_mask,
    interpolation='none',
)
plt.colorbar(flavin_heatmap, label='normalised flavin fluorescence (AU)')

# Labelling
plt.xticks(xticks, xticklabels)
plt.xlabel('Time (min)')
plt.ylabel('Cell')
plt.title(strain)
plt.show()

# Plot histograms

## Births

Duration between 1st and 2nd births

In [None]:
sampling_period = 5
binsize = 5

plt.hist(
    second_births_intervals * sampling_period,
    np.arange(0, binsize * (np.nanmax(second_births_intervals * sampling_period)//binsize + 2), binsize)
)
plt.xlabel('Time (min)')
plt.ylabel('Frequency')
plt.title('Distribution of duration between 1st and 2nd births')
plt.show()

All births

In [None]:
df = signal_births_aligned.apply(
    lambda x: np.diff(np.where(np.array(x)>0)),
    axis = 1
)
births_intervals = np.hstack(df.to_list()).ravel()

sampling_period = 5
binsize = 5

plt.hist(
    births_intervals * sampling_period,
    np.arange(0, binsize * (np.nanmax(births_intervals * sampling_period)//binsize + 2), binsize)
)
plt.xlabel('Time (min)')
plt.ylabel('Frequency')
plt.title('Distribution of birth intervals')
plt.show()

## Flavin signal peaks

Duration between 1st and 2nd peaks

In [None]:
sampling_period = 5
binsize = 5

plt.hist(
    second_peaks_intervals * sampling_period,
    np.arange(0, binsize * (np.nanmax(second_peaks_intervals * sampling_period)//binsize + 2), binsize)
)
plt.xlabel('Time (min)')
plt.ylabel('Frequency')
plt.title('Distribution of first flavin cycle lengths')
plt.show()

All peaks

In [None]:
df = signal_flavin_peaks.apply(
    lambda x: np.diff(np.where(np.array(x)>0)),
    axis = 1
)
peaks_intervals = np.hstack(df.to_list()).ravel()

sampling_period = 5
binsize = 5

plt.hist(
    peaks_intervals * sampling_period,
    np.arange(0, binsize * (np.nanmax(peaks_intervals * sampling_period)//binsize + 2), binsize)
)
plt.xlabel('Time (min)')
plt.ylabel('Frequency')
plt.title('Distribution of flavin cycle lengths')
plt.show()

## Overlay

with median

In [None]:
#births_intervals = np.log(births_intervals)
#peaks_intervals = np.log(peaks_intervals)

common_max_duration = np.nanmax(np.concatenate((births_intervals, peaks_intervals)))
sampling_period = 5
binsize = 5
bins = np.arange(0, binsize * ((common_max_duration * sampling_period)//binsize + 2), binsize)

plt.hist(
    peaks_intervals * sampling_period,
    bins,
    alpha = 0.5,
    label = 'Flavin cycle lengths'
)
plt.hist(
    births_intervals * sampling_period,
    bins,
    alpha = 0.5,
    label = 'Cell division cycle lengths'
)
plt.axvline(
    np.median(peaks_intervals * sampling_period),
    color='b',
    alpha = 0.5,
    label = 'Median flavin cell cycle length'
)
plt.axvline(
    np.median(births_intervals * sampling_period),
    color='orange',
    alpha = 0.5,
    label = 'Median cell division cycle length'
)
plt.legend(loc = 'upper right')
plt.xlabel('Time (min)')
plt.ylabel('Frequency')
plt.title('Distribution of cycle lengths')
plt.show()

## Difference between flavin signal peaks and corresponding first births

In [None]:
# Seems technically difficult...
# especially when there's a risk of no 1:1 correspondence due to identification errors
# Need pretty extensive error handing and curation to pull this off...

# OR...
# Align the flavin peak binary matrix by births
# And then get the distribution of where the first peaks are
# This is how Bieler et al. (2014) did it -- they didn't exactly super-curate things.

# Cross-correlation

In [None]:
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate.html

from scipy import signal
from sklearn.preprocessing import StandardScaler

# Choose cell
cellID = 476

# Scale time series (NB: individually!  And I don't know whether it's a good idea)
mCherry_transpose = signal_mCherry_processed.loc[cellID].to_numpy().reshape((-1,1))
flavin_transpose = signal_flavin_processed.loc[cellID].to_numpy().reshape((-1,1))
signal_mCherry_scaled = StandardScaler().fit_transform(mCherry_transpose)
signal_flavin_scaled = StandardScaler().fit_transform(flavin_transpose)

corr = signal.correlate(
    signal_mCherry_scaled,
    signal_flavin_scaled
)
lags = signal.correlation_lags(
    len(signal_mCherry_scaled),
    len(signal_flavin_scaled)
)


fig, axs = plt.subplots(3, 1)
fig.subplots_adjust(hspace = 1)

axs[0].plot(signal_mCherry_scaled)
axs[0].set_title('mCherry, scaled')
axs[0].set_xlabel('Time point')

axs[1].plot(signal_flavin_scaled)
axs[1].set_title('Flavin, scaled')
axs[1].set_xlabel('Time point')

axs[2].plot(lags, corr)
axs[2].spines['left'].set_position('zero')
axs[2].spines['bottom'].set_position('zero')
axs[2].spines['top'].set_color('none')
axs[2].spines['right'].set_color('none')
axs[2].set_title('Cross-correlated signal')
axs[2].set_xlabel('Lag')

In [None]:
# for ref
signal_flavin_processed.index.to_numpy()