# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Method-section:" data-toc-modified-id="Method-section:-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Method section:</a></div><div class="lev2 toc-item"><a href="#Clock-controlled--expression-simulation-based-on-peak-phase" data-toc-modified-id="Clock-controlled--expression-simulation-based-on-peak-phase-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Clock-controlled  expression simulation based on peak phase</a></div><div class="lev2 toc-item"><a href="#Metabolic-overview-posters" data-toc-modified-id="Metabolic-overview-posters-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Metabolic overview posters</a></div><div class="lev3 toc-item"><a href="#Clock-controlled-mRNA-peak-phase" data-toc-modified-id="Clock-controlled-mRNA-peak-phase-121"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Clock-controlled mRNA peak phase</a></div><div class="lev3 toc-item"><a href="#Clock-controlled-protein-peak-phase" data-toc-modified-id="Clock-controlled-protein-peak-phase-122"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Clock-controlled protein peak phase</a></div><div class="lev3 toc-item"><a href="#Number-of-hours-the-protein-peak-lags-behind-the-mRNA-peak" data-toc-modified-id="Number-of-hours-the-protein-peak-lags-behind-the-mRNA-peak-123"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Number of hours the protein peak lags behind the mRNA peak</a></div><div class="lev2 toc-item"><a href="#Metabolic-overview-animation" data-toc-modified-id="Metabolic-overview-animation-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Metabolic overview animation</a></div><div class="lev3 toc-item"><a href="#Clock-controlled-mRNA-expression-simulation" data-toc-modified-id="Clock-controlled-mRNA-expression-simulation-131"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Clock-controlled mRNA expression simulation</a></div><div class="lev3 toc-item"><a href="#Clock-controlled-protein-expression-simulation" data-toc-modified-id="Clock-controlled-protein-expression-simulation-132"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Clock-controlled protein expression simulation</a></div><div class="lev2 toc-item"><a href="#Omics-dashboard" data-toc-modified-id="Omics-dashboard-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Omics dashboard</a></div><div class="lev3 toc-item"><a href="#Alternating-Protein-and-RNA-Oscillations-for-the-Omics-Dashboard-charts" data-toc-modified-id="Alternating-Protein-and-RNA-Oscillations-for-the-Omics-Dashboard-charts-141"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Alternating Protein and RNA Oscillations for the Omics Dashboard charts</a></div><div class="lev3 toc-item"><a href="#Interleaving-Protein-and-RNA-Oscillations-for-the-Omics-Dashboard-charts" data-toc-modified-id="Interleaving-Protein-and-RNA-Oscillations-for-the-Omics-Dashboard-charts-142"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Interleaving Protein and RNA Oscillations for the Omics Dashboard charts</a></div><div class="lev2 toc-item"><a href="#SmartTables" data-toc-modified-id="SmartTables-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>SmartTables</a></div>

# Method section: 

## Clock-controlled  expression simulation based on peak phase


Using the peak phase data, we simulated expression of each clock-controlled mRNA and protein over a 24 hour period starting at DD.  Expression is simulated using arbitrary amplitude units that vary between 0 and 2, with 2 being the "peak phase" ($\theta$) according to  the following function:
 
$$f(\theta; t, T, A, D) = A\cos\left(\frac{2\pi(t-\theta)}{T}\right)+D$$

 
Where the period $T=24$, the amplitude $A$ and the offset $D$ both equal $1$. Time $t$ is simulated every 2 hours from DD at 0h, to 22h. 

In [32]:
%matplotlib notebook
import pandas as pd
import os, numpy as np
from math import pi
from numpy import cos
import matplotlib.pyplot as plt

class PDF(object):
    def __init__(self, pdf, size=(1200,600)):
        self.pdf = pdf
        self.size = size

    def _repr_html_(self):
        return '<iframe src={0} width={1[0]} height={1[1]}></iframe>'.format(self.pdf, self.size)

    def _repr_latex_(self):
        return r'\includegraphics[width=1.0\textwidth]{{{0}}}'.format(self.pdf)

class Animation(object):
    def __init__(self, url, size=(200,200)):
        self.url = url
        self.size = size

    def _repr_html_(self):
        return '<iframe src={0} width={1[0]} height={1[1]}></iframe>'.format(self.url, self.size)

    def _repr_latex_(self):
        return r'\includegraphics[width=1.0\textwidth]{{{0}}}'.format(self.url)

def create_oscillations_from_peak_phase(peak_phase, out, period=24, interval=2, amplitude=1, offset=1):
    t = np.linspace(0,int(period-interval), int(period/interval))
    time = ['{}h'.format(i) for i in range(0,int(period),interval)]
    oscillate = pd.DataFrame(0, index=time, columns = ['{}.7'.format(i) for i in peak_phase.index])
    for gene in peak_phase.index:
        theta = peak_phase.loc[gene,'Phase']
        oscillate['{}.7'.format(gene)] = amplitude*cos(2*pi*(t - theta)/float(period)) + offset
    oscillate.T.to_csv(out,sep='\t',index_label='$Gene')
    return oscillate.T

peakdir = os.path.join('.')
os.listdir(peakdir)

['#Protein_lags_RNA_by.tsv#',
 '.DS_Store',
 '.git',
 '.ipynb_checkpoints',
 'AmineSyn.png',
 'CentralDogmaSubcategories.pdf',
 'dashboard.html.pdf',
 'genes-of-pwys.tsv',
 'genes-of-pwys.tsv~',
 'genes-of-rxns.tsv',
 'genes-of-rxns.tsv~',
 'OmicsDashboard',
 'OmicsDashboardView.png',
 'oscillations.tar.bz2',
 'PathwayCollageColors.tsv',
 'PeakPhaseAnalysis.ipynb',
 'PeakPhaseColors.txt',
 'PeakPhaseColors.txt~',
 'Protein_and_RNA_Peak_phases.tsv',
 'Protein_and_RNA_Peak_phases.tsv~',
 'Protein_lag.pdf',
 'Protein_lag_poster.pdf',
 'Protein_lags_RNA_by.tsv',
 'Protein_lags_RNA_by.tsv~',
 'Protein_oscillations.html~',
 'Protein_Peak_Phase_for_U01.txt',
 'Protein_Peak_Phase_for_U01.txt~',
 'ProteinAndRNAOscillations.tsv',
 'ProteinAndRNAOscillations12.tsv',
 'ProteinAndRNAOscillations24.tsv',
 'ProteinLagsRNA.pdf',
 'ProteinOscillations.tsv',
 'ProteinOscillations.tsv~',
 'ProteinOscillations120',
 'ProteinOscillations120.tar.bz2',
 'ProteinOscillations150',
 'ProteinOscillations250',
 '

## Metabolic overview posters

* Clock-controlled mRNA peak phase
* Clock-controlled protein peak phase
* Number of hours the protein peak lags behind the mRNA peak
* Citation:

```
S.M. Paley and P.D. Karp.
The Pathway Tools Cellular Overview Diagram and Omics Viewer,
Nucleic Acids Research 34:3771-8 (2006)
```


### Clock-controlled mRNA peak phase

In [24]:
rna_peak = pd.read_table(os.path.join(peakdir,'RNA_Peak_Phase_for_U01.txt'),index_col='ID')
rna_peak.index = ['{}.7'.format(i) for i in rna_peak.index]
PDF('https://cyc.agilebiofoundry.org/PeakPhases/RNAPeakPhase/RNAPeakPhasePortrait.pdf')

### Clock-controlled protein peak phase

In [23]:
protein_peak = pd.read_table(os.path.join(peakdir,'Protein_Peak_Phase_for_U01.txt'),index_col='ID')
protein_peak.index = ['{}.7'.format(i) for i in protein_peak.index]
PDF('https://cyc.agilebiofoundry.org/PeakPhases/ProteinPeakPhase/ProteinPeakPhasePortrait.pdf')

### Number of hours the protein peak lags behind the mRNA peak

In [22]:
prot_and_rna_peak = rna_peak.join(protein_peak, how='outer',lsuffix='_rna', rsuffix='_protein')
#prot_and_rna_peak.index = ['{}.7'.format(i) for i in prot_and_rna_peak.index]
#display(prot_and_rna_peak)
prot_and_rna_peak.to_csv(os.path.join(peakdir, 'Protein_and_RNA_Peak_phases.tsv'),sep='\t',index_label='Gene')
protein_lag = (prot_and_rna_peak['Phase_protein'] - prot_and_rna_peak['Phase_rna']).dropna().apply(lambda x: np.mod(x, 24))
protein_lag.to_csv(os.path.join(peakdir,'Protein_lags_RNA_by.tsv'),sep='\t', index_label='$Gene',header=True)
PDF('https://cyc.agilebiofoundry.org/PeakPhases/ProteinLagsRNA.pdf')


## Metabolic overview animation
 * mRNA - Using the `PwyRNAOscillations` dataset, we overlaid a 24-hour time series of clock-controlled mRNA simulated expression onto the cellular overview. 
 * protein - Using the `PwyProteinOscillations` dataset we generated an animation of the proteins using the NeurosporaCyc Omics Viewer 
 * Citation:

```
Mario Latendresse and Peter D. Karp
Web-based metabolic network visualization with a zooming user interface,
BMC Bioinformatics 12:176, (2011)
```





### Clock-controlled mRNA expression simulation 
* `RNAOscillations.tsv` contains expression data for all clock-controlled mRNA. This dataset is used to generate the mRNA Omics Dashboard charts
* `PwyRNAOscillations.tsv` contains expression data for only clock-controlled mRNA whose product is an enzyme in a known pathway. This dataset is used to generate the mRNA cellular overview animation.

In [16]:
rna_peak = pd.read_table(os.path.join(peakdir, 'RNA_Peak_Phase_for_U01.txt'),index_col='ID')
rna_oscillations = create_oscillations_from_peak_phase( rna_peak, os.path.join(peakdir,'RNAOscillations.tsv'))
pwy_genes = pd.read_table(os.path.join(peakdir,'genes-of-pwys.tsv'))
pwy_rna_oscillations = pwy_genes.join(rna_oscillations,on='$Gene', how='inner')
pwy_rna_oscillations.to_csv(os.path.join(peakdir,'PwyRNAOscillations.tsv'),sep='\t',index=False)
Animation('https://cyc.agilebiofoundry.org/PeakPhases/RNAOscillations150/index.html',size=(1000,1000))

### Clock-controlled protein expression simulation 

* `ProteinOscillations.tsv` contains expression data for all clock-controlled proteins. This dataset is used to generate the protein Omics Dashboard charts
* `PwyProteinOscillations.tsv` contains expression data for only clock-controlled proteins that are enzymes in a known pathway. This dataset is used to generate the protein cellular overview animation.

In [18]:
protein_peak = pd.read_table(os.path.join(peakdir, 'Protein_Peak_Phase_for_U01.txt'),index_col='ID')
protein_oscillations = create_oscillations_from_peak_phase( protein_peak, os.path.join(peakdir,'ProteinOscillations.tsv'))
#display(protein_oscillations)
pwy_genes = pd.read_table(os.path.join(peakdir,'genes-of-pwys.tsv'))
pwy_protein_oscillations = pwy_genes.join(protein_oscillations,on='$Gene', how='inner')
pwy_protein_oscillations.to_csv(os.path.join(peakdir,'PwyProteinOscillations.tsv'),sep='\t',index=False)
Animation('https://cyc.agilebiofoundry.org/PeakPhases/ProteinOscillations150/index.html',size=(1200,600))

## Omics dashboard

For the omics dashboard,
 * Black color represents mRNA expression
 * Red color represents protein expression
 * Each small dot represents the expression of a single gene in the pathway class (Amino Acid synthesis, for example) at a specific time point. 
 * The large dot represents the average amplitude across all genes in that pathway class for that time point.
 * The line connecting the dots represents the "spread" of different expression levels for the genes in the pathway.
 * `RNAandProteinOscillations12.pdf` displays 12 data points per clock-controlled gene, where simulated mRNA and Protein expression is alternated every 2 hours: `0h_mRNA, 2h_protein, 4h_mRNA, 6h_protein,...,20h_mRNA, 22h_protein`.  
 * `RNAandProteinOscillations24.pdf` displays  24 data points per gene, where simulated mRNA and Protein expression is interleaved:  `0h_mRNA, 0h_protein, 2h_mRNA, 2h_protein,..., 22h_mRNA, 22h_protein`
 * The Omics dashboard [[2]](http://academic.oup.com/nar/article/doi/10.1093/nar/gkx910/4508872/The-Omics-Dashboard-for-interactive-exploration-of) can be cited here:
```
Paley, S., Parker, K., Spaulding, A., Tomb, J.-F., O’Maille, P., & Karp, P. D. (2017). The Omics Dashboard for interactive exploration of gene-expression data. Nucleic Acids Research. https://doi.org/10.1093/nar/gkx910
``` 




### Alternating Protein and RNA Oscillations for the Omics Dashboard charts

`RNAandProteinOscillations12.tsv` contains 12 data points per clock-controlled gene, where simulated mRNA and Protein expression is alternated every 2 hours: `0h_mRNA, 2h_protein, 4h_mRNA, 6h_protein,...,20h_mRNA, 22h_protein`

In [21]:
rna_or_protein = {0: 'RNA', 2:'protein'}
protein_and_rna_oscillations[['{}h_{}'.format(t, rna_or_protein[t % 4])  
        for t in range(0,24,2) ]].\
            to_csv(os.path.join(peakdir,'ProteinAndRNAOscillations12.tsv'),sep='\t',index_label='$Gene')
PDF('https://cyc.agilebiofoundry.org/PeakPhases/RNAandProteinOscillationDashboard12.pdf')

### Interleaving Protein and RNA Oscillations for the Omics Dashboard charts

`RNAandProteinOscillations24.tsv` contains 24 data points per gene, where simulated mRNA and Protein expression is interleaved:  `0h_mRNA, 0h_protein, 2h_mRNA, 2h_protein,..., 22h_mRNA, 22h_protein`

In [20]:
protein_and_rna_oscillations = rna_oscillations.join(protein_oscillations,how='inner',lsuffix='_RNA', rsuffix='_protein')
protein_and_rna_oscillations[['{}h_{}'.format(t,rna_or_protein)  for t in range(0,24,2) for rna_or_protein in ['RNA','protein']]].to_csv(os.path.join(peakdir,'ProteinAndRNAOscillations24.tsv'),sep='\t',index_label='$Gene')
PDF('https://cyc.agilebiofoundry.org/PeakPhases/dashboard.html.pdf')

## SmartTables
  * Hours the Protein peak lags behind the RNA peak.
  * Clock-controlled mRNA and Protein Peak Phases for Neurospora
  * 24-hour simulated expression based on peak phase:
      * all clock-controlled mRNA's in *N. crassa*
      * all clock-controlled proteins in *N. crassa*
      * clock-controlled protein that are enzymes in a known *N. crassa* pathway
      * clock-controlled mRNA whose product is an enzyme in a known *N. crassa* pathway
      * `RNAandProteinOscillations12` contains 12 data points per clock-controlled gene, where simulated mRNA and Protein expression is alternated every 2 hours: `0h_mRNA, 2h_protein, 4h_mRNA, 6h_protein,...,20h_mRNA, 22h_protein`
      * `RNAandProteinOscillations24` contains 24 data points per gene, where simulated mRNA and Protein expression is interleaved:  `0h_mRNA, 0h_protein, 2h_mRNA, 2h_protein,..., 22h_mRNA, 22h_protein`
  * Citation:

```
[PTools13] Mike Travers., S.M Paley., J.Shrager., T.A Holland., and Peter Karp
Groups:knowledge spreadsheets for symbolic biocomputing
Database, doi:10.1093/database/bat061 (2013)```


 