# Tempo accepts the following parameters:

- adata (REQUIRED) ([num_cells x num_genes] AnnData object): AnnData object / Scanpy object<br>
- folder_out (REQUIRED) (str): path for results folder<br>
- gene_acrophase_prior_path (REQUIRED) (str): path to gene acrophase prior file, which is a CSV<br>
- core_clock_gene_path (REQUIRED) (str): path to file for core clock genes, which is a plain text file listing the names of each core clock gene on each line<br>
- cell_phase_prior_path (str): path to cell phase prior file, which is a CSV<br>
- reference_gene (str): name of reference core clock gene<br>
- min_gene_prop (float; 0 to 1): minimum proportion of transcripts in pseudobulk that the gene must have<br>
- min_amp (float, positive real): minimum amplitude of genes amplitudes<br>
- max_amp (float, positive real): maximimum amplitude of genes amplitudes<br>
- init_mesor_scale_val (float, positive real): value to initialize variational mesor scale for all genes<br>
- prior_mesor_scale_val (float, positive real): value to set prior mesor scale for all genes<br>
- init_amp_loc_val (float, positive real lying in [min_amp,max_amp]): value to initialize the location of the variational amplitude for all genes<br>
- init_amp_scale_val (float, positive real): number of pseudotrials of the Beta distributoin to initialize the variational amplitude to. Larger values indicate more certainty.<br>
- prior_amp_alpha_val (float, positive real): alpha values of Beta distribution for prior amplitude<br>
- prior_amp_beta_val (float, positive real): beta values of Beta distribution for prior amplitude<br>
- known_cycler_init_shift_95_interval (float in [0,pi]): 95% interval to initialize the acrophase distribution scale, specifically for genes that are known cycling genes<br>
- unknown_cycler_init_shift_95_interval (float in [0,pi]): 95% interval to initialize the acrophase distribution scale, specifically for genes that are not known cycling genes<br>
- known_cycler_prior_shift_95_interval: 95% interval to set acrophase prior distribution scale, specifically for genes that are known cycling genes; note: values in gene_acrophase_prior_path take precedence<br>
- init_clock_Q_prob_alpha (float, positive real): alpha value of Beta distribution for variational gamma (probability gene has non-zero amplitude), specifically for user-supplied clock genes<br>
- init_clock_Q_prob_beta (float, positive real): beta value of Beta distribution for variational gamma (probability gene has non-zero amplitude), specifically for user-supplied clock genes<br>
- init_non_clock_Q_prob_alpha (float, positive real): alpha value of Beta distribution for variational gamma (probability gene has non-zero amplitude), specifically for non-clock genes<br>
- init_non_clock_Q_prob_beta (float, positive real): beta value of Beta distribution for variational gamma (probability gene has non-zero amplitude), specifically for non-clock genes<br>
- prior_clock_Q_prob_alpha (float, positive real): alpha value of Beta distribution for prior gamma (probability gene has non-zero amplitude), specifically for user-supplied clock genes<br>
- prior_clock_Q_prob_beta (float, positive real): beta value of Beta distribution for prior gamma (probability gene has non-zero amplitude), specifically for user-supplied clock genes<br>
- prior_non_clock_Q_prob_alpha (float, positive real): alpha value of Beta distribution for prior gamma (probability gene has non-zero amplitude), specifically for non-clock genes<br>
- prior_non_clock_Q_prob_beta (float, positive real): beta value of Beta distribution for prior gamma (probability gene has non-zero amplitude), specifically for non-clock genes<br>
- use_noninformative_phase_prior (boolean): if true, all cell phase priors are set to noninformative priors. however, if true, note cell phase priors specified by cell_phase_prior_path take precedent. if false, user must supply cell_phase_prior_path.<br>
- use_nb (boolean): if true, uses negative binomial likelihood model for gene transcript counts. if false, uses poisson.<br>
- mean_disp_init_coef (list of floats): initial values to set the log transcript proportion - log dispersion coefficients to (zeta parameterizing function g in the paper supplement) when use_nb is true, since the coefficients are fit using a gradient optimizer<br>
- est_mean_disp_relationship (boolean): if true, optimizes the log transcript proportion - log dispersion coefficients; if false, directly treats the user-supplied coefficients in mean_disp_init_coef as zeta<br>
- mean_disp_log10_prop_bin_marks (list of log10 transformed fractions / proportions): where to set bins of genes log10 proportions to sample genes to estimate zeta (parameters of the global log proportion - log dispersion relationship )<br>
- mean_disp_max_num_genes_per_bin (int, positive): maximum number of genes to sample per bin when estimating zeta (parameters of the globa log proportion - log dispersion relationship )<br>
- hv_std_residual_threshold (float): threshold of pearson residuals of genes variances vs. expected variances (given their means) to restrict highly variable genes to consider as postential cycling genes by the algorithm.<br>
- mu_loc_lr (positive float):  learning rate for variational mesor loc parameter<br>
- mu_log_scale_lr (positive float):  learning rate for variational mesor scale parameter<br>
- A_log_alpha_lr (positive float):  learning rate for variational amplitude alpha parameter<br>
- A_log_beta_lr (positive float):  learning rate for variational amplitude beta parameter<br>
- phi_euclid_loc_lr (positive float):  learning rate for variational acrophase loc parameter<br>
- phi_log_scale_lr (positive float):  learning rate for variational acrophase scale parameter<br>
- Q_prob_log_alpha_lr (positive float):  learning rate for variational non-zero amplitude probability loc parameter<br>
- Q_prob_log_beta_lr (positive float):  learning rate for variational non-zero amplitude probability scale parameter<br>
- num_phase_grid_points (positive int): number of grid points to use to approximate the conditional posterior cell phase distribution<br>
- num_phase_est_cell_samples (positive int): number of monte carlo samples of the cell phases to use to compute the ELBO expectation term when estimates cell phase (step 1 of the algorithm) <br>
- num_phase_est_gene_samples (positive int): number of monte carlo samples of the gene parameters to use to compute the ELBO expectation term when estimates cell phase (step 1 of the algorithm) <br>
- num_harmonic_est_cell_samples (positive int): number of monte carlo samples of the cell phases to use when fitting gene parameters of non-cycling genes in step 2 of the algorithm<br>
- num_harmonic_est_gene_samples (positive int): number of monte carlo samples of the gene parameters to use to compute the ELBO expectation term when fitting gene parameters of non-cycling genes in step 2 of the algorithm <br>
- vi_max_epochs (positive int): maximum number of epochs used to optimize parameters (for either step 1 or 2 of the algorithm)<br>
- vi_print_epoch_loss (boolean): if true, prints the ELBO at each epoch for each step of the algorithm<br>
- vi_improvement_window (int): size of the window of epochs to compare ELBO progress to (i.e. for vi_improvement_window = 10, the mean ELBO in the last 10 epochs is compared to the previous non-overlapping 10 epoch window)<br>
- vi_convergence_criterion (positive float): threshold improvement of current epoch window's mean ELBO to previous epoch window's mean ELBO at which to say the algorithm has converged<br>
- vi_lr_scheduler_patience (positive int): number of epochs of the ELBO getting worse before the scheduler decreases the learning rate<br>
- vi_lr_scheduler_factor (positive float): the multiplicative factor to apply to the current learning rate if the scheduler has "run out of patience"; values < 1 will lead to a decreased learning rate<br>
- vi_batch_size (positive int): cell batch size to complete the objective function<br>
- test_mode (boolean): if true, uses pytorch profilers which can slow down computation<br>
- use_clock_input_only (boolean): if true, algorithm only uses the core clock genes to estimate cell phase (i.e. it only runs Step 1 of the algorithm using the core clock genes, and then the algorithm halts.)<br>
- use_clock_output_only (boolean): if true, the algorithm only uses the core clock genes to compute the expectation of the ELBO in Step 1<br>
- frac_pos_cycler_samples_threshold (float, [0,1]): threshold for the MAP of a gene's non-zero amplitude probability to call them a de novo cycler<br>
- A_loc_pearson_residual_threshold (float): threshold for the difference between a gene's MAP amplitude and expected amplitude (given its mesor) reported in terms of a pearson residual in order to call the gene a de novo cyceler; larger values mean a stricter threshold<br>
- confident_cell_interval_size_threshold (float in [0 to 24]): threshold for a cells' 95% posterior interval for the cell to be considered when computing the expectation of the ELBO in Step 2 (to identify de novo cyclers); this can be used to improve computational efficiency, since cells with high uncertainty will contribute little information to the estimation of gene parameters<br>
- max_num_alg_steps (int): maximum number of times to run Steps 1 and 2 of the algorithm<br>
- opt_phase_est_gene_params (boolean): if False, does not optimize the variational gene distributions in step 1 -- the variational gene distributions are set to the priors, and Step 1 uses these and the observed counts to compute the conditional posterior of the cell phases<br>
- init_variational_dist_to_prior (boolean): if True, always sets the gene variational distributions to prior distributions at initialization of Steps 1 and 2. <br>
- log10_bf_tempo_vs_null_threshold (positive float): threshold of the log10 bayes factor comparing Tempo's core clock evidence (from step 1) to random core clock evidence; if the bayes factor does not exceed this threshold, the algorithm halts<br>



# Core clock list
The core clock list should be a plain text file with the names of core clock genes on each line

In [1]:
# --- LOAD CORE CLOCK LIST ---

core_clock_gene_path = '/users/benauerbach/desktop/tempo/test_data/core_clock_genes.txt'
with open(core_clock_gene_path) as file_obj:
    print(file_obj.read())



Gene_0
Gene_1
Gene_2
Gene_3
Gene_4
Gene_5
Gene_6
Gene_7
Gene_8
Gene_9
Gene_10
Gene_11
Gene_12
Gene_13
Gene_14
Gene_15
Gene_16
Gene_17
Gene_18
Gene_19
Gene_20
Gene_21
Gene_22
Gene_23


# Prior knowledge about the gene acrophases is specified as a CSV file with the following format
- gene column specifies the name of the gene (which must be an index in the .var field of the AnnData object supplied by the user)
- prior_acrophase_loc is used to specify the location value of the cell phase, in radians
- prior_acrophase_95_interval specifies the size of the 95% interval of the the prior distribution in radians, and should have values of 1e-3 (most certain) to pi (least certain)



In [2]:
# --- LOAD GENE PRIOR KNOWLEDGE ---

import pandas as pd
gene_acrophase_prior_path = '/users/benauerbach/desktop/tempo/test_data/core_clock_acrophase_prior.csv'
pd.read_table(gene_acrophase_prior_path,index_col='gene',sep=',')



Unnamed: 0_level_0,prior_acrophase_loc,prior_acrophase_95_interval
gene,Unnamed: 1_level_1,Unnamed: 2_level_1
Gene_0,0.0,0.392699
Gene_1,0.018052,0.392699
Gene_2,2.796874,0.392699
Gene_3,0.032643,0.392699
Gene_4,0.13098,0.392699
Gene_5,4.85914,0.392699
Gene_6,3.290857,0.392699
Gene_7,0.471431,0.392699
Gene_8,5.975873,0.392699
Gene_9,3.981588,0.392699


# Prior knowledge about the cell phases can be specified as a CSV file with the following format 
- Of note, prior knowledge about the cell phases is optional
- prior_theta_euclid_cos and prior_theta_euclid_sin are used to specify the location value of the cell phase, in terms of euclidean coordinates
- prior_theta_95_interval specifies the size of the 95% interval of the the prior distribution in radians, and should have values of 1e-3 (most certain) to pi (least certain)
- barcode column specifies the barcode of the cell (which must be an index in the .obs field of the AnnData object supplied by the user)



In [4]:
# --- LOAD CELL PRIOR KNOWLEDGE ---

cell_phase_prior_path = '/users/benauerbach/desktop/tempo/test_data/cell_phase_prior_df.csv'
pd.read_table(cell_phase_prior_path,index_col='barcode',sep=',')


Unnamed: 0_level_0,prior_theta_euclid_cos,prior_theta_euclid_sin,prior_theta_95_interval
barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4637,1,0,3.1
4754,1,0,3.1
1433,1,0,3.1
1889,1,0,3.1
4799,1,0,3.1
...,...,...,...
790,1,0,3.1
133,1,0,3.1
1376,1,0,3.1
1896,1,0,3.1
