Skip to content

Roy-lab/drmn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DRMN: Dynamic Regulatory Module Networks

GPLv3 license

Dynamic Regulatory Module Networks (DRMN) is a computational framework to infer context-specific regulatory network for cell lineage or a time course. It can incorporate context-specific features (such as histone modification) and context-independent features (such as motif networks). In order to handle small number of expression samples (1 per time point/cell line) we first cluster genes into groups of co-expressed genes, and then infer regulatory program for each module, with additional constraint that time points and cell lines that are close to each other should have similar regulatory program and modules.

alt text

The main steps of the pipeline are:

1. Prepare feature files

The DRMN program needs per cell line/time point feature files. These could be context specific (like histone modification signals) or context independent (like motif instances). For a brief description of how to prepare these features see feature generation.

2. Prepare DRMN input files

The DRMN program needs multiple input files (including the feature files generated in the first step, lineage tree, and list of genes). For explanation of these different input files, see other input files for DRMN.

3. Apply DRMN

The usage of the DRMN program is included at the end of this page. See the enclosed example script run_example.sh for how to run DRMN on an example input dataset.

4. Find transitioning gene sets

First post processing step in the DRMN pipeline is identifying genes that change their module assignment across cell lines/time points. See find_transitioning_genesets_DRMN prgoram in DRMN utils.

5. Predict regulators for transitioning gene sets.

Second post processing step in the DRMN pipeline is identifying regulatory features associated with transitioning gene sets. See mtg_lasso in DRMN utils.

Usage and other parameters

Usage: ./learnDRMN celltype_order ogids_file null k lineage_tree config rand[none|yes|<int>] outputDir mode[learn|learnCV|learnCV:<int>:<int>:<int>|generate|visualize] srcnode inittype[uniform|branchlength] p_diagonal_nonleaf [selfInit leasttype[LEASTFUSED|GREEDY] p1 p2 p3]
  1. celltype_order as described above

  2. ogids as described above

  3. not used (legacy argument that we should get rid of, but haven't yet)

  4. k: number of states (I think this needs to match the initial cluster assignments)

  5. lineage tree as described above

  6. config file as described above

  7. Whether to randomize the membership of the initial clusters. We have been using "none" for DRMN.

  8. output directory name, which you MUST create in advance.

  9. mode: learn: train DRMN on all data learnCV: do 3-fold CV in serial learnCV:fold:seed[:nfolds]: do the fold of CV corresponding to 'fold' value. Without specifying nfolds, it will do 3-fold CV and accept fold=0, 1, or 2. If you want to do more folds, give a higher value of nfolds. If you want to do CV with disjoint test sets, you need to use the same seed for each fold. generate: not implemented visualize: not implemented

  10. srcnode: Cell type to use as the reference cell type. The gene names for this cell type will be used in some output files.

  11. p_diagonal_nonleaf : The prior for genes maintaining their state assignment between two adjacent cell types. We use 0.8 by default. IT doesn't seem to affect DRMN very much so far.

  12. selfInit (optional): If you include this exact string, DRMN will initialize the module parameters based on the data for each cell type separately. If you don't use this, then it will initialize all module params from the srcnode cell type. If the distributions of your data differ between cell types (say you used log zero mean expression), then you will definitely need to use this. If the distributions match, then it probably doesn't matter. I use it anyway.

  13. leasttype(optional): The multitask regression algorithm, LEASTFUSED for fused lasso, and if it is GREEDY or not specified, it ran the greedy hill climbing algorithm.

  • fused lasso has 3 hyper parameters (sparsity, fused penalty, and group penalty).

for DRMN-FUSED:

./learnDRMN order.txt ogids.txt null 3 tree.txt config.txt none out/ learnCV:0:12345 esc uniform 0.8 selfInit LEASTFUSED 25 50 50

for DRMN-ST:

./learnDRMN order.txt ogids.txt null 3 tree.txt config.txt none out/ learnCV:0:12345 esc uniform 0.8 selfInit

About

DRMN with multi-task learning regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages