In [None]:
# For installation
pip install /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/

In [1]:
%matplotlib inline
import pycistarget
pycistarget.__version__

'0.1.dev47+g689662b'

<a class="anchor" id="top"></a>
# PycisTarget on mouse liver ChIP-seq data

* [0. Getting your input region sets](#1)
* [1. Homer](#2)
    * [A. Running Homer](#3)
    * [B. Exploring Homer results](#4)
* [2. cisTarget](#5)
    * [A. Creating cisTarget databases](#6)
    * [B. Running cisTarget](#7)
    * [C. Exploring cisTarget results](#8)
* [3. Differential Motif Enrichment (DEM)](#9)
    * [A. Creating DEM databases](#10)
    * [B. Running DEM](#11)
    * [C. Exploring DEM results](#12)
    * [D. Advanced usage](#13)
        * [1. Thresholding on the mean foreground signal](#14)
        * [2. Using a fixed threshold for the motif hits](#15)
        * [3. Using a shuffled background](#16)
        * [4. Specifying contrasts](#17)

**pycisTarget** is a python module that allows to perform motif enrichment analysis and derive genome-wide cistromes implementing **cisTarget** (Herrmann et al., 2012; Imrichova et al., 2015). In addition, *de novo* cistromes can also be derived (via **Homer** (Heinz et al., 2010)) and pycisTarget also includes a novel approach to derive differentially enriched motifs and cistromes between one or more groups of regions, named **Differentially Enriched Motifs (DEM)**.

<a class="anchor" id="1"></a>
## 0. Getting your input region sets

**pycisTarget** uses as input a dictionary containing the region set name as label and regions (as pyranges) as values. In this tutorial we will use 4 region sets, which correspond to the top 5K ChIP-seq peaks of Hnf4a, Foxa1, Cebpa and Onecut1 in the mouse liver. We can easily read the data in the correct format using list comprehensension.

In [1]:
import pyranges as pr
import os
path_to_region_sets = '/staging/leuven/stg_00002/lcb/cbravo/Liver/Multiome/pycistopic/GEMSTAT/ChIP/All_summits'
region_sets_files = ['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K.bed', 'Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K.bed', 'Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K.bed', 'Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K.bed']
region_sets = {x.replace('.bed', ''):pr.read_bed(os.path.join(path_to_region_sets, x)) for x in region_sets_files}

Apart from the cisTarget method, pycisTarget includes wrapper functions to use Homer (for *de novo* motif enrichment) and a new implementation relying in statistical testing between sets of regions using Cluster-Buster scores (DEM). We will first describe how to perform motif enrichment and form cistromes using Homer.

<a class="anchor" id="2"></a>
## 1. Homer

First we need to load the functions needed for Homer:

In [2]:
# Load homer functions
from pycistarget.motif_enrichment_homer import *

<a class="anchor" id="3"></a>
#### A. Running Homer

For running Homer there are some relevant parameters:
- **homer_path**: Path to the executable Homer files. Homer has to be also accessible in the python paths too.
- **region_sets**: The input sets of regions 
- **outdir**: Output directory
- **genome**: Genome assembly (equivalent to the genome parameter in Homer). Several species and genomes are supported, including human (hg18, hg19, hg38) and mouse (mm8, mm9, mm10), among others. Alternatively, it can be a path to custom genome fasta files.
- **size**: Fragment size to use for motif finding (by default, 'given', which is the whole region).
- **mask**: Whether to mask repeat regions
- **denovo**: Whether to perform de novo motif discovery. This will increase the running time considerably. If running de novo motif enrichment, you can use meme with a motif collection of interest to identify potential TFs linked to de novo motifs. If False, Homer will only be run for known motifs.
- **length**: Motif length for the de novo motif discovery.
- **n_cpu**: Number of cores to use
- **meme_path**:  Path to the executable MEME files. MEME has to be also accessible in the python paths too.
- **meme_collection_path** : Path to the motif collection in meme format. We recommend to use the cisTarget motif collection.
- **cistrome_annotation** : Annotations to assign motifs to TFs (direct, and/or by motif similarity or orthology)

In [3]:
# Set correct path to run HOMER
import os
os.putenv('HOMER_HOME','/data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a')
os.environ["PATH"] += os.pathsep + '/data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a/bin:'
homer_path='/data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a/bin/'
# Choose the output directory for the results
outdir='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/Homer/'
# Select your genome
genome='mm10'
# Set correct path to MEME for de novo motif annotation
os.putenv('MEME_HOME','/data/leuven/software/biomed/haswell_centos7/2018a/software/MEME/5.1.1-foss-2018a-Perl-5.28.1-Python-3.6.4')
os.environ["PATH"] += os.pathsep + '/data/leuven/software/biomed/haswell_centos7/2018a/software/MEME/5.1.1-foss-2018a-Perl-5.28.1-Python-3.6.4/bin:'
meme_collection_path = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/ctx_v9_motif_collection.meme'
meme_path='/data/leuven/software/biomed/haswell_centos7/2018a/software/MEME/5.1.1-foss-2018a-Perl-5.28.1-Python-3.6.4/bin/'
# Run
homer_dict=run_homer(homer_path,
                     region_sets,
                     outdir,
                     'mm10',
                     size='given',
                     mask=True,
                     denovo=True,
                     length='8,10,12',
                     n_cpu=4,
                     meme_path = meme_path,
                     meme_collection_path = meme_collection_path,
                     cistrome_annotation = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'],
                     _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2021-05-21 16:31:31,863 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.


2021-05-21 16:32:07,912	ERROR services.py:1276 -- Failed to start the dashboard: Failed to start the dashboard. The last 10 lines of /scratch/leuven/313/vsc31305/ray_spill/session_2021-05-21_16-31-36_998091_12802/logs/dashboard.log:
2021-05-21 16:31:54,846	INFO dashboard.py:92 -- Setup static dir for dashboard: /user/leuven/313/vsc31305/.local/lib/python3.7/site-packages/ray/new_dashboard/client/build
2021-05-21 16:31:54,887	INFO head.py:162 -- Connect to GCS at b'10.118.230.159:33619'
2021-05-21 16:31:54,908	INFO utils.py:202 -- Get all modules by type: DashboardHeadModule



[2m[36m(pid=12165)[0m 2021-05-21 16:32:25,143 Homer        INFO     Running Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12165)[0m 2021-05-21 16:32:25,169 Homer        INFO     Running Homer for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K with /data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a/bin/findMotifsGenome.pl /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/Homer/regions_bed/Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K.bed mm10 /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/Homer/Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K -preparsedDir /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/Homer/Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K -size given -len 8,10,12 -mask -keepFiles
[2m[36m(pid=12163)[0m 2021-05-21 16:32:25,160 Homer        INFO     Running F

[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(ra

[2m[36m(pid=12163)[0m 2021-05-21 17:42:35,710 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=12163)[0m 2021-05-21 17:42:37,073 Homer        INFO     Annotating motifs for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12163)[0m 2021-05-21 17:42:37,075 Homer        INFO     Annotating known motifs


[2m[36m(pid=12163)[0m b'Skipping line 310: expected 12 fields, saw 13\n'


[2m[36m(pid=12163)[0m 2021-05-21 17:43:06,130 Homer        INFO     Comparing de novo motifs with given motif collection with tomtom


[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12164)[0m 2021-05-21 17:43:15,678 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=12164)[0m 2021-05-21 17:43:16,095 Homer        INFO     Annotating motifs for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12164)[0m 2021-05-21 17:43:16,096 Homer        INFO     Annotating known motifs


[2m[36m(pid=12164)[0m b'Skipping line 310: expected 12 fields, saw 13\n'


[2m[36m(pid=12164)[0m 2021-05-21 17:43:25,846 Homer        INFO     Comparing de novo motifs with given motif collection with tomtom


[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12165)[0m 2021-05-21 17:45:43,442 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=12165)[0m 2021-05-21 17:45:44,655 Homer        INFO     Annotating motifs for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12165)[0m 2021-05-21 17:45:44,657 Homer        INFO     Annotating known motifs


[2m[36m(pid=12165)[0m b'Skipping line 310: expected 12 fields, saw 13\n'


[2m[36m(pid=12165)[0m 2021-05-21 17:46:01,956 Homer        INFO     Comparing de novo motifs with given motif collection with tomtom
[2m[36m(pid=12166)[0m 2021-05-21 17:46:22,825 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=12166)[0m 2021-05-21 17:46:23,038 Homer        INFO     Annotating motifs for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12166)[0m 2021-05-21 17:46:23,038 Homer        INFO     Annotating known motifs


[2m[36m(pid=12166)[0m b'Skipping line 310: expected 12 fields, saw 13\n'


[2m[36m(pid=12166)[0m 2021-05-21 17:46:56,667 Homer        INFO     Comparing de novo motifs with given motif collection with tomtom


[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(ra

[2m[36m(pid=12164)[0m 2021-05-21 18:01:52,520 Homer        INFO     Finding motif hits for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12164)[0m 2021-05-21 18:01:52,522 Homer        INFO     Retrieving enriched regions per known motif


[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12163)[0m 2021-05-21 18:02:38,037 Homer        INFO     Finding motif hits for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12163)[0m 2021-05-21 18:02:38,038 Homer        INFO     Retrieving enriched regions per known motif


[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12164)[0m 2021-05-21 18:03:16,611 Homer        INFO     Retrieving enriched regions per de novo motif


[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12164)[0m 2021-05-21 18:03:51,871 Homer        INFO     Getting cistromes for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12164)[0m 2021-05-21 18:03:53,970 Homer        INFO     Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K done!


[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12163)[0m 2021-05-21 18:04:17,076 Homer        INFO     Retrieving enriched regions per de novo motif


[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error
[2m[33m(raylet)[0m : Broken pipe


[2m[36m(pid=12163)[0m 2021-05-21 18:04:53,143 Homer        INFO     Getting cistromes for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K


[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12163)[0m 2021-05-21 18:04:55,294 Homer        INFO     Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K done!


[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12165)[0m 2021-05-21 18:07:16,943 Homer        INFO     Finding motif hits for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12165)[0m 2021-05-21 18:07:16,947 Homer        INFO     Retrieving enriched regions per known motif


[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12165)[0m 2021-05-21 18:08:34,283 Homer        INFO     Retrieving enriched regions per de novo motif
[2m[36m(pid=12166)[0m 2021-05-21 18:08:36,668 Homer        INFO     Finding motif hits for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12166)[0m 2021-05-21 18:08:36,669 Homer        INFO     Retrieving enriched regions per known motif


[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12165)[0m 2021-05-21 18:09:07,412 Homer        INFO     Getting cistromes for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12165)[0m 2021-05-21 18:09:09,586 Homer        INFO     Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K done!


[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12166)[0m 2021-05-21 18:10:23,813 Homer        INFO     Retrieving enriched regions per de novo motif


[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=12166)[0m 2021-05-21 18:11:00,941 Homer        INFO     Getting cistromes for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=12166)[0m 2021-05-21 18:11:03,760 Homer        INFO     Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K done!


In [4]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/Homer/Homer_dict.pkl', 'wb') as f:
  pickle.dump(homer_dict, f)

[[Back to top]](#top)

<a class="anchor" id="4"></a>
### B. Exploring Homer results

We can load the results for exploration. 

In [5]:
# Load
import pickle
infile = open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/Homer/Homer_dict.pkl', 'rb')
homer_dict = pickle.load(infile)
infile.close()

To visualize motif enrichment results, we can use the `homer_results()` function:

In [6]:
homer_results(homer_dict, 'Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K', results='known')

0,1,2,3,4,5,6,7,8,9,10,11
Rank,Motif,Name,P-value,log P-pvalue,q-value (Benjamini),# Target Sequences with Motif,% of Targets Sequences with Motif,# Background Sequences with Motif,% of Background Sequences with Motif,Motif File,SVG
1,T  G  C  A  A  G  C  T  A  C  G  T  C  T  A  G  G  A  T  C  C  T  A  G  G  A  T  C  G  T  C  A  C  T  G  A  A  G  T  C,CEBP(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer,1e-1324,-3.049e+03,0.0000,2719.0,59.17%,5054.3,11.31%,motif file (matrix),svg
2,T  C  A  G  G  A  C  T  C  A  G  T  C  T  G  A  A  G  C  T  C  T  A  G  G  A  C  T  T  G  C  A  C  T  G  A  A  G  T  C,HLF(bZIP)/HSC-HLF.Flag-ChIP-Seq(GSE69817)/Homer,1e-658,-1.516e+03,0.0000,2216.0,48.23%,6333.9,14.17%,motif file (matrix),svg
3,T  C  G  A  A  C  G  T  A  C  G  T  C  T  G  A  G  A  T  C  T  C  A  G  G  A  C  T  G  T  C  A  C  G  T  A  A  G  C  T  G  T  C  A  C  T  A  G  A  G  C  T  A  C  G  T  T  C  G  A,NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer,1e-641,-1.478e+03,0.0000,1926.0,41.92%,4807.9,10.75%,motif file (matrix),svg
4,C  T  A  G  T  C  G  A  C  G  A  T  C  T  A  G  G  C  A  T  C  A  G  T  C  T  A  G  G  A  T  C  C  G  T  A  G  T  C  A,CEBP:AP1(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer,1e-531,-1.225e+03,0.0000,2013.0,43.81%,6170.9,13.80%,motif file (matrix),svg
5,T  G  C  A  A  G  C  T  C  A  T  G  C  G  T  A  A  G  C  T  A  C  T  G  G  A  T  C  G  T  C  A  C  G  T  A  A  G  C  T,Atf4(bZIP)/MEF-Atf4-ChIP-Seq(GSE35681)/Homer,1e-332,-7.645e+02,0.0000,947.0,20.61%,2029.1,4.54%,motif file (matrix),svg
6,T  C  G  A  G  C  A  T  A  C  G  T  C  T  A  G  G  T  A  C  T  C  G  A  G  C  A  T  T  G  A  C  T  C  G  A  A  C  G  T,Chop(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer,1e-217,-4.998e+02,0.0000,684.0,14.89%,1575.0,3.52%,motif file (matrix),svg
7,T  G  A  C  C  G  A  T  C  T  G  A  C  T  A  G  C  T  A  G  A  C  G  T  A  T  G  C  T  G  C  A  T  C  G  A  C  T  G  A  C  T  A  G  C  A  T  G  A  C  G  T  A  G  T  C  C  G  T  A,"PPARa(NR),DR1/Liver-Ppara-ChIP-Seq(GSE47954)/Homer",1e-96,-2.212e+02,0.0000,1548.0,33.69%,9150.0,20.47%,motif file (matrix),svg
8,G  T  A  C  G  C  T  A  T  C  A  G  C  T  G  A  C  T  A  G  C  A  T  G  A  G  C  T  G  A  T  C  T  G  C  A  T  C  G  A  C  T  G  A  A  C  T  G  C  A  G  T  A  G  T  C  G  A  T  C  G  C  T  A,"HNF4a(NR),DR1/HepG2-HNF4a-ChIP-Seq(GSE25021)/Homer",1e-87,-2.009e+02,0.0000,875.0,19.04%,4218.8,9.44%,motif file (matrix),svg
9,A  G  C  T  A  G  C  T  C  A  T  G  C  T  G  A  G  T  A  C  A  G  T  C  A  G  C  T  A  G  C  T  C  A  G  T  C  T  A  G,RARa(NR)/K562-RARa-ChIP-Seq(Encode)/Homer,1e-79,-1.829e+02,0.0000,2930.0,63.76%,22301.1,49.88%,motif file (matrix),svg


In [32]:
homer_results(homer_dict, 'Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K', results='denovo')

0,1,2,3,4,5,6,7,8
Rank,Motif,P-value,log P-pvalue,% of Targets,% of Background,STD(Bg STD),Best Match/Details,Motif File
1,T  C  A  G  A  G  C  T  A  C  G  T  C  T  A  G  A  G  T  C  T  C  A  G  G  A  T  C  G  T  C  A  C  T  G  A  A  G  C  T  T  G  A  C  A  T  G  C,1e-1782,-4.104e+03,65.31%,10.04%,55.6bp (153.5bp),NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer(0.919) More Information | Similar Motifs Found,motif file (matrix)
2,C  T  G  A  A  G  C  T  A  C  G  T  C  T  A  G  A  T  G  C  T  C  G  A  A  T  C  G  G  T  C  A,1e-330,-7.619e+02,55.56%,28.10%,126.1bp (155.1bp),MF0006.1_bZIP_cEBP-like_subclass/Jaspar(0.856) More Information | Similar Motifs Found,motif file (matrix)
3,C  A  G  T  T  C  A  G  T  G  C  A  G  T  A  C  G  T  A  C  G  A  C  T  A  G  C  T  C  A  G  T  A  T  C  G  T  G  A  C  G  T  A  C  G  A  T  C,1e-116,-2.686e+02,21.48%,9.96%,112.9bp (146.0bp),"PPARa(NR),DR1/Liver-Ppara-ChIP-Seq(GSE47954)/Homer(0.957) More Information | Similar Motifs Found",motif file (matrix)
4,T  G  A  C  A  T  G  C  C  G  A  T  A  C  T  G  A  C  G  T  A  C  G  T  A  C  G  T  C  T  A  G  A  G  T  C  G  A  T  C,1e-69,-1.590e+02,41.37%,29.15%,129.4bp (153.5bp),Foxo1(Forkhead)/RAW-Foxo1-ChIP-Seq(Fan_et_al.)/Homer(0.917) More Information | Similar Motifs Found,motif file (matrix)
5,A  G  C  T  A  C  G  T  A  T  C  G  A  T  G  C  A  T  G  C  C  G  T  A  C  T  A  G  T  C  G  A,1e-53,-1.242e+02,44.22%,33.18%,127.2bp (151.9bp),NF1-halfsite(CTF)/LNCaP-NF1-ChIP-Seq(Unpublished)/Homer(0.968) More Information | Similar Motifs Found,motif file (matrix)
6,A  G  C  T  A  C  G  T  A  C  T  G  C  T  A  G  A  G  T  C  A  G  T  C  A  G  T  C  C  G  T  A,1e-42,-9.745e+01,39.50%,29.99%,131.7bp (151.3bp),NFIX/MA0671.1/Jaspar(0.763) More Information | Similar Motifs Found,motif file (matrix)
7,C  G  A  T  C  T  G  A  C  T  A  G  T  C  G  A  C  G  A  T  G  T  A  C  T  C  G  A  C  T  G  A  A  G  C  T  C  T  G  A,1e-38,-8.809e+01,10.53%,5.61%,129.6bp (146.9bp),HNF6(Homeobox)/Liver-Hnf6-ChIP-Seq(ERP000394)/Homer(0.877) More Information | Similar Motifs Found,motif file (matrix)
8,A  G  C  T  C  A  G  T  A  T  C  G  G  A  C  T  A  C  T  G  T  A  G  C  T  A  G  C  C  T  G  A  C  T  A  G  T  G  C  A,1e-34,-7.953e+01,9.97%,5.39%,132.4bp (147.2bp),"Tbox:Smad(T-box,MAD)/ESCd5-Smad2_3-ChIP-Seq(GSE29422)/Homer(0.717) More Information | Similar Motifs Found",motif file (matrix)
9,C  G  T  A  A  C  T  G  A  C  G  T  G  T  A  C  A  G  T  C  C  G  T  A  C  T  A  G  C  T  G  A,1e-31,-7.347e+01,41.65%,33.27%,137.4bp (148.0bp),PB0134.1_Hnf4a_2/Jaspar(0.756) More Information | Similar Motifs Found,motif file (matrix)


You can also access the regions enriched for each motif (use known_motif_hits for known motifs; and denovo_motif_hits for de novo motifs):

In [7]:
homer_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].known_motif_hits['CEBP(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer'][0:10]

['chr18:60495354-60495855',
 'chr11:19018746-19019247',
 'chr1:161070421-161070922',
 'chr1:82238130-82238631',
 'chr13:73626916-73627417',
 'chr10:37275799-37276300',
 'chr7:89423090-89423591',
 'chr2:167687611-167688112',
 'chr13:52981068-52981569',
 'chr15:96735639-96736140']

To access cistromes (use known_cistromes for cistromes based on known motifs; and denovo_cistromes for cistromes based on de novo motifs):

In [10]:
homer_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].known_cistromes['Cebpa'][0:10]

['chr18:60495354-60495855',
 'chr11:19018746-19019247',
 'chr1:161070421-161070922',
 'chr1:82238130-82238631',
 'chr13:73626916-73627417',
 'chr10:37275799-37276300',
 'chr7:89423090-89423591',
 'chr2:167687611-167688112',
 'chr13:52981068-52981569',
 'chr15:96735639-96736140']

You can easily export cistromes to a bed file:

In [11]:
from pycistarget.utils import *
cebpa_cistrome_pr = pr.PyRanges(region_names_to_coordinates(homer_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].known_cistromes['Cebpa']))
cebpa_cistrome_pr.to_bed(path='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/Homer/cebpa_cistrome_example.bed')

[[Back to top]](#top)

<a class="anchor" id="5"></a>
## 2. cisTarget

<a class="anchor" id="6"></a>
#### A. Creating cisTarget databases

To run **cisTarget** you will need to provide a **ranking database** (that is, a feather file with a dataframe with motifs as rows, genomic regions as columns and their ranked position [based on cis-regulatory module (CRM) score (Frith et al., 2003)] as values). We provide those databases for human (hg38, hg19), mouse (mm10, mm9) and fly (dm3, dm6) at https://resources.aertslab.org/cistarget/. 

In addition, **if you want to use other regions or genomes to build your databases**, we provide a step-by-step tutorial and scripts at https://github.com/aertslab/create_cisTarget_databases. Below you can find the basic steps to do so:

In [None]:
%%bash
#### Variables
region_bed = 'PATH_TO_BED_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
region_fasta = 'PATH_TO_FASTA_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
database_suffix = 'SUFFIX_FOR_DATABASE_FILE'
#### Get fasta sequences
module load BEDTools # In our system, load BEDTools
bedtools getfasta -fi /staging/leuven/stg_00002/lcb/resources/mouse/mm10/mm10.fa -bed ${region_bed} > ${region_fasta}
#### Activate environment
my_conda_initialize # In our system, initialize conda
conda activate /staging/leuven/stg_00002/lcb/ghuls/software/miniconda3/envs/create_cistarget_databases 
#### Set ${create_cistarget_databases_dir} to https://github.com/aertslab/create_cisTarget_databases 
create_cistarget_databases_dir='/staging/leuven/stg_00002/lcb/ghuls/software/create_cisTarget_databases'
#### Score the motifs in 10 chunks 
for current_part in {1..10} ; do ${create_cistarget_databases_dir}/create_cistarget_motif_databases.py
-f ${region_fasta}
-M /staging/leuven/stg_00002/lcb/icistarget/data/motifCollection/v9/singletons/
-m /staging/leuven/stg_00002/lcb/icistarget/data/motifCollection/v9/motifs.txt
-p ${current_part} 10
-o ${database_suffix}
-t 20 
done 
#### Merge scores
${create_cistarget_databases_dir}/combine_partial_regions_or_genes_vs_motifs_or_tracks_cistarget_dbs.py -i ${database_suffix} ${output_dir}
#### Remove chunks
rm ${database_suffix}*part*
#### Create rankings
motifs_vs_regions_scores_feather = 'PATH_TO_MOTIFS_VS_REGIONS_SCORES_DATABASE'
${create_cistarget_databases_dir}/convert_motifs_or_tracks_vs_regions_or_genes_scores_to_rankings_cistarget_dbs.py -i ${motifs_vs_regions_scores_feather} -s 555

[[Back to top]](#top)

<a class="anchor" id="7"></a>
#### B. Running cisTarget

For running cisTarget there are some relevant parameters:
- **ctx_db**: Path to the cisTarget database to use, or a preloaded cisTargetDatabase object (using the same region sets to be analyzed)
- **region_sets**: The input sets of regions 
- **specie**: Specie to which region coordinates and database belong to. To annotate motifs to TFs using cisTarget annotations, possible values are 'mus_musculus', 'homo_sapiens' or 'drosophila_melanogaster'. If any other value, motifs will not be annotated to a TF unless providing a customized annotation.
- **fraction_overlap**: Minimum overlap fraction (in any direction) to map input regions to regions in the database. Default: 0.4.
- **auc_threshold**: Threshold to calculate the AUC. For human and mouse we recommend to set it to 0.005 (default), for fly to 0.01.
- **nes_threshold**: NES threshold to calculate the motif significant. Default: 3.0
- **rank_threshold**: Percentage of regions to use as maximum rank to take into account for the region enrichment recovery curve. By default, we use 5% of the total number of regions in the database.
- **annotation**: Annotation to use to form the cistromes. Default: ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot']
- **n_cpu**: Number of cpus to use during calculations.

In [3]:
# Load cistarget functions
from pycistarget.motif_enrichment_cistarget import *

In [5]:
# Run
cistarget_dict = run_cistarget(ctx_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v9/CTX_mm10/CTX_mm10_SCREEN3_1kb_bg_with_mask/CTX_mm10_SCREEN3_1kb_bg_with_mask.regions_vs_motifs.rankings.feather',
                                                      region_sets = region_sets,
                                                      specie = 'mus_musculus',
                                                      auc_threshold = 0.005,
                                                      nes_threshold = 3.0,
                                                      rank_threshold = 0.05,
                                                      annotation = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'],
                                                      n_cpu = 4,
                                                      _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2021-05-21 15:56:46,117 cisTarget    INFO     Reading cisTarget database


2021-05-21 16:04:29,659	ERROR services.py:1276 -- Failed to start the dashboard: Failed to start the dashboard. The last 10 lines of /scratch/leuven/313/vsc31305/ray_spill/session_2021-05-21_16-04-02_048332_31050/logs/dashboard.log:
2021-05-21 16:04:20,268	INFO dashboard.py:92 -- Setup static dir for dashboard: /user/leuven/313/vsc31305/.local/lib/python3.7/site-packages/ray/new_dashboard/client/build
2021-05-21 16:04:20,669	INFO head.py:162 -- Connect to GCS at b'10.118.230.159:41212'
2021-05-21 16:04:20,727	INFO utils.py:202 -- Get all modules by type: DashboardHeadModule



[2m[36m(pid=7422)[0m 2021-05-21 16:04:52,480 cisTarget    INFO     Running cisTarget for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7422)[0m 2021-05-21 16:04:52,480 cisTarget    INFO     Running cisTarget for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7421)[0m 2021-05-21 16:04:52,480 cisTarget    INFO     Running cisTarget for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7421)[0m 2021-05-21 16:04:52,480 cisTarget    INFO     Running cisTarget for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7420)[0m 2021-05-21 16:04:52,498 cisTarget    INFO     Running cisTarget for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7420)[0m 2021-05-21 16:04:52,498 cisTarget    INFO     Running cisTarget for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7419)[0m 2021-05-21 16:04:52,480 cisTarget    INFO     Running cisTar

[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=7421)[0m 2021-05-21 16:07:06,446 cisTarget    INFO     Annotating motifs for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7421)[0m 2021-05-21 16:07:06,446 cisTarget    INFO     Annotating motifs for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7419)[0m 2021-05-21 16:07:06,946 cisTarget    INFO     Annotating motifs for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7419)[0m 2021-05-21 16:07:06,946 cisTarget    INFO     Annotating motifs for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7420)[0m 2021-05-21 16:07:09,965 cisTarget    INFO     Annotating motifs for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7420)[0m 2021-05-21 16:07:09,965 cisTarget    INFO     Annotating motifs for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=7421)[0m 2021-05-21 16:07:10,386 numexpr.utils INFO     Note: NumExpr

[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


2021-05-21 16:07:47,191 cisTarget    INFO     Done!


In [7]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/cisTarget/cisTarget_dict.pkl', 'wb') as f:
  pickle.dump(cistarget_dict, f)

[[Back to top]](#top)

<a class="anchor" id="8"></a>
#### C. Exploring cisTarget results

We can load the results for exploration. 

In [8]:
# Load
import pickle
infile = open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/cisTarget/cisTarget_dict.pkl', 'rb')
cistarget_dict = pickle.load(infile)
infile.close()

To visualize motif enrichment results, we can use the `cisTarget_results()` function:

In [9]:
cistarget_results(cistarget_dict, name='Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Region_set,Direct_annot,Motif_similarity_annot,Orthology_annot,Motif_similarity_and_Orthology_annot,NES,AUC,Rank_at_max,Motif_hits
cisbp__M5317,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd, Cebpg",Cebpe,"Ep300, Ppargc1a",18.341869,0.091825,60113.0,2446
taipale__CEBPE_DBD_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd, Cebpg",Cebpe,"Ep300, Ppargc1a",18.21821,0.091232,60594.0,2443
transfac_pro__M07414,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Cebpa, Cebpb","Cebpd, Cebpe, Cebpg",Ep300,"Atf3, Bmyc, Hlf, Myc, Nfil3",18.055576,0.090453,60353.0,2606
cisbp__M5314,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpg","Cebpb, Cebpe","Ep300, Ppargc1a",17.952745,0.08996,60640.0,2417
taipale__CEBPB_DBD_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpg","Cebpb, Cebpe","Ep300, Ppargc1a",17.655014,0.088534,60617.0,2389
taipale_cyt_meth__CEBPD_NRTTGCGYAAYN_eDBD,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg","Cebpd, Cebpe","Dbp, Ep300, Hlf, Ppargc1a, Tef",17.425034,0.087432,60524.0,2314
hocomoco__CEBPB_MOUSE.H11MO.0.A,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Cebpa, Cebpb","Cebpd, Cebpe, Cebpg",,Ep300,17.383533,0.087233,60630.0,2604
cisbp__M0315,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Cebpa, Cebpb","Cebpd, Cebpe, Cebpg, Hlf",,"Ep300, Nfil3, Ppargc1a",17.377502,0.087204,60187.0,2617
transfac_pro__M08910,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpg","Cebpb, Cebpe","Dbp, Ep300, Hlf, Ppargc1a, Tef",17.132434,0.08603,60626.0,2477
cisbp__M5316,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg","Cebpd, Cebpe","Ep300, Ppargc1a",17.111585,0.08593,60530.0,2392


This table can also be easily exported to a html file:

In [10]:
out_file = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/cisTarget/Cebpa_motif_enricment.html'
cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].motif_enrichment.to_html(open(out_file, 'w'), escape=False, col_space=80)

You can also access the regions enriched for each motif. You will find to entries in motif_hits (similarly for cistromes); in 'Region_set' you will find the coordinates as in the input regions, in 'Database' you will find the coordinates as in the database:

In [11]:
cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].motif_hits['Region_set']['cisbp__M5317'][0:10]

['chr15:85231705-85232206',
 'chr12:83543908-83544409',
 'chr15:81523371-81523872',
 'chr1:138606672-138607173',
 'chr17:87263386-87263887',
 'chr15:77196848-77197349',
 'chr15:25914953-25915454',
 'chr2:73046931-73047432',
 'chr1:31093894-31094395',
 'chr9:50834945-50835446']

To access cistromes (only available if motifs have been annotated):

In [12]:
cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].cistromes['Region_set']['Cebpa'][0:10]

['chr15:85231705-85232206',
 'chr11:75472531-75473032',
 'chr12:83543908-83544409',
 'chr15:81523371-81523872',
 'chr5:147726931-147727432',
 'chr1:60859768-60860269',
 'chr1:138606672-138607173',
 'chr17:87263386-87263887',
 'chr9:14398068-14398569',
 'chr15:77196848-77197349']

You can easily export cistromes to a bed file:

In [13]:
from pycistarget.utils import *
cebpa_cistrome = cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].cistromes['Region_set']['Cebpa']
cebpa_cistrome_pr = pr.PyRanges(region_names_to_coordinates(cebpa_cistrome))
cebpa_cistrome_pr.to_bed(path='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/cisTarget/cebpa_cistrome_example.bed')

[[Back to top]](#top)

<a class="anchor" id="9"></a>
# 3. DEM

<a class="anchor" id="10"></a>
#### A. Creating your DEM databases

To run **DEM** you will need to provide a **CRM scores database** (that is, a feather file with a dataframe with motifs as rows, genomic regions as columns and their cis-regulatory module (CRM) score (Frith et al., 2003) as values). We provide those databases for human (hg38, hg19), mouse (mm10, mm9) and fly (dm3, dm6) at https://resources.aertslab.org/cistarget/. 

In addition, **if you want to use other regions or genomes to build your databases**, we provide a step-by-step tutorial and scripts at https://github.com/aertslab/create_cisTarget_databases. The steps are the same as for creating a cisTarget database, without running the last step for ranking the regions. Below you can find the basic steps to do so:

In [None]:
%%bash
#### Variables
region_bed = 'PATH_TO_BED_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
region_fasta = 'PATH_TO_FASTA_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
database_suffix = 'SUFFIX_FOR_DATABASE_FILE'
#### Get fasta sequences
module load BEDTools # In our system, load BEDTools
bedtools getfasta -fi /staging/leuven/stg_00002/lcb/resources/mouse/mm10/mm10.fa -bed ${region_bed} > ${region_fasta}
#### Activate environment
my_conda_initialize # In our system, initialize conda
conda activate /staging/leuven/stg_00002/lcb/ghuls/software/miniconda3/envs/create_cistarget_databases 
#### Set ${create_cistarget_databases_dir} to https://github.com/aertslab/create_cisTarget_databases 
create_cistarget_databases_dir='/staging/leuven/stg_00002/lcb/ghuls/software/create_cisTarget_databases'
#### Score the motifs in 10 chunks 
for current_part in {1..10} ; do ${create_cistarget_databases_dir}/create_cistarget_motif_databases.py
-f ${region_fasta}
-M /staging/leuven/stg_00002/lcb/icistarget/data/motifCollection/v9/singletons/
-m /staging/leuven/stg_00002/lcb/icistarget/data/motifCollection/v9/motifs.txt
-p ${current_part} 10
-o ${database_suffix}
-t 20 
done 
#### Merge scores
${create_cistarget_databases_dir}/combine_partial_regions_or_genes_vs_motifs_or_tracks_cistarget_dbs.py -i ${database_suffix} ${output_dir}
#### Remove chunks
rm ${database_suffix}*part*

[[Back to top]](#top)

<a class="anchor" id="11"></a>
#### B. Running DEM

For running DEM there are some relevant parameters:
- **dem_db**: Path to the DEM database to use, or a preloaded DEMDatabase object (using the same region sets to be analyzed)
- **region_sets**: The input sets of regions 
- **specie**: Specie to which region coordinates and database belong to. To annotate motifs to TFs using cisTarget annotations, possible values are 'mus_musculus', 'homo_sapiens' or 'drosophila_melanogaster'. If any other value, motifs will not be annotated to a TF unless providing a customized annotation.
- **contrasts**: Type of contrast to perform. If 'Other', background regions will be taken from other region sets; if 'Shuffle' the background will consist of the scores on shuffled input sequences. You can also provide a list specifying the specific contrasts to make. We will show some examples of these modalities below. When using 'Shuffle', the cluster-buster path, the genome fasta and the path to the folder with the motifs to score (cluster-buster format) has to be provided.
- **fraction_overlap**: Minimum overlap fraction (in any direction) to map input regions to regions in the database. Default: 0.4.
- **max_bg_regions**: Maximum number of background regions to use. Default: None (all regions).
- **adjpval_thr**: Maximum adjusted p-value to select motifs. Default: 0.05
- **log2fc_thr**: Minimum LogFC between the regions set and te background to consider the motif as differentially enriched. Default: 1.
- **mean_fg_thr**: Minimum mean CRM value in the foreground (region set) to consider the motif differentially enriched. Default: 0
- **motif_hit_thr**: Minimum CRM value to consider a region a motif hit. If None (default), an optimal threshold will be calculated per motif by comparing foreground and background.
- **annotation**: Annotation to use to form the cistromes. Here we will only use the direct annotation as example. Default: ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot']
- **n_cpu**: Number of cpus to use during calculations.

In [3]:
# Load DEM functions
from pycistarget.motif_enrichment_dem import *

In [14]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v9/CTX_mm10/CTX_mm10_SCREEN3_1kb_bg_with_mask/CTX_mm10_SCREEN3_1kb_bg_with_mask.regions_vs_motifs.scores.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Other',
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 0,
    motif_hit_thr = None,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation = ['Direct_annot'],
    n_cpu = 4,
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2021-05-20 14:50:06,004 DEM          INFO     Reading DEM database
2021-05-20 14:55:31,347 DEM          INFO     Creating contrast groups


2021-05-20 14:55:34,272	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


[2m[36m(pid=22837)[0m 2021-05-20 14:55:44,096 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=22837)[0m 2021-05-20 14:55:44,096 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=22836)[0m 2021-05-20 14:55:45,161 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=22836)[0m 2021-05-20 14:55:45,161 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=22834)[0m 2021-05-20 14:55:45,297 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=22834)[0m 2021-05-20 14:55:45,297 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=22835)[0m 2021-05-20 14:55:45,610 DEM          INFO     Computing DEM for Foxa1_ERR2357

[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


2021-05-20 14:57:00,359 DEM          INFO     Forming cistromes
2021-05-20 14:57:00,710 DEM          INFO     Done!


In [16]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/DEM_dict_B.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

[[Back to top]](#top)

<a class="anchor" id="12"></a>
#### C. Exploring DEM results

We can load the results for exploration. 

In [14]:
# Load
import pickle
infile = open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/DEM_dict_B.pkl', 'rb')
DEM_dict = pickle.load(infile)
infile.close()

To visualize motif enrichment results, we can use the `DEM_results()` function:

In [21]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Motif_similarity_annot,Orthology_annot,Motif_similarity_and_Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Number_of_regions
taipale_tf_pairs__FLI1_CEBPD_RNCGGANNTTGCGCAAN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,,"Cebpd, Fli1",,2.627919,0.008487,0.314952,0.050952,1.1,428.0
taipale_tf_pairs__ATF4_TEF_RNMTGATGCAATN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,,"Atf4, Tef",,2.449714,0.000787,0.455437,0.083366,2.38,363.0
taipale__CEBPG_DBD_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.435591,0.0,1.327401,0.245367,2.11,1413.0
taipale__Atf4_DBD_NGGATGATGCAATM_repr,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Atf4,"Cebpg, Ddit3",,"Cebpb, Ep300, Jun",2.367688,0.003386,0.378252,0.073289,2.92,249.0
taipale_cyt_meth__CEBPE_NRTTGCGYAAYN_eDBD_meth_repr,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg, Dbp, Hlf","Cebpd, Cebpe","Ep300, Nfil3, Ppargc1a, Tef",2.333029,0.0,1.289099,0.255844,1.61,1610.0
taipale_tf_pairs__TEAD4_CEBPD_NTTRCGYAANNNNNNRGWATGY_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,,"Cebpd, Tead4",,2.332689,1e-06,0.400207,0.079447,0.647,767.0
cisbp__M5318,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.330356,0.0,1.42071,0.282487,2.2,1478.0
transfac_pro__M09010,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Atf2, Cebpa, Cebpb, Cebpg, Nfil3","Dbp, Hlf, Tef","Cebpe, Crebl2",2.293414,0.0,0.655187,0.133653,1.0,1067.0
taipale__CEBPG_full_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.279199,0.0,1.511894,0.311468,1.12,2187.0
taipale_cyt_meth__TEF_NRTTAYGTAAYN_eDBD,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Atf2, Atf4, Cebpa, Creb1, Dbp, Hlf, Nfil3",Tef,"Cebpb, Cebpd, Cebpe, Cebpg, Creb5, Crem, Jdp2, Xbp1",2.266265,0.0,0.664474,0.138122,2.33,665.0


This table can also be easily exported to a html file:

In [23]:
out_file = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/Cebpa_motif_enricment.html'
DEM_dict.motif_enrichment['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].to_html(open(out_file, 'w'), escape=False, col_space=80)

You can also access the regions enriched for each motif. You will find to entries in motif_hits (similarly for cistromes); in 'Region_set' you will find the coordinates as in the input regions, in 'Database' you will find the coordinates as in the database:

In [24]:
DEM_dict.motif_hits['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['taipale__CEBPG_DBD_NTTRCGCAAY'][0:10]

['chr3:104601631-104602132',
 'chr4:48132714-48133215',
 'chr5:37960182-37960683',
 'chr14:12104181-12104682',
 'chr3:118477970-118478471',
 'chr8:11088866-11089367',
 'chr7:66311206-66311707',
 'chr13:55361124-55361625',
 'chr8:22878215-22878716',
 'chr15:36394865-36395366']

To access cistromes (only available if motifs have been annotated):

In [25]:
DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa'][0:10]

['chr10:128964974-128965475',
 'chr14:55560165-55560666',
 'chr3:104601631-104602132',
 'chr4:156124035-156124536',
 'chr2:168003318-168003819',
 'chr5:87147990-87148491',
 'chr11:90240601-90241102',
 'chr18:51130579-51131080',
 'chr4:48132714-48133215',
 'chr9:75764709-75765210']

What is the length of this cistrome? We will compare how this changes with different settings below:

In [16]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa'])

3819

You can easily export cistromes to a bed file:

In [26]:
from pycistarget.utils import *
cebpa_cistrome = DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa']
cebpa_cistrome_pr = pr.PyRanges(region_names_to_coordinates(cebpa_cistrome))
cebpa_cistrome_pr.to_bed(path='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/cebpa_cistrome_example.bed')

[[Back to top]](#top)

<a class="anchor" id="13"></a>
#### D. Advanced usage

<a class="anchor" id="14"></a>
##### 1. Thresholding on the mean foreground signal

Above you may have noticed some motifs with high LogFC values, but low signal in both foreground and background. To avoid them, you can set a threshold on the mean CRM value in the foreground with `mean_fg_thr`. Here we will set it to 1:

In [27]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v9/CTX_mm10/CTX_mm10_SCREEN3_1kb_bg_with_mask/CTX_mm10_SCREEN3_1kb_bg_with_mask.regions_vs_motifs.scores.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Other',
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 1,
    motif_hit_thr = None,
    n_cpu = 4,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation = ['Direct_annot'],
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2021-05-20 15:02:37,680 DEM          INFO     Reading DEM database
2021-05-20 15:03:45,343 DEM          INFO     Creating contrast groups


2021-05-20 15:03:48,130	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


[2m[36m(pid=27254)[0m 2021-05-20 15:03:57,085 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=27254)[0m 2021-05-20 15:03:57,085 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=27254)[0m 2021-05-20 15:03:57,085 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=27253)[0m 2021-05-20 15:03:58,535 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=27253)[0m 2021-05-20 15:03:58,535 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=27253)[0m 2021-05-20 15:03:58,535 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=27252)[0m 2021-05-20 15:03:58,555 DEM          INFO     Computing DEM for Foxa1_ERR23

[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


2021-05-20 15:04:50,815 DEM          INFO     Forming cistromes
2021-05-20 15:04:51,010 DEM          INFO     Done!


You will observe now that these motifs are gone:

In [29]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Motif_similarity_annot,Orthology_annot,Motif_similarity_and_Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Number_of_regions
taipale__CEBPG_DBD_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.435591,0.0,1.327401,0.245367,2.11,1413.0
taipale_cyt_meth__CEBPE_NRTTGCGYAAYN_eDBD_meth_repr,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg, Dbp, Hlf","Cebpd, Cebpe","Ep300, Nfil3, Ppargc1a, Tef",2.333029,0.0,1.289099,0.255844,1.61,1610.0
cisbp__M5318,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.330356,0.0,1.42071,0.282487,2.2,1478.0
taipale__CEBPG_full_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.279199,0.0,1.511894,0.311468,1.12,2187.0
cisbp__M5319,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.174326,0.0,1.573493,0.3486,0.779,2613.0
taipale_cyt_meth__CEBPE_NRTTGCGYAAYN_eDBD,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpb, Cebpg","Cebpd, Cebpe","Dbp, Ep300, Hlf, Ppargc1a, Tef",2.171451,0.0,1.662524,0.369059,2.62,1480.0
taipale_cyt_meth__CEBPB_NRTTGCGYAAYN_eDBD_meth,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpg, Dbp","Cebpb, Cebpe","Ep300, Hlf, Nfil3, Tef",2.165566,0.0,1.46913,0.327461,2.45,1381.0
transfac_pro__M07413,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd, Cebpg",,"Cebpe, Ep300",2.109829,0.0,1.89574,0.439195,1.49,2463.0
transfac_pro__M07080,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpe, Cebpg",Cebpb,"Bmyc, Ep300, Myc",2.082265,0.0,2.516838,0.594335,2.16,2582.0
transfac_pro__M07689,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg, Dbp, Hlf","Cebpd, Cebpe","Ep300, Nfil3, Ppargc1a, Tef",2.079418,0.0,1.738856,0.41143,2.52,1561.0


Let's check the length of the cistrome:

In [31]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa'])

3819

And save this object:

In [30]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/DEM_dict_D1.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

[[Back to top]](#top)

<a class="anchor" id="15"></a>
##### 2. Using a fixed threshold for the motif hits

You may have also noticed that cistromes are larger compared to Homer or cisTarget, and this will largely depend on your background (cistromes will be formed by those regions that are more enriched for that motif compared to that background). You can also set a fixed threshold to consider a motif a hit with `motif_hit_thr`. Here we will set it to 3.

In [32]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v9/CTX_mm10/CTX_mm10_SCREEN3_1kb_bg_with_mask/CTX_mm10_SCREEN3_1kb_bg_with_mask.regions_vs_motifs.scores.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Other',
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 1,
    motif_hit_thr = 3,
    n_cpu = 4,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation = ['Direct_annot'],
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2021-05-20 15:06:10,529 DEM          INFO     Reading DEM database
2021-05-20 15:07:15,600 DEM          INFO     Creating contrast groups


2021-05-20 15:07:18,434	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


[2m[36m(pid=29491)[0m 2021-05-20 15:07:27,651 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=29491)[0m 2021-05-20 15:07:27,651 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=29491)[0m 2021-05-20 15:07:27,651 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=29491)[0m 2021-05-20 15:07:27,651 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=29493)[0m 2021-05-20 15:07:29,285 DEM          INFO     Computing DEM for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=29493)[0m 2021-05-20 15:07:29,285 DEM          INFO     Computing DEM for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=29493)[0m 2021-05-20 15:07:29,285 DEM          INFO     Computing DEM for Foxa1_ERR

You will notice now that the number of motif hits per motif is generally lower.

In [34]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Motif_similarity_annot,Orthology_annot,Motif_similarity_and_Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Number_of_regions
taipale__CEBPG_DBD_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.435591,0.0,1.327401,0.245367,3.0,1006.0
taipale_cyt_meth__CEBPE_NRTTGCGYAAYN_eDBD_meth_repr,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg, Dbp, Hlf","Cebpd, Cebpe","Ep300, Nfil3, Ppargc1a, Tef",2.333029,0.0,1.289099,0.255844,3.0,977.0
cisbp__M5318,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.330356,0.0,1.42071,0.282487,3.0,1098.0
taipale__CEBPG_full_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.279199,0.0,1.511894,0.311468,3.0,1163.0
cisbp__M5319,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.174326,0.0,1.573493,0.3486,3.0,1172.0
taipale_cyt_meth__CEBPE_NRTTGCGYAAYN_eDBD,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpb, Cebpg","Cebpd, Cebpe","Dbp, Ep300, Hlf, Ppargc1a, Tef",2.171451,0.0,1.662524,0.369059,3.0,1274.0
taipale_cyt_meth__CEBPB_NRTTGCGYAAYN_eDBD_meth,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpg, Dbp","Cebpb, Cebpe","Ep300, Hlf, Nfil3, Tef",2.165566,0.0,1.46913,0.327461,3.0,1158.0
transfac_pro__M07413,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd, Cebpg",,"Cebpe, Ep300",2.109829,0.0,1.89574,0.439195,3.0,1386.0
transfac_pro__M07080,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpe, Cebpg",Cebpb,"Bmyc, Ep300, Myc",2.082265,0.0,2.516838,0.594335,3.0,2052.0
transfac_pro__M07689,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg, Dbp, Hlf","Cebpd, Cebpe","Ep300, Nfil3, Ppargc1a, Tef",2.079418,0.0,1.738856,0.41143,3.0,1304.0


The length of the cistromes is lower too:

In [35]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa'])

3465

Let's save this object:

In [36]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/DEM_dict_D2.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

[[Back to top]](#top)

<a class="anchor" id="16"></a>
##### 3. Using a shuffled background

It is possible that you don't have a background (for example, if you only have a ChIP-seq experiment). You can also use shuffled regions (from your input) as background by setting `contrasts` to 'Shuffle'. You will need to have Cluster-Buster installed to use this option.

In [41]:
os.putenv('CBUST_HOME','/data/leuven/software/biomed/haswell_centos7/2018a/software/Cluster-Buster/20180705-GCCcore-6.4.0')
os.environ["PATH"] += os.pathsep + '/data/leuven/software/biomed/haswell_centos7/2018a/software/Cluster-Buster/20180705-GCCcore-6.4.0/bin:'
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v9/CTX_mm10/CTX_mm10_SCREEN3_1kb_bg_with_mask/CTX_mm10_SCREEN3_1kb_bg_with_mask.regions_vs_motifs.scores.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Shuffle',
    name = 'DEM',
    max_bg_regions = 100,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 2.5, #You may need to increase the detection threshold here, otherwise you may see a lot of G repeats
    n_cpu = 4,
    fraction_overlap = 0.4,
    cluster_buster_path = '/data/leuven/software/biomed/haswell_centos7/2018a/software/Cluster-Buster/20180705-GCCcore-6.4.0/bin/cbust',
    path_to_genome_fasta = '/staging/leuven/res_00001/genomes/mus_musculus/mm10_ucsc/fasta/mm10.fa',
    path_to_motifs = '/staging/leuven/stg_00002/lcb/icistarget/data/motifCollection/v9/singletons/',
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2021-05-20 15:47:08,935 DEM          INFO     Reading DEM database
2021-05-20 15:48:23,826 DEM          INFO     Creating contrast groups
2021-05-20 15:48:23,831 DEM          INFO     Generating and scoring shuffled background
2021-05-20 15:48:24,004 Cluster-Buster INFO     Scoring sequences


2021-05-20 15:48:27,052	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


2021-05-20 15:51:16,809 Cluster-Buster INFO     Done!
2021-05-20 15:51:17,184 DEM          INFO     Generating and scoring shuffled background
2021-05-20 15:51:17,364 Cluster-Buster INFO     Scoring sequences


2021-05-20 15:51:20,443	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


2021-05-20 15:54:12,203 Cluster-Buster INFO     Done!
2021-05-20 15:54:12,474 DEM          INFO     Generating and scoring shuffled background
2021-05-20 15:54:12,635 Cluster-Buster INFO     Scoring sequences


2021-05-20 15:54:15,775	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


2021-05-20 15:57:05,877 Cluster-Buster INFO     Done!
2021-05-20 15:57:06,199 DEM          INFO     Generating and scoring shuffled background
2021-05-20 15:57:06,350 Cluster-Buster INFO     Scoring sequences


2021-05-20 15:57:09,364	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


2021-05-20 15:59:56,516 Cluster-Buster INFO     Done!


2021-05-20 15:59:59,941	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


[2m[36m(pid=23920)[0m 2021-05-20 16:00:08,922 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=23920)[0m 2021-05-20 16:00:08,922 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=23920)[0m 2021-05-20 16:00:08,922 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=23920)[0m 2021-05-20 16:00:08,922 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=23920)[0m 2021-05-20 16:00:08,922 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=23920)[0m 2021-05-20 16:00:08,922 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=23920)[0m 2021-05-20 16:00:08,922 DEM          INFO     Computing DEM for Onecu

[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe


[2m[36m(pid=23920)[0m 2021-05-20 16:00:23,040 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=23920)[0m 2021-05-20 16:00:23,040 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=23920)[0m 2021-05-20 16:00:23,040 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=23920)[0m 2021-05-20 16:00:23,040 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=23920)[0m 2021-05-20 16:00:23,040 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2m[36m(pid=23920)[0m 2021-05-20 16:00:23,040 numexpr.utils INFO     Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing saf

[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(raylet)[0m cut: write error: Broken pipe
[2m[33m(ra

2021-05-20 16:01:11,481 DEM          INFO     Forming cistromes
2021-05-20 16:01:14,332 DEM          INFO     Done!


Let's see the results now:

In [42]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Motif_similarity_annot,Orthology_annot,Motif_similarity_and_Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Number_of_regions
dbcorrdb__CEBPB__ENCSR000EHE_1__m1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpe, Cebpg, Ddit3",Cebpb,"Dbp, Ep300",3.109886,0.0,2.555773,0.296042,1.29,3020.0
factorbook__CEBPB,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Atf4, Cebpa, Cebpd, Cebpg, Dbp",Cebpb,"Cebpe, Ep300, Hlf",3.105516,0.0,2.632374,0.30584,3.03,2225.0
dbcorrdb__CEBPB__ENCSR000BRQ_1__m1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpg",Cebpb,Ep300,3.085453,0.0,2.872691,0.338435,2.48,2856.0
dbcorrdb__CEBPB__ENCSR000BQI_1__m1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpe, Cebpg, Ddit3",Cebpb,,2.994965,0.0,2.675362,0.335589,2.98,2360.0
hocomoco__CEBPB_HUMAN.H11MO.0.A,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpa,"Cebpd, Cebpe, Cebpg",Cebpb,Ep300,2.985242,0.0,2.938602,0.371102,1.4,3337.0
transfac_pro__M07080,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpe, Cebpg",Cebpb,"Bmyc, Ep300, Myc",2.982557,0.0,2.516838,0.318432,0.816,3237.0
dbcorrdb__CEBPB__ENCSR000DYI_1__m1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpe, Cebpg, Dbp","Cebpb, Ep300",Jun,2.930386,0.0,2.601177,0.341221,2.34,2661.0
dbcorrdb__CEBPB__ENCSR000EDA_1__m1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Atf4, Cebpa, Cebpd, Cebpe, Cebpg, Dbp, Ddit3, Nfil3","Cebpb, Ep300",,2.92885,0.0,3.012026,0.395537,1.3,3504.0
dbcorrdb__CEBPB__ENCSR000EEE_1__m1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpe, Cebpg","Cebpb, Ep300",Atf4,2.915596,0.0,2.600302,0.344621,0.965,3193.0
jaspar__MA0102.3,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpd, Cebpe, Cebpg",Cebpa,Ep300,2.912782,0.0,2.79246,0.370811,1.3,3250.0


The length of the cistromes is lower too:

In [43]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa'])

4037

Let's save this object:

In [44]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/DEM_dict_D3.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

[[Back to top]](#top)

<a class="anchor" id="17"></a>
##### 4. Specifying contrasts

Finally it is possible that you want to make specific contrast between region sets. You can do this by passing a list to contrast (each slot will be a contrast, first slot with it will be the foreground and second the background). For example, here we will perform two contrasts: 1) Cebpa versus Onecut and 2) Cebpa versus Onecut and Hnf4a.

In [5]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v9/CTX_mm10/CTX_mm10_SCREEN3_1kb_bg_with_mask/CTX_mm10_SCREEN3_1kb_bg_with_mask.regions_vs_motifs.scores.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = [[['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'], ['Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K']], [['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'], ['Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K', 'Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K']]],
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 1,
    motif_hit_thr = 3,
    n_cpu = 4,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation = ['Direct_annot'],
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2021-05-20 17:45:35,805 DEM          INFO     Reading DEM database
2021-05-20 17:50:54,828 DEM          INFO     Creating contrast groups


2021-05-20 17:51:11,304	INFO services.py:1269 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


[2m[36m(pid=24771)[0m 2021-05-20 17:51:28,026 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K_Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=24771)[0m 2021-05-20 17:51:28,026 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K_Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=24772)[0m 2021-05-20 17:51:28,189 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=24772)[0m 2021-05-20 17:51:28,189 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(pid=24771)[0m

Let's see the results now comparing with Onecut:

In [6]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Motif_similarity_annot,Orthology_annot,Motif_similarity_and_Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Number_of_regions
taipale_cyt_meth__CEBPE_NRTTGCGYAAYN_eDBD_meth_repr,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg, Dbp, Hlf","Cebpd, Cebpe","Ep300, Nfil3, Ppargc1a, Tef",2.260583,0.0,1.289099,0.269019,3.0,977.0
taipale_cyt_meth__CEBPB_NRTTGCGYAAYN_eDBD_meth,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpd, Cebpg, Dbp","Cebpb, Cebpe","Ep300, Hlf, Nfil3, Tef",2.152441,0.0,1.46913,0.330454,3.0,1158.0
taipale_cyt_meth__CEBPE_NRTTGCGYAAYN_eDBD,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpb, Cebpg","Cebpd, Cebpe","Dbp, Ep300, Hlf, Ppargc1a, Tef",2.150738,0.0,1.662524,0.374396,3.0,1274.0
taipale__CEBPG_DBD_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.076253,0.0,1.327401,0.314766,3.0,1006.0
taipale__CEBPG_full_NTTRCGCAAY,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,2.06123,0.0,1.511894,0.362267,3.0,1163.0
taipale_cyt_meth__CEBPD_NRTTGCGYAAYN_eDBD_meth,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpg, Dbp","Cebpd, Cebpe","Ep300, Hlf, Nfil3, Tef",2.030317,0.0,1.776707,0.43494,3.0,1417.0
cisbp__M5275,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,,"Cebpa, Cebpb, Cebpd, Cebpg, Tef",,Cebpe,2.006358,0.0,1.396739,0.347649,3.0,988.0
transfac_pro__M07413,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd, Cebpg",,"Cebpe, Ep300",2.00377,0.0,1.89574,0.472698,3.0,1386.0
cisbp__M5319,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,1.994111,0.0,1.573493,0.394982,3.0,1172.0
cisbp__M5318,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,"Cebpa, Cebpd","Cebpe, Cebpg",Ep300,1.98843,0.0,1.42071,0.358037,3.0,1098.0


Let's save this object:

In [9]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tutorial/DEM/DEM_dict_D4.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

[[Back to top]](#top)