# Motif enrichment with pycisTarget using mouse liver ChIP-seq regions

In [1]:
%matplotlib inline
import pycistarget
pycistarget.__version__

**pycisTarget** is a python module that allows to perform motif enrichment analysis and derive genome-wide cistromes implementing **cisTarget** (Herrmann et al., 2012; Imrichova et al., 2015). In addition, *de novo* cistromes can also be derived (via **Homer** (Heinz et al., 2010)) and pycisTarget also includes a novel approach to derive differentially enriched motifs and cistromes between one or more groups of regions, named **Differentially Enriched Motifs (DEM)**.

## 0. Getting your input region sets

**pycisTarget** uses as input a dictionary containing the region set name as label and regions (as pyranges) as values. In this tutorial we will use 4 region sets, which correspond to the top 5K ChIP-seq peaks of Hnf4a, Foxa1, Cebpa and Onecut1 in the mouse liver (Ballester et al., 2014). We can easily read the data in the correct format using list comprehensension.

In [2]:
import pyranges as pr
import os
path_to_region_sets = '/staging/leuven/stg_00002/lcb/cbravo/Liver/Multiome/pycistopic/GEMSTAT/ChIP/All_summits'
region_sets_files = ['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K.bed', 'Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K.bed', 'Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K.bed', 'Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K.bed']
region_sets = {x.replace('.bed', ''):pr.read_bed(os.path.join(path_to_region_sets, x)) for x in region_sets_files}


Apart from the cisTarget method, pycisTarget includes wrapper functions to use Homer (for *de novo* motif enrichment) and a new implementation relying in statistical testing between sets of regions using Cluster-Buster scores (DEM). We will first describe how to perform motif enrichment and form cistromes using Homer.

## 1. cisTarget

### A. Creating cisTarget databases

To run **cisTarget** you will need to provide a **ranking database** (that is, a feather file with a dataframe with motifs as rows, genomic regions as columns and their ranked position [based on cis-regulatory module (CRM) score (Frith et al., 2003)] as values). We provide those databases for human (hg38, hg19), mouse (mm10, mm9) and fly (dm3, dm6) at https://resources.aertslab.org/cistarget/. 

In addition, **if you want to use other regions or genomes to build your databases**, we provide a step-by-step tutorial and scripts at https://github.com/aertslab/create_cisTarget_databases. Below you can find the basic steps to do so:

In [None]:
%%bash
#### Variables
genome_fasta = 'PATH_TO_GENOME_FASTA'
region_bed = 'PATH_TO_BED_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
region_fasta = 'PATH_TO_FASTA_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
database_suffix = 'SUFFIX_FOR_DATABASE_FILE'
path_to_motif_collection = 'PATH_TO_MOTIF_COLLECTION_IN_CLUSTER_BUSTER_FORMAT'
motif_list = 'PATH_TO_FILE_WITH_MOTIFS_TO_SCORE'
n_cpu = 'NUMBER_OF_CORES'
#### Get fasta sequences
module load BEDTools # In our system, load BEDTools
bedtools getfasta -fi ${genome_fasta} -bed ${region_bed} > ${region_fasta}
#### Activate environment
my_conda_initialize # In our system, initialize conda
conda activate /staging/leuven/stg_00002/lcb/ghuls/software/miniconda3/envs/create_cistarget_databases 
#### Set ${create_cistarget_databases_dir} to https://github.com/aertslab/create_cisTarget_databases 
create_cistarget_databases_dir='/staging/leuven/stg_00002/lcb/ghuls/software/create_cisTarget_databases'
#### Score the motifs 
${create_cistarget_databases_dir}/create_cistarget_motif_databases.py \
-f ${region_fasta} \
-M ${path_to_motif_collection} \
-m ${motif_list} \
-o ${database_suffix} \
-t ${n_cpu} \
-l \
-s 555
done 
#### Create rankings
motifs_vs_regions_scores_feather = 'PATH_TO_MOTIFS_VS_REGIONS_SCORES_DATABASE'
${create_cistarget_databases_dir}/convert_motifs_or_tracks_vs_regions_or_genes_scores_to_rankings_cistarget_dbs.py -i ${motifs_vs_regions_scores_feather} -s 555

### B. Running cisTarget

For running cisTarget there are some relevant parameters:

- **ctx_db**: Path to the cisTarget database to use, or a preloaded cisTargetDatabase object. In this tutorial we will use the precomputed mm10 database (using SCREEN regions), available at https://resources.aertslab.org/cistarget/.
- **region_sets**: The input sets of regions 
- **specie**: Specie to which region coordinates and database belong to. To annotate motifs to TFs using cisTarget annotations, possible values are 'mus_musculus', 'homo_sapiens' or 'drosophila_melanogaster'. If any other value, motifs will not be annotated to a TF unless providing a customized annotation.
- **fraction_overlap**: Minimum overlap fraction (in any direction) to map input regions to regions in the database. Default: 0.4.
- **auc_threshold**: Threshold to calculate the AUC. For human and mouse we recommend to set it to 0.005 (default), for fly to 0.01.
- **nes_threshold**: NES threshold to calculate the motif significant. Default: 3.0
- **rank_threshold**: Percentage of regions to use as maximum rank to take into account for the region enrichment recovery curve. By default, we use 5% of the total number of regions in the database.
- **annotation**: Annotation to use to form the cistromes. Default: ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot']. Since we are using the clustered motif database, we will not use motif similatiry annotations (which only rely on Tomtom q-values), since it is implicit on the clusters.
- **annotation_version** : Motif collection version. Here we use the clustered v10 database ('v10nr_clust').
- **path_to_motif_annotations** : File with motif annotations. These files are available at https://resources.aertslab.org/cistarget/motif2tf . 
- **n_cpu**: Number of cpus to use during calculations.

In [3]:
# Load cistarget functions
from pycistarget.motif_enrichment_cistarget import *

In [5]:
# Run, using precomputed database
cistarget_dict = run_cistarget(ctx_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v10_clust/CTX_mm10/CTX_mm10_SCREEN3_no_bg_with_mask/CTX_mm10_SCREEN3_no_bg_with_mask.regions_vs_motifs.rankings.v2.feather',
                                                      region_sets = region_sets,
                                                      specie = 'mus_musculus',
                                                      auc_threshold = 0.005,
                                                      nes_threshold = 3.0,
                                                      rank_threshold = 0.05,
                                                      annotation = ['Direct_annot', 'Orthology_annot'],
                                                      annotation_version = 'v10nr_clust',
                                                      path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
                                                      n_cpu = 4,
                                                      _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2022-08-04 09:14:15,645 cisTarget    INFO     Reading cisTarget database
[2m[36m(ctx_internal_ray pid=30473)[0m 2022-08-04 09:14:41,873 cisTarget    INFO     Running cisTarget for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K which has 4924 regions
[2m[36m(ctx_internal_ray pid=30476)[0m 2022-08-04 09:14:41,925 cisTarget    INFO     Running cisTarget for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K which has 4715 regions
[2m[36m(ctx_internal_ray pid=30475)[0m 2022-08-04 09:14:42,008 cisTarget    INFO     Running cisTarget for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K which has 5019 regions
[2m[36m(ctx_internal_ray pid=30474)[0m 2022-08-04 09:14:42,100 cisTarget    INFO     Running cisTarget for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K which has 3777 regions
[2m[36m(ctx_internal_ray pid=30473)[0m 2022-08-04 09:14:54,544 cisTarget    INFO     Annotating motifs for Cebpa_ERR235722_summits_order_by_score_exte

In [5]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/cisTarget/cisTarget_dict.pkl', 'wb') as f:
  pickle.dump(cistarget_dict, f)

### C. Exploring cisTarget results

We can load the results for exploration. 

In [6]:
# Load
import pickle
infile = open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/cisTarget/cisTarget_dict.pkl', 'rb')
cistarget_dict = pickle.load(infile)
infile.close()

To visualize motif enrichment results, we can use the `cisTarget_results()` function:

In [7]:
cistarget_results(cistarget_dict, name='Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Region_set,Direct_annot,Orthology_annot,NES,AUC,Rank_at_max,Motif_hits
metacluster_46.4,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Cebpe, Cebpb, Cebpd, Cebpg, Hlf, Cebpa","Cebpe, Hes2, Cebpb, Ep300, Cebpd, Cebpg, Gatad2a, Cebpa, Dbp",29.343196,0.097521,55485.0,2661
homer__ATTGCGCAAC_CEBP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,,24.940297,0.083402,55526.0,2148
cisbp__M01815,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,18.255857,0.061966,55529.0,1913
swissregulon__mm__Cebpe,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,13.044653,0.045254,55394.0,1454
swissregulon__hs__CEBPB,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Cebpb,11.735067,0.041054,55112.0,1303
transfac_pro__M01869,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpg,,10.872791,0.038289,55512.0,1477
transfac_pro__M04761,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Hsf1,10.672929,0.037648,55521.0,1433
taipale_tf_pairs__GCM1_CEBPB_MTRSGGGNNNNNTTRCGYAAN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Gcm1, Cebpb",10.544153,0.037235,9621.0,496
taipale_tf_pairs__GCM1_CEBPB_MTRSGGGNNNNNNTTRCGYAAN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Gcm1, Cebpb",9.96071,0.035364,7159.0,372
taipale_tf_pairs__ATF4_CEBPB_NNATGAYGCAAYN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpb, Atf4",9.523663,0.033963,5333.0,266


This table can also be easily exported to a html file:

In [8]:
out_file = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/cisTarget/Cebpa_motif_enricment.html'
cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].motif_enrichment.to_html(open(out_file, 'w'), escape=False, col_space=80)

You can also access the regions enriched for each motif. You will find to entries in motif_hits (similarly for cistromes); in 'Region_set' you will find the coordinates as in the input regions, in 'Database' you will find the coordinates as in the database:

In [9]:
cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].motif_hits['Region_set']['metacluster_46.4'][0:10]

['chr7:88310722-88311223',
 'chr4:132078352-132078853',
 'chr7:16525901-16526402',
 'chr6:99266056-99266557',
 'chr1:20820207-20820708',
 'chr15:58214791-58215292',
 'chr7:99181713-99182214',
 'chr7:46719487-46719988',
 'chr13:49681875-49682376',
 'chr5:150599840-150600341']

To access cistromes (only available if motifs have been annotated):

In [10]:
cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].cistromes['Region_set']['Cebpa_(2809r)'][0:10]

['chr7:88310722-88311223',
 'chr4:132078352-132078853',
 'chr7:16525901-16526402',
 'chr6:99266056-99266557',
 'chr1:20820207-20820708',
 'chr15:58214791-58215292',
 'chr7:99181713-99182214',
 'chr7:46719487-46719988',
 'chr13:49681875-49682376',
 'chr5:150599840-150600341']

You can easily export cistromes to a bed file:

In [11]:
from pycistarget.utils import *
cebpa_cistrome = cistarget_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].cistromes['Region_set']['Cebpa_(2809r)']
cebpa_cistrome_pr = pr.PyRanges(region_names_to_coordinates(cebpa_cistrome))
cebpa_cistrome_pr.to_bed(path='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/cisTarget/cebpa_cistrome_example.bed')

## 2. DEM

### A. Creating your DEM databases

To run **DEM** you will need to provide a **CRM scores database** (that is, a feather file with a dataframe with motifs as rows, genomic regions as columns and their cis-regulatory module (CRM) score (Frith et al., 2003) as values). We provide those databases for human (hg38, hg19), mouse (mm10, mm9) and fly (dm3, dm6) at https://resources.aertslab.org/cistarget/. 

In addition, **if you want to use other regions or genomes to build your databases**, we provide a step-by-step tutorial and scripts at https://github.com/aertslab/create_cisTarget_databases. The steps are the same as for creating a cisTarget database, without running the last step for ranking the regions. Below you can find the basic steps to do so:

In [12]:
%%bash
#### Variables
genome_fasta = 'PATH_TO_GENOME_FASTA'
region_bed = 'PATH_TO_BED_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
region_fasta = 'PATH_TO_FASTA_FILE_WITH_GENOMIC_REGIONS_FOR_DATABASE'
database_suffix = 'SUFFIX_FOR_DATABASE_FILE'
path_to_motif_collection = 'PATH_TO_MOTIF_COLLECTION_IN_CLUSTER_BUSTER_FORMAT'
motif_list = 'PATH_TO_FILE_WITH_MOTIFS_TO_SCORE'
n_cpu = 'NUMBER_OF_CORES'
#### Get fasta sequences
module load BEDTools # In our system, load BEDTools
bedtools getfasta -fi ${genome_fasta} -bed ${region_bed} > ${region_fasta}
#### Activate environment
my_conda_initialize # In our system, initialize conda
conda activate /staging/leuven/stg_00002/lcb/ghuls/software/miniconda3/envs/create_cistarget_databases 
#### Set ${create_cistarget_databases_dir} to https://github.com/aertslab/create_cisTarget_databases 
create_cistarget_databases_dir='/staging/leuven/stg_00002/lcb/ghuls/software/create_cisTarget_databases'
#### Score the motifs 
${create_cistarget_databases_dir}/create_cistarget_motif_databases.py \
-f ${region_fasta} \
-M ${path_to_motif_collection} \
-m ${motif_list} \
-o ${database_suffix} \
-t ${n_cpu} \
-l \
-s 555
done 

### B. Running DEM

For running DEM there are some relevant parameters:

- **dem_db**: Path to the DEM database to use, or a preloaded DEMDatabase object (using the same region sets to be analyzed)
- **region_sets**: The input sets of regions 
- **specie**: Specie to which region coordinates and database belong to. To annotate motifs to TFs using cisTarget annotations, possible values are 'mus_musculus', 'homo_sapiens' or 'drosophila_melanogaster'. If any other value, motifs will not be annotated to a TF unless providing a customized annotation.
- **contrasts**: Type of contrast to perform. If 'Other', background regions will be taken from other region sets; if 'Shuffle' the background will consist of the scores on shuffled input sequences. You can also provide a list specifying the specific contrasts to make. We will show some examples of these modalities below. When using 'Shuffle', the cluster-buster path, the genome fasta and the path to the folder with the motifs to score (cluster-buster format) has to be provided.
- **fraction_overlap**: Minimum overlap fraction (in any direction) to map input regions to regions in the database. Default: 0.4.
- **max_bg_regions**: Maximum number of background regions to use. Default: None (all regions).
- **adjpval_thr**: Maximum adjusted p-value to select motifs. Default: 0.05
- **log2fc_thr**: Minimum LogFC between the regions set and te background to consider the motif as differentially enriched. Default: 1.
- **mean_fg_thr**: Minimum mean CRM value in the foreground (region set) to consider the motif differentially enriched. Default: 0
- **motif_hit_thr**: Minimum CRM value to consider a region a motif hit. If None (default), an optimal threshold will be calculated per motif by comparing foreground and background.
- **annotation_version** : Motif collection version. Here we use the clustered v10 database ('v10nr_clust').
- **path_to_motif_annotations** : File with motif annotations. These files are available at https://resources.aertslab.org/cistarget/motif2tf . 
- **motif_annotation**: Annotation to use to form the cistromes. Here we will only use the direct and orthology annotation as example. Default: ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot']
- **n_cpu**: Number of cpus to use during calculations.

In [6]:
# Load DEM functions
from pycistarget.motif_enrichment_dem import *

In [7]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v10_clust/CTX_mm10/CTX_mm10_SCREEN3_no_bg_with_mask/CTX_mm10_SCREEN3_no_bg_with_mask.regions_vs_motifs.scores.v2.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Other',
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 0,
    motif_hit_thr = None,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation_version = 'v10nr_clust',
    path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
    motif_annotation = ['Direct_annot', 'Orthology_annot'],
    n_cpu = 4,
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2022-08-04 09:15:22,876 DEM          INFO     Reading DEM database
2022-08-04 09:17:26,334 DEM          INFO     Creating contrast groups
[2m[36m(DEM_internal_ray pid=1603)[0m 2022-08-04 09:17:33,557 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=1605)[0m 2022-08-04 09:17:33,648 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=1606)[0m 2022-08-04 09:17:33,672 DEM          INFO     Computing DEM for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=1604)[0m 2022-08-04 09:17:33,791 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
2022-08-04 09:17:46,089 DEM          INFO     Forming cistromes
2022-08-04 09:17:46,411 DEM          INFO     Done!


In [14]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/DEM_dict_B.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

### C. Exploring DEM results

We can load the results for exploration. 

In [15]:
# Load
import pickle
infile = open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/DEM_dict_B.pkl', 'rb')
DEM_dict = pickle.load(infile)
infile.close()

To visualize motif enrichment results, we can use the `DEM_results()` function:

In [16]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Motif_hits
taipale_tf_pairs__ATF4_TEF_RNMTGATGCAATN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Tef, Atf4",3.57887,1.7e-05,0.480499,0.040211,1.15,496.0
taipale_tf_pairs__CEBPG_ATF4_NNATGAYGCAAT_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpg, Atf4",3.574535,1e-06,0.522313,0.043842,1.5,521.0
taipale_tf_pairs__GCM1_CEBPB_MTRSGGGNNNNNTTRCGYAAN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Gcm1, Cebpb",3.409684,0.033062,0.172734,0.016254,0.487,343.0
taipale_tf_pairs__TEAD4_CEBPD_NTTRCGYAANNNNNNRGWATGY_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Tead4, Cebpd",3.361205,0.0,0.420022,0.040874,1.55,493.0
tfdimers__MD00123,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"E2f1, Sox17",2.677306,8e-06,0.536723,0.083908,2.15,476.0
taipale_tf_pairs__TEAD4_CEBPD_NTTRCGYAANNNNNNNRGWATGY_CAP_repr,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Tead4, Cebpd",2.605569,3.2e-05,0.32604,0.053569,2.45,239.0
taipale_tf_pairs__TEAD4_CEBPD_RGWATGYNNTTRCGYAAN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Tead4, Cebpd",2.596006,0.0,0.64943,0.107413,0.395,1433.0
taipale_tf_pairs__ERF_CEBPD_RSMGGAANTTGCGYAAN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Cebpd, Erf",2.277135,0.034172,0.243821,0.050302,1.04,310.0
tfdimers__MD00288,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Hmga1b, Sry, Hmga2",2.157576,0.009955,0.300005,0.067241,2.42,235.0
taipale_tf_pairs__FLI1_CEBPD_RNCGGANNTTGCGCAAN_CAP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Fli1, Cebpd",2.149682,0.000405,0.325289,0.073308,1.29,386.0


This table can also be easily exported to a html file:

In [17]:
out_file = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/Cebpa_motif_enricment.html'
DEM_dict.motif_enrichment['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].to_html(open(out_file, 'w'), escape=False, col_space=80)

You can also access the regions enriched for each motif. You will find to entries in motif_hits (similarly for cistromes); in 'Region_set' you will find the coordinates as in the input regions, in 'Database' you will find the coordinates as in the database:

In [18]:
DEM_dict.motif_hits['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['homer__ATTGCGCAAC_CEBP'][0:10]

['chr4:53196410-53196911',
 'chr9:95477249-95477750',
 'chr17:53580191-53580692',
 'chr1:106267982-106268483',
 'chr5:99283569-99284070',
 'chr8:22054603-22055104',
 'chr5:102537694-102538195',
 'chr4:48132714-48133215',
 'chr4:156124035-156124536',
 'chr15:59643719-59644220']

To access cistromes (only available if motifs have been annotated):

In [19]:
DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa_(3360r)'][0:10]

['chr4:53196410-53196911',
 'chr9:25570286-25570787',
 'chr9:95477249-95477750',
 'chr1:106267982-106268483',
 'chr5:99283569-99284070',
 'chr4:76344051-76344552',
 'chr8:22054603-22055104',
 'chr17:53580191-53580692',
 'chr5:102537694-102538195',
 'chr12:7978369-7978870']

What is the length of this cistrome? We will compare how this changes with different settings below:

In [20]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa_(3360r)'])

3360

You can easily export cistromes to a bed file:

In [21]:
from pycistarget.utils import *
cebpa_cistrome = DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa_(3360r)']
cebpa_cistrome_pr = pr.PyRanges(region_names_to_coordinates(cebpa_cistrome))
cebpa_cistrome_pr.to_bed(path='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/cebpa_cistrome_example.bed')

### D. Advanced usage

#### 1. Thresholding on the mean foreground signal

Above you may have noticed some motifs with high LogFC values, but low signal in both foreground and background. To avoid them, you can set a threshold on the mean CRM value in the foreground with `mean_fg_thr`. Here we will set it to 1:

In [8]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v10_clust/CTX_mm10/CTX_mm10_SCREEN3_no_bg_with_mask/CTX_mm10_SCREEN3_no_bg_with_mask.regions_vs_motifs.scores.v2.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Other',
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 1,
    motif_hit_thr = None,
    n_cpu = 4,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation_version = 'v10nr_clust',
    path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
    motif_annotation = ['Direct_annot', 'Orthology_annot'],
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2022-08-04 09:18:24,202 DEM          INFO     Reading DEM database
2022-08-04 09:18:47,321 DEM          INFO     Creating contrast groups
[2m[36m(DEM_internal_ray pid=3083)[0m 2022-08-04 09:18:54,925 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=3084)[0m 2022-08-04 09:18:54,911 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=3085)[0m 2022-08-04 09:18:54,941 DEM          INFO     Computing DEM for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=3086)[0m 2022-08-04 09:18:55,092 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
2022-08-04 09:19:07,185 DEM          INFO     Forming cistromes
2022-08-04 09:19:07,446 DEM          INFO     Done!


You will observe now that these motifs are gone:

In [23]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Motif_hits
homer__ATTGCGCAAC_CEBP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,,2.124851,0.0,2.547318,0.584035,2.38,2299.0
metacluster_46.4,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Hlf, Cebpd, Cebpe, Cebpg, Cebpb, Cebpa","Hes2, Cebpe, Cebpd, Ep300, Cebpg, Gatad2a, Dbp, Cebpb, Cebpa",2.058157,0.0,2.910528,0.698883,1.98,3165.0
dbtfbs__HLF_HepG2_ENCSR528PSI_merged_N1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Hlf,1.885549,0.0,1.106676,0.299512,1.8,1293.0
swissregulon__hs__CEBPB,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Cebpb,1.869025,0.0,1.894034,0.518508,1.17,2173.0
transfac_pro__M04761,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Hsf1,1.644374,0.0,1.911942,0.611602,0.887,2782.0
metacluster_156.2,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Ddit3, Cebpg, Atf4","Ddit3, Atf3, Cebpg, Atf4, Myc",1.639006,0.0,1.453472,0.466677,1.9,1389.0
metacluster_46.5,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Hlf, Tef",1.586788,0.0,1.297947,0.432102,1.55,1661.0
cisbp__M01815,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.483739,0.0,2.501856,0.894566,2.16,2614.0
metacluster_156.3,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Dbp, Hlf, Tef, Nfil3","Gm4125, Hlf, Tef, Dbp, Nfil3",1.436273,0.0,1.31915,0.487453,1.33,1852.0
swissregulon__mm__Cebpe,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.434853,0.0,2.007923,0.742699,1.81,2326.0


The Cebpa cistrome has the same length:

In [24]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa_(3360r)'])

3360

And save this object:

In [25]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/DEM_dict_D1.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

#### 2. Using a fixed threshold for the motif hits

You may have also noticed that cistromes are larger compared to Homer or cisTarget, and this will largely depend on your background (cistromes will be formed by those regions that are more enriched for that motif compared to that background). You can also set a fixed threshold to consider a motif a hit with `motif_hit_thr`. Here we will set it to 3.

In [9]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v10_clust/CTX_mm10/CTX_mm10_SCREEN3_no_bg_with_mask/CTX_mm10_SCREEN3_no_bg_with_mask.regions_vs_motifs.scores.v2.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Other',
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 1,
    motif_hit_thr = 3,
    n_cpu = 4,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation_version = 'v10nr_clust',
    path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
    motif_annotation = ['Direct_annot', 'Orthology_annot'],
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2022-08-04 09:19:30,235 DEM          INFO     Reading DEM database
2022-08-04 09:19:53,620 DEM          INFO     Creating contrast groups
[2m[36m(DEM_internal_ray pid=19721)[0m 2022-08-04 09:20:01,295 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=19720)[0m 2022-08-04 09:20:01,401 DEM          INFO     Computing DEM for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=19722)[0m 2022-08-04 09:20:01,378 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=19723)[0m 2022-08-04 09:20:01,544 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
2022-08-04 09:20:13,338 DEM          INFO     Forming cistromes
2022-08-04 09:20:13,567 DEM          INFO     Done!


You will notice now that the number of motif hits per motif is generally lower.

In [27]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Motif_hits
homer__ATTGCGCAAC_CEBP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,,2.124851,0.0,2.547318,0.584035,3.0,1940.0
metacluster_46.4,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Hlf, Cebpd, Cebpe, Cebpg, Cebpb, Cebpa","Hes2, Cebpe, Cebpd, Ep300, Cebpg, Gatad2a, Dbp, Cebpb, Cebpa",2.058157,0.0,2.910528,0.698883,3.0,2340.0
dbtfbs__HLF_HepG2_ENCSR528PSI_merged_N1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Hlf,1.885549,0.0,1.106676,0.299512,3.0,780.0
swissregulon__hs__CEBPB,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Cebpb,1.869025,0.0,1.894034,0.518508,3.0,1379.0
transfac_pro__M04761,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Hsf1,1.644374,0.0,1.911942,0.611602,3.0,1377.0
metacluster_156.2,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Ddit3, Cebpg, Atf4","Ddit3, Atf3, Cebpg, Atf4, Myc",1.639006,0.0,1.453472,0.466677,3.0,846.0
metacluster_46.5,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Hlf, Tef",1.586788,0.0,1.297947,0.432102,3.0,775.0
cisbp__M01815,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.483739,0.0,2.501856,0.894566,3.0,1911.0
metacluster_156.3,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Dbp, Hlf, Tef, Nfil3","Gm4125, Hlf, Tef, Dbp, Nfil3",1.436273,0.0,1.31915,0.487453,3.0,689.0
swissregulon__mm__Cebpe,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.434853,0.0,2.007923,0.742699,3.0,1315.0


The length of the cistromes is lower too:

In [28]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa_(2488r)'])

2488

Let's save this object:

In [29]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/DEM_dict_D2.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

#### 3. Using a shuffled background

It is possible that you don't have a background (for example, if you only have a ChIP-seq experiment). You can also use shuffled regions (from your input) as background by setting `contrasts` to 'Shuffle'. You will need to have Cluster-Buster installed to use this option.

In [10]:
os.putenv('CBUST_HOME','/data/leuven/software/biomed/skylake_centos7/2018a/software/Cluster-Buster/20220421-GCCcore-6.4.0')
os.environ["PATH"] += os.pathsep + '/data/leuven/software/biomed/skylake_centos7/2018a/software/Cluster-Buster/20220421-GCCcore-6.4.0/bin:'
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v10_clust/CTX_mm10/CTX_mm10_SCREEN3_no_bg_with_mask/CTX_mm10_SCREEN3_no_bg_with_mask.regions_vs_motifs.scores.v2.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Shuffle',
    name = 'DEM',
    max_bg_regions = 100,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 2.5, #You may need to increase the detection threshold here, otherwise you may see a lot of G repeats
    n_cpu = 4,
    fraction_overlap = 0.4,
    cluster_buster_path = '/data/leuven/software/biomed/skylake_centos7/2018a/software/Cluster-Buster/20220421-GCCcore-6.4.0/bin/cbust',
    path_to_genome_fasta = '/staging/leuven/res_00001/genomes/mus_musculus/mm10_ucsc/fasta/mm10.fa',
    path_to_motifs = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/cluster_buster/',
    annotation_version = 'v10nr_clust',
    path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
    motif_annotation = ['Direct_annot', 'Orthology_annot'],
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2022-08-04 09:20:27,317 DEM          INFO     Reading DEM database
2022-08-04 09:20:52,439 DEM          INFO     Creating contrast groups
2022-08-04 09:20:52,443 DEM          INFO     Generating and scoring shuffled background
2022-08-04 09:20:58,295 Cluster-Buster INFO     Scoring sequences
2022-08-04 09:22:07,487 Cluster-Buster INFO     Done!
2022-08-04 09:22:07,543 DEM          INFO     Generating and scoring shuffled background
2022-08-04 09:22:12,910 Cluster-Buster INFO     Scoring sequences
2022-08-04 09:22:41,457 Cluster-Buster INFO     Done!
2022-08-04 09:22:41,512 DEM          INFO     Generating and scoring shuffled background
2022-08-04 09:22:46,153 Cluster-Buster INFO     Scoring sequences
2022-08-04 09:23:12,511 Cluster-Buster INFO     Done!
2022-08-04 09:23:12,567 DEM          INFO     Generating and scoring shuffled background
2022-08-04 09:23:15,634 Cluster-Buster INFO     Scoring sequences
2022-08-04 09:23:43,120 Cluster-Buster INFO     Done!
[2m[36m(DEM_internal_ray

Let's see the results now:

In [11]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Motif_hits
homer__ATTGCGCAAC_CEBP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,,2.532608,0.0,2.547322,0.440243,1.56,2771.0
metacluster_46.4,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Cebpg, Cebpa, Cebpb, Cebpd, Cebpe, Hlf","Ep300, Cebpg, Cebpa, Cebpb, Hes2, Gatad2a, Cebpd, Dbp, Cebpe",2.422879,0.0,2.910528,0.542766,1.6,3416.0
transfac_pro__M12588,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Ddit3,1.860392,0.0,2.645379,0.728541,1.43,3269.0
transfac_pro__M09737,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Zfp644,1.824506,0.0,2.768588,0.781677,1.88,2792.0
swissregulon__hs__EZH2,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Ezh2,1.775441,0.0,2.676924,0.781943,1.58,3074.0
hocomoco__SMAD3_HUMAN.H11MO.0.B,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Smad3,1.712154,0.0,2.745296,0.837876,2.06,2667.0
transfac_pro__M12659,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Rora,1.707419,0.0,2.525895,0.773448,1.31,3244.0
cisbp__M01815,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.639925,0.0,2.501861,0.80278,1.41,3217.0
transfac_pro__M01721,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Pura,,1.609624,0.0,2.591059,0.849048,1.55,3244.0
swissregulon__hs__CUX1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Cux1,1.588562,0.0,2.901514,0.964761,1.74,3205.0


The length of the cistromes is lower too:

In [13]:
len(DEM_dict.cistromes['Region_set']['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K']['Cebpa_(3378r)'])

3378

Let's save this object:

In [15]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/DEM_dict_D3.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

#### 4. Specifying contrasts

It is possible that you want to make specific contrast between region sets. You can do this by passing a list to contrast (each slot will be a contrast, first slot with it will be the foreground and second the background). For example, here we will perform two contrasts: 1) Cebpa versus Onecut and 2) Cebpa versus Onecut and Hnf4a.

In [16]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v10_clust/CTX_mm10/CTX_mm10_SCREEN3_no_bg_with_mask/CTX_mm10_SCREEN3_no_bg_with_mask.regions_vs_motifs.scores.v2.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = [[['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'], ['Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K']], [['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'], ['Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K', 'Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K']]],
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 1,
    motif_hit_thr = 3,
    n_cpu = 4,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation_version = 'v10nr_clust',
    path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
    motif_annotation = ['Direct_annot', 'Orthology_annot'],
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2022-08-04 09:24:59,453 DEM          INFO     Reading DEM database
2022-08-04 09:25:21,915 DEM          INFO     Creating contrast groups
[2m[36m(DEM_internal_ray pid=24473)[0m 2022-08-04 09:25:29,749 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=24472)[0m 2022-08-04 09:25:29,796 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K_Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
2022-08-04 09:25:40,199 DEM          INFO     Forming cistromes
2022-08-04 09:25:40,293 DEM          INFO     Done!


Let's see the results now comparing with Onecut:

In [17]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Motif_hits
dbtfbs__HLF_HepG2_ENCSR528PSI_merged_N1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,,Hlf,2.210142,0.0,1.106676,0.239167,3.0,780.0
metacluster_46.4,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,"Cebpg, Cebpa, Cebpb, Cebpd, Cebpe, Hlf","Ep300, Cebpg, Cebpa, Cebpb, Hes2, Gatad2a, Cebpd, Dbp, Cebpe",2.018376,0.0,2.910528,0.718423,3.0,2340.0
homer__ATTGCGCAAC_CEBP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpb,,1.823287,0.0,2.547322,0.719813,3.0,1940.0
metacluster_46.5,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,,"Tef, Hlf",1.760758,0.0,1.297947,0.383015,3.0,775.0
swissregulon__hs__CEBPB,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,,Cebpb,1.74152,0.0,1.894033,0.566419,3.0,1379.0
transfac_pro__M04761,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,,Hsf1,1.720374,0.0,1.911941,0.580217,3.0,1377.0
metacluster_156.2,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,"Atf4, Ddit3, Cebpg","Cebpg, Myc, Atf4, Ddit3, Atf3",1.665894,0.0,1.45347,0.45806,3.0,846.0
metacluster_156.3,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,"Dbp, Tef, Nfil3, Hlf","Gm4125, Nfil3, Dbp, Tef, Hlf",1.62854,0.0,1.319148,0.426633,3.0,689.0
cisbp__M01815,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.514219,0.0,2.501861,0.875866,3.0,1911.0
swissregulon__mm__Cebpe,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K_VS_Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.40163,0.0,2.007918,0.76,3.0,1315.0


Let's save this object:

In [18]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/DEM_dict_D4.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

#### 5. Balancing promoter content

Finally it is possible to balance the proportion of promoters between foreground and background to avoid overrepresentation of the promoter sequences signal. You only need to provide the promoter annotation.

In [34]:
# Retrive promoter annotation from biomart
import pybiomart as pbm
promoter_space = 500
dataset = pbm.Dataset(name='mmusculus_gene_ensembl',  host='http://nov2020.archive.ensembl.org/')
annot = dataset.query(attributes=['chromosome_name', 'transcription_start_site', 'strand', 'external_gene_name', 'transcript_biotype'])
annot.columns = ['Chromosome', 'Start', 'Strand', 'Gene', 'Transcript_type']
annot['Chromosome'] = annot['Chromosome'].astype('str')
filterf = annot['Chromosome'].str.contains('CHR|GL|JH|MT')
annot = annot[~filterf]
annot['Chromosome'] = annot['Chromosome'].str.replace(r'(\b\S)', r'chr\1')
annot = annot[annot.Transcript_type == 'protein_coding']
annot = annot.dropna(subset = ['Chromosome', 'Start'])

In [39]:
DEM_dict = DEM(dem_db = '/staging/leuven/stg_00002/icistarget-data/make_rankings/v10_clust/CTX_mm10/CTX_mm10_SCREEN3_no_bg_with_mask/CTX_mm10_SCREEN3_no_bg_with_mask.regions_vs_motifs.scores.v2.feather',
    region_sets = region_sets,
    specie = 'mus_musculus',
    contrasts = 'Other',
    name = 'DEM',
    fraction_overlap = 0.4,
    max_bg_regions = 500,
    adjpval_thr = 0.05,
    log2fc_thr = 1,
    mean_fg_thr = 1,
    motif_hit_thr = None,
    genome_annotation= annot, # Add genome_annotation
    promoter_space = 500,
    cluster_buster_path = None,
    path_to_genome_fasta = None,
    path_to_motifs = None,
    annotation_version = 'v10nr_clust',
    path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/cbravo/cluster_motif_collection_V10_no_desso_no_factorbook/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
    motif_annotation = ['Direct_annot', 'Orthology_annot'],
    n_cpu = 4,
    tmp_dir = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget/tmp',
    _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

2022-08-04 09:39:55,535 DEM          INFO     Reading DEM database
2022-08-04 09:40:17,093 DEM          INFO     Creating contrast groups
[2m[36m(DEM_internal_ray pid=33197)[0m 2022-08-04 09:40:26,597 DEM          INFO     Computing DEM for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=33194)[0m 2022-08-04 09:40:26,722 DEM          INFO     Computing DEM for Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=33195)[0m 2022-08-04 09:40:26,721 DEM          INFO     Computing DEM for Foxa1_ERR235786_summits_order_by_score_extended_250bp_top5K
[2m[36m(DEM_internal_ray pid=33196)[0m 2022-08-04 09:40:26,824 DEM          INFO     Computing DEM for Hnf4a_ERR235763_summits_order_by_score_extended_250bp_top5K
2022-08-04 09:40:37,990 DEM          INFO     Forming cistromes
2022-08-04 09:40:38,190 DEM          INFO     Done!


Let's see the results now comparing with Onecut:

In [40]:
DEM_dict.DEM_results('Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K')

Unnamed: 0,Logo,Contrast,Direct_annot,Orthology_annot,Log2FC,Adjusted_pval,Mean_fg,Mean_bg,Motif_hit_thr,Motif_hits
dbtfbs__HLF_HepG2_ENCSR528PSI_merged_N1,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Hlf,2.195518,0.0,1.106676,0.241604,2.25,1058.0
homer__ATTGCGCAAC_CEBP,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpb,,2.173246,0.0,2.547322,0.56477,2.4,2284.0
metacluster_46.4,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Cebpg, Cebpa, Cebpb, Cebpd, Cebpe, Hlf","Ep300, Cebpg, Cebpa, Cebpb, Hes2, Gatad2a, Cebpd, Dbp, Cebpe",2.116813,0.0,2.910528,0.671039,1.9,3225.0
transfac_pro__M04761,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Hsf1,1.891702,0.0,1.911941,0.515247,1.06,2598.0
swissregulon__hs__CEBPB,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,Cebpb,1.859936,0.0,1.894033,0.521784,2.15,1760.0
metacluster_46.5,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,,"Tef, Hlf",1.847229,0.0,1.297947,0.360733,2.03,1364.0
metacluster_156.3,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Dbp, Tef, Nfil3, Hlf","Gm4125, Nfil3, Dbp, Tef, Hlf",1.667336,0.0,1.319148,0.415313,0.984,2184.0
metacluster_156.2,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,"Atf4, Ddit3, Cebpg","Cebpg, Myc, Atf4, Ddit3, Atf3",1.575318,0.0,1.45347,0.48774,1.68,1523.0
cisbp__M01815,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.504281,0.0,2.501861,0.88192,1.83,2843.0
swissregulon__mm__Cebpe,,Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K,Cebpe,,1.482787,0.0,2.007918,0.718427,1.65,2475.0


Let's save this object:

In [41]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/DEM/DEM_dict_D5.pkl', 'wb') as f:
  pickle.dump(DEM_dict, f)

## 3. Homer

First we need to load the functions needed for Homer:

In [42]:
# Load homer functions
from pycistarget.motif_enrichment_homer import *

### A. Running Homer

For running Homer there are some relevant parameters:

- **homer_path**: Path to the executable Homer files. Homer has to be also accessible in the python paths too.
- **region_sets**: The input sets of regions 
- **outdir**: Output directory
- **genome**: Genome assembly (equivalent to the genome parameter in Homer). Several species and genomes are supported, including human (hg18, hg19, hg38) and mouse (mm8, mm9, mm10), among others. Alternatively, it can be a path to custom genome fasta files.
- **size**: Fragment size to use for motif finding (by default, 'given', which is the whole region).
- **mask**: Whether to mask repeat regions
- **denovo**: Whether to perform de novo motif discovery. This will increase the running time considerably. If running de novo motif enrichment, you can use meme with a motif collection of interest to identify potential TFs linked to de novo motifs. If False, Homer will only be run for known motifs.
- **length**: Motif length for the de novo motif discovery.
- **n_cpu**: Number of cores to use
- **meme_path**:  Path to the executable MEME files. MEME has to be also accessible in the python paths too.
- **meme_collection_path** : Path to the motif collection in meme format. We recommend to use the cisTarget motif collection.
- **annotation_version** : Motif collection version. Here we use the unclustered v10 database ('v10').
- **path_to_motif_annotations** : File with motif annotations. These files are available at https://resources.aertslab.org/cistarget/motif2tf . 
- **cistrome_annotation** : Annotations to assign motifs to TFs (direct, and/or by motif similarity or orthology)

In [43]:
# Set correct path to run HOMER
import os
os.putenv('HOMER_HOME','/data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a')
os.environ["PATH"] += os.pathsep + '/data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a/bin:'
homer_path='/data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a/bin/'
# Choose the output directory for the results
outdir='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/Homer/'
# Select your genome
genome='mm10'
# Set correct path to MEME for de novo motif annotation - Only needed if using de novo annotation!
# We have tomtom installed in our image, so we dont need to add additional paths
meme_collection_path = '/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/scenicplus_motif_collection.meme'
meme_path='/opt/meme/bin/'
# Run
homer_dict=run_homer(homer_path,
                     region_sets,
                     outdir,
                     genome,
                     size='given',
                     mask=True,
                     denovo=True,
                     length='8,10,12',
                     n_cpu=4,
                     meme_path = meme_path,
                     meme_collection_path = meme_collection_path,
                     annotation_version = 'v10',
                     path_to_motif_annotations = '/staging/leuven/stg_00002/lcb/icistarget/data/motif2tf_project/motif_to_tf_db_data/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl',
                     cistrome_annotation = ['Direct_annot', 'Orthology_annot'],
                     _temp_dir='/scratch/leuven/313/vsc31305/ray_spill')

[2m[36m(homer_ray pid=33838)[0m 2022-08-04 09:41:08,646 Homer        INFO     Running Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K
[2m[36m(homer_ray pid=33838)[0m 2022-08-04 09:41:08,647 Homer        INFO     Running Homer for Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K with /data/leuven/software/biomed/haswell_centos7/2018a/software/HOMER/4.10.4-foss-2018a/bin/findMotifsGenome.pl /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/Homer/regions_bed/Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K.bed mm10 /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/Homer/Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K -preparsedDir /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/Homer/Onecut1_ERR235752_summits_order_by_score_extended_250bp_top5K -size given -len 8,10,12 -mask -keepFiles
[2m[36m(homer_ray pid=33839)[0m 2022-08-04 09:41:08,790 H

In [44]:
# Save
import pickle
with open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/Homer/Homer_dict.pkl', 'wb') as f:
  pickle.dump(homer_dict, f)

### B. Exploring Homer results

We can load the results for exploration. 

In [4]:
# Load
import pickle
infile = open('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial_old/pycistarget_tutorial/Homer/Homer_dict.pkl', 'rb')
homer_dict = pickle.load(infile)
infile.close()

To visualize motif enrichment results, we can use the `homer_results()` function:

In [45]:
homer_results(homer_dict, 'Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K', results='known')

0,1,2,3,4,5,6,7,8,9,10,11
Rank,Motif,Name,P-value,log P-pvalue,q-value (Benjamini),# Target Sequences with Motif,% of Targets Sequences with Motif,# Background Sequences with Motif,% of Background Sequences with Motif,Motif File,SVG
1,T  G  C  A  A  G  C  T  A  C  G  T  C  T  A  G  G  A  T  C  C  T  A  G  G  A  T  C  G  T  C  A  C  T  G  A  A  G  T  C,CEBP(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer,1e-1326,-3.054e+03,0.0000,2719.0,59.17%,5045.7,11.28%,motif file (matrix),svg
2,T  C  A  G  G  A  C  T  C  A  G  T  C  T  G  A  A  G  C  T  C  T  A  G  G  A  C  T  T  G  C  A  C  T  G  A  A  G  T  C,HLF(bZIP)/HSC-HLF.Flag-ChIP-Seq(GSE69817)/Homer,1e-669,-1.541e+03,0.0000,2216.0,48.23%,6249.0,13.97%,motif file (matrix),svg
3,T  C  G  A  A  C  G  T  A  C  G  T  C  T  G  A  G  A  T  C  T  C  A  G  G  A  C  T  G  T  C  A  C  G  T  A  A  G  C  T  G  T  C  A  C  T  A  G  A  G  C  T  A  C  G  T  T  C  G  A,NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer,1e-646,-1.489e+03,0.0000,1926.0,41.92%,4777.9,10.68%,motif file (matrix),svg
4,C  T  A  G  T  C  G  A  C  G  A  T  C  T  A  G  G  C  A  T  C  A  G  T  C  T  A  G  G  A  T  C  C  G  T  A  G  T  C  A,CEBP:AP1(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer,1e-532,-1.227e+03,0.0000,2013.0,43.81%,6161.9,13.78%,motif file (matrix),svg
5,T  G  C  A  A  G  C  T  C  A  T  G  C  G  T  A  A  G  C  T  A  C  T  G  G  A  T  C  G  T  C  A  C  G  T  A  A  G  C  T,Atf4(bZIP)/MEF-Atf4-ChIP-Seq(GSE35681)/Homer,1e-335,-7.727e+02,0.0000,947.0,20.61%,2008.9,4.49%,motif file (matrix),svg
6,T  C  G  A  G  C  A  T  A  C  G  T  C  T  A  G  G  T  A  C  T  C  G  A  G  C  A  T  T  G  A  C  T  C  G  A  A  C  G  T,Chop(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer,1e-217,-5.017e+02,0.0000,684.0,14.89%,1569.1,3.51%,motif file (matrix),svg
7,T  G  A  C  C  G  A  T  C  T  G  A  C  T  A  G  C  T  A  G  A  C  G  T  A  T  G  C  T  G  C  A  T  C  G  A  C  T  G  A  C  T  A  G  C  A  T  G  A  C  G  T  A  G  T  C  C  G  T  A,"PPARa(NR),DR1/Liver-Ppara-ChIP-Seq(GSE47954)/Homer",1e-94,-2.176e+02,0.0000,1548.0,33.69%,9195.6,20.56%,motif file (matrix),svg
8,G  T  A  C  G  C  T  A  T  C  A  G  C  T  G  A  C  T  A  G  C  A  T  G  A  G  C  T  G  A  T  C  T  G  C  A  T  C  G  A  C  T  G  A  A  C  T  G  C  A  G  T  A  G  T  C  G  A  T  C  G  C  T  A,"HNF4a(NR),DR1/HepG2-HNF4a-ChIP-Seq(GSE25021)/Homer",1e-86,-1.996e+02,0.0000,875.0,19.04%,4230.7,9.46%,motif file (matrix),svg
9,A  G  C  T  A  G  C  T  C  A  T  G  C  T  G  A  G  T  A  C  A  G  T  C  A  G  C  T  A  G  C  T  C  A  G  T  C  T  A  G,RARa(NR)/K562-RARa-ChIP-Seq(Encode)/Homer,1e-82,-1.908e+02,0.0000,2930.0,63.76%,22170.5,49.58%,motif file (matrix),svg


In [46]:
homer_results(homer_dict, 'Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K', results='denovo')

0,1,2,3,4,5,6,7,8
Rank,Motif,P-value,log P-pvalue,% of Targets,% of Background,STD(Bg STD),Best Match/Details,Motif File
1,T  C  G  A  A  G  C  T  C  A  G  T  C  T  A  G  G  A  T  C  T  C  A  G  G  T  A  C  G  T  C  A  C  T  G  A  A  G  C  T  G  T  A  C  T  A  C  G,1e-1721,-3.963e+03,62.81%,9.40%,55.1bp (152.9bp),NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer(0.918) More Information | Similar Motifs Found,motif file (matrix)
2,A  C  G  T  A  G  C  T  A  C  T  G  A  G  T  C  C  T  G  A  C  G  T  A  A  C  G  T  G  T  A  C,1e-196,-4.516e+02,22.00%,7.81%,123.0bp (154.6bp),Ddit3::Cebpa/MA0019.1/Jaspar(0.770) More Information | Similar Motifs Found,motif file (matrix)
3,C  T  A  G  C  A  T  G  A  T  G  C  T  G  A  C  G  T  C  A  T  C  G  A  C  T  G  A  C  A  T  G  C  A  T  G  A  G  C  T  A  G  T  C  G  T  C  A,1e-133,-3.076e+02,38.02%,21.94%,116.4bp (147.3bp),"PPARa(NR),DR1/Liver-Ppara-ChIP-Seq(GSE47954)/Homer(0.938) More Information | Similar Motifs Found",motif file (matrix)
4,A  C  G  T  C  T  A  G  A  C  G  T  A  C  G  T  C  A  G  T  C  T  G  A  A  T  G  C  G  A  T  C  G  C  T  A  C  G  T  A,1e-85,-1.964e+02,38.48%,25.29%,129.6bp (152.7bp),FOXM1(Forkhead)/MCF7-FOXM1-ChIP-Seq(GSE72977)/Homer(0.922) More Information | Similar Motifs Found,motif file (matrix)
5,T  A  G  C  C  T  G  A  A  G  C  T  A  C  G  T  C  A  T  G  T  A  C  G  G  T  A  C  G  A  T  C,1e-50,-1.152e+02,49.84%,38.95%,131.9bp (150.3bp),NFY(CCAAT)/Promoter/Homer(0.860) More Information | Similar Motifs Found,motif file (matrix)
6,C  T  G  A  C  T  A  G  C  A  T  G  C  G  A  T  T  A  G  C  G  T  C  A  T  C  G  A  C  T  G  A  A  C  T  G  A  G  C  T  A  T  G  C  G  A  C  T,1e-48,-1.121e+02,27.73%,18.78%,135.3bp (146.2bp),Hnf4a/MA0114.3/Jaspar(0.793) More Information | Similar Motifs Found,motif file (matrix)
7,A  C  T  G  A  G  T  C  G  T  A  C  G  T  C  A  A  C  T  G  T  A  G  C  A  T  C  G  C  G  A  T,1e-46,-1.080e+02,46.79%,36.36%,134.5bp (150.6bp),HIC1(Zf)/Treg-ZBTB29-ChIP-Seq(GSE99889)/Homer(0.760) More Information | Similar Motifs Found,motif file (matrix)
8,C  T  G  A  G  T  C  A  C  G  A  T  T  A  G  C  C  G  T  A  C  A  G  T  A  C  G  T  C  G  T  A  T  C  G  A  A  G  T  C,1e-46,-1.061e+02,13.56%,7.43%,128.4bp (152.5bp),HNF1b(Homeobox)/PDAC-HNF1B-ChIP-Seq(GSE64557)/Homer(0.893) More Information | Similar Motifs Found,motif file (matrix)
9,C  T  A  G  C  T  G  A  G  C  A  T  C  G  A  T  C  A  T  G  T  A  G  C  G  T  C  A  C  A  G  T  G  A  T  C  C  G  T  A  C  T  A  G  A  T  G  C,1e-36,-8.461e+01,3.29%,0.95%,128.6bp (150.7bp),Atf4(bZIP)/MEF-Atf4-ChIP-Seq(GSE35681)/Homer(0.838) More Information | Similar Motifs Found,motif file (matrix)


You can also access the regions enriched for each motif (use known_motif_hits for known motifs; and denovo_motif_hits for de novo motifs):

In [47]:
homer_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].known_motif_hits['CEBP(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer'][0:10]

['chr10:89748570-89749071',
 'chr10:111335980-111336481',
 'chr4:45495781-45496282',
 'chr19:30170213-30170714',
 'chr10:121129224-121129725',
 'chr2:103492434-103492935',
 'chr2:26600492-26600993',
 'chr4:145280844-145281345',
 'chr13:81329746-81330247',
 'chr13:96742830-96743331']

To access cistromes (use known_cistromes for cistromes based on known motifs; and denovo_cistromes for cistromes based on de novo motifs):

In [52]:
homer_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].denovo_cistromes['Cebpa_(2886r)'][0:10]

['chr10:89748570-89749071',
 'chr10:111335980-111336481',
 'chr8:70544122-70544623',
 'chr19:30170213-30170714',
 'chr10:121129224-121129725',
 'chr2:103492434-103492935',
 'chr2:26600492-26600993',
 'chr4:145280844-145281345',
 'chr1:193289929-193290430',
 'chr13:81329746-81330247']

You can easily export cistromes to a bed file:

In [53]:
from pycistarget.utils import *
cebpa_cistrome_pr = pr.PyRanges(region_names_to_coordinates(homer_dict['Cebpa_ERR235722_summits_order_by_score_extended_250bp_top5K'].denovo_cistromes['Cebpa_(2886r)']))
cebpa_cistrome_pr.to_bed(path='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycistarget_tutorial/Homer/cebpa_cistrome_example.bed')