# Task 4: Determine the Significance of Regulatory Elements in the ACE2-Spanning TAD
## Overview of Goals:
1) Perform a TF Enrichment Analysis (TFEA)  
2) Tag super-enhancers (SEs), i.e. large clusters of enhancers with unusually high levels of TF binding  
3) Identify ACE2 promoters  
4) Compare the selected region to literature reports on ACE2 regulation  

**Note:** The precise genomic coordinates of the region (± 50kb) are:  
*3'-edge (crhomStart):* chrX:15,200,000 (i.e. chrX:15,509,068 - 259,068 bp - 50,000 bp)  
*5'-edge (chromEnd):* chrX:15,800,000 (i.e. chrX:15,509,068 + 240,932 bp + 50,000 bp)  

### Step 1: TF Enrichment Analysis (TFEA):
- The [ENCODE ChIP-Seq Significance Tool](http://encodeqt.simple-encode.org/) and [ChEA3](https://amp.pharm.mssm.edu/chea3/) are common resoureces for enrichment analysis on transcription factors given sets of genes of interest.  

- Resources:  

1) http://jordan.biology.gatech.edu/lulu/tf-enrichment.html  
2) https://www.youtube.com/watch?v=JD1Hw676SxA  
3) https://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC2944209&blobtype=pdf  

##### Analysis based on existing algorithms:
***ChEA3***
- From [Landscape of X chromosome inactivation across human tissues](https://www.nature.com/articles/nature24265) and filtering for ACE2-spanning TAD:

![X-Escapees-ACE2-TAD.PNG](attachment:X-Escapees-ACE2-TAD.PNG)
![ACE2-X-Escapee.PNG](attachment:ACE2-X-Escapee.PNG)


##### Analysis based on the length of the region (600,000 bp):
1) Choose random regions on the X chromosome and autosomes with approximately the same number of genes in the ACE2-spanning TAD (10):  

Location|Region ID|Number of Genes
:------:|:-------:|:-------------:
chrX:13,490,000-14,090,000|Region #1|7
chrX:12,776,138-13,376,138|Region #2|8
chrX:100,492,766-101,092,766|Region #3|10
chr12:21,140,000-21,740,000|Region #4|9
chr5:157,066,922-157,666,922|Region #5|8
chr9:5,424,554-6,024,554|Region #6|8


2) Perform the following for each region:  
    - Retrieve UniBind TF data for each region, as outlined in Task3. Follow the instructions to obtain the names of transcription factor and save regions in different sheets of the same Excel file.  
    
![t4-4.PNG](attachment:t4-4.PNG)

    - Statistically summarize the data using R, as outlined in Task3.  

![t4-plot1.png](attachment:t4-plot1.png)

## Option 1:
![t4-plot2.png](attachment:t4-plot2.png)
## Option 2:
![t4-plot3.png](attachment:t4-plot3.png)
## Option 3:
![t4-plot5.png](attachment:t4-plot5.png)

![t4-plot4.png](attachment:t4-plot4.png)

![t4-plot10.png](attachment:t4-plot10.png)
![t4-plot7.png](attachment:t4-plot7.png)  
![t4-plot8.png](attachment:t4-plot8.png)  
![t4-plot9.png](attachment:t4-plot9.png)  

![t4-img6.PNG](attachment:t4-img6.PNG)

## Analysis:
- There are no TFs that appear only in the ACE2-spanning TAD and nowhere else.

### Step 2: [Super-Enhacer (SE) Labelling](https://omictools.com/super-enhancer-data-category):
1) [SEdb](http://www.licpathway.net/sedb/index.php) & [SEAnalysis](http://www.licpathway.net/SEanalysis/?tdsourcetag=s_pctim_aiomsg)  
2) [dbSUPER](http://asntech.org/dbsuper/index.php)  
3) [EnhancerDB](http://lcbb.swjtu.edu.cn/EnhancerDB/)  
4) Super-Enhancer Archive [(SEA)](http://sea.edbc.org/)

**dbSUPER:**  
- Using the dbSUPER data for individual cell/tissue type [(hg19)](http://asntech.org/dbsuper/download.php), custom tracks were created in UCSC for each sample:  

dbSUPER Tracks in USCS Genome Browser (1)|dbSUPER Tracks in USCS Genome Browser (2)
:------------------------------------:|:----------------------------:
![t4-img8.PNG](attachment:t4-img8.PNG)|![t4-img9.PNG](attachment:t4-img9.PNG)  

- All other tracks were set to "hide" except for those with SEs in the ACE2-spanning TAD: 
    - CD3  
    - CD4 Naive Primary 8pool  
    - CD56  
    - HMEC  

![t4-img10.PNG](attachment:t4-img10.PNG)  

**SEdb:**  
- Using the [SEdb Genome Browser](http://licpathway.net/sedb/genome-growser.php), all other *Super-enhancer* tracks were set to "hide" except for those with SEs in the ACE2-spanning TAD:  
    - B-cell_ENCODE  
    - CD4donorE  
    - CD8_T_cells  
    - CLB-MA  
    - Colo741  
    - HCC1954_LCC2  
    - IMR32   
    - LAN1   
    - SET2  
    - SH-SY5Y_DMSO_3h  
    - T-cell  
    - UACC257  
    - endothelial-cell-of-umbilical-vein  
    - large-intestine_108d  
    - mammary-epithelial-cell  
    - patient_119  
    - spleen  
    
![T4-IMG11.PNG](attachment:T4-IMG11.PNG)

- Using SEdb's CGI interface we can also locate SEs that overlab with user-submitted genome location (**http://www.licpathway.net/sedb/search/overlap_cgi.php?chr=(Chromosome number)&start=(Genome start position)&end=(Genome end position)**)  

![2020-06-01%20%282%29.png](attachment:2020-06-01%20%282%29.png)  

- Using UCSC Genome Browser, combine SE data from both dbSUPER and SEdb into a custom track:
![t4-img12.PNG](attachment:t4-img12.PNG)  
![t4-img13.PNG](attachment:t4-img13.PNG)

### Step 3: ACE2 Promoter Identification:
- From [Human promoters in EPDnew](https://epd.epfl.ch/search_EPDnew.php?query=ACE2&db=human):  
1) ACE2_1: chrX:15600950-15601009 (chrX:15,600,960)  
2) ACE2_2: chrX:15602145-15602204 (chrX:15,602,155)  

### Step 4: Literature Review:
**KEY CONCEPTS:**
- Identification of relative location: Positions upstream are negative numbers counting back from -1, for example -100 is a position 100 base pairs upstream)  
    - Upstream (5', -ve, ->)  
    - Downstream (3', +ve, <-)  

1) [Epigenetic regulation of angiotensin-converting enzyme 2 (ACE2) by SIRT1 under conditions of cell energy stress](https://portlandpress-com.ezproxy.library.ubc.ca/clinsci/article/126/7/507/69210/Epigenetic-regulation-of-angiotensinconverting)  
> - ChIP analysis demonstrated that SIRT1 bound to the ACE2 promoter and the expression of the ACE2 transcript is controlled by the activity of SIRT1 under conditions of energy stress.    
> - There is a CREB-binding site (cAMP-response element) at position 14455 bp upstream of the start of the ACE2 gene. This binding site is within a region of conserved transcription-factor-binding sites. A further 2662 bp upstream of the CREB-binding site is a region containing both a p300- (a CREB co-activator that relaxes the chromatin at the promoter region and recruits RNA polymerase II) and CREB-binding region.  
> - Binding of both CREB and SIRT1 to the promoter region of ACE2 was examined. No significant change in CREB binding to the promoter was seen in response to treatment. Treatment with AICAR increased the binding of SIRT1 to the promoter region of ACE2 approximately 10-fold compared with the control. Conversely, IL-1β treatment decreased the binding of SIRT1 to the promoter region of ACE2 approximately 8-fold.  
> - IL-1β up-regulates ACE2 transcription, but SIRT1 binding to the promoter region of ACE2 in IL-1β-treated cells was reduced. This suggests that the mechanism of SIRT1-mediated ACE2 regulation is dependent on co-ordinate conditions such as cofactor expression or the precise amount of SIRT1-mediated deacetylation.

2) [Epigenetic regulation of somatic angiotensin-converting enzyme by DNA methylation and histone acetylation](https://www-tandfonline-com.ezproxy.library.ubc.ca/doi/abs/10.4161/epi.6.4.14961)  
> - The expression level of somatic ACE is modulated by CpG-methylation and histone deacetylases inhibition. The basal methylation pattern of the promoter of the ace-1 gene is cell-type specific and correlates to sACE transcription. DNMT inhibition is associated with altered methylation of the ace-1 promoter and a cell-type and tissue-specific increase of sACE mRNA levels.

3) [Identifying the regulatory element for human angiotensin-converting enzyme 2 (ACE2) expression in human cardiofibroblasts](https://www-sciencedirect-com.ezproxy.library.ubc.ca/science/article/pii/S0196978111003287)
> - The sequence ATTTGGA within the −516/−481 domain of the ace2 promoter is a significant binding element for which Ang II is the responsive element in HCFs.  
> - The sequence ATTTGGA is a potential binding domain for the transcriptional factor Ikaros. Ikaros was originally found to function as a key regulator of lymphocyte differentiation.  

4) [DNA Methylation Analysis of the COVID-19 Host Cell Receptor, Angiotensin I Converting Enzyme 2 Gene (ACE2) in the Respiratory System Reveal Age and Gender Differences](https://www.preprints.org/manuscript/202003.0295/v1)  
> -  DNA methylation levels at loci related to the ACE2 gene was varied across tissue cell types. Notably, DNA methylation
across three CpGs (cg04013915, cg08559914, cg03536816) was lowest in lung epithelial cells compared to the other tissue cell types.
>> - **Transcription and expression were highest in cell types where DNA methylation was lowest. Conversely, hypermethylation was observed  in cell types with excluded ACE2 transcription and protein expression.**  
>> - **Analysis of DNA methylation at two CpG sites related to the ACE2 gene showed that females were significantly hypomethylated compared to males.  Notably, the ACE2 gene is located on the X chromosome raising the possibility of methylation differences due to X chromosome activation**   
> - In human lung tissues, gender differences in DNA methylation at 2 sites related to the ACE2 gene were identified.  
> - In freshly isolated airway epithelial cells, DNA methylation near the transcription start site of the ACE2 gene associated with biological age.  
> - Estrogen receptor signaling is critical for protection in females.  

5) [EZH2-mediated H3K27me3 inhibits ACE2 expression](https://www-sciencedirect-com.ezproxy.library.ubc.ca/science/article/pii/S0006291X20307087)  
> - EZH2 represses ACE2 expression in human embryonic stem cells by mediating H3K27me3 at human ACE2 promoter region.  

6) [Applied bioinformatics for the identification of regulatory elements](https://www.nature.com/articles/nrg1315)
> - Sequences near a TSS are more likely to contain functionally important regulatory controls than those that are more distal.  

# Analysis:

## GeneCards:
1) Promoter:  
- ACE2_1 (chrX:15600950-15601009)  
- ACE2_2 (chrX:15602145-15602204)  
**Transcription Factor Binding Sites:**

1|2|3|4
:-------:|:-------:|:-------:|:-------:
NCOR1|TCF7|FOXA2|CEBPB
MAFF|USF1|SAP130|JUN
GATA3|FOXP1|SMAD4|JUND
BCL6|HOMEZ|RXRA|RARA
KDM1A|NR2F6|HMG20A|HNF4A
RFX1|TEAD3|ZNF614|ARID3A
SOX5|MIXL1|SMARCE1|CEBPG
ZNF384|SP1|GATAD2A|EP300
RXRB|FOXA1|FOXA3|SOX13
MNT|CEBPA

*Top Transcription factor binding sites by QIAGEN in the ACE2 gene promoter:*  
> AP-1 | c-Fos | c-Jun | FOXO1 | FOXO1a | GATA-1 | IRF-2 | MRF-2 | Nkx2-5 | YY1  

2) Transcription Start Site: chrX:15602148-15602148  
3) Interactions Region: chrX:15,579,745-15,602,148 

## [Bioinformatic characterization of angiotensin-converting enzyme 2, the entry receptor for SARS-CoV-2](https://www.biorxiv.org/content/10.1101/2020.04.13.038752v1.full)
> - The small intestine expressed higher levels of ACE2 than any other organ. The large intestine, kidney and testis showed moderate signals, whereas the signal was weak in the lung specimens.  
> - A novel tool for the prediction of transcription factor binding sites identified several putative sites for determined transcription factors within the ACE2 gene promoter. **Our results also confirmed that age and gender play no significant role in the regulation of ACE2 mRNA expression in the lung.** 
>
> TFBS analysis of the ACE2 intestinal transcript promoter (ENST00000252519) revealed several candidate binding sites which occur in a cluster extending from 400 bp upstream of the transcription start site; CDX2, HNF1A, FOXA1, SOX4, TP63, HNF4A, DUX4, FOXA2, NR2F6, and SOX11.  
>> -  In several tissues these TFs are found to be highly positively correlated (>0.7) with expression of ACE2: CDX2 (colon, terminal ileum), HNF1A (colon, kidney, terminal ileum), FOXA1 (cervix, colon, terminal ileum), HNF4A (colon, terminal ileum), FOXA2 (colon, kidney), NR2F6 (colon, kidney, terminal ileum), and SOX11 (kidney). In addition, two of the TFs are highly negatively correlated with ACE2 expression DUX4 (kidney) and FOXA1 (kidney).  
>
> Analysis of the ACE2 lung transcript promoter (ENST00000427411) produced putative TFBS predictions for ESRRA, HNF4A, CDX2, CEBPA, ESRRB, MEF2B, TCF7, TCF7L2, JUN, and LEF1.
>> - The TFs corresponding to predicted TFBSs, which are positively correlated (>0.7) with ACE2 expression, are ESRRA (terminal ileum, colon), HNF4A (terminal ileum, colon), CDX2 (colon, terminal ileum), CEBPA (colon, terminal ileum), ESRRB (cervix), TCF7L2 (testis). Those TFBSs with TFs which strongly (<-0.7) negatively correlate with ACE2 are ESRRA (kidney) and TCFL72 (kidney).  
>
> - Common between the two tissue-specific transcripts, are predictions for CDX2 and HNF-family transcription factors.  

## [The transcription factor HNF1α induces expression of angiotensin-converting enzyme 2 (ACE2) in pancreatic islets from evolutionarily conserved promoter motifs](https://www-sciencedirect-com.ezproxy.library.ubc.ca/science/article/pii/S1874939913001405?via%3Dihub)
> - In this study, we describe the structure of the human ACE2 gene promoter and demonstrate that there are three functional, evolutionarily conserved motifs in the proximal part of the promoter, capable of binding both HNF1α and HNF1β. Both transcription factors induce expression of ACE2 mRNA, leading to elevated levels of ACE2 protein and ACE2 enzymatic activity in insulinoma cells. Overexpression of HNF1α dose-dependently increases ACE2 expression in primary cells from pancreatic islets.  
>> - As there are at least two transcription start sites for both the human and the mouse ACE2 genes, the bases were numbered relative to the translation start site. Two regions of the human sequence, − 1509/− 928 and − 454/− 1, have homologous regions with genomic regions of all the analyzed placental mammals. The homologous sequences were therefore identified as distal and proximal promoter regions. In the human genome, the ACE2 promoter is bipartite with the two promoter regions separated by an Alu element.  
>> -  For both humans and mice, there exist ACE2 mRNAs with transcriptional initiation in the proximal promoter region and mRNAs with initiation in the distal promoter region.  
>> - According to the UCSC genome browser for the ENCODE project, the X-chromosome coordinates for exon 1a, exon 1, and the splice acceptor site in exon 1 are 15620278–15620079, 15619202–15618849, and 15619137 for the human genome (Feb. 2009, GRCh37/hg19 assembly).  
>
> ![ACE2-Promoter.jpg](attachment:ACE2-Promoter.jpg)
> (E) Sequences of the conserved distal and proximal promoter regions with potential transcription factor binding motifs. Green shading indicates bases conserved among all tested mammalian species. Pink shading indicates bases conserved among the tested placental mammals. Hyphens indicate positions, where other species have additional bases.  

## [Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5656262/)
*In the current study, we have demonstrated that several of the additional evolutionarily conserved motifs in the ACE2 gene promoter are cis-regulatory elements. Highly conserved motifs at positions −153/−144 and –101/−79 of the human ACE2 promoter are required for basal transcription in 832/13 cells. Although the −101/−79 motif can bind COUP-TFII, it binds a potential transcriptional activator in 832/13 cells whose identity remains elusive. On the other hand, the –153/−144 motif is a functional binding site for FOXA transcription factors. The major factor binding the −153/−144 motif in both 832/13 insulinoma cells and mouse pancreatic islets is FOXA2. We further observed a correlation between the FOXA2 mRNA and ACE2 mRNA in mouse pancreatic islets. Although we cannot exclude the possibility of other factors affecting both ACE2 and FOXA2 expression, the correlation is consistent with the notion that FOXA2 stimulates ACE2 expression.*
> - Most tissues, including pancreatic islets, express ACE2 mainly from the proximal promoter region.  
>> **Proximal promoter region** is the most active promoter in tissues such as **heart, pancreas, brain, and kidney**, whereas the **lung** has expression primarily from the **distal promoter region**.  
>> - We previously defined the distal (−1509/−928) and proximal (−454/−1) human ACE2 promoter regions based on homology with other mammalian species, yet there are other conserved elements outside these regions that may affect transcription.  
>> - Only the proximal ACE2 promoter region (−454/−1) shows strong promoter activity, suggesting that regions upstream of the proximal promoter region contain elements inhibiting transcription.  
>
> - By systematic mutation of conserved elements, we identified five regions affecting ACE2 expression, of which two regions bound transcriptional activators.  
> - The only independently verified induction of transcription by interaction between promoter elements and transcription factors binding to these elements are hepatocyte nuclear factors 1α and 1β (**HNF1α and HNF1β**), which bind to evolutionarily conserved motifs in the proximal ACE2 promoter region.  
>> - HNF1α and HNF1β stimulate ACE2 gene expression by binding to three highly conserved promoter motifs in the proximal promoter region.  
>
> - In several but not all tissues, the expression of ACE2 is higher in female than in male animals. The mechanism in mice was reported to be mediated by estrogen and an estrogen response element (ERE).
>> - An example is the DNA sequence AGGTCAAACTCTCTG at position −2986/−2972 of the mouse ACE2 promoter that has been identified as an ERE. The homologous sequence for the human ACE2 promoter is AGGTCAAACTTCCCT (−2639/−2625). *We conclude that these regions from the mouse and human ACE2 promoters are not bona fide EREs.*  
>
> - The AMP-activated protein kinase activator AICAR led to increased ACE2 expression and binding of the histone deacetylase SIRT1 to a well-conserved DNA element far upstream (~14.5 kb) of the ACE2 coding region.  
> - FOXA2 is the major transcription factor from 832/13 insulinoma cells and mouse pancreatic islets binding to the site.
>> - The region R6 contains the region −153/−144, which has high similarity to FOXA and FOXO binding motifs. The EMSA clearly demonstrates that FOXA1, FOXA2, and FOXA3 are all capable of binding to the R6 region and that FOXA2 is the most pronounced protein in 832/13 nuclear extracts that bind to the motif.  
>> - We conclude that the proximal ACE2 promoter region contains a **functional** FOXA binding site. We conclude that **FOXA2** is the major protein in mouse islets binding to the FOXA motif in the proximal ACE2 promoter region.   
>> - FOXA2 is a substrate for SIRT1 deacetylation. The AICAR effect in Huh7 cells was claimed to be mediated by a conserved element capable of binding SIRT1 that is located 14 kb upstream of the ACE2 promoter regions.  
>> - FOXA transcription factors have been reported to be involved in sexual dimorphism in gene regulation in liver cancer.
>> For a given concentration of FOXA2, there thus seems to be a higher expression of ACE2 mRNA in females than in males.  
>
> - In the context of −454/−1 human ACE2 promoter driving ACE2 expression, we mutated 14 conserved motifs.  
>> - Three of these mutations (−83/−88 in R4, −148/−153 in R6, and −178/−185 in R8) led to significant downregulation of expression, whereas two mutations (−39/−44 in R3 and −282/−287 in R12) increased expression.  
>> - The −101/−79 motif is highly conserved among mammalian species. According to the BKL TRANSFAC program, several transcription factors have binding motifs with similarity to this conserved ACE2 promoter region, including COUP-TFII, HNF4α, PPARγ, and MafA. 

## [High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6300699/)
> - Both enhancers and promoters have high DNA accessibility and low H3K27me3.  
>> - Distal enhancers show relatively higher H3K27ac and H3K4me1.  
>> - Promoters show relatively higher H3K9ac and H3K4me3.

# LINKING CIS-REGULATORY REGIONS USING TRANSCRIPTION FACTOR BINDING SIGNATURES by Michelle Kang
- In the context of this thesis, **promoters** are classified as regulatory regions overlapping the TSS(s) of a gene and **enhancers** are distal regulatory regions.  
- The tri-methylation (Me3) at lysine 4 (K4) of histone H3 (i.e. H3K4Me3) marks regions proximal to TSSs, while H3K4me1 marks enhancer regions, and the presence of H2K27ac distinguishes active from inactive enhancers.  
- Experimentally validated enhancer regions overlap a DNase I hypersensitive site (DHS).  
- The promoter region of each gene was generated by centering a 2 kb region around the strongest TSS of that gene.  

1) TSS regions: ***From FANTOM5 & [Bioinformatic characterization of angiotensin-converting enzyme 2, the entry receptor for SARS-CoV-2](https://www.biorxiv.org/content/10.1101/2020.04.13.038752v1.full):***  
-  Previous studies identified two distinct tissue specific transcription start sites (TSS) for intestine and lung expression, which correspond to primary protein-coding Ensembl transcripts ENST00000252519 and ENST00000427411, respectively.  
- The lung-specific transcript TSS aligns with the p3@ACE2 FANTOM5 dataset CAGE peak, which indicates that the expression of this transcript is much lower than the intestinal transcript, which corresponds with p1@ACE2 and p2@ACE2 FANTOM5 CAGE peaks.  

TSS peak ID|Short description|Tissue|Ensembl
:--------:|:--------------:|:------:|:--------:
Hg19::chrX:15619060..15619073|p1@ACE2|Intestine|ENST00000252519
Hg19::chrX:15619076..15619089|p2@ACE2|Intestine|ENST00000427411
Hg19::chrX:15620273..15620294|p3@ACE2|Lung|ENST00000427411

2) Regions of Interest (ROIs):  

- ACE2-spanning TAD: **chrX:15200000-15800000** 
- Upstream ACE2-spanning TAD:  
    - **chrX:15600960-15800000** (exluding ACE2)  
    - **chrX:15561033-15800000** (including ACE2)    
- Downstream ACE2-spanning TAD:  
    - **chrX:15200000-15561033** (excluding ACE2)  
    - **chrX:15200000-15600960** (including ACE2)  
- Excluding SEs:  

ID|SE Region|SE Source|Downstream from TAD|Upstream from TAD|Dowstream from SE|Upstream from SE
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
SE_20567|chrX:15766841-15779105|dbSUPER & SEdb|**chrX:15200000-15766841**|**chrX:15779105-15800000**|**chrX:15695650-15766841**|**chrX:15779105-15800000**
SE_36605|chrX:15680930-15695650|dbSUPER & SEdb|**chrX:15200000-15680930**|**chrX:15695650-15800000**|**chrX:15652280-15680930**|**chrX:15695650-15766841**
SE_02_16700684|chrX:15614888-15652280|SEdb|**chrX:15200000-15614888**|**chrX:15652280-15800000**|**chrX:15548243-15614888**|**chrX:15652280-15680930**
SE_02_33600312|chrX:15510060-15548243|SEdb|**chrX:15200000-15510060**|**chrX:15548243-15800000**|**chrX:15486318-15510060**|**chrX:15548243-15614888**
SE_02_30300099|chrX:15423865-15486318|SEdb|**chrX:15200000-15423865**|**chrX:15486318-15800000**|**chrX:15386782-15423865**|**chrX:15486318-15510060**
SE_02_28400489|chrX:15315472-15386782|SEdb|**chrX:15200000-15315472**|**chrX:15386782-15800000**|**chrX:15200000-15315472**|**chrX:15386782-15423865**

*Breaking down ROIs into 200-300 bp regions based on overlap with H3K4Me1/H2K27Ac/DHS peaks:*  
1) Identify peak centres visually;  
2) Compute ± 300 bp for enhancer regions;  

![ROIs_TFBS.PNG](attachment:ROIs_TFBS.PNG)

**After running this with Michelle's method, only regions #10, #11, and #18 seem to display TF binding activity. Note that the regions in the excel files include peaks within SEs, but were not flagged by this analysis.**  
![ROIs-Analysis-Results.PNG](attachment:ROIs-Analysis-Results.PNG)
![tsv-ROIs-Result.PNG](attachment:tsv-ROIs-Result.PNG)
![excel-ROIs-Result.PNG](attachment:excel-ROIs-Result.PNG)

Region ID|#chrom|chromStart|chromEnd|TFs
:-:|:-:|:-:|:-:|:-:
10|X|15548221|15548521|NFE2, CEBPB, CREB3,L3MBTL2, MAFG, USF1
11|X|15529517|15529817|IKZF1, MAFG, SMARCA4, TEAD4
18|X|15737860|15738160| MAFF, MAFK, NFE2

# Results:
- Common between the three enhancers are TFBS for the transcriptor family MAF.
- Common between regions #10 and #18 is NFE2.


# TargetFinder:
1) Install Anaconda on Ubuntu: [How to Install Miniconda on Ubuntu 18.04/16.04 Linux](https://www.osetc.com/en/how-to-install-miniconda-on-ubuntu-18-04-16-04-linux.html)  
2) Install bedtools and other required packages: [Installers](https://anaconda.org/bioconda/bedtools)  

conda install scikit-learn pandas numexpr pytables  
conda install -c bioconda bedtools


3) Run TargetFinder according to instructions from [GitHub](https://github.com/shwhalen/targetfinder):

**As root user (i.e. sudo -s):**  

apt-get update -y  
apt-get install -y python3-venv  
mkdir targetfinder_task4  
cd targetfinder_task4  
python3 -m venv task4_venv  
source task4_venv/bin/activate  

**This creates a [virtual environment](https://www.liquidweb.com/kb/creating-virtual-environment-ubuntu-16-04/), but you must deactivate it to continue (?).**  

deactivate  
cd /mnt/c/Users/bkirs/Documents/Wasserman/Task4  

*sudo apt-get install python3-pandas  
sudo apt-get install python3-sklearn python3-sklearn-lib* 

pyhton  
import urllib.request  
url = 'https://github.com/shwhalen/targetfinder'  
urllib.request.urlretrieve(url, '/mnt/c/Users/bkirs/Documents/Wasserman/Task4/targetfinder')  

**This did not work. I downloaded the files from GitHub into C:\Users\bkirs\Documents\Wasserman\Task4 instead and unzipped it using the command line.**  

exit()  
*cd /mnt/c/Users/bkirs/Documents/Wasserman/Task4/targetfinder-master.zip/targetfinder-master
unzip targetfinder-master.zip*  
sudo apt-get upgrade

*python3  
import pandas as pd  
from sklearn.model_selection import StratifiedKFold, cross_val_score  
from sklearn.ensemble import GradientBoostingClassifier  
nonpredictors = ['enhancer_chrom', 'enhancer_start', 'enhancer_end', 'promoter_chrom', 'promoter_start', 'promoter_end', 'window_chrom', 'window_start', 'window_end', 'window_name', 'active_promoters_in_window', 'interactions_in_window', 'enhancer_distance_to_promoter', 'bin', 'label']  
training_df = pd.read_hdf('paper/targetfinder/K562/output-epw/training.h5', 'training').set_index(['enhancer_name', 'promoter_name'])  
predictors_df = training_df.drop(nonpredictors, axis = 1)  
labels = training_df['label']  
estimator = GradientBoostingClassifier(n_estimators = 4000, learning_rate = 0.1, max_depth = 5, max_features = 'log2', random_state = 0)  
cv = StratifiedKFold(n_splits = 10, shuffle = True, random_state = 0)  
scores = cross_val_score(estimator, predictors_df, labels, scoring = 'f1', cv = cv, n_jobs = -1)  
print('{:2f} {:2f}'.format(scores.mean(), scores.std()))  
estimator.fit(predictors_df, labels)  
importances = pd.Series(estimator.feature_importances_, index = predictors_df.columns).sort_values(ascending = False)  
print(importances.head(16))*  

**Must use python3...**  

![Example-targetfinder.PNG](attachment:Example-targetfinder.PNG)

*Each cell line has 3 training datasets with their own subdirectories: one with features generated for the enhancer and promoter only (EP), one for promoters and extended enhancers (EEP), and one for promoters, enhancers, and the window between (EPW). For example, paper/targetfinder/HeLa-S3/output-eep contains training data for the HeLa-S3 cell line using promoter and extended enhancer features.*  
**Running EP:**
![Example-targetfinder.EP.PNG](attachment:Example-targetfinder.EP.PNG)

# [UniBind Enrichment Analysis](https://unibind.uio.no/enrichment/)
- Enrichment computations are performed using the [LOLA tool](http://code.databio.org/LOLA/).  

1) Install [BioConductor](https://www.bioconductor.org/install/) and [LOLA](https://bioconductor.org/packages/release/bioc/html/LOLA.html).  
2) Follow these instructions:  
https://bioconductor.org/packages/release/bioc/vignettes/LOLA/inst/doc/gettingStarted.html  
https://www.rdocumentation.org/packages/LOLA/versions/1.2.2
https://www.bioconductor.org/packages/release/bioc/vignettes/LOLA/inst/doc/usingLOLACore.html  
3) Download [examples](https://bitbucket.org/CBGR/unibind_enrichment/src/master/)  


# Compilation of Potential Cis-Regulatory Elements:
*Results derived mainly from [NCBI](https://www.ncbi.nlm.nih.gov/gene/59272):*

1) Promoters (hg38):

Genomic Coordinates|Source
:---:|:---:
chrX:15,600,911-15,603,000|[Identifying the Regulatory Element for Human Angiotensin-Converting Enzyme 2 (ACE2) Expression in Human Cardiofibroblasts](https://pubmed.ncbi.nlm.nih.gov/21864606/)
chrX:15,600,911-15,601,448|[Identifying the Regulatory Element for Human Angiotensin-Converting Enzyme 2 (ACE2) Expression in Human Cardiofibroblasts](https://pubmed.ncbi.nlm.nih.gov/21864606/)
chrX:15,600,912-15,601,365|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/) **&** [The Transcription Factor HNF1α Induces Expression of Angiotensin-Converting Enzyme 2 (ACE2) in Pancreatic Islets From Evolutionarily Conserved Promoter Motifs](https://pubmed.ncbi.nlm.nih.gov/24100303/)
chrX:15,600,911-15,601,092|[Identifying the Regulatory Element for Human Angiotensin-Converting Enzyme 2 (ACE2) Expression in Human Cardiofibroblasts](https://pubmed.ncbi.nlm.nih.gov/21864606/)
chrX:15,600,950-15,601,009|[EDPnew](https://epd.epfl.ch/search_EPDnew.php?query=ACE2&db=human)
chrX:15,602,145-15,602,204|[EDPnew](https://epd.epfl.ch/search_EPDnew.php?query=ACE2&db=human)
chrX:15,596,221-15,603,200|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,507,261-15,507,320|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,499,609-15,501,801|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,567,600-15,567,801|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)

2) Transcription Start Site (TSS):

TSS peak ID|Source
:--------:|:--------------:
Hg19::chrX:15,619,060-15,619,073|FANTOM5 **&** [Bioinformatic characterization of angiotensin-converting enzyme 2, the entry receptor for SARS-CoV-2](https://www.biorxiv.org/content/10.1101/2020.04.13.038752v1)
Hg19::chrX:15,619,076-15,619,089|FANTOM5 **&** [Bioinformatic characterization of angiotensin-converting enzyme 2, the entry receptor for SARS-CoV-2](https://www.biorxiv.org/content/10.1101/2020.04.13.038752v1)
Hg19::chrX:15,620,273-15,620,294|FANTOM5 **&** [Bioinformatic characterization of angiotensin-converting enzyme 2, the entry receptor for SARS-CoV-2](https://www.biorxiv.org/content/10.1101/2020.04.13.038752v1)
chrX:15,600,960|[Ensembl BioMart](https://www.ensembl.org/biomart/martview/95e9de53337916f91126c7627d1a6018) - Human genes (GRCh38.p13)
chrX:15,566,287|[Ensembl BioMart](https://www.ensembl.org/biomart/martview/95e9de53337916f91126c7627d1a6018) - Human genes (GRCh38.p13)
chrX:15,566,709|[Ensembl BioMart](https://www.ensembl.org/biomart/martview/95e9de53337916f91126c7627d1a6018) - Human genes (GRCh38.p13)
chrX:15,602,148|[Ensembl BioMart](https://www.ensembl.org/biomart/martview/95e9de53337916f91126c7627d1a6018) - Human genes (GRCh38.p13)
chrX:15,602,069|[Ensembl BioMart](https://www.ensembl.org/biomart/martview/95e9de53337916f91126c7627d1a6018) - Human genes (GRCh38.p13)
chrX:15,600,937-15,600,950|[refTSS](http://reftss.clst.riken.jp/reftss/EntrezGene:59272)
chrX:15,600,953-15,600,966|[refTSS](http://reftss.clst.riken.jp/reftss/EntrezGene:59272)
chrX:15,601,077-15,601,112|[refTSS](http://reftss.clst.riken.jp/reftss/EntrezGene:59272)
chrX:15,602,150-15,602,171|[refTSS](http://reftss.clst.riken.jp/reftss/EntrezGene:59272)

![lift_TSS.PNG](attachment:lift_TSS.PNG)

3) Enhancers:

Genomic Coordinates|Source|Type
:---:|:---:|:---:
chrX:15,674,387-15,674,890|[Genome-wide Prediction of Conserved and Nonconserved Enhancers by Histone Acetylation Patterns](https://genome.cshlp.org/content/17/1/74.long)|Acetylation island
chrX:15,602,801-15,603,200|[Ensembl](http://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000130234;r=X:15561033-15602148;time=1591338453074.074)
chrX:15,596,221-15,603,200|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,579,724-15,581,292|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,545,002-15,550,172|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,356,201-15,357,200|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,511,001-15,512,200|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,451,801-15,452,600|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,513,395-15,517,400|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,564,910-15,565,947|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,499,609-15,501,801|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,504,278-15,505,808|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,529,001-15,529,200|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,523,401-15,525,400|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,512,601-15,513,200|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,520,601-15,520,800|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)
chrX:15,513,218-15,513,367|[GeneHancer](https://www.genecards.org/cgi-bin/carddisp.pl?gene=ACE2&keywords=ace2&prefilter=genomic_location#genomic_location)

4) Cis-Regulatory Regions:

Genomic Coordinates|Construct|Source|Additional Information
:---:|:---:|:---:|:---:
chrX:15,601,412-15,601,447|-516/-481|[Identifying the Regulatory Element for Human Angiotensin-Converting Enzyme 2 (ACE2) Expression in Human Cardiofibroblasts](https://pubmed.ncbi.nlm.nih.gov/21864606/)|Sequence is  5′-ATTTGGA-3′, required for Ang II-activated transcription, potential binding domain for the transcriptional factor Ikaros. 
chrX:15,601,089-15,601,096|-454/-1|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|R8: m(-178/-185)
chrX:15,600,993-15,600,999|-454/-1|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|R4: m(-83/-88) - Binds COUP-TFII, HNF4α, Pax6, and MafA at chrX:15,600,987-15,601,013
chrX:15,600,950-15,600,955|-454/-1|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|R3: m(-39/-44)
chrX:15,601,059-15,601,064|-454/-1|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|R6: m(-148/-153) - Binds FOXA1, FOXA2, and FOXA3 at chrX:15,601,047-15,601,073
crhX:15,601,155-15,601,160|-454/-1|[The Transcription Factor HNF1α Induces Expression of Angiotensin-Converting Enzyme 2 (ACE2) in Pancreatic Islets From Evolutionarily Conserved Promoter Motifs](https://pubmed.ncbi.nlm.nih.gov/24100303/)
chrX:15,601,193-15,601,198|-454/-1|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|R12: m(-282/-287)
chrX:15,601,225-15,601,230|-454/-1|[The Transcription Factor HNF1α Induces Expression of Angiotensin-Converting Enzyme 2 (ACE2) in Pancreatic Islets From Evolutionarily Conserved Promoter Motifs](https://pubmed.ncbi.nlm.nih.gov/24100303/)
chrX:15,601,246-15,601,251|-454/-1|[The Transcription Factor HNF1α Induces Expression of Angiotensin-Converting Enzyme 2 (ACE2) in Pancreatic Islets From Evolutionarily Conserved Promoter Motifs](https://pubmed.ncbi.nlm.nih.gov/24100303/)
chrX:15,601,431-15,601,435|-516/+20|[Identifying the Regulatory Element for Human Angiotensin-Converting Enzyme 2 (ACE2) Expression in Human Cardiofibroblasts](https://pubmed.ncbi.nlm.nih.gov/21864606/)|Precise region within the -516/-481 construct
chrX:15,600,912-15,601,365|−101/−79|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|Region in the proximal promoter region (−454/−1) that is conserved in placental mammals, several transcription factors have binding motifs with similarity to this conserved ACE2 promoter region, including COUP-TFII, HNF4α, PPARγ, and MafA.
chrX:15,601,839-15,602,420||[The Transcription Factor HNF1α Induces Expression of Angiotensin-Converting Enzyme 2 (ACE2) in Pancreatic Islets From Evolutionarily Conserved Promoter Motifs](https://pubmed.ncbi.nlm.nih.gov/24100303/)|Region in the distal promoter region (-1509/-928) that is conserved in placental mammals
|−153/−144|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|Region in the proximal promoter region (-454/-1) that has high similarity to FOXA binding sites and is indeed a functional FOXA binding site
|−1363/−1337|[Forkhead Box Transcription Factors of the FOXA Class Are Required for Basal Transcription of Angiotensin-Converting Enzyme 2](https://pubmed.ncbi.nlm.nih.gov/29082356/)|Region in the distal promoter region (-1509/-928) that has high similarity to FOXA binding sites

For **Identifying the regulatory element for human angiotensin-converting enzyme 2 (ACE2) expression in human cardiofibroblasts**: *The promoter region was defined according to the position relative to the transcription start site (+1) in ACE2 mRNA sequence (GenBank no. AF_291820).*

# [SCREEN](https://screen.wenglab.org/): Search Candidate cis-Regulatory Elements by ENCODE
- Download SCREEN data and filter for chrX:15,200,000-15,800,000
- Add regions to Excel sheet and run Michelle's model once again

# Enhancer-Promoter Interaction Prediction Tools:
1. [TargetFinder](https://github.com/shwhalen/targetfinder)  
2. [FOCS](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1432-2): a novel method for analyzing enhancer and gene activity patterns infers an extensive enhancer–promoter map   
    - [FOCS R Script](http://acgt.cs.tau.ac.il/focs/)  
3. [EPIP](https://academic.oup.com/bioinformatics/article-abstract/35/20/3877/5549495?redirectedFrom=fulltext)  
4. [ABC Predictions](https://osf.io/uhnb4/)  
    - [Code](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  
5. "Custom Model:"**(dhs_activity*h3k27_activity*hi_frequency)/enhancer_promoter_distance**  

***Activity Level:*** Higher DHS and H3K27Ac enhancer activity is related to a greater likelihood of linkage to a promoter.  

***Hi-C Interaction Frequency:*** Higher levels of EP interaction in 3D is related to a greater likelihood of linkage.  

***Distance:*** Enhancers that are in closer proximity to a promoter are more likely to be linked to that promoter.  

6. [PSYCHIC](https://www.cs.huji.ac.il/~tommy/PSYCHIC/) 

**[omicX](https://omictools.com/psychic-tool):** Evaluates Hi-C data to identify enriched DNA-DNA interactions. PSYCHIC analyzes promoter-enhancer interactions through three steps: (1) it finds an optimal segmentation of each chromosome into topological domains via a unified probalistic model and a dynamic programming algorithm; (2) it iteratively combines neighboring domains into hierarchical structures and finally (3) it matches each domain by using a topologically association domain (TAD) -specific background model.  

**[Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5740158/)**
> Putative Enhancers-Promoter interactions for human (hg19) - From "grep ACE2" Supplementary Data 1 and 2  
>> - Hi-C data from 15 conditions and cell types in mouse and human to create a tissue-specific database of putative interactions between enhancers and their target genes (i.e. over-represented interactions up to 1 Mb from promoter regions).  
>> - Includes mouse cortex and embryonic stem cells, mouse embryonic stem cells, neural progenitor cells (NPC), and neurons, and mouse B-lymphoblast (CH12LX) cells, as well as human embryonic stem cells and lung fibroblast IMR-90 cells, GM12878 B-lymphoblastoid cells, and HMEC, HUVEC, IMR-90, K562, KBM7, and NHEK cells lines.  

Human|Mouse
:--:|:--:
hESC, IMR-90, GM12878, HUVEC, HMEC, K562, KBM7, NHEK| Cortex, mESC, CH12LX, MESC, MNPC, Neuron

>> - Overall, ***49% of the predicted enhancers are located within 120 Kb of their target promoters***, with ***only about 15% regulating the nearest gene*** (56% regulate one of the 5 nearest genes).  
>> - Naming convention: **.enh_p.bed bed file of over-represented pairs with FDR value < p, each line is of the format [chr start end], [gene, distance to enhancer, FDR, p-value, expected # of interactions, observed # of interactions]**

Name|FDR threshold
:--:|:--:
hES.enh_1e-10.bed|1e-10

Name|Gene promoter|Distance to putative enhancer|FDR|p-value|Expected number of interactions|Observed number of interactions
:-:|:-:|:-:|:-:|:-:|:-:|:-:
Shh:840Kb:3.2e-14:7.8e-16:2:15|Shh|840Kb|3.2e-14|7.8e-16|2|15
Foxg1:-40Kb:0:0:7.9:29|Foxg1|-40kb|0|0|7.9|29

![PSYCHIC_Excel.PNG](attachment:PSYCHIC_Excel.PNG)

**Converting from hg19 to hg38:**  
>> - FDR Threshold = 1e-10:
![PSYCHIC_UCSC_1e-10.PNG](attachment:PSYCHIC_UCSC_1e-10.PNG)
>> - FDR Threshold = 1e-4:
![PSYCHIC_UCSC_1e-4.PNG](attachment:PSYCHIC_UCSC_1e-4.PNG)
>> - FDR Threshold = 1e-2:
![PSYCHIC_UCSC_Unique.PNG](attachment:PSYCHIC_UCSC_Unique.PNG)

7. [Inter-chrom](http://promoter.bx.psu.edu/hi-c/iview.php) - Visualize Hi-C interactions between genes/regions  

8. [EpiRegio](https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkaa382/5847772?guestAccessKey=c129a37f-20ce-4c8f-806e-549ea8a7bb64): analysis and retrieval of regulatory elements linked to genes  
> - Web Server: https://epiregio.de/


[The Hitchhiker's Guide to Hi-C Analysis: Practical guidelines](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347522/) - Guidelines for analyzing and interpreting data obtained with genome-wide 3C methods such as Hi-C and 3C-seq that rely on deep sequencing to detect and quantify pairwise chromatin interactions genome-wide.  
> Large-scale genomic conformations (e.g. using Hi-C and a traditional 6bp-cutting enzyme) define genomic compartments, while specific small-scale interaction patterns (using 3C/4C/5C and a 4bp-cutting enzyme) define promoter-enhancer looping.
>


# Super-Enhancer Identification (cont.) - [Blacklisted genomic regions for functional genomics analysis](https://sites.google.com/site/anshulkundaje/projects/blacklists)
- Human (hg38) - https://www.encodeproject.org/files/ENCFF356LFX/  
***No "blacklisted" regions overlap the ACE2-spanning TAD.***

# Comparing CREs by Source:

![HIst_ACE2_CREs.png](attachment:HIst_ACE2_CREs.png)

# UCSC View:
- Compilation of predicted ACE2-related enhancers/enhancer regions from Ensembl, GeneHancer, ENCODE, NCBI, Regulatory Elements DB (ENCODE), FANTOM5, PSYCHIC, EpiRegio, and SCREEN.  
- Super-enhancer regions are highlighted in yellow.  

![CREs_minus_SEs.PNG](attachment:CREs_minus_SEs.PNG)

## Removing all SEs (from SEdb and dbSUPER):
- The "super-enhancer-free block" that almost entirely overlaps ACE2 contains the putative enhancer region stipulated by PSICHIC at a FDR threshold of 1e-10. It also contains all of the NCBI-predicted ACE2 enhancers.

**Region 1:** - chrX:15487047-15509502  
**Region 2:** - chrX:15548243-15614139  
**Region 3:** - chrX:15652280-15670954  
**Region 4:** - chrX:15695650-15766841  

![CREs-Minus-SEs.png](attachment:CREs-Minus-SEs.png)

## Removing only SEs from dbSUPER:

**Region 1:** - chrX:15331879-15680930  
**Region 2:** - chrX:15695650-15766841  

![CREs-Minus-SEdb.png](attachment:CREs-Minus-SEdb.png)

## Without removing SEs:

**Region:** - chrX:15331879-15701302  

![CREs-Plus-SEs.png](attachment:CREs-Plus-SEs.png)

![Rplot-Pct_CREs-3_Regions-Bubble.png](attachment:Rplot-Pct_CREs-3_Regions-Bubble.png)

![Rplot-Pct_CREs-3_Regions-Segment.png](attachment:Rplot-Pct_CREs-3_Regions-Segment.png)