# Task 2: Quantify Sample Differences in ACE2-Spanning TAD
*Determine the median distance between the 3' and 5' edges of individual, tissue-specific TADs (FIREs) and a reference point, for both males and females separately.*  
**Direction of Transcription:** In the UCSC Genome Broswer, arrowheads point towards the direction of transcription (5' → 3').  
- Upstream: Towards the **5′**-end  
- Downstream: Towards the **3′**-end  

**Tissue Samples:** From the [FIREs study](https://www.cell.com/cell-reports/pdfExtended/S2211-1247(16)31481-4), custom tracks were generated for the following tissue samples:  

Male                | Female          
:------------------:|:---------------------:
Bladder Tissue      | Adrenal Tissue  
Cortex Tissue       | Aorta         
Hippocampus Tissue  | Lung Tissue            
Left Ventricle      | Ovary Tissue           
Liver               | Pancreas Tissue        
Lung Tissue         | Small Intestine Tissue 
Psoas Muscle Tissue |                        
Right Ventricle     |                        
Spleen              |                        
Thymus              |                        
Pancreas Tissue     |                        

![Custom Tracks](attachment:custom.PNG)  

## Step-By-Step Walkthrough:
### 1. Define a reference point:
> Using UCSC Genome Browser's Hi-C track, the midpoint of the ACE2-spanning TAD was estimated and chosen as the reference point.  
![Reference Region](attachment:2020-05-14.png)  
> *Midpoint of chrX:15,261,567-15,756,569:* **chrX:15,509,068**  

### 2. Obtain the coordinates for the 5'- and 3'-edges of each TAD:
> Refer back to the original BED files and obtain the *chromStart* and *chromEnd* chromosomal coordinates.  
> *This was done by visual inspection in conjuction with the command line:* First, a filter was applied to chrX, then the approximate start position of the feature was located, and finally confirmation was obtained by verifying the feature score.
>
> **Example:**  
>
![Aorta Example](attachment:2020-05-15.png)  
>
> In this case, we are looking for the TAD beginning and ending at approximately 15,300,000 bp and 15,600,000 bp, respectively. Confirmation will occur by ensuring a score of 1.  
>
![Capture-aorta-example.PNG](attachment:Capture-aorta-example.PNG)

#### Female Sample Analysis:
Tissue                   | 5'-edge | 3'-edge    
:-----------------------:|:-------:|:-------:
Adrenal Tissue           |chrX:15,750,000|chrX:15,300,000
Aorta                    |chrX:15,650,000|chrX:15,300,000
Lung Tissue              |chrX:17,100,000|chrX:15,200,000
Ovary Tissue             |chrX:15,900,000|chrX:15,300,000
Pancreas Tissue          |chrX:15,850,000|chrX:15,350,000
Small Intestine Tissue   |chrX:16,400,000|chrX:15,250,000

#### Male Sample Analysis:
Tissue              | 5'-edge | 3'-edge    
:------------------:|:-------:|:-------:
Bladder Tissue      |chrX:16,700,000|chrX:15,300,000 
Cortex Tissue       |chrX:15,650,000|chrX:15,250,000         
Hippocampus Tissue  |chrX:15,750,000|chrX:15,250,000
Left Ventricle      |chrX:15,750,000|chrX:15,250,000            
Liver               |chrX:15,700,000|chrX:15,250,000      
Lung Tissue         |chrX:15,750,000|chrX:15,250,000
Psoas Muscle Tissue |chrX:15,750,000|chrX:15,250,000
Right Ventricle     |chrX:15,700,000|chrX:15,250,000                        
Spleen              |chrX:15,650,000|chrX:15,300,000
Thymus              |chrX:15,650,000|chrX:15,250,000
Pancreas Tissue     |chrX:15,800,000|chrX:15,250,000

### 3. Plot the data in R Studio:
*Basic R Graphing Resources:*  
1) https://www.r-graph-gallery.com/index.html  
2) http://www.sthda.com/english/wiki/ggplot2-box-plot-quick-start-guide-r-software-and-data-visualization  
3) https://ggplot2.tidyverse.org/reference/geom_boxplot.html

#### Upload the excel file with chromosomal coordinates and sample information (Tissue, Sex, chrom, chromStart, chromEnd): 

#### Add reference point data:

#### Split the table for male and female data:

#### Add 5'- and 3'-distance data:

#### Generate plots:
*Histograms:*

Sex-Independent | Male | Female
:--------------: |:----------------: | :----------------: 
![Rplot-h1.png](attachment:Rplot-h1.png)|![Rplot-h3.png](attachment:Rplot-h3.png)|![Rplot-h5.png](attachment:Rplot-h5.png)
![Rplot-h2.png](attachment:Rplot-h2.png)|![Rplot-h4.png](attachment:Rplot-h4.png)|![Rplot-h6.png](attachment:Rplot-h6.png)

*Basic Boxplot of Distance Distribution:* 

Sex-Independent | Sex-Dependent
:--------------: |:----------------: 
![Boxplot 2 Samples](attachment:Rplot-offlala.png)| ![Boxplot 4 Samples](attachment:Rplotoff.png)
![Summary 2 Samples](attachment:stat2.PNG)        | ![Summary 4 Samples](attachment:stat1.PNG) 

*Annotated Boxplots of Distance Distribution:* Includes mean value (red), outliers (blue), and individual observations (black).  

1. Create new columns for distance data:

2. Calculate means: 

3. Plot graphs:

5'-Edge Distance | 3'-Edge Distance
:--------------: |:----------------: 
![5' Distance](attachment:Rplotoff2.png)| ![3' Distance](attachment:Rplotoff3.png)  

#### Compute variances:
- Use **var(x)** for sample variance   
- Use **var_pop(x)** for population variance: var_pop <- function(x) {mean((x - mean(x))^2)}  

#### Perform an analysis of variance (ANOVA, Levene, and Kruskal Tests):
- Useful resrouce: http://www.sthda.com/english/wiki/one-way-anova-test-in-r

![Capturelala.PNG](attachment:Capturelala.PNG)
![summary.PNG](attachment:summary.PNG)
![summary2.PNG](attachment:summary2.PNG)
![summary3.PNG](attachment:summary3.PNG)

# FANTOM5
### A database describing the regulatory landscape of the mammalian transcriptome.
- Provides an atlas of promoters, enhancers, and transcription start sites (TSSs) across diverse cell types.  
- Includes shortcut links to COVID-19/SARS-CoV-2 related genes, including [ACE2](https://fantom.gsc.riken.jp/5/sstar/EntrezGene:59272).  

**Tutorial Links:**
1. FANTOM5:
    - https://fantom.gsc.riken.jp/5/  
    - https://www.slideshare.net/SumitMiddha/nature-article-a-promoterlevel-mammalian-expression-atlas-fantom5  
2. Zenbu: https://fantom.gsc.riken.jp/zenbu/  
3. SSTAR:  

**ACE2 Analysis:**  
- There are three (3) TSS regions, all with highest CAGE expression for **small intestine** samples.

# GTEx
### A database for tissue-specific gene expression and regulation.
**ACE2 Analysis:**  
- There are high levels of expression in **testis**, a possible indicator of sex-bias.  
- Gene expression can be split between subsets (*Sex*), which demonstrate approximately identical levels of median TPM for male and female samples of the same tissue.  
