# DML Analysis

In this notebook, I will examine the location of differentially methylated loci (DMLs) in the *C. virginica* genome. The DMLs were identified using methylKit in [this R script](https://github.com/RobertsLab/project-virginica-oa/blob/master/analyses/2018-05-29-MethylKit-Full-Samples/2018-05-29-MethylKit-Analysis-Full-Samples.R). DMLs were then written out as a [bedfile](https://github.com/RobertsLab/project-virginica-oa/blob/master/analyses/2018-05-29-MethylKit-Full-Samples/2018-05-30-DML-Locations.bed). Using this file, I will begin the analysis derived from [Steven's  notebook](https://github.com/sr320/nb-2018/blob/master/C_virginica/21-Bedtools.ipynb).

1. Locate DMLs
2. Identify gene ontology
3. Locate CGs
4. Identify transposable elements
5. Gene flanking

## 0. Set working directory

In [1]:
pwd

/Users/yaamini/Documents/project-virginica-oa/notebooks


In [11]:
cd ../analyses/

/Users/yaamini/Documents/project-virginica-oa/analyses


In [12]:
pwd

'/Users/yaamini/Documents/project-virginica-oa/analyses'

In [13]:
!mkdir 2018-06-11-DML-Analysis

In [14]:
ls

0516_bismark.err                     [34m2018-05-04-Bismark-Full-Samples[m[m/
0516_bme.err                         [34m2018-05-22-Bismark-Full-Samples[m[m/
0516_dedup.err                       [34m2018-05-22-Bismark-Subset[m[m/
[34m2018-01-23-MBDSeq-Labwork[m[m/           [34m2018-05-29-MethylKit-Full-Samples[m[m/
[34m2018-04-26-Gonad-Methylation-FastQC[m[m/ [34m2018-06-11-DML-Analysis[m[m/
[34m2018-04-27-Bismark[m[m/                  README.md
[34m2018-05-01-MethylKit[m[m/


In [15]:
cd 2018-06-11-DML-Analysis/

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-06-11-DML-Analysis


## 1. Locate DMLs

To identify the location of DMLs in the *C. virginica* genome, I will use `intersect` from `bedtools`. [The BEDtools suite](http://bedtools.readthedocs.io/en/latest/content/bedtools-suite.html) allows me to easily find overlapping regions of different bed files.

### 1a. Locate bedfiles for analysis

The bedfile with DMLs: can be viewed below. Columns are are the chromosome, start position, end position, strand, and fold difference with direction. This file only has DMLs that were at least 50% different between the two treatments (control and elevated pCO2).

In [17]:
!head ../2018-05-29-MethylKit-Full-Samples/2018-05-30-DML-Locations.bed

NC_035780.1	265027	265029	-	63
NC_035780.1	346071	346073	-	54
NC_035780.1	549842	549844	+	-51
NC_035780.1	571093	571095	-	53
NC_035780.1	571138	571140	+	58
NC_035780.1	620088	620090	+	-52
NC_035780.1	635912	635914	-	-51
NC_035780.1	990995	990997	-	-50
NC_035780.1	993014	993016	+	-59
NC_035780.1	1887091	1887093	+	53


I will be using the following Genome Feature Tracks:

1. Exon
2. Intron
3. mRNA
4. CG locations

The links to these feature tracks can be found on the [Roberts Lab Genomic Resources wiki page](https://github.com/RobertsLab/resources/wiki/Genomic-Resources).

In [19]:
!curl http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_exon.bed > C_virginica-3.0_Gnomon_exon.bed

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20.7M  100 20.7M    0     0  57.9M      0 --:--:-- --:--:-- --:--:-- 58.7M


In [20]:
!curl http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_intron.bed > C_virginica-3.0_intron.bed

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9260k  100 9260k    0     0  37.9M      0 --:--:-- --:--:-- --:--:-- 38.4M


In [18]:
!curl http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_mRNA.gff3 > C_virginica-3.0_Gnomon_mRNA.gff3

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 26.4M  100 26.4M    0     0  69.4M      0 --:--:-- --:--:-- --:--:-- 70.3M


In [21]:
!curl http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_CG-motif.bed > C_virginica-3.0_CG-motif.bed

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  533M  100  533M    0     0  52.4M      0  0:00:10  0:00:10 --:--:-- 48.8M
