# Genomic Location of DML

In this notebook, I will identify the genomic locations of [sex-specific DML identified with `methylKit`](https://github.com/RobertsLab/project-gigas-oa-meth/blob/master/code/06-methylKit.R). 

2. Create BEDfiles for DML
3. Characterize genomic locations for DML
4. Identify overlaps between female- and indeterminate-DML
5. Identify overlaps between SNPs and sex-specific DML

## 0. Set working directory

In [1]:
pwd

'/Users/yaamini/Documents/project-gigas-oa-meth/code'

In [2]:
cd ../output/

/Users/yaamini/Documents/project-gigas-oa-meth/output


In [4]:
mkdir 10_DML-characterization

In [5]:
cd 10_DML-characterization/

/Users/yaamini/Documents/project-gigas-oa-meth/output/08_DML-characterization


## 2. Create BEDfiles for DML

My `methylKit` DML lists are `.csv` files. To identify genomic locations with `bedtools intersect`, I need BEDfiles.

In [7]:
#Look at csv file to determine what modifications need to be made
#Column 2: chr, Column 3: start, Column 4: end, Column 8: meth.diff
!head ../06-methylKit/DML/DML-pH-25-Cov5-Fem.csv 

"","chr","start","end","strand","pvalue","qvalue","meth.diff"
"375","NC_047559.1",7867,7869,"*",2.33980960791286e-15,4.08615261059822e-12,64.5753773413348
"2817","NC_047559.1",576880,576882,"*",1.33914604111036e-08,4.4952472775104e-06,36.8918918918919
"3089","NC_047559.1",606381,606383,"*",4.823350456729e-12,3.95503174159914e-09,32.8677433435181
"3093","NC_047559.1",607024,607026,"*",6.74665560838222e-05,0.00675150143112196,42.6966292134831
"3170","NC_047559.1",612843,612845,"*",1.49061330412156e-10,8.4130667805953e-08,-36.2951549392227
"3207","NC_047559.1",616708,616710,"*",2.03411781947518e-14,2.92674131627326e-11,-35
"4610","NC_047559.1",733435,733437,"*",1.05743863872163e-05,0.00142612432702515,46.1430423509075
"4704","NC_047559.1",742817,742819,"*",1.86287098909016e-16,4.09099699134841e-13,39.0844459857812
"4705","NC_047559.1",742820,742822,"*",8.66263989196498e-16,1.65869573168142e-12,35.5772357723577


In [48]:
!find ../06-methylKit/DML/DML*csv

../06-methylKit/DML/DML-pH-100-Cov5-Ind.csv
../06-methylKit/DML/DML-pH-25-Cov5-Fem.csv
../06-methylKit/DML/DML-pH-25-Cov5-Ind.csv
../06-methylKit/DML/DML-pH-50-Cov5-Fem.csv
../06-methylKit/DML/DML-pH-50-Cov5-Ind.csv
../06-methylKit/DML/DML-pH-75-Cov5-Fem.csv
../06-methylKit/DML/DML-pH-75-Cov5-Ind.csv


In [56]:
%%bash

#Replace , with tabs
#Remove extraneous quotes entries (can also be done in R)
#Print chr, start, end, meth.diff
#Remove header
#Save as BEDfile

for f in ../06-methylKit/DML/DML*csv
do
    tr "," "\t" < ${f} \
    | tr -d '"' \
    | awk '{print $2"\t"$3"\t"$4"\t"$8}' \
    | tail -n+2 \
    > ${f}.bed
done

In [61]:
%%bash

#Move BEDfiles to current working directory
mv ../06-methylKit/DML/*bed .

In [64]:
!head *bed

==> DML-pH-100-Cov5-Ind.csv.bed <==
NC_047559.1	738014	738016	-100
NC_047559.1	1006145	1006147	100
NC_047559.1	1011405	1011407	100
NC_047559.1	1715466	1715468	100
NC_047559.1	2193954	2193956	-100
NC_047559.1	3595157	3595159	-100
NC_047559.1	3613450	3613452	-100
NC_047559.1	3734205	3734207	-100
NC_047559.1	3874606	3874608	-100
NC_047559.1	4907314	4907316	-100

==> DML-pH-25-Cov5-Fem.csv.bed <==
NC_047559.1	7867	7869	64.5753773413348
NC_047559.1	576880	576882	36.8918918918919
NC_047559.1	606381	606383	32.8677433435181
NC_047559.1	607024	607026	42.6966292134831
NC_047559.1	612843	612845	-36.2951549392227
NC_047559.1	616708	616710	-35
NC_047559.1	733435	733437	46.1430423509075
NC_047559.1	742817	742819	39.0844459857812
NC_047559.1	742820	742822	35.5772357723577
NC_047559.1	744037	744039	-53.2666983975884

==> DML-pH-25-Cov5-Ind.csv.bed <==
NC_047559.1	4791	4793	66.8650793650794
NC_047559.1	4835	4837	66.8831168831169
NC_047559.1	4843	4845	88.6486486486486
NC_0475