# Genomic Location of DML

In this notebook, I will identify the genomic locations of [sex-specific DML identified with `methylKit`](https://github.com/RobertsLab/project-gigas-oa-meth/blob/master/code/06-methylKit.R). 

2. Create BEDfiles for DML
4. Identify overlaps between female- and indeterminate-DML
3. Characterize genomic locations for DML
5. Identify overlaps between SNPs and sex-specific DML

## 0. Set working directory

In [1]:
pwd

'/Users/yaamini/Documents/project-oyster-oa/code/Haws'

In [2]:
cd ../output/

/Users/yaamini/Documents/project-gigas-oa-meth/output


In [3]:
#mkdir 10_DML-characterization

In [3]:
cd 10_DML-characterization/

/Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization


In [4]:
bedtoolsDirectory = "/Users/Shared/bioinformatics/bedtools2/bin/"

## 2. Create BEDfiles for DML

My `methylKit` DML lists are `.csv` files. To identify genomic locations with `bedtools intersect`, I need BEDfiles.

In [7]:
#Look at csv file to determine what modifications need to be made
#Column 2: chr, Column 3: start, Column 4: end, Column 8: meth.diff
!head ../06-methylKit/DML/DML-pH-25-Cov5-Fem.csv 

"","chr","start","end","strand","pvalue","qvalue","meth.diff"
"375","NC_047559.1",7867,7869,"*",2.33980960791286e-15,4.08615261059822e-12,64.5753773413348
"2817","NC_047559.1",576880,576882,"*",1.33914604111036e-08,4.4952472775104e-06,36.8918918918919
"3089","NC_047559.1",606381,606383,"*",4.823350456729e-12,3.95503174159914e-09,32.8677433435181
"3093","NC_047559.1",607024,607026,"*",6.74665560838222e-05,0.00675150143112196,42.6966292134831
"3170","NC_047559.1",612843,612845,"*",1.49061330412156e-10,8.4130667805953e-08,-36.2951549392227
"3207","NC_047559.1",616708,616710,"*",2.03411781947518e-14,2.92674131627326e-11,-35
"4610","NC_047559.1",733435,733437,"*",1.05743863872163e-05,0.00142612432702515,46.1430423509075
"4704","NC_047559.1",742817,742819,"*",1.86287098909016e-16,4.09099699134841e-13,39.0844459857812
"4705","NC_047559.1",742820,742822,"*",8.66263989196498e-16,1.65869573168142e-12,35.5772357723577


In [48]:
!find ../06-methylKit/DML/DML*csv

../06-methylKit/DML/DML-pH-100-Cov5-Ind.csv
../06-methylKit/DML/DML-pH-25-Cov5-Fem.csv
../06-methylKit/DML/DML-pH-25-Cov5-Ind.csv
../06-methylKit/DML/DML-pH-50-Cov5-Fem.csv
../06-methylKit/DML/DML-pH-50-Cov5-Ind.csv
../06-methylKit/DML/DML-pH-75-Cov5-Fem.csv
../06-methylKit/DML/DML-pH-75-Cov5-Ind.csv


In [56]:
%%bash

#Replace , with tabs
#Remove extraneous quotes entries (can also be done in R)
#Print chr, start, end, meth.diff
#Remove header
#Save as BEDfile

for f in ../06-methylKit/DML/DML*csv
do
    tr "," "\t" < ${f} \
    | tr -d '"' \
    | awk '{print $2"\t"$3"\t"$4"\t"$8}' \
    | tail -n+2 \
    > ${f}.bed
done

In [61]:
%%bash

#Move BEDfiles to current working directory
mv ../06-methylKit/DML/*bed .

In [64]:
!head *bed

==> DML-pH-100-Cov5-Ind.csv.bed <==
NC_047559.1	738014	738016	-100
NC_047559.1	1006145	1006147	100
NC_047559.1	1011405	1011407	100
NC_047559.1	1715466	1715468	100
NC_047559.1	2193954	2193956	-100
NC_047559.1	3595157	3595159	-100
NC_047559.1	3613450	3613452	-100
NC_047559.1	3734205	3734207	-100
NC_047559.1	3874606	3874608	-100
NC_047559.1	4907314	4907316	-100

==> DML-pH-25-Cov5-Fem.csv.bed <==
NC_047559.1	7867	7869	64.5753773413348
NC_047559.1	576880	576882	36.8918918918919
NC_047559.1	606381	606383	32.8677433435181
NC_047559.1	607024	607026	42.6966292134831
NC_047559.1	612843	612845	-36.2951549392227
NC_047559.1	616708	616710	-35
NC_047559.1	733435	733437	46.1430423509075
NC_047559.1	742817	742819	39.0844459857812
NC_047559.1	742820	742822	35.5772357723577
NC_047559.1	744037	744039	-53.2666983975884

==> DML-pH-25-Cov5-Ind.csv.bed <==
NC_047559.1	4791	4793	66.8650793650794
NC_047559.1	4835	4837	66.8831168831169
NC_047559.1	4843	4845	88.6486486486486
NC_0475

I imported the BEDfiles into [this IGV session](https://github.com/RobertsLab/project-gigas-oa-meth/blob/master/output/10_DML-characterization/dml.xml) to visualize them.

## 3. Identify overlaps between sex-specific DML

In [28]:
!head DML-pH-75-Cov5-Fem.csv.bed

NC_047559.1	2799230	2799232	83.0864530029626
NC_047559.1	6531650	6531652	76.7441860465116
NC_047559.1	8166690	8166692	77.3809523809524
NC_047559.1	9400974	9400976	-81.6326530612245
NC_047559.1	11117667	11117669	-77.0833333333333
NC_047559.1	18206975	18206977	80.1851851851852
NC_047559.1	22293933	22293935	75.2770673486786
NC_047559.1	24170063	24170065	76.8102073365231
NC_047559.1	24173162	24173164	77.8471602735653
NC_047559.1	24173203	24173205	82.0861678004535


In [29]:
#!head DML-pH-100-Cov5-Ind.csv.bed

NC_047559.1	738014	738016	-100
NC_047559.1	1006145	1006147	100
NC_047559.1	1011405	1011407	100
NC_047559.1	1715466	1715468	100
NC_047559.1	2193954	2193956	-100
NC_047559.1	3595157	3595159	-100
NC_047559.1	3613450	3613452	-100
NC_047559.1	3734205	3734207	-100
NC_047559.1	3874606	3874608	-100
NC_047559.1	4907314	4907316	-100


In [34]:
#Find overlaps between female- and indeterminate-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b DML-pH-100-Cov5-Ind.csv.bed \
> DML-pH-Cov5-Overlaps.bed
!head DML-pH-Cov5-Overlaps.bed
!wc -l DML-pH-Cov5-Overlaps.bed

NC_047559.1	41431870	41431872	-89.5833333333333
NC_047561.1	46632231	46632233	80.7509881422925
NC_047562.1	10069178	10069180	-94.8212495823588
NC_047563.1	15549479	15549481	-83.5616438356164
NC_047563.1	62370986	62370988	80.5555555555556
NC_047563.1	62739104	62739106	-81.7953546767106
NC_047565.1	7110085	7110087	-83.3130699088146
NC_047568.1	295645	295647	-75.4901960784314
       8 DML-pH-Cov5-Overlaps.bed


## 4. Characterize genomic locations of DML

I will look at overlaps between genome features and either female- and indeterminate-DML, as well as those that overlap.

### 4a. Gene

In [10]:
#Find overlaps between DML and feature
#Look at output
#Count number of overlaps

!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-75-Cov5-Fem-Gene.bed
!head DML-pH-75-Cov5-Fem-Gene.bed
!wc -l DML-pH-75-Cov5-Fem-Gene.bed

NC_047559.1	2799230	2799232	83.0864530029626
NC_047559.1	6531650	6531652	76.7441860465116
NC_047559.1	8166690	8166692	77.3809523809524
NC_047559.1	11117667	11117669	-77.0833333333333
NC_047559.1	18206975	18206977	80.1851851851852
NC_047559.1	22293933	22293935	75.2770673486786
NC_047559.1	24170063	24170065	76.8102073365231
NC_047559.1	24173162	24173164	77.8471602735653
NC_047559.1	24173203	24173205	82.0861678004535
NC_047559.1	24177562	24177564	86.3445378151261
     287 DML-pH-75-Cov5-Fem-Gene.bed


In [11]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-100-Cov5-Ind-Gene.bed
!head DML-pH-100-Cov5-Ind-Gene.bed
!wc -l DML-pH-100-Cov5-Ind-Gene.bed

NC_047559.1	738014	738016	-100
NC_047559.1	1006145	1006147	100
NC_047559.1	1715466	1715468	100
NC_047559.1	2193954	2193956	-100
NC_047559.1	3595157	3595159	-100
NC_047559.1	3613450	3613452	-100
NC_047559.1	3734205	3734207	-100
NC_047559.1	3874606	3874608	-100
NC_047559.1	4907314	4907316	-100
NC_047559.1	4996260	4996262	-100
    2519 DML-pH-100-Cov5-Ind-Gene.bed


In [35]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-Cov5-Overlaps-Gene.bed
!head DML-pH-Cov5-Overlaps-Gene.bed
!wc -l DML-pH-Cov5-Overlaps-Gene.bed

NC_047559.1	41431870	41431872	-89.5833333333333
NC_047561.1	46632231	46632233	80.7509881422925
NC_047562.1	10069178	10069180	-94.8212495823588
NC_047563.1	15549479	15549481	-83.5616438356164
NC_047563.1	62370986	62370988	80.5555555555556
NC_047563.1	62739104	62739106	-81.7953546767106
NC_047565.1	7110085	7110087	-83.3130699088146
NC_047568.1	295645	295647	-75.4901960784314
       8 DML-pH-Cov5-Overlaps-Gene.bed


### 4b. Exon UTR

In [12]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-pH-75-Cov5-Fem-exonUTR.bed
!head DML-pH-75-Cov5-Fem-exonUTR.bed
!wc -l DML-pH-75-Cov5-Fem-exonUTR.bed

NC_047560.1	21542121	21542123	-81.0526315789474
NC_047561.1	11056239	11056241	79.6594982078853
NC_047561.1	24739219	24739221	77.2727272727273
NC_047562.1	3908637	3908639	-87.9571248423707
NC_047563.1	34553577	34553579	76.334453154162
NC_047563.1	38157425	38157427	-75.609756097561
NC_047563.1	40259243	40259245	82.2745901639344
NC_047564.1	16004308	16004310	-84.375
NC_047564.1	16004667	16004669	77
NC_047564.1	35202625	35202627	78.3699059561129
      20 DML-pH-75-Cov5-Fem-exonUTR.bed


In [13]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-pH-100-Cov5-Ind-exonUTR.bed
!head DML-pH-100-Cov5-Ind-exonUTR.bed
!wc -l DML-pH-100-Cov5-Ind-exonUTR.bed

NC_047559.1	3595157	3595159	-100
NC_047559.1	8699968	8699970	100
NC_047559.1	10073766	10073768	-100
NC_047559.1	19937373	19937375	100
NC_047559.1	22269695	22269697	-100
NC_047559.1	22868972	22868974	100
NC_047559.1	24389916	24389918	100
NC_047559.1	26043316	26043318	-100
NC_047559.1	26447530	26447532	-100
NC_047559.1	26870650	26870652	100
     181 DML-pH-100-Cov5-Ind-exonUTR.bed


In [36]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-pH-Cov5-Overlaps-exonUTR.bed
!head DML-pH-Cov5-Overlaps-exonUTR.bed
!wc -l DML-pH-Cov5-Overlaps-exonUTR.bed

NC_047568.1	295645	295647	-75.4901960784314
       1 DML-pH-Cov5-Overlaps-exonUTR.bed


### 4c. CDS

In [14]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \
> DML-pH-75-Cov5-Fem-CDS.bed
!head DML-pH-75-Cov5-Fem-CDS.bed
!wc -l DML-pH-75-Cov5-Fem-CDS.bed

NC_047559.1	6531650	6531652	76.7441860465116
NC_047559.1	8166690	8166692	77.3809523809524
NC_047559.1	41431870	41431872	-89.5833333333333
NC_047559.1	54053386	54053388	-85.1851851851852
NC_047560.1	19919194	19919196	78.4216772151899
NC_047560.1	48674436	48674438	78.0762250453721
NC_047560.1	50062255	50062257	88.4615384615385
NC_047560.1	50369208	50369210	-85.0022153300842
NC_047560.1	63552053	63552055	-82.32861936721
NC_047561.1	2480996	2480998	-87.4329501915709
      78 DML-pH-75-Cov5-Fem-CDS.bed


In [15]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \
> DML-pH-100-Cov5-Ind-CDS.bed
!head DML-pH-100-Cov5-Ind-CDS.bed
!wc -l DML-pH-100-Cov5-Ind-CDS.bed

NC_047559.1	738014	738016	-100
NC_047559.1	1715466	1715468	100
NC_047559.1	3613450	3613452	-100
NC_047559.1	5430962	5430964	100
NC_047559.1	5980653	5980655	-100
NC_047559.1	6327898	6327900	100
NC_047559.1	6601946	6601948	-100
NC_047559.1	7071341	7071343	-100
NC_047559.1	9371990	9371992	100
NC_047559.1	9537680	9537682	100
     789 DML-pH-100-Cov5-Ind-CDS.bed


In [37]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \
> DML-pH-Cov5-Overlaps-CDS.bed
!head DML-pH-Cov5-Overlaps-CDS.bed
!wc -l DML-pH-Cov5-Overlaps-CDS.bed

NC_047559.1	41431870	41431872	-89.5833333333333
NC_047563.1	15549479	15549481	-83.5616438356164
NC_047563.1	62739104	62739106	-81.7953546767106
       3 DML-pH-Cov5-Overlaps-CDS.bed


### 4d. Intron

In [16]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \
> DML-pH-75-Cov5-Fem-intron.bed
!head DML-pH-75-Cov5-Fem-intron.bed
!wc -l DML-pH-75-Cov5-Fem-intron.bed

NC_047559.1	2799230	2799232	83.0864530029626
NC_047559.1	11117667	11117669	-77.0833333333333
NC_047559.1	18206975	18206977	80.1851851851852
NC_047559.1	22293933	22293935	75.2770673486786
NC_047559.1	24170063	24170065	76.8102073365231
NC_047559.1	24173162	24173164	77.8471602735653
NC_047559.1	24173203	24173205	82.0861678004535
NC_047559.1	24177562	24177564	86.3445378151261
NC_047559.1	24182519	24182521	76.6666666666667
NC_047559.1	24185437	24185439	77.5862068965517
     190 DML-pH-75-Cov5-Fem-intron.bed


In [17]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \
> DML-pH-100-Cov5-Ind-intron.bed
!head DML-pH-100-Cov5-Ind-intron.bed
!wc -l DML-pH-100-Cov5-Ind-intron.bed

NC_047559.1	1006145	1006147	100
NC_047559.1	2193954	2193956	-100
NC_047559.1	3734205	3734207	-100
NC_047559.1	3874606	3874608	-100
NC_047559.1	4907314	4907316	-100
NC_047559.1	4996260	4996262	-100
NC_047559.1	5402041	5402043	-100
NC_047559.1	6435921	6435923	-100
NC_047559.1	6539611	6539613	100
NC_047559.1	6615055	6615057	-100
    1552 DML-pH-100-Cov5-Ind-intron.bed


In [38]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \
> DML-pH-Cov5-Overlaps-intron.bed
!head DML-pH-Cov5-Overlaps-intron.bed
!wc -l DML-pH-Cov5-Overlaps-intron.bed

NC_047561.1	46632231	46632233	80.7509881422925
NC_047562.1	10069178	10069180	-94.8212495823588
NC_047563.1	62370986	62370988	80.5555555555556
NC_047565.1	7110085	7110087	-83.3130699088146
       4 DML-pH-Cov5-Overlaps-intron.bed


### 4e. Upstream flanks

In [18]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \
> DML-pH-75-Cov5-Fem-upstream.bed
!head DML-pH-75-Cov5-Fem-upstream.bed
!wc -l DML-pH-75-Cov5-Fem-upstream.bed

NC_047562.1	31533368	31533370	75.974025974026
NC_047568.1	47299814	47299816	-78.397921628058
       2 DML-pH-75-Cov5-Fem-upstream.bed


In [19]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \
> DML-pH-100-Cov5-Ind-upstream.bed
!head DML-pH-100-Cov5-Ind-upstream.bed
!wc -l DML-pH-100-Cov5-Ind-upstream.bed

NC_047559.1	16416254	16416256	-100
NC_047559.1	38681370	38681372	100
NC_047559.1	38681375	38681377	100
NC_047559.1	38681405	38681407	100
NC_047559.1	46969439	46969441	100
NC_047561.1	50281842	50281844	100
NC_047561.1	50939108	50939110	100
NC_047561.1	52983330	52983332	-100
NC_047563.1	37046184	37046186	-100
NC_047564.1	22408957	22408959	100
      33 DML-pH-100-Cov5-Ind-upstream.bed


In [39]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \
> DML-pH-Cov5-Overlaps-upstream.bed
!head DML-pH-Cov5-Overlaps-upstream.bed
!wc -l DML-pH-Cov5-Overlaps-upstream.bed

       0 DML-pH-Cov5-Overlaps-upstream.bed


### 4f. Downstream flanks

In [20]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \
> DML-pH-75-Cov5-Fem-downstream.bed
!head DML-pH-75-Cov5-Fem-downstream.bed
!wc -l DML-pH-75-Cov5-Fem-downstream.bed

NC_047559.1	9400974	9400976	-81.6326530612245
NC_047561.1	40177842	40177844	75.4727474972191
NC_047562.1	26532161	26532163	-79.147235176549
NC_047562.1	31533368	31533370	75.974025974026
NC_047563.1	2827931	2827933	81.25
NC_047564.1	49356109	49356111	82.5
NC_047565.1	53019025	53019027	-89.3314366998577
NC_047565.1	53019065	53019067	-86.2068965517241
NC_047566.1	4629852	4629854	76.6575529733424
NC_047566.1	38181604	38181606	79.9723279142165
      16 DML-pH-75-Cov5-Fem-downstream.bed


In [21]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \
> DML-pH-100-Cov5-Ind-downstream.bed
!head DML-pH-100-Cov5-Ind-downstream.bed
!wc -l DML-pH-100-Cov5-Ind-downstream.bed

NC_047559.1	15592671	15592673	-100
NC_047559.1	16416254	16416256	-100
NC_047559.1	18755703	18755705	-100
NC_047559.1	22210027	22210029	-100
NC_047559.1	31887469	31887471	100
NC_047559.1	36248111	36248113	100
NC_047559.1	49469926	49469928	-100
NC_047559.1	49469940	49469942	-100
NC_047559.1	49470144	49470146	-100
NC_047559.1	52023706	52023708	100
     140 DML-pH-100-Cov5-Ind-downstream.bed


In [40]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \
> DML-pH-Cov5-Overlaps-downstream.bed
!head DML-pH-Cov5-Overlaps-downstream.bed
!wc -l DML-pH-Cov5-Overlaps-downstream.bed

       0 DML-pH-Cov5-Overlaps-downstream.bed


### 4g. Intergenic regions

In [22]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \
> DML-pH-75-Cov5-Fem-intergenic.bed
!head DML-pH-75-Cov5-Fem-intergenic.bed
!wc -l DML-pH-75-Cov5-Fem-intergenic.bed

NC_047559.1	27982072	27982074	76.056338028169
NC_047560.1	7407587	7407589	-76.0997732426304
NC_047560.1	36856613	36856615	-78.5714285714286
NC_047561.1	23167283	23167285	-76.6423357664234
NC_047562.1	21005172	21005174	-80.5555555555556
NC_047562.1	48518001	48518003	-79.4623059866962
NC_047563.1	41758693	41758695	-77.2791023842917
NC_047564.1	21775476	21775478	78.7878787878788
NC_047564.1	40557391	40557393	78.5714285714286
NC_047566.1	24457862	24457864	82.1143617021277
      12 DML-pH-75-Cov5-Fem-intergenic.bed


In [23]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \
> DML-pH-100-Cov5-Ind-intergenic.bed
!head DML-pH-100-Cov5-Ind-intergenic.bed
!wc -l DML-pH-100-Cov5-Ind-intergenic.bed

NC_047559.1	1011405	1011407	100
NC_047559.1	15764238	15764240	-100
NC_047559.1	15764243	15764245	-100
NC_047559.1	15764260	15764262	-100
NC_047559.1	22369287	22369289	-100
NC_047559.1	24148954	24148956	-100
NC_047559.1	27983414	27983416	-100
NC_047559.1	40612979	40612981	100
NC_047559.1	45266940	45266942	100
NC_047559.1	45266965	45266967	100
     178 DML-pH-100-Cov5-Ind-intergenic.bed


In [41]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \
> DML-pH-Cov5-Overlaps-intergenic.bed
!head DML-pH-Cov5-Overlaps-intergenic.bed
!wc -l DML-pH-Cov5-Overlaps-intergenic.bed

       0 DML-pH-Cov5-Overlaps-intergenic.bed


### 4h. lncRNA

In [24]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-pH-75-Cov5-Fem-lncRNA.bed
!head DML-pH-75-Cov5-Fem-lncRNA.bed
!wc -l DML-pH-75-Cov5-Fem-lncRNA.bed

NC_047562.1	3908637	3908639	-87.9571248423707
NC_047565.1	33994740	33994742	-79.1666666666667
NC_047565.1	33997347	33997349	-78.0265748031496
NC_047565.1	33997363	33997365	-90.2097902097902
NC_047565.1	34000238	34000240	-77.9411764705882
       5 DML-pH-75-Cov5-Fem-lncRNA.bed


In [25]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-pH-100-Cov5-Ind-lncRNA.bed
!head DML-pH-100-Cov5-Ind-lncRNA.bed
!wc -l DML-pH-100-Cov5-Ind-lncRNA.bed

NC_047559.1	19182907	19182909	-100
NC_047559.1	26447530	26447532	-100
NC_047559.1	26674601	26674603	-100
NC_047559.1	51192786	51192788	100
NC_047560.1	20508852	20508854	-100
NC_047560.1	25538094	25538096	100
NC_047560.1	51759717	51759719	100
NC_047560.1	51759824	51759826	100
NC_047560.1	51759841	51759843	100
NC_047560.1	51760104	51760106	100
      36 DML-pH-100-Cov5-Ind-lncRNA.bed


In [42]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-pH-Cov5-Overlaps-lncRNA.bed
!head DML-pH-Cov5-Overlaps-lncRNA.bed
!wc -l DML-pH-Cov5-Overlaps-lncRNA.bed

       0 DML-pH-Cov5-Overlaps-lncRNA.bed


### 4i. Tranposable elements

In [26]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \
> DML-pH-75-Cov5-Fem-TE.bed
!head DML-pH-75-Cov5-Fem-TE.bed
!wc -l DML-pH-75-Cov5-Fem-TE.bed

NC_047559.1	2799230	2799232	83.0864530029626
NC_047559.1	22293933	22293935	75.2770673486786
NC_047559.1	24170063	24170065	76.8102073365231
NC_047559.1	24182519	24182521	76.6666666666667
NC_047559.1	24185437	24185439	77.5862068965517
NC_047559.1	24702767	24702769	77.9661016949153
NC_047559.1	24987135	24987137	-80.6034482758621
NC_047559.1	27982072	27982074	76.056338028169
NC_047559.1	28456701	28456703	78.4313725490196
NC_047559.1	42438404	42438406	82.258064516129
      91 DML-pH-75-Cov5-Fem-TE.bed


In [27]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \
> DML-pH-100-Cov5-Ind-TE.bed
!head DML-pH-100-Cov5-Ind-TE.bed
!wc -l DML-pH-100-Cov5-Ind-TE.bed

NC_047559.1	3734205	3734207	-100
NC_047559.1	3874606	3874608	-100
NC_047559.1	4996260	4996262	-100
NC_047559.1	6435921	6435923	-100
NC_047559.1	6539611	6539613	100
NC_047559.1	8269426	8269428	-100
NC_047559.1	9523443	9523445	100
NC_047559.1	10082509	10082511	100
NC_047559.1	10846851	10846853	-100
NC_047559.1	13235175	13235177	-100
     829 DML-pH-100-Cov5-Ind-TE.bed


In [43]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \
> DML-pH-Cov5-Overlaps-TE.bed
!head DML-pH-Cov5-Overlaps-TE.bed
!wc -l DML-pH-Cov5-Overlaps-TE.bed

       0 DML-pH-Cov5-Overlaps-TE.bed


## 5. SNP overlap

I will now look at overlaps between sex-specific DML and merged or individual SNPs.

### 5a. Create BEDfiles

In [29]:
!find /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/*CT-SNPs.tab

/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_1_CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_2_CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_3_CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_4_CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_5_CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_6_CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_7_CT-SNPs.tab
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_8_CT-SNPs.tab


In [30]:
%%bash

for f in /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/*CT-SNPs.tab
do
    awk '{print $1"\t"$2"\t"$2}' ${f} > ${f}.bed
done

In [31]:
!find /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/*CT-SNPs.tab.bed

/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_1_CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_2_CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_3_CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_4_CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_5_CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_6_CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_7_CT-SNPs.tab.bed
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_8_CT-SNPs.tab.bed


In [32]:
%%bash

for f in /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/*CT-SNPs.tab.bed
do
    [ -f ${f} ] || continue
    mv "${f}" "${f//.tab/}"

done

In [38]:
!head /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/*CT-SNPs.bed

==> /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CT-SNPs.bed <==
NC_047559.1	22443	22443
NC_047559.1	34836	34836
NC_047559.1	36674	36674
NC_047559.1	38038	38038
NC_047559.1	44211	44211
NC_047559.1	48352	48352
NC_047559.1	49472	49472
NC_047559.1	82690	82690
NC_047559.1	83012	83012
NC_047559.1	87321	87321

==> /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_1_CT-SNPs.bed <==
NC_047559.1	22443	22443
NC_047559.1	34836	34836
NC_047559.1	36674	36674
NC_047559.1	38038	38038
NC_047559.1	42517	42517
NC_047559.1	48352	48352
NC_047559.1	49472	49472
NC_047559.1	108375	108375
NC_047559.1	197299	197299
NC_047559.1	205322	205322

==> /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/zr3616_2_CT-SNPs.bed <==
NC_047559.1	82690	82690
NC_047559.1	83012	83012
NC_047559.1	83513	83513
NC_047559.1	100873	100873
NC_047559.1	125852	125852
NC_047559.1	183845	183845
NC_047559.1	191463	191463
NC_047559.1	211764	211764
NC_047559.1	219023	

### 5b. Merged SNPs

In [64]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-75-Cov5-Fem.csv.bed \
-b /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CT-SNPs.bed \
> DML-pH-75-Cov5-Fem-mergedSNP.bed
!head DML-pH-75-Cov5-Fem-mergedSNP.bed
!wc -l DML-pH-75-Cov5-Fem-mergedSNP.bed

NC_047559.1	24859718	24859720	-79.2
NC_047559.1	24987135	24987137	-80.6034482758621
NC_047559.1	54053386	54053388	-85.1851851851852
NC_047560.1	19919194	19919196	78.4216772151899
NC_047560.1	22806263	22806265	-80.787037037037
NC_047561.1	11056239	11056241	79.6594982078853
NC_047561.1	11068665	11068667	87.8787878787879
NC_047561.1	16819572	16819574	-79.537750385208
NC_047561.1	34755530	34755532	78
NC_047561.1	38103260	38103262	-75.6010230179028
      38 DML-pH-75-Cov5-Fem-mergedSNP.bed


In [65]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-100-Cov5-Ind.csv.bed \
-b /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CT-SNPs.bed \
> DML-pH-100-Cov5-Ind-mergedSNP.bed
!head DML-pH-100-Cov5-Ind-mergedSNP.bed
!wc -l DML-pH-100-Cov5-Ind-mergedSNP.bed

NC_047559.1	3874606	3874608	-100
NC_047559.1	6601946	6601948	-100
NC_047559.1	9548686	9548688	100
NC_047559.1	11690479	11690481	-100
NC_047559.1	15578806	15578808	-100
NC_047559.1	15989137	15989139	-100
NC_047559.1	16409823	16409825	-100
NC_047559.1	17519147	17519149	-100
NC_047559.1	21493774	21493776	-100
NC_047559.1	22269695	22269697	-100
     298 DML-pH-100-Cov5-Ind-mergedSNP.bed


In [66]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-Cov5-Overlaps.bed \
-b /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CT-SNPs.bed \
> DML-pH-Cov5-Overlaps-mergedSNP.bed
!head DML-pH-Cov5-Overlaps-mergedSNP.bed
!wc -l DML-pH-Cov5-Overlaps-mergedSNP.bed

       0 DML-pH-Cov5-Overlaps-mergedSNP.bed


### 5c. Individual SNPs

In [68]:
cd /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/

/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper


In [44]:
!find zr3616*CT-SNPs.bed

zr3616_1_CT-SNPs.bed
zr3616_2_CT-SNPs.bed
zr3616_3_CT-SNPs.bed
zr3616_4_CT-SNPs.bed
zr3616_5_CT-SNPs.bed
zr3616_6_CT-SNPs.bed
zr3616_7_CT-SNPs.bed
zr3616_8_CT-SNPs.bed


#### Female-DML

In [57]:
%%bash
for f in zr3616*CT-SNPs.bed
do
    /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \
    -u \
    -a /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-75-Cov5-Fem.csv.bed \
    -b ${f} \
    > DML-pH-75-Cov5-Fem-${f}
done

In [58]:
#Move to Github repository
!mv DML-pH-75-Cov5-Fem* /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/.

In [70]:
!head /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-75-Cov5-Fem*SNPs.bed
!wc -l /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-75-Cov5-Fem*SNPs.bed

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-75-Cov5-Fem-zr3616_1_CT-SNPs.bed <==
NC_047559.1	24859718	24859720	-79.2
NC_047559.1	51876614	51876616	-84.7826086956522
NC_047559.1	54053386	54053388	-85.1851851851852
NC_047560.1	51286421	51286423	-76.9230769230769
NC_047561.1	803620	803622	-81.4285714285714
NC_047561.1	16819572	16819574	-79.537750385208
NC_047561.1	34755530	34755532	78
NC_047561.1	38103260	38103262	-75.6010230179028
NC_047561.1	55590869	55590871	-76.1420698101653
NC_047562.1	21005172	21005174	-80.5555555555556

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-75-Cov5-Fem-zr3616_2_CT-SNPs.bed <==
NC_047559.1	24859718	24859720	-79.2
NC_047559.1	54053386	54053388	-85.1851851851852
NC_047560.1	17822823	17822825	80.8080808080808
NC_047560.1	19919194	19919196	78.4216772151899
NC_047560.1	51286421	51286423	-76.9230769230769
NC_047561.1	11068665	11068667	87.8787878787879
NC_047561.1	40076495	40076

#### Indeterminate-DML

In [52]:
%%bash
for f in zr3616*CT-SNPs.bed
do
    /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \
    -u \
    -a /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-100-Cov5-Ind.csv.bed \
    -b ${f} \
    > DML-pH-100-Cov5-Ind-${f}
done

In [53]:
#Move to Github repository
!mv DML-pH-100-Cov5-Ind* /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/.

In [71]:
!head /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-100-Cov5-Ind*SNPs.bed
!wc -l /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-100-Cov5-Ind*SNPs.bed

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-100-Cov5-Ind-zr3616_1_CT-SNPs.bed <==
NC_047559.1	3874606	3874608	-100
NC_047559.1	4907314	4907316	-100
NC_047559.1	9548686	9548688	100
NC_047559.1	11690479	11690481	-100
NC_047559.1	15989137	15989139	-100
NC_047559.1	18777239	18777241	100
NC_047559.1	22269695	22269697	-100
NC_047559.1	23978734	23978736	100
NC_047559.1	40795880	40795882	100
NC_047559.1	52046157	52046159	100

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-100-Cov5-Ind-zr3616_2_CT-SNPs.bed <==
NC_047559.1	738014	738016	-100
NC_047559.1	6601946	6601948	-100
NC_047559.1	9811935	9811937	-100
NC_047559.1	11690479	11690481	-100
NC_047559.1	15989137	15989139	-100
NC_047559.1	22269695	22269697	-100
NC_047559.1	22353593	22353595	-100
NC_047559.1	24967631	24967633	-100
NC_047559.1	27983414	27983416	-100
NC_047559.1	29060021	29060023	-100

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DM

#### Common DML

In [54]:
%%bash
for f in zr3616*CT-SNPs.bed
do
    /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \
    -u \
    -a /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps.bed \
    -b ${f} \
    > DML-pH-Cov5-Overlaps-${f}
done

In [55]:
#Move to Github repository
!mv DML-pH-Cov5-Overlaps* /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/.

In [72]:
!head /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps*SNPs.bed
!wc -l /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps*SNPs.bed

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps-zr3616_1_CT-SNPs.bed <==

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps-zr3616_2_CT-SNPs.bed <==

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps-zr3616_3_CT-SNPs.bed <==

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps-zr3616_4_CT-SNPs.bed <==

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps-zr3616_5_CT-SNPs.bed <==

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps-zr3616_6_CT-SNPs.bed <==

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Overlaps-zr3616_7_CT-SNPs.bed <==

==> /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-Cov5-Ove