# Genomic Location of DML

In this notebook, I will identify the genomic locations of [DML identified with `methylKit`](https://github.com/RobertsLab/project-oyster-oa/blob/master/code/Haws/04-methylKit.R). 

2. Create BEDfiles for DML
4. Identify overlaps between pH- and ploidy-DML
3. Characterize genomic locations for DML
5. Identify overlaps between SNPs and DML

## 0. Set working directory

In [1]:
pwd

'/Users/yaamini/Documents/project-oyster-oa/code/Haws'

In [2]:
cd ../../analyses/

/Users/yaamini/Documents/project-oyster-oa/analyses


In [3]:
#mkdir Haws_07-DML-characterization

In [4]:
cd Haws_07-DML-characterization/

/Users/yaamini/Documents/project-oyster-oa/analyses/Haws_07-DML-characterization


In [5]:
bedtoolsDirectory = "/Users/Shared/bioinformatics/bedtools2/bin/"

## 2. Create BEDfiles for DML

My DML lists are `.csv` files. To identify genomic locations with `bedtools intersect`, I need BEDfiles.

### 2a. `methylKit`

In [8]:
#Look at csv file to determine what modifications need to be made
#Column 2: chr, Column 3: start, Column 4: end, Column 8: meth.diff
!head ../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv

,chr,start,end,strand,pvalue,qvalue,meth.diff
49115,NC_047559.1,5294172,5294174,*,6.81863140326384e-14,1.13190244626751e-07,40.2560083594566
162616,NC_047559.1,15801827,15801829,*,7.35840565483495e-09,0.000872504096156049,-45.6918238993711
890333,NC_047560.1,65604843,65604845,*,3.34714016321879e-07,0.00940017301493494,49.4839101396478
1014648,NC_047561.1,7843128,7843130,*,5.49971909095006e-08,0.00313909989423398,-26.3157894736842
1041384,NC_047561.1,10147466,10147468,*,5.73605741393552e-08,0.00313909989423398,-30.4647676161919
1041599,NC_047561.1,10166213,10166215,*,1.68763140575909e-09,0.000371694309881221,-29.1507066437723
1053918,NC_047561.1,11783086,11783088,*,1.4461592764831e-09,0.000371694309881221,-44.1576698155646
1060146,NC_047561.1,12279075,12279077,*,3.2020995626083e-09,0.000514406178679344,-26.890756302521
1109777,NC_047561.1,16521359,16521361,*,1.50728082250528e-09,0.000371694309881221,28.8444735692442


In [12]:
#Will use 25% meth diff cutoff for DML definition
!find ../Haws_04-methylKit/DML/DML*25*

../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv
../Haws_04-methylKit/DML/DML-ploidy-25-Cov5.csv


In [13]:
%%bash

#Replace , with tabs
#Remove extraneous quotes entries (can also be done in R)
#Print chr, start, end, meth.diff
#Remove header
#Save as BEDfile

for f in ../Haws_04-methylKit/DML/DML*25*
do
    tr "," "\t" < ${f} \
    | awk '{print $2"\t"$3"\t"$4"\t"$8}' \
    | tail -n+2 \
    > ${f}.bed
done

In [14]:
%%bash

#Move BEDfiles to current working directory
mv ../Haws_04-methylKit/DML/*bed .

In [15]:
!head *bed

==> DML-pH-25-Cov5.csv.bed <==
NC_047559.1	5294172	5294174	40.2560083594566
NC_047559.1	15801827	15801829	-45.6918238993711
NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10147466	10147468	-30.4647676161919
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19286180	19286182	-55.4137931034483

==> DML-ploidy-25-Cov5.csv.bed <==
NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047559.1	53732861	53732863	25.8426966292135
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047561.1	40362698	40362700	29.4117647058824
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426

### 2b. `DSS`

In [6]:
#Check format: chr, pos, stat, pvals, fdrs
!head ../Haws_04-DSS/DML/DML-pH-DSS.csv

,chr,pos,stat,pvals,fdrs
10950,NC_047559.1,520576,5.36772390879675,7.97364876094437e-08,0.00669831008311927
280929,NC_047559.1,13702829,5.86875710812115,4.39074111576282e-09,0.000889552056257843
817563,NC_047559.1,41205913,5.9624742480836,2.4844681633341e-09,0.000541334814760626
880189,NC_047559.1,44191406,5.35003134621439,8.7938998655662e-08,0.00720229317625906
934243,NC_047559.1,47000336,-5.41718434198197,6.05449093413689e-08,0.00563850981052605
993302,NC_047559.1,50090321,-6.64621526435129,3.00725354774864e-11,1.57854060369563e-05
1089838,NC_047559.1,54761361,-5.54042150746187,3.01744466452751e-08,0.00368342194971688
1203367,NC_047560.1,4561420,7.79831465653462,6.27394139573504e-15,1.44903490034857e-08
1203368,NC_047560.1,4561429,7.36822858834852,1.72910124667661e-13,1.99677355479751e-07


In [12]:
%%bash

#Print chr, start, end
#Remove header
#Save as BEDfile

for f in ../Haws_04-DSS/DML/DML*csv
do
    tr "," "\t" < ${f} \
    | awk '{print $2"\t"$3"\t"$3+2}' \
    | tail -n+2 \
    > ${f}.bed
done

In [13]:
!head ../Haws_04-DSS/DML/*bed

==> ../Haws_04-DSS/DML/DML-pH-DSS.csv.bed <==
NC_047559.1	520576	520578
NC_047559.1	13702829	13702831
NC_047559.1	41205913	41205915
NC_047559.1	44191406	44191408
NC_047559.1	47000336	47000338
NC_047559.1	50090321	50090323
NC_047559.1	54761361	54761363
NC_047560.1	4561420	4561422
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494

==> ../Haws_04-DSS/DML/DML-ploidy-DSS.csv.bed <==
NC_047559.1	3159595	3159597
NC_047559.1	3159620	3159622
NC_047559.1	22732543	22732545
NC_047559.1	30739063	30739065
NC_047559.1	43886947	43886949
NC_047559.1	44191406	44191408
NC_047559.1	44850822	44850824
NC_047559.1	45984057	45984059
NC_047559.1	47884062	47884064
NC_047559.1	48771720	48771722

==> ../Haws_04-DSS/DML/DML-ploidypH-DSS.csv.bed <==
NC_047559.1	3022288	3022290
NC_047559.1	6445629	6445631
NC_047559.1	46813912	46813914
NC_047559.1	47000336	47000338
NC_047560.1	4561492	4561494
NC_047560.1	40407111	40407113
NC_047560.1	55499797	55499799
NC_047560.1	59701557	5970155

In [14]:
%%bash

#Move BEDfiles to current working directory
mv ../Haws_04-DSS/DML/*bed .

I imported the BEDfiles into [this IGV session]() to visualize them.

## 3. Identify overlaps between DML lists

### 3a. `methylKit`

In [5]:
#Count hypomethylated DML
#Count hypermethylated DML
!grep "-" DML-pH-25-Cov5.csv.bed | wc -l
!grep -v "-" DML-pH-25-Cov5.csv.bed | wc -l

      30
      12


In [7]:
#Count hypomethylated DML
#Count hypermethylated DML
!grep "-" DML-ploidy-25-Cov5.csv.bed | wc -l
!grep -v "-" DML-ploidy-25-Cov5.csv.bed | wc -l

      10
      19


In [10]:
#Find overlaps between pH- and ploidy-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b DML-ploidy-25-Cov5.csv.bed \
> DML-Cov5-Overlaps.bed
!head DML-Cov5-Overlaps.bed
!wc -l DML-Cov5-Overlaps.bed

NC_047561.1	40362698	40362700	-31.0344827586207
NC_047567.1	9520723	9520725	-45.7492354740061
       2 DML-Cov5-Overlaps.bed


### 3b. `DSS`

In [16]:
#Find overlaps between pH- and ploidy-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b DML-ploidy-DSS.csv.bed \
> DML-DSS-pHploidy-Overlaps.bed
!head DML-DSS-pHploidy-Overlaps.bed
!wc -l DML-DSS-pHploidy-Overlaps.bed

NC_047559.1	44191406	44191408
NC_047559.1	50090321	50090323
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	19948171	19948173
NC_047560.1	40407111	40407113
NC_047562.1	13501413	13501415
NC_047563.1	33073757	33073759
NC_047565.1	41071596	41071598
NC_047565.1	43573693	43573695
      21 DML-DSS-pHploidy-Overlaps.bed


In [17]:
#Find overlaps between pH- and interaction-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b DML-ploidypH-DSS.csv.bed \
> DML-DSS-pHint-Overlaps.bed
!head DML-DSS-pHint-Overlaps.bed
!wc -l DML-DSS-pHint-Overlaps.bed

NC_047559.1	47000336	47000338
NC_047560.1	4561492	4561494
NC_047560.1	40407111	40407113
NC_047560.1	59701557	59701559
NC_047561.1	25296188	25296190
NC_047564.1	48296668	48296670
NC_047565.1	41071596	41071598
NC_047567.1	31560080	31560082
NC_047567.1	31560110	31560112
NC_047567.1	31560120	31560122
      11 DML-DSS-pHint-Overlaps.bed


In [18]:
#Find overlaps between ploidy- and interaction-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b DML-ploidypH-DSS.csv.bed \
> DML-DSS-ploidyint-Overlaps.bed
!head DML-DSS-ploidyint-Overlaps.bed
!wc -l DML-DSS-ploidyint-Overlaps.bed

NC_047560.1	4561492	4561494
NC_047560.1	40407111	40407113
NC_047563.1	6395389	6395391
NC_047565.1	41071596	41071598
NC_047566.1	15683888	15683890
NC_047566.1	15685674	15685676
NC_047567.1	3077162	3077164
NC_047567.1	31559112	31559114
NC_047567.1	31559989	31559991
NC_047567.1	31560004	31560006
      17 DML-DSS-ploidyint-Overlaps.bed


In [20]:
#Find overlaps between pH-, ploidy- and interaction-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b DML-ploidy-DSS.csv.bed DML-ploidypH-DSS.csv.bed \
> DML-DSS-all-Overlaps.bed
!head DML-DSS-all-Overlaps.bed
!wc -l DML-DSS-all-Overlaps.bed

NC_047559.1	44191406	44191408
NC_047559.1	47000336	47000338
NC_047559.1	50090321	50090323
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	19948171	19948173
NC_047560.1	40407111	40407113
NC_047560.1	59701557	59701559
NC_047561.1	25296188	25296190
NC_047562.1	13501413	13501415
      25 DML-DSS-all-Overlaps.bed


### 3c. `methylKit` and `DSS`

In [21]:
#Find overlaps between pH DML lists
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b DML-pH-DSS.csv.bed \
> DML-pH-method-Overlaps.bed
!head DML-pH-method-Overlaps.bed
!wc -l DML-pH-method-Overlaps.bed

NC_047567.1	16984837	16984839	42.8241335044929
       1 DML-pH-method-Overlaps.bed


In [22]:
#Find overlaps between ploidy DML lists
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b DML-ploidy-DSS.csv.bed \
> DML-ploidy-method-Overlaps.bed
!head DML-ploidy-method-Overlaps.bed
!wc -l DML-ploidy-method-Overlaps.bed

NC_047561.1	40362698	40362700	29.4117647058824
NC_047564.1	23049738	23049740	29.2845880961766
NC_047565.1	14899959	14899961	32.5955265610438
       3 DML-ploidy-method-Overlaps.bed


## 4. Characterize genomic locations of DML

I will look at overlaps between genome features and either pH- or ploidy-DML.

### 4a. Gene

#### `methylKit`

In [5]:
#Find overlaps between DML and feature
#Look at output
#Count number of overlaps

!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-25-Cov5-Gene.bed
!head DML-pH-25-Cov5-Gene.bed
!wc -l DML-pH-25-Cov5-Gene.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047559.1	15801827	15801829	-45.6918238993711
NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10147466	10147468	-30.4647676161919
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19545407	19545409	-41.4451612903226
      36 DML-pH-25-Cov5-Gene.bed


In [11]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-25-Cov5-Gene.bed
!head DML-ploidy-25-Cov5-Gene.bed
!wc -l DML-ploidy-25-Cov5-Gene.bed

NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047561.1	40362698	40362700	29.4117647058824
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	-30.0865800865801
NC_047564.1	25380708	25380710	-40.1414677276746
      25 DML-ploidy-25-Cov5-Gene.bed


In [15]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-Cov5-Overlaps-Gene.bed
!head DML-Cov5-Overlaps-Gene.bed
!wc -l DML-Cov5-Overlaps-Gene.bed

NC_047561.1	40362698	40362700	-31.0344827586207
NC_047567.1	9520723	9520725	-45.7492354740061
       2 DML-Cov5-Overlaps-Gene.bed


#### `DSS`

In [23]:
#Find overlaps between DML and feature
#Look at output
#Count number of overlaps

!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-DSS-Gene.bed
!head DML-pH-DSS-Gene.bed
!wc -l DML-pH-DSS-Gene.bed

NC_047559.1	41205913	41205915
NC_047559.1	44191406	44191408
NC_047559.1	47000336	47000338
NC_047559.1	50090321	50090323
NC_047560.1	4561420	4561422
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	4561508	4561510
NC_047560.1	4565018	4565020
NC_047560.1	19948171	19948173
     123 DML-pH-DSS-Gene.bed


In [27]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-DSS-Gene.bed
!head DML-ploidy-DSS-Gene.bed
!wc -l DML-ploidy-DSS-Gene.bed

NC_047559.1	3159595	3159597
NC_047559.1	3159620	3159622
NC_047559.1	30739063	30739065
NC_047559.1	43886947	43886949
NC_047559.1	44191406	44191408
NC_047559.1	45984057	45984059
NC_047559.1	47884062	47884064
NC_047559.1	48771720	48771722
NC_047559.1	50090321	50090323
NC_047559.1	53771128	53771130
     145 DML-ploidy-DSS-Gene.bed


In [25]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidypH-DSS-Gene.bed
!head DML-ploidypH-DSS-Gene.bed
!wc -l DML-ploidypH-DSS-Gene.bed

NC_047559.1	3022288	3022290
NC_047559.1	6445629	6445631
NC_047559.1	46813912	46813914
NC_047559.1	47000336	47000338
NC_047560.1	4561492	4561494
NC_047560.1	55499797	55499799
NC_047560.1	59701557	59701559
NC_047561.1	25296188	25296190
NC_047562.1	19799003	19799005
NC_047563.1	6395389	6395391
      48 DML-ploidypH-DSS-Gene.bed


### 4b. Exon UTR

#### `methylKit`

In [6]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-pH-25-Cov5-exonUTR.bed
!head DML-pH-25-Cov5-exonUTR.bed
!wc -l DML-pH-25-Cov5-exonUTR.bed

NC_047561.1	10147466	10147468	-30.4647676161919
NC_047563.1	11760749	11760751	-34.033180778032
NC_047564.1	43801732	43801734	-26.7326732673267
NC_047565.1	4762558	4762560	-26.7316669176329
NC_047566.1	9548317	9548319	-34.3623481781376
       5 DML-pH-25-Cov5-exonUTR.bed


In [12]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-ploidy-25-Cov5-exonUTR.bed
!head DML-ploidy-25-Cov5-exonUTR.bed
!wc -l DML-ploidy-25-Cov5-exonUTR.bed

       0 DML-ploidy-25-Cov5-exonUTR.bed


In [20]:
#Remove empty file
!rm DML-ploidy-25-Cov5-exonUTR.bed

### 4c. CDS

In [7]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-pH-25-Cov5-CDS.bed
!head DML-pH-25-Cov5-CDS.bed
!wc -l DML-pH-25-Cov5-CDS.bed

NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	39008886	39008888	-35.8974358974359
NC_047561.1	40362698	40362700	-31.0344827586207
NC_047567.1	15896903	15896905	-28.3455405508507
NC_047567.1	22295946	22295948	-26.9118276501641
NC_047568.1	46593770	46593772	-26.1194029850746
       7 DML-pH-25-Cov5-CDS.bed


In [13]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-ploidy-25-Cov5-CDS.bed
!head DML-ploidy-25-Cov5-CDS.bed
!wc -l DML-ploidy-25-Cov5-CDS.bed

NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	22468723	22468725	28.4117647058823
NC_047561.1	40362698	40362700	29.4117647058824
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	-30.0865800865801
NC_047565.1	11970715	11970717	46.6938636749958
NC_047566.1	46447078	46447080	37.3155447746109
       7 DML-ploidy-25-Cov5-CDS.bed


In [16]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-Cov5-Overlaps-CDS.bed
!head DML-Cov5-Overlaps-CDS.bed
!wc -l DML-Cov5-Overlaps-CDS.bed

NC_047561.1	40362698	40362700	-31.0344827586207
       1 DML-Cov5-Overlaps-CDS.bed


### 4d. Intron

In [8]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-pH-25-Cov5-intron.bed
!head DML-pH-25-Cov5-intron.bed
!wc -l DML-pH-25-Cov5-intron.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047559.1	15801827	15801829	-45.6918238993711
NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19545407	19545409	-41.4451612903226
NC_047561.1	31290734	31290736	-30.2791262135922
NC_047561.1	46808693	46808695	-27.2727272727273
NC_047563.1	66794619	66794621	-29.651103651714
      24 DML-pH-25-Cov5-intron.bed


In [14]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-ploidy-25-Cov5-intron.bed
!head DML-ploidy-25-Cov5-intron.bed
!wc -l DML-ploidy-25-Cov5-intron.bed

NC_047559.1	44801744	44801746	34.0988480118915
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	25380708	25380710	-40.1414677276746
NC_047565.1	10523508	10523510	38.0689469431726
NC_047565.1	13203393	13203395	41.1725955204216
NC_047565.1	14899959	14899961	32.5955265610438
NC_047566.1	27129225	27129227	37.7269975786925
NC_047566.1	35988011	35988013	-53.0531425651507
      18 DML-ploidy-25-Cov5-intron.bed


In [17]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-Cov5-Overlaps-intron.bed
!head DML-Cov5-Overlaps-intron.bed
!wc -l DML-Cov5-Overlaps-intron.bed

NC_047567.1	9520723	9520725	-45.7492354740061
       1 DML-Cov5-Overlaps-intron.bed


### 4e. Upstream flanks

In [9]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_upstream.gff \
> DML-pH-25-Cov5-upstream.bed
!head DML-pH-25-Cov5-upstream.bed
!wc -l DML-pH-25-Cov5-upstream.bed

       0 DML-pH-25-Cov5-upstream.bed


In [15]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_upstream.gff \
> DML-ploidy-25-Cov5-upstream.bed
!head DML-ploidy-25-Cov5-upstream.bed
!wc -l DML-ploidy-25-Cov5-upstream.bed

       0 DML-ploidy-25-Cov5-upstream.bed


In [14]:
#Remove empty files
!rm *upstream.bed

### 4f. Downstream flanks

In [10]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_downstream.gff \
> DML-pH-25-Cov5-downstream.bed
!head DML-pH-25-Cov5-downstream.bed
!wc -l DML-pH-25-Cov5-downstream.bed

NC_047561.1	19286180	19286182	-55.4137931034483
NC_047561.1	21915577	21915579	46.9271523178808
NC_047567.1	16984837	16984839	42.8241335044929
NW_022994991.1	19672	19674	36.769801980198
       4 DML-pH-25-Cov5-downstream.bed


In [16]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_downstream.gff \
> DML-ploidy-25-Cov5-downstream.bed
!head DML-ploidy-25-Cov5-downstream.bed
!wc -l DML-ploidy-25-Cov5-downstream.bed

NC_047566.1	24265305	24265307	-26.1261261261261
       1 DML-ploidy-25-Cov5-downstream.bed


### 4g. Intergenic regions

In [11]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intergenic.bed \
> DML-pH-25-Cov5-intergenic.bed
!head DML-pH-25-Cov5-intergenic.bed
!wc -l DML-pH-25-Cov5-intergenic.bed

NC_047563.1	61114616	61114618	-30.8823529411765
NC_047565.1	44521815	44521817	-30.3333333333333
       2 DML-pH-25-Cov5-intergenic.bed


In [17]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intergenic.bed \
> DML-ploidy-25-Cov5-intergenic.bed
!head DML-ploidy-25-Cov5-intergenic.bed
!wc -l DML-ploidy-25-Cov5-intergenic.bed

NC_047559.1	53732861	53732863	25.8426966292135
NC_047566.1	24266096	24266098	-29.4736842105263
NC_047566.1	24266109	24266111	-27.7777777777778
       3 DML-ploidy-25-Cov5-intergenic.bed


### 4h. lncRNA

In [12]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-pH-25-Cov5-lncRNA.bed
!head DML-pH-25-Cov5-lncRNA.bed
!wc -l DML-pH-25-Cov5-lncRNA.bed

NC_047564.1	43801732	43801734	-26.7326732673267
NC_047565.1	44578741	44578743	-26.7896446913321
NC_047566.1	9548317	9548319	-34.3623481781376
       3 DML-pH-25-Cov5-lncRNA.bed


In [18]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-ploidy-25-Cov5-lncRNA.bed
!head DML-ploidy-25-Cov5-lncRNA.bed
!wc -l DML-ploidy-25-Cov5-lncRNA.bed

       0 DML-ploidy-25-Cov5-lncRNA.bed


In [18]:
#Remove empty file
!rm DML-ploidy-25-Cov5-lncRNA.bed

### 4i. Tranposable elements

In [13]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-pH-25-Cov5-TE.bed
!head DML-pH-25-Cov5-TE.bed
!wc -l DML-pH-25-Cov5-TE.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	19286180	19286182	-55.4137931034483
NC_047561.1	21915577	21915579	46.9271523178808
NC_047563.1	61114616	61114618	-30.8823529411765
NC_047564.1	2678443	2678445	-45.6953642384106
NC_047565.1	10619872	10619874	-25.6880733944954
NC_047565.1	44521815	44521817	-30.3333333333333
NC_047565.1	44578741	44578743	-26.7896446913321
NC_047566.1	23226898	23226900	25.3731343283582
      16 DML-pH-25-Cov5-TE.bed


In [19]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-ploidy-25-Cov5-TE.bed
!head DML-ploidy-25-Cov5-TE.bed
!wc -l DML-ploidy-25-Cov5-TE.bed

NC_047559.1	44801744	44801746	34.0988480118915
NC_047559.1	53732861	53732863	25.8426966292135
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047563.1	39926052	39926054	42.6872058194266
NC_047566.1	50117081	50117083	32.0492517222266
NC_047566.1	51204319	51204321	35.812086064308
NC_047567.1	21017447	21017449	34.8875423641779
       8 DML-ploidy-25-Cov5-TE.bed


In [19]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-Cov5-Overlaps-TE.bed
!head DML-Cov5-Overlaps-TE.bed
!wc -l DML-Cov5-Overlaps-TE.bed

       0 DML-Cov5-Overlaps-TE.bed


In [20]:
!rm DML-Cov5-Overlaps-TE.bed

## 5. SNP overlap

I will now look at overlaps between sex-specific DML and unique C/T SNPs.

### 5a. Create BEDfiles

In [37]:
!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab

NC_001276.1	12440	.	C	T
NC_001276.1	7226	.	C	T
NC_047559.1	10001065	.	C	T
NC_047559.1	10001128	.	C	T
NC_047559.1	1000226	.	C	T
NC_047559.1	10004318	.	C	T
NC_047559.1	100045	.	C	T
NC_047559.1	10004558	.	C	T
NC_047559.1	10005322	.	C	T
NC_047559.1	10005684	.	C	T


In [38]:
!awk '{print $1"\t"$2"\t"$2}' /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab \
> /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed
!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed

NC_001276.1	12440	12440
NC_001276.1	7226	7226
NC_047559.1	10001065	10001065
NC_047559.1	10001128	10001128
NC_047559.1	1000226	1000226
NC_047559.1	10004318	10004318
NC_047559.1	100045	100045
NC_047559.1	10004558	10004558
NC_047559.1	10005322	10005322
NC_047559.1	10005684	10005684


### 5b. Overlaps with Unique C/T SNPs

In [39]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-pH-25-Cov5-unique-CT-SNPs.bed
!head DML-pH-25-Cov5-unique-CT-SNPs.bed
!wc -l DML-pH-25-Cov5-unique-CT-SNPs.bed

NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	39008886	39008888	-35.8974358974359
NC_047567.1	15896903	15896905	-28.3455405508507
NC_047568.1	46593770	46593772	-26.1194029850746
       6 DML-pH-25-Cov5-unique-CT-SNPs.bed


In [40]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-ploidy-25-Cov5-unique-CT-SNPs.bed
!head DML-ploidy-25-Cov5-unique-CT-SNPs.bed
!wc -l DML-ploidy-25-Cov5-unique-CT-SNPs.bed

NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047565.1	11970715	11970717	46.6938636749958
NC_047568.1	46583284	46583286	-33.1582332761578
       5 DML-ploidy-25-Cov5-unique-CT-SNPs.bed


In [22]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-Cov5-Overlaps-unique-CT-SNPs.bed
!head DML-Cov5-Overlaps-unique-CT-SNPs.bed
!wc -l DML-Cov5-Overlaps-unique-CT-SNPs.bed

       0 DML-Cov5-Overlaps-unique-CT-SNPs.bed


In [23]:
#Remove empty file
!rm DML-Cov5-Overlaps-unique-CT-SNPs.bed