This notebook contains the instructions for reproducing results presented in  "*Environmental and genealogical signals on DNA methylation in a widespread apomictic dandelion lineage*" by V.N. Ibañez, M. van Antro, C. Peña Ponton, S. Ivanovic, C.A.M. Wagemaker, F. Gawehns, K.J.F. Verhoeven.

## Load data and set R environment

In this section, we will load the dataset to run the script, configure the working directory and environment.

In [1]:
#@title Load files
%load_ext rpy2.ipython
!rm -r *
!mkdir results rawData annotation scripts plots tmp

!wget -c -O scripts/commonFunctions.R https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/Rscripts/commonFunctions.R
!wget -c -O rawData/AseI-NsiI_Design_withPlotInfos.txt https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_Design_withPlotInfos.txt
!wget -c -O rawData/Csp6I-NsiI_Design_withPlotInfos.txt https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/Csp6I-NsiI_Design_withPlotInfos.txt

!wget -c -O rawData/AseI-NsiI_methylation.filtered https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_petite.methylation.filtered
!wget -c -O rawData/Csp6I-NsiI_methylation.filtered https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/Csp6I-NsiI_petite.methylation.filtered



!wget -c -O annotation/Csp6I-NsiI_mergedAnnot.csv https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/Csp6I-NsiI_mergedAnnot.csv
!wget -c -O annotation/AseI-NsiI_mergedAnnot.csv https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_mergedAnnot.csv


--2022-09-17 14:12:08--  https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/Rscripts/commonFunctions.R
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18802 (18K) [text/plain]
Saving to: ‘scripts/commonFunctions.R’


2022-09-17 14:12:08 (5.99 MB/s) - ‘scripts/commonFunctions.R’ saved [18802/18802]

--2022-09-17 14:12:08--  https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_Design_withPlotInfos.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2691 (2.6K) [text/plain]
Saving to: ‘rawData/AseI-NsiI

In [2]:
%%R
#@title Set R environment
rm(list=ls())
wd<-getwd()
baseDir <- gsub("/results", "", wd)
scriptDir <- file.path(baseDir, "scripts")


In [3]:
%%R
#@title Install R packages
install.packages(c("data.table","vegan"), quiet=TRUE)





In [4]:
%%R
#@title Load packages silently
## Load packages silently:
suppressPackageStartupMessages({
  library(data.table)
  library(vegan)
  source(file.path(scriptDir, "commonFunctions.R"), local=TRUE)
})


# Analyzing data step-by-step

In this section, we will explore chunk of code to filter the one dataset: *AseI-NsiI*


## Load and explore data

In [5]:
#@title
%%R
RE<-"AseI-NsiI"
designTable <- file.path(paste0(baseDir, "/rawData/",RE, "_Design_withPlotInfos.txt"))
infileName <- file.path(paste0(baseDir,"/rawData/",RE,"_methylation.filtered"))
annotationFile <- file.path(paste0(baseDir, "/annotation/",RE, "_mergedAnnot.csv"))
## Load data
colNamesForGrouping <- c("Treat")
sampleTab <- f.read.sampleTable(designTable, colNamesForGrouping) # see commonFunctions.R
Data <- f.load.methylation.bed(infileName) # see commonFunctions.R
ctxt <- c("CHH", "CG", "CHG")#, "all")


===  2022 Sep 17 02:15:04 PM === Removing 0 samples due to the sampleRemovalInfo column 


## Make distance files for each context



In [7]:
#@title
%%R
#loop for all context
for (j in 1:length(ctxt)){
  myD <- subset(Data, context == ctxt[j])
  totalCols <- grep("_total$", colnames(myD), value = TRUE)
  methCols <- grep("_methylated$", colnames(myD), value = TRUE)
  totCov <- myD[,totalCols]
  methCov <- myD[,methCols]
  colnames(totCov) <- gsub("_total$", "", colnames(totCov))
  colnames(methCov) <- gsub("_methylated$", "", colnames(methCov))
  # match the samples
  commonSamples <- sort(intersect(colnames(totCov), rownames(sampleTab)))
  out<-f.meth.distances(methCov, totCov)
  #plot the non-Metric MDS
  m<-as.matrix(t(out))
  d<-as.dist(m)
  forPlot <- tryCatch(MASS::isoMDS(d)$points, error = function(e) {NA}, finally = cat("###\n"))
  pdf(paste0(baseDir,"/plots/", RE, "_",ctxt[j],"_isoMDS", ".pdf"), height = 4, width = 4)
  plot(forPlot, pch=sampleTab[rownames(forPlot), "pch"], col = sampleTab[rownames(forPlot), "color"], main = ctxt[j])
  invisible(dev.off())
    
  #permanova in distance matrix
  eti<-sampleTab[order(rownames(sampleTab)),1:3]
  eti[, 1:3]<-lapply(eti[,1:3], as.factor)
  fit<-adonis(d ~eti$Acc+eti$Treat, data=eti, permutation=10000)
  #save results
  write.csv(fit$aov.tab, file = paste0(baseDir,"/results/",RE,"_",ctxt[j], "_adonis.csv"), row.names = TRUE)
  write.csv(out, file = paste0(baseDir,"/results/",RE,"_",ctxt[j], "_Distances.csv"), row.names = TRUE)
}

initial  value 39.396397 
iter   5 value 26.148687
iter  10 value 25.097781
iter  15 value 23.951141
final  value 23.428136 
converged
###





initial  value 35.749204 
iter   5 value 25.127177
iter  10 value 24.543975
iter  10 value 24.534435
iter  10 value 24.534435
final  value 24.534435 
converged
###





initial  value 34.478253 
iter   5 value 25.318136
final  value 25.085618 
converged
###





# A non-metric MDS was used to visualise epigenetic distances 

In this section, the code will run the previous steps for both datasets: *AseI-NsiI* and *Csp6I-NsiI*

In [8]:
%%R
#@ title Characterize both data set: AseI-NsiI and Csp6I-NsiI
## process both data set
RE<-c("AseI-NsiI", "Csp6I-NsiI")
for (r in 1:length(RE)){
  designTable <- file.path(paste0(baseDir, "/rawData/",RE[r], "_Design_withPlotInfos.txt"))
  infileName <- file.path(paste0(baseDir,"/rawData/",RE[r],"_methylation.filtered"))
  annotationFile <- file.path(paste0(baseDir, "/annotation/",RE[r], "_mergedAnnot.csv"))
  
  colNamesForGrouping <- c("Treat")
  sampleTab <- f.read.sampleTable(designTable, colNamesForGrouping) # see commonFunctions.R
  Data <- f.load.methylation.bed(infileName) # see commonFunctions.R
  ctxt <- c("CHH", "CG", "CHG")#, "all")
  #loop for all context
  for (j in 1:length(ctxt)){
    myD <- subset(Data, context == ctxt[j])
    totalCols <- grep("_total$", colnames(myD), value = TRUE)
    methCols <- grep("_methylated$", colnames(myD), value = TRUE)
    totCov <- myD[,totalCols]
    methCov <- myD[,methCols]
    colnames(totCov) <- gsub("_total$", "", colnames(totCov))
    colnames(methCov) <- gsub("_methylated$", "", colnames(methCov))
    # match the samples
    commonSamples <- sort(intersect(colnames(totCov), rownames(sampleTab)))
    out<-f.meth.distances(methCov, totCov)
    #plot the non-Metric MDS
    m<-as.matrix(t(out))
    d<-as.dist(m)
    forPlot <- tryCatch(MASS::isoMDS(d)$points, error = function(e) {NA}, finally = cat("###\n"))
    pdf(paste0(baseDir,"/plots/", RE[r], "_",ctxt[j],"_isoMDS", ".pdf"), height = 4, width = 4)
    plot(forPlot, pch=sampleTab[rownames(forPlot), "pch"], col = sampleTab[rownames(forPlot), "color"], main = ctxt[j])
    invisible(dev.off())
    #permanova in distance matrix
    eti<-sampleTab[order(rownames(sampleTab)),1:3]
    eti[, 1:3]<-lapply(eti[,1:3], as.factor)
    fit<-adonis(d ~eti$Acc+eti$Treat, data=eti, permutation=10000)
    #save results
    write.csv(fit$aov.tab, file = paste0(baseDir,"/tmp/",RE[r],"_",ctxt[j], "_adonis.csv"), row.names = TRUE)
    write.csv(out, file = paste0(baseDir,"/tmp/",RE[r],"_",ctxt[j], "_Distances.csv"), row.names = TRUE)
  }
}  

===  2022 Sep 17 02:34:11 PM === Removing 0 samples due to the sampleRemovalInfo column 
initial  value 39.396397 
iter   5 value 26.148687
iter  10 value 25.097781
iter  15 value 23.951141
final  value 23.428136 
converged
###





initial  value 35.749204 
iter   5 value 25.127177
iter  10 value 24.543975
iter  10 value 24.534435
iter  10 value 24.534435
final  value 24.534435 
converged
###





initial  value 34.478253 
iter   5 value 25.318136
final  value 25.085618 
converged
###





===  2022 Sep 17 02:34:15 PM === Removing 0 samples due to the sampleRemovalInfo column 
initial  value 37.899099 
iter   5 value 25.409275
iter  10 value 24.399581
final  value 23.749193 
converged
###





initial  value 36.161017 
iter   5 value 25.662068
iter  10 value 25.074448
iter  10 value 25.062868
final  value 24.824763 
converged
###





initial  value 51.958994 
iter   5 value 30.861031
final  value 29.338499 
converged
###



