This notebook contains the instructions for reproducing results presented in  "*Environmental and genealogical signals on DNA methylation in a widespread apomictic dandelion lineage*" by V.N. Ibañez, M. van Antro, C. Peña Ponton, S. Ivanovic, C.A.M. Wagemaker, F. Gawehns, K.J.F. Verhoeven.

## Load data and set R environment

In this section, we will load the dataset to run the script, configure the working directory and environment.

In [3]:
#@title Load files
%load_ext rpy2.ipython
!rm -r *
!mkdir results rawData annotation scripts plots tmp

!wget -c -O scripts/commonFunctions.R https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/Rscripts/commonFunctions.R
!wget -c -O rawData/AseI-NsiI_Design_withPlotInfos.txt https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_Design_withPlotInfos.txt
!wget -c -O rawData/Csp6I-NsiI_Design_withPlotInfos.txt https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/Csp6I-NsiI_Design_withPlotInfos.txt

!wget -c -O rawData/AseI-NsiI_methylation.filtered https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_petite.methylation.filtered
!wget -c -O rawData/Csp6I-NsiI_methylation.filtered https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/Csp6I-NsiI_petite.methylation.filtered



!wget -c -O annotation/Csp6I-NsiI_mergedAnnot.csv https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/Csp6I-NsiI_mergedAnnot.csv
!wget -c -O annotation/AseI-NsiI_mergedAnnot.csv https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_mergedAnnot.csv


--2022-09-17 14:04:32--  https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/Rscripts/commonFunctions.R
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18802 (18K) [text/plain]
Saving to: ‘scripts/commonFunctions.R’


2022-09-17 14:04:32 (12.0 MB/s) - ‘scripts/commonFunctions.R’ saved [18802/18802]

--2022-09-17 14:04:32--  https://raw.githubusercontent.com/VeronicaNoe/epiTree/main/data4r/AseI-NsiI_Design_withPlotInfos.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2691 (2.6K) [text/plain]
Saving to: ‘rawData/AseI-NsiI

In [4]:
%%R
#@title Set R environment
rm(list=ls())
wd<-getwd()
baseDir <- gsub("/results", "", wd)
scriptDir <- file.path(baseDir, "scripts")


In [5]:
%%R
#@title Install R packages
install.packages(c("data.table","vioplot","vegan","reshape2"), quiet=TRUE)

In [6]:
%%R
#@title Load packages silently
## Load packages silently:
suppressPackageStartupMessages({
  library(data.table)
  library(vioplot) # plotting
  library(vegan)
  library(reshape2)
  source(file.path(scriptDir, "commonFunctions.R"), local=TRUE)
})


# Analyzing data step-by-step

In this section, we will explore chunk of code to filter the one dataset: *AseI-NsiI*


## Load and explore data

In [31]:
#@title
%%R
RE<-"AseI-NsiI"
designTable <- file.path(paste0(baseDir, "/rawData/",RE, "_Design_withPlotInfos.txt"))
infileName <- file.path(paste0(baseDir,"/rawData/",RE,"_methylation.filtered"))
annotationFile <- file.path(paste0(baseDir, "/annotation/",RE, "_mergedAnnot.csv"))
## Load data
sampleTab <- f.read.sampleTable(designTable)
mePerc <- f.load.methylation.bed(infileName, percentages = TRUE) # see commonFunctions.R
infoColumns <- c("chr", "pos", "context")
allSamples <- setdiff(colnames(mePerc), infoColumns)
sampleTab <- sampleTab[allSamples,]
print(mePerc[1:10,1:10]) 

===  2022 Sep 17 01:44:57 PM === Removing 0 samples due to the sampleRemovalInfo column 
   chr pos context sample_1_AseI sample_10_AseI sample_11_AseI sample_12_AseI
1   38  23     CHG        95.238         84.000         93.939         87.097
2   38  28     CHH         4.762          0.000          0.000          0.000
3   38  36     CHH         0.000          4.000          0.000          0.000
4   38  61     CHG        45.000         60.000         57.576         48.387
5   38  81      CG       100.000         96.000        100.000         87.097
6   38 108     CHH         9.524         12.000          9.375         19.355
7   38 112     CHG       100.000        100.000         84.848         93.548
8   38 135     CHG        95.000         83.333         90.323         93.333
9   38 142      CG       100.000         95.833        100.000        100.000
10  38 149     CHH            NA          0.000          0.000          0.000
   sample_13_AseI sample_15_AseI sample_16_AseI
1    

## Descriptive analysis with overall methylation levels.



In [32]:
#@title
%%R
## Average within groups 
#Select on what to average
aveData <- f.summarize.columns(mePerc, data.frame(sample = rownames(sampleTab), group = sampleTab$Treat, stringsAsFactors = FALSE), function(x) mean(x, na.rm = TRUE))
aveDataInfo <- mePerc[,infoColumns]
rownames(aveDataInfo) <- paste0("chr", aveDataInfo$chr, "_", aveDataInfo$pos)
rownames(aveData) <- rownames(aveDataInfo)
## Choose a group order for the plot and set the colors
forPlotOrder <- c("Control", "Shade")
aveData <- aveData[, match(forPlotOrder, colnames(aveData))]
temp <- unique(sampleTab[,c("Treat","color")])
plotColors <- temp$color; names(plotColors) <- temp$Treat

## Draw a histogram



In [36]:
#@title
%%R
## Draw histograms
allContexts <- c("CG", "CHG", "CHH")#
numPlotRows <- 1 #
numPlotCols <- length(allContexts)
pdf(file.path(paste0(baseDir,"/plots/", RE, "_histo.pdf")))
par(oma = c(2, 2, 2, 2))
layout(matrix(1:(numPlotRows*numPlotCols), nrow = numPlotRows, byrow = TRUE))
for (ctxt in allContexts) {
  subData <- subset(mePerc, mePerc$context==ctxt)
  toPlot<-reshape2::melt(subData, id=infoColumns)
  histo<-hist(toPlot$value, breaks=seq(0,100,10),  plot=FALSE)
  ymax<-max(histo$counts)
  plot(histo, main=ctxt, xlab="", ylim=c(0, ymax*1.5))
}
invisible(dev.off())

## Draw a violin plot



In [37]:
#@title
%%R
## Draw violin plots
allContexts <- c("CG", "CHG", "CHH")#, "all") 
numPlotRows <- 1 
numPlotCols <- length(allContexts)
allMeans <- matrix(NA, nrow = length(forPlotOrder), ncol = numPlotCols, dimnames = list(forPlotOrder, allContexts))
aveData <- aveData[rownames(aveDataInfo),] 
  
pdf(paste0(baseDir,"/plots/", RE, "_Context_methylationLevelsViolinPlot.pdf"), height = 5*numPlotRows, width = 2+length(forPlotOrder)*numPlotCols)
par(oma = c(12, 8, 3, 0), mar = c(0, 0, 0, 0))
layout(matrix(1:(numPlotRows*numPlotCols), nrow = numPlotRows, byrow = TRUE))
for (ctxt in allContexts) {
  if (ctxt == "all") {
    subData <- aveData
  } else {
    subData <- aveData[aveDataInfo$context == ctxt,]
  }
  plot(NA, main = ctxt, bty = "n", xaxs = "r", yaxs = "r", xlab = "", ylab = "", las = 1, cex = 0.2, tck = 0.01, xlim = c(0.5, length(forPlotOrder)+0.5), ylim = c(0, 100), xaxt = "n", yaxt = "n")
  curPos <- 1
  for (curGroup in forPlotOrder) {
    toPlot <- subData[,curGroup]
    toPlot <- toPlot[!is.na(toPlot)]
    curCol <- plotColors[curGroup]
    if (sum(toPlot > 0) > 4) {
      vioplot(toPlot, names = c(curGroup), col = curCol, ylim = c(0,100), drawRect = TRUE, add = TRUE, at = curPos)
    }
  curMean <- mean(toPlot)
  lines(c(curPos-0.3,curPos+0.3), c(curMean, curMean), col = "black", lwd = 4, lty = 1)
  curPos <- curPos + 1
  allMeans[curGroup,ctxt] <- curMean # add the mean to the collection
  }
  if (ctxt != "all") { axis(2, at = seq(0, 100, by = 20), labels = seq(0, 100, by = 20), outer = TRUE, las = 1, line=2, lwd=2, cex.axis=2) }
    axis(1, at = 1:length(forPlotOrder), labels = forPlotOrder, outer = TRUE, las = 2, line=2, lwd=2, cex.axis=3)
}
invisible(dev.off())
write.csv(round(allMeans, 3), file.path(paste0(baseDir,"/tmp/",RE, "_Context_methylationLevelsViolinPlot_means.csv")))

## Draw a heatmap



In [39]:
#@title
%%R
## Get the average methylation level per group, context, feature
allFeatures <- c("gene", "transposon", "repeat", "nothing")
forMask <- paste0("chr", aveDataInfo$chr)
listForPlot <- list()
for (ctxt in allContexts) {
  if (ctxt == "all") {
    contextMask <- rep(TRUE, nrow(aveDataInfo))
  } else {
    contextMask <- aveDataInfo$context == ctxt
  }
  featureMeans <- matrix(NA, nrow = length(allFeatures), ncol = length(forPlotOrder), dimnames = list(allFeatures, forPlotOrder))
  for (feature in allFeatures) {
    mergedAnno <- f.load.merged.annotation(annotationFile, feature)
    annoMask <- forMask %in% rownames(mergedAnno)
    subData <- aveData[annoMask & contextMask,]
    featureMeans[feature, colnames(aveData)] <- colMeans(subData, na.rm = TRUE)
  }
  listForPlot[[ctxt]] <- featureMeans
}

# do the plot
dirOut<-paste0(baseDir,"/plots/",RE, "_Context_methylationLevelsPerFeature.pdf")
imageColors <- f.blackblueyellowredpinkNICE(51) 
pdf(dirOut, height = 5*numPlotRows, width = 2+length(forPlotOrder)*numPlotCols)
layout(matrix(1:(numPlotRows*numPlotCols), nrow = numPlotRows, byrow = TRUE))
for (ctxt in allContexts) {
  temp <- listForPlot[[ctxt]]
  f.image.without.text(forPlotOrder, allFeatures, t(temp), xLabel = "", yLabel = "", mainLabel = ctxt, useLog = FALSE, col = imageColors, zlim = c(0, 100))
  #write.csv(round(temp, 3), paste0(baseDir,"/tmp/",RE,"_methylationLevelsPerFeature_",ctxt, ".csv"))
  }
invisible(dev.off())

# Descriptive analysis with overall methylation levels


In this section, the code will run the previous steps for both datasets: *AseI-NsiI* and *Csp6I-NsiI*

In [7]:
%%R
#@ title Characterize both data set: AseI-NsiI and Csp6I-NsiI
## process both data set
RE<-c("AseI-NsiI", "Csp6I-NsiI")
for (r in 1:length(RE)){
  designTable <- file.path(paste0(baseDir, "/rawData/",RE[r], "_Design_withPlotInfos.txt"))
  infileName <- file.path(paste0(baseDir,"/rawData/",RE[r],"_methylation.filtered"))
  annotationFile <- file.path(paste0(baseDir, "/annotation/",RE[r], "_mergedAnnot.csv"))
  ## Load data
  sampleTab <- f.read.sampleTable(designTable)
  mePerc <- f.load.methylation.bed(infileName, percentages = TRUE) # see commonFunctions.R
  infoColumns <- c("chr", "pos", "context")
  allSamples <- setdiff(colnames(mePerc), infoColumns)
  sampleTab <- sampleTab[allSamples,]

  ## Average within groups 
  #Select on what to average
  aveData <- f.summarize.columns(mePerc, data.frame(sample = rownames(sampleTab), group = sampleTab$Treat, stringsAsFactors = FALSE), function(x) mean(x, na.rm = TRUE))
  aveDataInfo <- mePerc[,infoColumns]
  rownames(aveDataInfo) <- paste0("chr", aveDataInfo$chr, "_", aveDataInfo$pos)
  rownames(aveData) <- rownames(aveDataInfo)
  ## Choose a group order for the plot and set the colors
  forPlotOrder <- c("Control", "Shade")
  aveData <- aveData[, match(forPlotOrder, colnames(aveData))]
  temp <- unique(sampleTab[,c("Treat","color")])
  plotColors <- temp$color; names(plotColors) <- temp$Treat
  ## Draw histograms
  allContexts <- c("CG", "CHG", "CHH")#
  numPlotRows <- 1 #
  numPlotCols <- length(allContexts)
  pdf(file.path(paste0(baseDir,"/plots/", RE[r], "_histo.pdf")))
  par(oma = c(2, 2, 2, 2))
  layout(matrix(1:(numPlotRows*numPlotCols), nrow = numPlotRows, byrow = TRUE))
  for (ctxt in allContexts) {
    subData <- subset(mePerc, mePerc$context==ctxt)
    toPlot<-reshape2::melt(subData, id=infoColumns)
    histo<-hist(toPlot$value, breaks=seq(0,100,10),  plot=FALSE)
    ymax<-max(histo$counts)
    plot(histo, main=ctxt, xlab="", ylim=c(0, ymax*1.5))
  }
  invisible(dev.off())

  ## Draw violin plots
  allContexts <- c("CG", "CHG", "CHH")#, "all") 
  numPlotRows <- 1 
  numPlotCols <- length(allContexts)
  allMeans <- matrix(NA, nrow = length(forPlotOrder), ncol = numPlotCols, dimnames = list(forPlotOrder, allContexts))
  aveData <- aveData[rownames(aveDataInfo),] 
  
  pdf(paste0(baseDir,"/plots/", RE[r], "_Context_methylationLevelsViolinPlot.pdf"), height = 5*numPlotRows, width = 2+length(forPlotOrder)*numPlotCols)
  par(oma = c(12, 8, 3, 0), mar = c(0, 0, 0, 0))
  layout(matrix(1:(numPlotRows*numPlotCols), nrow = numPlotRows, byrow = TRUE))
  for (ctxt in allContexts) {
    if (ctxt == "all") {
      subData <- aveData
    } else {
      subData <- aveData[aveDataInfo$context == ctxt,]
    }
    plot(NA, main = ctxt, bty = "n", xaxs = "r", yaxs = "r", xlab = "", ylab = "", las = 1, cex = 0.2, tck = 0.01, xlim = c(0.5, length(forPlotOrder)+0.5), ylim = c(0, 100), xaxt = "n", yaxt = "n")
    curPos <- 1
    for (curGroup in forPlotOrder) {
    toPlot <- subData[,curGroup]
    toPlot <- toPlot[!is.na(toPlot)]
    curCol <- plotColors[curGroup]
    if (sum(toPlot > 0) > 4) {
      vioplot(toPlot, names = c(curGroup), col = curCol, ylim = c(0,100), drawRect = TRUE, add = TRUE, at = curPos)
      }
    curMean <- mean(toPlot)
    lines(c(curPos-0.3,curPos+0.3), c(curMean, curMean), col = "black", lwd = 4, lty = 1)
    curPos <- curPos + 1
    allMeans[curGroup,ctxt] <- curMean # add the mean to the collection
    }
    if (ctxt != "all") { axis(2, at = seq(0, 100, by = 20), labels = seq(0, 100, by = 20), outer = TRUE, las = 1, line=2, lwd=2, cex.axis=2) }
    axis(1, at = 1:length(forPlotOrder), labels = forPlotOrder, outer = TRUE, las = 2, line=2, lwd=2, cex.axis=3)
  }
  invisible(dev.off())
  write.csv(round(allMeans, 3), file.path(paste0(baseDir,"/tmp/",RE[r], "_Context_methylationLevelsViolinPlot_means.csv")))

  ## Get the average methylation level per group, context, feature
  allFeatures <- c("gene", "transposon", "repeat", "nothing")
  forMask <- paste0("chr", aveDataInfo$chr)
  listForPlot <- list()
  
  for (ctxt in allContexts) {
    if (ctxt == "all") {
      contextMask <- rep(TRUE, nrow(aveDataInfo))
    } else {
      contextMask <- aveDataInfo$context == ctxt
    }
    featureMeans <- matrix(NA, nrow = length(allFeatures), ncol = length(forPlotOrder), dimnames = list(allFeatures, forPlotOrder))
    for (feature in allFeatures) {
      mergedAnno <- f.load.merged.annotation(annotationFile, feature)
      annoMask <- forMask %in% rownames(mergedAnno)
      subData <- aveData[annoMask & contextMask,]
      featureMeans[feature, colnames(aveData)] <- colMeans(subData, na.rm = TRUE)
    }
    listForPlot[[ctxt]] <- featureMeans
  }
  
  ## Do the plot
  dirOut<-paste0(baseDir,"/plots/",RE[r], "_Context_methylationLevelsPerFeature.pdf")
  imageColors <- f.blackblueyellowredpinkNICE(51) 
  pdf(dirOut, height = 5*numPlotRows, width = 2+length(forPlotOrder)*numPlotCols)
  layout(matrix(1:(numPlotRows*numPlotCols), nrow = numPlotRows, byrow = TRUE))
  for (ctxt in allContexts) {
    temp <- listForPlot[[ctxt]]
    f.image.without.text(forPlotOrder, allFeatures, t(temp), xLabel = "", yLabel = "", mainLabel = ctxt, useLog = FALSE, col = imageColors, zlim = c(0, 100))
    write.csv(round(temp, 3), paste0(baseDir,"/tmp/",RE[r],"_methylationLevelsPerFeature_",ctxt, ".csv"))
    }
  invisible(dev.off())
}

===  2022 Sep 17 02:06:01 PM === Removing 0 samples due to the sampleRemovalInfo column 
===  2022 Sep 17 02:06:06 PM === Removing 0 samples due to the sampleRemovalInfo column 
