<div style="background-color:rgb(255, 250, 240); padding:10px 0;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

# Binder Tutorial QC Analysis

### <font color='red'>To begin: Click the top cell and press 'Run' on the toolbar (or shift-enter). Alternatively click Kernel, Restart and Run All.</font> 


## Table of Contents:
1. [Import data](#1.) <br>
2. [Visualisation](#2.)<br>
   2.1. [Histagram of RSD](#2.1.)<br>
   2.2. [Density plot of RSD vs. D-Ratio](#2.2.)<br>
   2.3. [PCA score plot of QC vs. Sample](#2.3.)<br>
   2.4. [Scatter plot of Molecular Weights vs. RT Time (sized by RSD)](#2.4.)<br>

<a id="1."></a>
## 1.  Import Data

1. Import the readxl package (https://readxl.tidyverse.org/).<br>
2. Import the excel sheet "Data" from excel file "data.xlsx" into a data frame called "data".<br>
3. Display the number of rows and column.<br>
4. Display the fist 10 rows at the top (head) of the data frame.<br>

</div>

In [None]:
library(readxl) # import readxl

data <- read_excel('data.xlsx', sheet='Data') # import data table
head(data, 10) # view data table (top 10 rows)
cat("Data Table:", nrow(data), "rows &", ncol(data), "columns") # print

peak <- read_excel('data.xlsx', sheet='Peak') # import peak table
head(peak, 10) # view peak table (top 10 rows)
cat("Peak Table:", nrow(peak), "rows &", ncol(peak), "columns") # print

<div style="background-color:rgb(255, 250, 240); padding:10px 0;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

<a id="2."></a>
## 2. Visualisation

#### <font color='red'>Note: Each cell in the Visualisation Section can be run in any order (provided data is imported in Section 1).</font> 
<br>

<a id="2.1."></a>
### 2.1. Histagram of RSD
<br>
</div>

In [None]:
library(ggplot2)
options(repr.plot.width=4, repr.plot.height=3)

qplot(peak$RSD,
      geom="histogram",
      binwidth=0.5,
      fill=I("lightgreen"),
      col=I("black"),
      xlab = "RSD")

<div style="background-color:rgb(255, 250, 240); padding:10px 0;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

<a id="2.2."></a>
### 2.2. Density plot of RSD vs. D-Ratio

<br>

</div>

In [None]:
library(ggplot2)
options(repr.plot.width=5, repr.plot.height=4)

# Area + contour
ggplot(peak, aes(x=RSD, y=D_Ratio)) +
    stat_density_2d(aes(fill=..level..), geom="polygon", colour="white") +
    xlim(0,20) +
    ylim(-0.05, 0.5)

# Pearsons correlation
corr_stats <- cor.test(peak$RSD, peak$D_Ratio, method=c("pearson"))
estimate = format(corr_stats$estimate, digits=2) # rounding
p_val = format(corr_stats$p.value, digits=3) # rounding
cat("pearsonr=", estimate, "; p=", p_val) # print

<div style="background-color:rgb(255, 250, 240); padding:10px 0;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

<a id="2.3."></a>
### 2.3. PCA score plot of QC vs. Sample

<br>
</div>

In [None]:
library(dplyr, warn.conflicts=FALSE) # Supress 'the following object is masked' warning
library(ggplot2)

# Extract X matrix
names = peak$Name
X = select(data, names)

# Fit PCA
pca = prcomp(X, scale=TRUE)

# Get scores
scores = pca$x
scores_df <- as.data.frame(scores)
scores_df$SampleType = data$SampleType # add SampleType to scores table

# Plot scores
options(repr.plot.width=5, repr.plot.height=4)
ggplot(scores_df, aes(x=PC1, y=PC2, color=SampleType)) + 
    geom_point(alpha=0.7, size=2) + 
    xlab('PC1') +
    ylab('PC2') + 
    ggtitle('Quality Control PCA plot') 

<div style="background-color:rgb(255, 250, 240); padding:10px 0;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

<a id="2.4."></a>
### 2.4. Scatter plot of Molecular Weights vs. RT Time (sized by RSD)

<br>

</div>

In [None]:
library(ggplot2)
options(repr.plot.width=7, repr.plot.height=5)

ggplot(peak, aes(x=Mol_Weight, y=RT_minutes)) + 
    geom_point(color='red', alpha=0.2, size=peak$RSD**2/120) + 
    xlab('Molecular Weight') +
    ylab('RT minutes') + 
    ggtitle('Metabolites Detected (sized by RSD)')