In [1]:
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.3.0     ✔ purrr   0.3.3
✔ tibble  3.0.0     ✔ dplyr   0.8.5
✔ tidyr   1.0.2     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


## Read in GO Terms

This is a filtered dataframe from the full InterproScan output. It was generated by looking for the phrase "GO:" in the full output. Column names come from the [Interproscan documentation](https://interproscan-docs.readthedocs.io/en/latest/OutputFormats.html#tab-separated-values-format-tsv). The filtering on gene_name just removes the ".t01" which is present in the annotation file, but is missing from the VST-transformed counted loaded below. 

In [2]:
old_path <- "~/Documents/MtVernon/2017/GigaScience/analysis_post_review/"

In [3]:
go_terms <- read.csv(paste(old_path, 'vitvi.vcostv3.clean.pep.tsv', sep='/'), sep='\t', header=F)
colnames(go_terms) <- c('gene_name', 'seq_MD5', 'seq_len', 'analysis', 'signature_accession',
                        'signature_description', 'start_chord', 'end_chord', 'score', 'status', 
                        'date', 'InterPro_annotation_accesion', 'InterPro_annotation_description', 'GO', 'pathway_anotation')

go_terms$gene_name <- go_terms$gene_name %>% str_remove(pattern='.t01')

It looks like there are some low-score hits in the df which will need filtered away. There is no filtering in this cell, just seeing how many have e-value > 1e-05. 

In [4]:
length(unique(go_terms$gene_name))
go_terms[go_terms$score > 1e-5,] %>% 
  .[['gene_name']] %>% 
  unique() %>% 
  length()

## Read in v1 to v3 naming map
Will be used for the circadian genes. This file is just built by creating a TSV with columns for the current annotation name and older names still marked last column of the annotion. 

In [5]:
name_map <- read.csv(paste(old_path, 'v3_mapNamesToAlias.tsv', sep='/'), sep='\t', header=F)
colnames(name_map) <- c('v3', 'olderNames')
head(name_map)

v3,olderNames
Vitvi01g00001,"VIT_01s0011g00010,LOC104879287"
Vitvi01g00002,"VIT_01s0011g00030,LOC100259472"
Vitvi01g00003,
Vitvi01g00004,
Vitvi01g01833,VIT_01s0011g00040
Vitvi01g00005,"VIT_01s0011g00050,LOC100257674"


## Read in metadata filtered counts

In [56]:
treatments <- read.csv('1719_treatments.csv')
head(treatments)

X,sampleName,Year,Tissue,Phenology,indexer,Rootstock,Irrigation,Row,Block
A1Y1_001_L,A1Y1_001_L,2017,Leaf,Anthesis,2017_Anthesis_Leaf_8_A_2,1103P,,8,A
A1Y1_002_L,A1Y1_002_L,2017,Leaf,Anthesis,2017_Anthesis_Leaf_8_A_3,1103P,,8,A
A1Y1_003_L,A1Y1_003_L,2017,Leaf,Anthesis,2017_Anthesis_Leaf_8_B_2,3309C,,8,A
A1Y1_004_L,A1Y1_004_L,2017,Leaf,Anthesis,2017_Anthesis_Leaf_8_B_2,3309C,,8,A
A1Y1_005_L,A1Y1_005_L,2017,Leaf,Anthesis,2017_Anthesis_Leaf_8_C_2,SO4,,8,A
A1Y1_006_L,A1Y1_006_L,2017,Leaf,Anthesis,2017_Anthesis_Leaf_8_C_3,SO4,,8,A


In [11]:
load('1719_VSD_counts_varFilt.Rdata')
vsd_counts <- as.data.frame(vsd_counts_varFilt)
dim(vsd_counts)

#### Constitutively Expressed GO Terms
Some common genes are often used as markers of constitutive expression. Here we use two classes of genes: actin-family and ubiquitin-domain to test general stability. [This paper](https://plantmethods.biomedcentral.com/articles/10.1186/s13007-018-0311-x) suggests these genes could be dynamic wrt experimental manipulation, however, they should be generally stable within a timepoint. 

Here I look for those in our data set and plot them. 

In [12]:
x <- go_terms[go_terms$InterPro_annotation_accesion %in% c("IPR000626","IPR004000"),]
x <- x[x$score < 1e-05,]
dim(x)

genes_to_search <- x %>% select(gene_name, InterPro_annotation_accesion) %>% group_by(gene_name) %>% unique()
dim(genes_to_search)

In [39]:
genes_to_search_str <- genes_to_search$gene_name[genes_to_search$gene_name %in% colnames(vsd_counts_varFilt)]

In [88]:
d <- vsd_counts_varFilt %>%
  select(genes_to_search_str)

d$sampleName <- rownames(d)
d <- d %>% gather('gene_name', 'expr', -sampleName)
d <- merge(d, genes_to_search, by='gene_name')
d$Gene_Class <- factor(d$InterPro_annotation_accesion, levels=c('IPR004000', 'IPR000626'),
                      labels=c('Actin', 'Ubiquitin'))

d <- merge(d, treatments, by.x='sampleName')
d$Tissue <- factor(d$Tissue, labels=c('L', 'R'))
d$Year <- factor(d$Year, labels=c('17', '18', '19'))
d$Phenology <- factor(d$Phenology, levels=c('Anthesis', 'Veraison', 'Harvest'), labels=c('A', 'V', 'H'))
d$Rootstock <- factor(d$Rootstock, levels=c('Ungrafted', '1103P', '3309C', 'SO4'), labels=c('U', '1', '3', 'S'))

# p <- ggplot(d, aes(x=sampleName, y=expr, group=gene_name, color=Gene_Class)) + 
#   geom_line() + 
#   facet_wrap('gene_name') + 
#   ylab("VST-Transformed Expression") +
#   theme_bw() + 
#   theme(axis.text.x=element_blank(), axis.title.x=element_blank(), axis.ticks.x = element_blank())
# p

p1 <- ggplot(d, aes(x=Tissue, y=expr, color=Gene_Class)) +
  geom_boxplot() + 
  facet_wrap('gene_name') + 
  ylab("VST-Transformed Expression") +
  theme_bw()

p2 <- ggplot(d, aes(x=as.factor(Year), y=expr, color=Gene_Class)) +
  geom_boxplot() + 
  facet_wrap('gene_name') + 
  ylab("VST-Transformed Expression") +
  theme_bw()

p3 <- ggplot(d, aes(x=Phenology, y=expr, color=Gene_Class)) +
  geom_boxplot() + 
  facet_wrap('gene_name') + 
  ylab("VST-Transformed Expression") +
  theme_bw()

p4 <- ggplot(d, aes(x=Rootstock, y=expr, color=Gene_Class)) +
  geom_boxplot() + 
  facet_wrap('gene_name') + 
  ylab("VST-Transformed Expression") +
  theme_bw()

p <- ggpubr::ggarrange(p1, p2, p3, p4, nrow=2, ncol=2, common.legend=T)

In [89]:
pdf('1719_geneExpression_ActinUbiquitin.pdf', height=12, width=14)
  p
dev.off()

# p