Skip to content

Commit

Permalink
Analysis Mode should now be able to retrieve cDNA sequences for user-…
Browse files Browse the repository at this point in the history
…provided stable transcript IDs, via biomart
  • Loading branch information
astrasb committed Oct 14, 2020
1 parent 6ae964e commit 1169515
Show file tree
Hide file tree
Showing 4 changed files with 164 additions and 136 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The *Strongyloides* Codon Adapter Shiny App adapts and automates that process of

1. **Optimization Mode:** This tab optimizes genetic sequences for expression in *Strongyloides* species. It accepts either nucleotide or amino acid sequences, and will generate an optimized nucleotide sequence with and without the desired number of artificial introns. Users may input sequences using the text box provided, or may upload sequences as .fasta/.gb/.txt files. Optimized sequences with or without artificial introns may be downloaded as .txt files.

2. **Analysis Mode:** This tab reports the endogenous codon optimization for a given gene relative to the codon usage weights of highly expressed *Strongyloides ratti* transcripts (1) or *C. elegans* genes (2). Stable Gene IDs with prefixes "SSTP", "SRAE", "SPAL", "SVE", or "WB" can be provided either through direct input via the provided textbox, or in bulk as a comma separated text file. Users may also provide a *C. elegans* gene name. Finally, users may direcly provide cDNA sequences for analysis, either as a 2-column .csv file listing geneIDs and cDNA sequences, or a .fa file containing named cDNA sequences.
2. **Analysis Mode:** This tab reports the endogenous codon optimization for a given gene relative to the codon usage weights of highly expressed *Strongyloides ratti* transcripts (1) or *C. elegans* genes (2). Stable Gene or Transcript IDs with prefixes "SSTP", "SRAE", "SPAL", "SVE", or "WB" can be provided either through direct input via the provided textbox, or in bulk as a comma separated text file. Users may also provide a *C. elegans* gene name, provided it is prefaced with the string "Ce-", or *C. elegans* stable transcript IDs as is. Finally, users may direcly provide cDNA sequences for analysis, either as a 2-column .csv file listing geneIDs and cDNA sequences, or a .fa file containing named cDNA sequences.

Users may download an excel file containing the codon adaptation index and cDNA sequences for the user-provided genes. The app also generates a scatter plot displaying, for each gene, codon adaptiveness values relative to S. ratti vs C. elegans usage weights. Users may download this plot as a PDF file.

Expand Down
288 changes: 158 additions & 130 deletions Server/analyze_geneID_list.R
Original file line number Diff line number Diff line change
@@ -1,137 +1,165 @@
# This script includes the the primary computation for analyzing a list of geneIDs
# for the Strongyloides Codon Adapter App in Analyze Sequences Mode
# If user has provided a list of geneIDs, pull cDNA sequence from BioMart and analyse
# If user has provided a list of geneIDs/transcripIDs, pull cDNA sequence from BioMart and analyse
# GC content and CAI values for each gene using calls to `calc_sequence_stats.R`.
#

analyze_geneID_list <- function(genelist, vals){
# Get cDNA sequences for given geneIDs from BioMaRT
Sspp.seq <- NULL
Sr.seq <- NULL
Ce.seq <- NULL
withProgress(message = "Accessing BioMaRT",expr = {
setProgress(.05)
if (any(grepl('SSTP|SVE|SPAL|WB', genelist$geneID))) {
Sspp.seq <- getBM(attributes=c('wbps_gene_id', 'cdna'),
# grab the cDNA sequences for the given genes from WormBase Parasite
mart = useMart(biomart="parasite_mart",
dataset = "wbps_gene",
host="https://parasite.wormbase.org",
port = 443),
filters = c('species_id_1010',
'wbps_gene_id'),
values = list(c('strattprjeb125',
'ststerprjeb528',
'stpapiprjeb525',
'stveneprjeb530',
'caelegprjna13758'),
genelist$geneID),
useCache = F) %>%
as_tibble() %>%
#we need to rename the columns retreived from biomart
dplyr::rename(geneID = wbps_gene_id, cDNA = cdna)
Sspp.seq$cDNA <- tolower(Sspp.seq$cDNA)
}
setProgress(.3)
if (any(grepl('SRAE', genelist$geneID))) {
Sr.seq <- getBM(attributes=c('wbps_transcript_id', 'cdna'),
# grab the cDNA sequences for the given genes from WormBase Parasite
mart = useMart(biomart="parasite_mart",
dataset = "wbps_gene",
host="https://parasite.wormbase.org",
port = 443),
filters = c('species_id_1010',
'wbps_transcript_id'),
values = list('strattprjeb125',
genelist$geneID),
useCache = F) %>%
as_tibble() %>%
#we need to rename the columns retreived from biomart
dplyr::rename(geneID = wbps_transcript_id, cDNA = cdna)
Sr.seq$cDNA <- tolower(Sr.seq$cDNA)
}
setProgress(0.5)
if (any(grepl('Ce', genelist$geneID))) {
genelist$geneID <- genelist$geneID %>%
gsub("^Ce-", "",.)
Ce.seq <- getBM(attributes=c('external_gene_id', 'cdna'),
# grab the cDNA sequences for the given genes from WormBase Parasite
mart = useMart(biomart="parasite_mart",
dataset = "wbps_gene",
host="https://parasite.wormbase.org",
port = 443),
filters = c('species_id_1010',
'gene_name'),
values = list('caelegprjna13758',
genelist$geneID),
useCache = F) %>%
as_tibble() %>%
#we need to rename the columns retreived from biomart
dplyr::rename(geneID = external_gene_id, cDNA = cdna)
Ce.seq$cDNA <- tolower(Ce.seq$cDNA)
}

setProgress(0.7)
gene.seq <- dplyr::bind_rows(Sspp.seq,Sr.seq,Ce.seq) %>%
dplyr::left_join(genelist, . , by = "geneID")

## Calculate info each sequence (S. ratti index) ----
temp<- lapply(gene.seq$cDNA, function (x){
if (!is.na(x)) {
s2c(x) %>%
calc_sequence_stats(.,w)}
else {
list(GC = NA, CAI = NA)
}
})

setProgress(0.8)
# Strongyloides CAI values ----
info.gene.seq<- temp %>%
map("GC") %>%
unlist() %>%
as_tibble_col(column_name = 'GC')

info.gene.seq<- temp %>%
map("CAI") %>%
unlist() %>%
as_tibble_col(column_name = 'Sr_CAI') %>%
add_column(info.gene.seq, .)

info.gene.seq <- info.gene.seq %>%
add_column(geneID = gene.seq$geneID, .before = 'GC')


# C. elegans CAI values ----
# Only run this under certain conditions
#
setProgress(0.9)
## Calculate info each sequence (C. elegans index) ----
Ce.temp<- lapply(gene.seq$cDNA, function (x){
if (!is.na(x)) {
s2c(x) %>%
calc_sequence_stats(.,Ce.w)}
else {
list(GC = NA, CAI = NA)
}
})

ce.info.gene.seq<- Ce.temp %>%
map("CAI") %>%
unlist() %>%
as_tibble_col(column_name = 'Ce_CAI')

setProgress(0.95)
## Merge both tibbles
info.gene.seq <- add_column(info.gene.seq,
Ce_CAI = ce.info.gene.seq$Ce_CAI, .after = "Sr_CAI")

vals$geneIDs <- info.gene.seq %>%
left_join(.,gene.seq, by = "geneID") %>%
rename('cDNA sequence' = cDNA)

setProgress(1)
info.gene.seq

})
# Get cDNA sequences for given geneIDs from BioMaRT
Sspp.seq <- NULL
Sr.seq <- NULL
Ce.seq <- NULL
transcript.seq <- NULL

withProgress(message = "Accessing BioMaRT",expr = {
setProgress(.05)
# If any of the items in genelist contain the strings `SSTP`, `SVE`, `SPAL`, or `WB` check if they are geneIDs
if (any(grepl('SSTP|SVE|SPAL|WB', genelist$geneID))) {
Sspp.seq <- getBM(attributes=c('wbps_gene_id', 'cdna'),
# grab the cDNA sequences for the given genes from WormBase Parasite
mart = useMart(biomart="parasite_mart",
dataset = "wbps_gene",
host="https://parasite.wormbase.org",
port = 443),
filters = c('species_id_1010',
'wbps_gene_id'),
values = list(c('strattprjeb125',
'ststerprjeb528',
'stpapiprjeb525',
'stveneprjeb530',
'strattprjeb125',
'caelegprjna13758'),
genelist$geneID),
useCache = F) %>%
as_tibble() %>%
#we need to rename the columns retreived from biomart
dplyr::rename(geneID = wbps_gene_id, cDNA = cdna)
Sspp.seq$cDNA <- tolower(Sspp.seq$cDNA)
}
setProgress(.2)
# If any of the items in genelist contain the string `SRAE` check if they are external geneIDs
if (any(grepl('SRAE', genelist$geneID))) {
Sr.seq <- getBM(attributes=c('external_gene_id', 'cdna'),
# grab the cDNA sequences for the given genes from WormBase Parasite
mart = useMart(biomart="parasite_mart",
dataset = "wbps_gene",
host="https://parasite.wormbase.org",
port = 443),
filters = c('species_id_1010',
'external_gene_id'),
values = list(c('strattprjeb125'),
genelist$geneID),
useCache = F) %>%
as_tibble() %>%
#we need to rename the columns retreived from biomart
dplyr::rename(geneID = external_gene_id, cDNA = cdna)
Sr.seq$cDNA <- tolower(Sr.seq$cDNA)
}
setProgress(0.4)
# Check all items in geneList to see if they are transcript ids
transcript.seq <- getBM(attributes=c('wbps_transcript_id', 'cdna'),
# grab the cDNA sequences for the given genes from WormBase Parasite
mart = useMart(biomart="parasite_mart",
dataset = "wbps_gene",
host="https://parasite.wormbase.org",
port = 443),
filters = c('species_id_1010',
'wbps_transcript_id'),
values = list(c('strattprjeb125',
'ststerprjeb528',
'stpapiprjeb525',
'stveneprjeb530',
'caelegprjna13758'),
genelist$geneID),
useCache = F) %>%
as_tibble() %>%
#we need to rename the columns retreived from biomart
dplyr::rename(geneID = wbps_transcript_id, cDNA = cdna)
transcript.seq$cDNA <- tolower(transcript.seq$cDNA)

setProgress(0.6)
# If any of the items in genelist contain the string `Ce-`, remove that string and search as gene names
if (any(grepl('Ce', genelist$geneID))) {
genelist$geneID <- genelist$geneID %>%
gsub("^Ce-", "",.)
Ce.seq <- getBM(attributes=c('external_gene_id', 'cdna'),
# grab the cDNA sequences for the given genes from WormBase Parasite
mart = useMart(biomart="parasite_mart",
dataset = "wbps_gene",
host="https://parasite.wormbase.org",
port = 443),
filters = c('species_id_1010',
'gene_name'),
values = list('caelegprjna13758',
genelist$geneID),
useCache = F) %>%
as_tibble() %>%
#we need to rename the columns retreived from biomart
dplyr::rename(geneID = external_gene_id, cDNA = cdna)
Ce.seq$cDNA <- tolower(Ce.seq$cDNA)
}

setProgress(0.7)
gene.seq <- dplyr::bind_rows(Sspp.seq,Sr.seq,transcript.seq,Ce.seq) %>%
dplyr::left_join(genelist, . , by = "geneID")

## Calculate info each sequence (S. ratti index) ----
temp<- lapply(gene.seq$cDNA, function (x){
if (!is.na(x)) {
s2c(x) %>%
calc_sequence_stats(.,w)}
else {
list(GC = NA, CAI = NA)
}
})

setProgress(0.8)
# Strongyloides CAI values ----
info.gene.seq<- temp %>%
map("GC") %>%
unlist() %>%
as_tibble_col(column_name = 'GC')

info.gene.seq<- temp %>%
map("CAI") %>%
unlist() %>%
as_tibble_col(column_name = 'Sr_CAI') %>%
add_column(info.gene.seq, .)

info.gene.seq <- info.gene.seq %>%
add_column(geneID = gene.seq$geneID, .before = 'GC')


# C. elegans CAI values ----
# Only run this under certain conditions
#
setProgress(0.9)
## Calculate info each sequence (C. elegans index) ----
Ce.temp<- lapply(gene.seq$cDNA, function (x){
if (!is.na(x)) {
s2c(x) %>%
calc_sequence_stats(.,Ce.w)}
else {
list(GC = NA, CAI = NA)
}
})

ce.info.gene.seq<- Ce.temp %>%
map("CAI") %>%
unlist() %>%
as_tibble_col(column_name = 'Ce_CAI')

setProgress(0.95)
## Merge both tibbles
info.gene.seq <- add_column(info.gene.seq,
Ce_CAI = ce.info.gene.seq$Ce_CAI, .after = "Sr_CAI")

vals$geneIDs <- info.gene.seq %>%
left_join(.,gene.seq, by = "geneID") %>%
rename('cDNA sequence' = cDNA)

setProgress(1)
info.gene.seq

})
}
2 changes: 1 addition & 1 deletion UI/README/README_Features.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Optimized sequences with or without artificial introns may be downloaded as .txt
### Analyze Sequences Mode
This tab reports the endogenous codon optimization for a given gene relative to the codon usage weights of highly expressed *Strongyloides ratti* transcripts [(Mitreva *et al* 2006)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779591/) or highly expressed *C. elegans* genes [(Sharp and Bradnam, 1997)](https://www.ncbi.nlm.nih.gov/books/NBK20194/).

Stable Gene IDs with prefixes "SSTP", "SRAE", "SPAL", "SVE", or "WB" can be provided either through direct input via the provided textbox, or in bulk as a comma separated text file. Users may also provide a *C. elegans* gene name. Finally, users may direcly provide cDNA sequences for analysis, either as a 2-column .csv file listing geneIDs and cDNA sequences, or a .fa file containing named cDNA sequences.
Stable Gene or Transcript IDs with prefixes "SSTP", "SRAE", "SPAL", "SVE", or "WB" can be provided either through direct input via the provided textbox, or in bulk as a comma separated text file. Users may also provide a *C. elegans* gene name, provided it is prefaced with the string "Ce-", or *C. elegans* stable transcript IDs as is. Finally, users may direcly provide cDNA sequences for analysis, either as a 2-column .csv file listing geneIDs and cDNA sequences, or a .fa file containing named cDNA sequences.

Users may download an excel file containing the codon adaptation index and cDNA sequences for the user-provided genes. The app also generates a scatter plot displaying, for each gene, codon adaptiveness values relative to S. ratti vs C. elegans usage weights. Users may download this plot as a PDF file.

8 changes: 4 additions & 4 deletions app.R
Original file line number Diff line number Diff line change
Expand Up @@ -277,13 +277,13 @@ server <- function(input, output, session) {
# Primary reactive element in the Analysis Mode
analyze_sequence <- eventReactive(input$goAnalyze, {
validate(
need({isTruthy(input$idtext) | isTruthy(input$loadfile)}, "Please input geneIDs or sequences for analysis")
need({isTruthy(input$idtext) | isTruthy(input$loadfile)}, "Please input stable gene/transcript IDs or sequences for analysis")
)

isolate({
if (isTruthy(input$idtext)){
# If user provides input using the textbox,
# assume they are provided a list of geneIDs
# assume they are provided a list of gene/transcript IDs
genelist <- input$idtext %>%
gsub(" ", "", ., fixed = TRUE) %>%
str_split(pattern = ",") %>%
Expand Down Expand Up @@ -325,9 +325,9 @@ server <- function(input, output, session) {
strip.white = T)) %>%
as_tibble()

# Remove input rows where the geneID includes the word "gene" - we are assuming
# Remove input rows where the geneID includes the word "gene" or "transcript" - we are assuming
# that such rows will be header rows.
genelist <- dplyr::filter(genelist, !grepl('gene', V1))
genelist <- dplyr::filter(genelist, !grepl('gene|trascript', V1))

# Assume that an input with two columns and more than one
# row is a list of geneID/cDNA pairs
Expand Down

0 comments on commit 1169515

Please sign in to comment.