Skip to content

Commit

Permalink
Updating App About tab
Browse files Browse the repository at this point in the history
  • Loading branch information
astrasb committed Oct 13, 2020
1 parent aa1f011 commit 2025ea2
Show file tree
Hide file tree
Showing 14 changed files with 245 additions and 349 deletions.
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ To run a specific release locally use the following commands in R/RStudio:
Please note: the download step for runURL/runGitHub may take a substantial amount of time. We recommend downloading this archive and running the application locally.

## App Features
The *Strongyloides* Codon Adapter Shiny App adapts and automates that process of codon adaptation for *Strongyloides* species, and enables users to query codon adaptiveness of select genes of interest. The app has two two modes:
The *Strongyloides* Codon Adapter Shiny App adapts and automates that process of codon adaptation for *Strongyloides* species, and enables users to query codon adaptiveness of select genes of interest. The app has two modes:

1. **Optimization Mode:** This tab optimizes genetic sequences for expression in *Strongyloides* species. It accepts either nucleotide or amino acid sequences, and will generate an optimized nucleotide sequence with and without the desired number of artificial introns. Users may input sequences using the text box provided, or may upload sequences as .fasta/.gb/.txt files. Optimized sequences with or without artificial introns may be downloaded as .txt files.

Expand Down Expand Up @@ -78,8 +78,6 @@ Introns are placed between the 3rd and 4th nucleotide of one of the following se
### User Interface for *Strongyloides* Codon Adapter App in Analyze Sequences Mode
![An example of the User Interface for the Strongyloides Codon Adapter Shiny App in Analyze Sequences Mode](/Static/Str_Codon_Adapter_AnalysisMode.png)



## Sources
* [Shiny](https://shiny.rstudio.com/) - UI framework
* [WormbaseParasite](https://parasite.wormbase.org/index.html) - GeneIDs and cDNA sequences
Expand Down
Binary file removed Static/AnalysisModeResultExample.png
Binary file not shown.
Binary file removed Static/OptimizationModeResultExample.png
Binary file not shown.
Binary file removed Static/Str_Codon_Adapter_AboutTab.png
Binary file not shown.
62 changes: 0 additions & 62 deletions Static/codon_usage_chart_all.csv

This file was deleted.

16 changes: 16 additions & 0 deletions UI/README/README_Analysis_Methods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
### CAI (Codon Adaptation Index)
The primary non-responsive data input to the *Strongyloides* Codon Adapter App is a .csv file containing codon usage rules for highly expressed *S. ratti* transcripts and *C. elegans* genes (`codon_usage_chart.csv`, located in the `Static` subfolder). This multi-species codon usage chart is loaded by the Shiny server function and used to create relative adaptiveness lookup tables.

For each sequence provided using responsive Shiny inputs, individual codons are scored by calculating their relative adaptivness: (the frequency that codon "i" encodes amino acid "AA") / (the frequency of the codon most often used for encoding amino acid "AA"). Genes are scored by calculating their Codon Adaptation Index: the geometric average of relative adaptiveness of all codons in the gene sequence (3,4). The CAI is calculated via the `seqinr` library. Codon bias in nematode transcripts can vary as a function of gene expression levels such that highly expressed genes appear to have the greatest degree of codon bias. Therefore, optimization rules used to generate sequences codon optimized for expression in *Strongyloides* species are based on the codon usage weights of highly expressed *S. ratti* transcripts (1).

### GC Content
The fraction of G+C bases of the nucleic acid sequences. Calculated using the `seqinr` library.

### Inserting Introns
Including synthetic introns into cDNA sequences can signficiant increase gene expression. Intron mediated enchancement of gene expression can be due to a variety of mechanisms, including by increasing the rate of transcription. Intron mediated enhancement occurs in *C. elegans* (5). Although intron mediated enhancement has to be specifically studied in *Strongyloides spp.* there is evidence that the prescence of introns does not prevent gene expression (e.g. intron-inclusive eGFP)(6). Here, the desired number of introns are inserted within the DNA sequence, up to a maximum of 3 unique introns. Intron sequences and order are taken from the Fire Lab Vector Kit (1995) (7).

#### Intron Number and Spacing
The Fire lab established three unique introns, spaced equidistantly within a gene as canon (7); this configuration is thus set as default, and is recommended. In *C. elegans*, the location of the intron site influences the degree of intron mediated enhancement, such that a single 5′-intron is more effective than a single 3′-intron. Therefore when only 1 or 2 introns are desired, 3 possible intron insertion sites are identified, and filled as needed, starting from the 5′ site.

#### Identifying Intron Insertion Sites
Introns are placed between the 3rd and 4th nucleotide of one of the following sequences: "aagg", "aaga", "cagg", "caga", as in Redemann *et al* (2011) (8). If those sequences are not present, introns are placed between the 2nd and 3rd nucleotide of one of the following minimal *C. elegans* splice site consensus sequences was used: "aga", "agg" (9).
9 changes: 9 additions & 0 deletions UI/README/README_Features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

The *Strongyloides* Codon Adapter Shiny App adapts and automates that process of codon adaptation for *Strongyloides* species, and enables users to query codon adaptiveness of select genes of interest. The app has two modes:

1. **Optimization Mode:** This tab optimizes genetic sequences for expression in *Strongyloides* species. It accepts either nucleotide or amino acid sequences, and will generate an optimized nucleotide sequence with and without the desired number of artificial introns. Users may input sequences using the text box provided, or may upload sequences as .fasta/.gb/.txt files. Optimized sequences with or without artificial introns may be downloaded as .txt files.

2. **Analysis Mode:** This tab reports the endogenous codon optimization for a given gene relative to the codon usage weights of highly expressed *Strongyloides ratti* transcripts [(Mitreva *et al* 2006)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779591/) or highly expressed *C. elegans* genes [(Sharp and Bradnam, 1997)](https://www.ncbi.nlm.nih.gov/books/NBK20194/). Stable Gene IDs with prefixes "SSTP", "SRAE", "SPAL", "SVE", or "WB" can be provided either through direct input via the provided textbox, or in bulk as a comma separated text file. Users may also provide a *C. elegans* gene name. Finally, users may direcly provide cDNA sequences for analysis, either as a 2-column .csv file listing geneIDs and cDNA sequences, or a .fa file containing named cDNA sequences.

Users may download an excel file containing the codon adaptation index and cDNA sequences for the user-provided genes. The app also generates a scatter plot displaying, for each gene, codon adaptiveness values relative to S. ratti vs C. elegans usage weights. Users may download this plot as a PDF file.

3 changes: 3 additions & 0 deletions UI/README/README_Methods_CAI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The primary non-responsive data input to the *Strongyloides* Codon Adapter App is a .csv file containing codon usage rules for highly expressed *S. ratti* transcripts and *C. elegans* genes (`codon_usage_chart.csv`, located in the `Static` subfolder). This multi-species codon usage chart is loaded by the Shiny server function and used to create relative adaptiveness lookup tables.

For each sequence provided using responsive Shiny inputs, individual codons are scored by calculating their relative adaptivness: (the frequency that codon "i" encodes amino acid "AA") / (the frequency of the codon most often used for encoding amino acid "AA"). Genes are scored by calculating their Codon Adaptation Index: the geometric average of relative adaptiveness of all codons in the gene sequence ( [Sharp and Li, 1987](https://pubmed.ncbi.nlm.nih.gov/3547335/), [Jansen *et al* 2003](http://www.ncbi.nlm.nih.gov/pubmed/12682375)). The CAI is calculated via the `seqinr` library. Codon bias in nematode transcripts can vary as a function of gene expression levels such that highly expressed genes appear to have the greatest degree of codon bias. Therefore, optimization rules used to generate sequences codon optimized for expression in *Strongyloides* species are based on the codon usage weights of highly expressed *S. ratti* transcripts [(Mitreva *et al* 2006)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779591/).
3 changes: 3 additions & 0 deletions UI/README/README_Methods_GC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The fraction of G+C bases of the user-provided nucleic acid sequences. Calculated using the `seqinr` library.

In general, sequences optimized for expression in *Strongyloides* species will have lower GC content than sequences optimized for expression in *C. elegans*; *Strongyloides* genomes are notably more AT rich than *C. elegans* genomes. Furthermore, a previous study reported that *C. elegans* shows overall lower codon bias compared to *Strongyloides* species, and that this difference is greatly affected by GC content, which more AT rich species like *Strongyloides spp.* displaying greater codon usage biases [(Mitreva *et al* 2006)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779591/). Here, we report GC ratios of submitted and optimized sequences; users should observe an overall decrease in GC ratio after optimization.
7 changes: 7 additions & 0 deletions UI/README/README_Methods_Introns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Including artificial/synthetic introns into cDNA sequences can signficiant increase gene expression. Intron mediated enchancement of gene expression can be due to a variety of mechanisms, including by increasing the rate of transcription. Intron mediated enhancement occurs in *C. elegans* [(Crane *et al* 2019)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6591249/). Although intron mediated enhancement has to be specifically studied in *Strongyloides spp.* there is evidence that the prescence of introns does not prevent gene expression (e.g. intron-inclusive eGFP; [Junio *et al* 2008](https://pubmed.ncbi.nlm.nih.gov/17945217/) ).

In Optimize Sequences mode, users may input a desired number of introns, up to a maximum of three unique introns. The Fire lab established three unique introns, spaced equidistantly within a gene as canon [(Fire Lab Vector Kit 1995)](https://media.addgene.org/cms/files/Vec95.pdf); this configuration is thus set as default, and is recommended.

Intron sequences for insertion into the optimized sequence are the three canonical artificial intron sequences established by the Fire lab. This app divides the optimized cDNA sequence at 3 hypothetical intron insertion sites spaced approximately equidistantly. Insertion sites are *C. elegans* exon splice sites: preferably the stringent consensus sequences (‘AAG^G’, ‘AAG^A’, ‘CAG^G’, ‘CAG^A’) but minimal consensus sequences are used if more stringent sites are not present or if there are fewer than 3 possible insertion sites (‘AG^G’, ‘AG^A’) ( [Redemann *et al* 2011](https://pubmed.ncbi.nlm.nih.gov/21278743/), [*Cis-* Splicing in Worms *in* *C. elegans* II, 1997](https://www.ncbi.nlm.nih.gov/books/NBK20075/). For all insertion sites, '^' symbol indicates the exact insertion site.

Once hypothetical intron insertion sites have been identified, the application inserts the user-specified number of introns, using the 5’ insertion site first and continuing in the 3’ direction. In *C. elegans*, the location of the intron site influences the degree of intron mediated enhancement, such that a single 5′-intron is more effective than a single 3′-intron. Therefore when only 1 or 2 introns are desired, 3 possible intron insertion sites are identified, and filled as needed, starting from the 5′ site.
Loading

0 comments on commit 2025ea2

Please sign in to comment.