Virus-Host Codon Usage Co-Adaptation Analysis
vhcub is an R package to analyze the co-adaptation of codon usage between a virus and its host.
vhcub was developed using R and available on CRAN:
The following measures are implemented in the package
- ENC, effective number of codons (Novembre, 2002),
- SCUO, synonymous codon usage orderliness (Wan et al., 2004),
- Codon Adaptation Index, CAI (Sharp and Li, 1987),
- Relative Codon Deoptimization Index, RCDI (Puigbò et al, 2010),
- Relative Synonymous Codon Usage, RSCU (Sharp and Li, 1987),
- Also, it provides a statistical dinucleotide over- and underrepresentation with three different models.
Using vhcub to study the CUB of a virus, its host and the co-adaptation between them is straightforward.
coding sequences for both Escherichia virus T4 and its host Escherichia coli were downloaded in fasta format from the NCBI database.
# read virus and host fasta files fasta <- fasta.read("EscherichiavirusT4.fasta","Escherichiacoli.fasta") fasta.virus <- fasta[] fasta.host <- fasta[] # Calculate the GC overall all content as well as GC at first, second and third codon positions for the virus gc.df <- GC.content(fasta.virus) # Calculate zscore using syncodon model for statistical dinucleotide over- and underrepresentation syncodon <- dinuc.syncodon(fasta.virus,permutations=100) # Calculate zscore using base model for statistical dinucleotide over- and underrepresentation base <- dinuc.base(fasta.virus,permutations=100) # Calculate zscore using codon model for statistical dinucleotide over- and underrepresentation codon <- dinuc.codon(fasta.virus,permutations=100) # Calculate ENc values for the virus and its host enc.df.virus <- ENc.values(fasta.virus) enc.df.host <- ENc.values(fasta.host) # Calculate SCUO values for the virus scuo.df <- SCUO.values(fasta.virus) # Calculate CAI values for the virus using the host sequences as a reference genes set cai.df <- CAI.values(fasta.virus, enc.df.host, fasta.host, genetic.code="11") # Calculate RSCU values for the virus and its host rscu.virus <- RSCU.values(fasta.virus) rscu.host <- RSCU.values(fasta.host) # Calculate SiD value for the virus SiD <- SiD.value(rscu.host,rscu.virus) # Calculate RCDI values for the virus rcdi.df <- RCDI.values(fasta.virus,fasta.host, enc.df.host)
Furthermore, vhcub uses ggplot to visualize two important plots named ENc-GC3 plot and PR2plot, which help to explain what are the factors influence a virus's evolution concerning its CUB.
Contributions to the package are welcome
For bugs and suggestions, the most effective way is by raising an issue on the github issue tracker. Github allows you to classify your issues so that we know if it is a bug report, feature request or feedback to the authors.