- This webpage: https://lawlessgenomics.com/inspire2022lawless.github.io
- Manuscript and source code: https://github.com/DylanLawless/inspire2022lawless.github.io
- Pre-print on Overleaf https://www.overleaf.com/project/61718a4e077acc3d20ee68f1
- Final publication doi [citation].
- Pre-print on medarxiv: https://doi.org/10.1101/2022.06.22.22276752.
- Git commit ID and md5sum of publication version:
Figure 1: INSPIRE cohort summary. Natural quasi random RSV infection events were monitored by RT-qPCR, serology, and sequencing.
<iframe scrolling="no" seamless="seamless" src=" https://lawlessgenomics.com/inspire2022lawless.github.io/pages/update_G_pca.html" style="border:none;" height=400 width="100"> </iframe>To add images (B), (C) and (D).
Figure 2: Population structure. (A) Protein coding genes in RSV. (B) Phylogenetic tree based based on multiple sequence alignemnt (MSA) of amino acid G protein sequences. (C) Principal component analysis (PCA) PCs1-3 with labels indicating repeat/persistent infections from different phylogenetic clades. (D) Panel [i] summarises every pairwise genetic distance between every viral sequence. Genetic invariance in repeat/persistent infections separated by at least 15 days compared to other genetic variation within clades (panel [ii]) and within all possible pairs (panel [iii]).
<iframe scrolling="no" seamless="seamless" src=" https://lawlessgenomics.com/inspire2022lawless.github.io/pages/manplot.html" style="border:none;" height=300 width="60%"> </iframe>Figure 3: Genetic association with persistent infection. (A) Amino acid association with persistent infection after multiple testing correction (significant threshold shown by dotted line).
This figure uses the relative MSA positions. I will update positions if we keep interative results. i.e. p.221 in relative position is p.218 in strain B reference sequence (as seen in genetic association plot signal variant).
<iframe scrolling="no" seamless="seamless" src=" https://lawlessgenomics.com/inspire2022lawless.github.io/pages/pca_variance_explaied.html" style="border:none;" height=400 width="100"> </iframe>Figure 3 (B) Variance explained (VE) within cohort. The effect of each variant on cohort structure is shown for PCs1-2. A large % VE for a significantly associated variant would indicate a false positive. (C) Variants in strong correlation were clumped for association testing using proxies for r2 ≥ 0.8. One significant association was identified (shown in A); the r2 values for all other variants show a single highly correlated variant with the lead proxy (red).
<iframe scrolling="no" seamless="seamless" src=" https://lawlessgenomics.com/inspire2022lawless.github.io/pages/gene_illustrate_rsv_Pval.html" style="border:none;" height=1000 width="100"> </iframe>Figure 3 (D) Evidence for biological interpretation for every amino acid position is summarised.
Figure 4: Supplemental: Variant clumping for reduction in association testing. [Left] Correlation between all positions. [Right] Correlation between proxy variants are clumping to remove r2 ≥ 0.8.
Figure 5: Supplemental: Publicly available RSV sequence data for > 30 years. (A) Global sample collection per year. (B) Variant associated with persistent infection tracked in public data. (C) % variance explained per year for all G protein amino acid variants from 1990-2022.
-
R v4.1.0 was used for data preparation and analysis http://www.r-project.org.
-
R package caret was used for analysis: genetic correlations.
-
R package dplyr was used for data curation.
-
R package factoextra was used for analysis: PCA, and to visualise eigenvalues and variance.
-
R package ggplot2 was used for data visualisation.
-
R package MASS was used to analysis: logistic regression model data.
-
R package stats was used for analysis: including glm for logistic regressions.
-
R package stringr was used for data curation.
-
R package tidyr was used for data curation.
-
clc_novo_assemble qiagenbioinformatics.com
-
Clustal Omega https://www.ebi.ac.uk/Tools/msa/clustalo/
-
IQ-Tree https://www.iqtree.org/
-
MAFFT https://mafft.cbrc.jp/alignment/software/ [citation katoh2013mafft]
-
Viral Genome ORF Reader, VIGOR 3.0 https://sourceforge.net/projects/jcvi-vigor/files/
-
RCSB PDB https://www.rcsb.org
-
UniProt https://www.uniprot.org
- Dataset https://www.ncbi.nlm.nih.gov/bioproject/267583.
- Dataset https://www.ncbi.nlm.nih.gov/bioproject/225816.
- J. Craig Venter Institute https://www.jcvi.org.
- GenBank:NC_001989 Bovine orthopneumovirus, complete genome https://www.ncbi.nlm.nih.gov/nuccore/NC_001989.
- Reference data https://www.ncbi.nlm.nih.gov/gene/?term=1489824. G attachment glycoprotein [Human orthopneumovirus]; ID: 1489824; Location: NC_001781.1 (4675..5600); Aliases: HRSVgp07.
- Reference data https://www.ncbi.nlm.nih.gov/gene/?term=37607642. G attachment glycoprotein [Human orthopneumovirus]; ID: 37607642; Location: NC_038235.1 (4673..5595); Aliases: DZD21_gp07.
- Reference data for all public NCBI Virus https://www.ncbi.nlm.nih.gov/labs/virus/vssi/ for species: Human orthopneumovirus; genus: orthopneumovirus; family: Pneumoviridae.
- Reference data https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=Human\%20orthopneumovirus,\%20taxid:11250
- contains sequence data for Virus Lineage ss=Human orthopneumovirus, taxid:11250 nucleotide: 26’965, protein: 53’804, RefSeq Genomes: 2.
- Reference https://www.ncbi.nlm.nih.gov/protein/NP_056862.1
- GCF_002815475.1 (release 2018-08-19) Nucleotide Accessions: NC_038235.1, protein: Y_009518856.1
- Reference https://www.ncbi.nlm.nih.gov/protein/YP_009518856.1
- GCF_000855545.1 (release 2015-02-12) Nucleotide Accessions: NC_001781.1, protein: NP_056862.1 (strain B1).
Using a custom host and ssh key is recommended for maintenance.
## Set up the ssh config file
cd ~/.ssh/config
## Set such that Host and User are custom
# lawlessgenomics repo
Host dylanlawless.github.com
HostName github.com
User DylanLawless
PreferredAuthentications publickey
IdentityFile ~/.ssh/key1_rsa
IdentitiesOnly yes
# Clone using the correct Host as per config.
git clone git@dylanlawless.github.com:DylanLawless/inspire2022lawless.github.io.git
# Set the local user here (instead of global, i.e. /Users/user/.gitconfig)
cd "the clone repo dir"
git config user.email personemail@addess.com
git config user.name DylanLawless
This page uses the body font Switzer-Light, a neo-grotesk Latin-script typeface with 18 styles, designed by Jérémie Hornus. Body text color is #0a2e4a and additional colors are found in _sass/variables.scss The layout is based on the jekyll-theme-cayman. It was developed using a local installation of the Jekyll Ruby gem and published using GitHub Pages. Add the methods section for all data access and software here.