Skip to content

baymlab/2023_QuinonesOlvera-Owen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2023_QuinonesOlvera-Owen

About

Code corresponing to the paper:

Diverse and abundant viruses exploit conjugative plasmids (2023)

[bioRxiv]

Natalia Quinones-Olvera*, Siân V. Owen*, Lucy M. McCully, Maximillian G. Marin, Eleanor A. Rand, Alice C. Fan, Oluremi J. Martins Dosumu, Kay Paul, Cleotilde E. Sanchez Castaño, Rachel Petherbridge, Jillian S. Paull, Michael Baym

Table of Contents


Genomes


Figure 1

1a, 1c: Phage DisCo

Figure 2

2a: Tree

  • Code
    • Jupyter notebook: Whole genome alignment and tree building
      • Key commands:
      # whole genome alignment
      clustalo -i <alphatv.fasta> -o <alphatv.msa.fasta> --outfmt=fa
      
      # tree building
      iqtree -st DNA -m MFP -bb 1000 -alrt 1000 -s <alphatv.msa.trim.fasta>
  • Data

2b: Map

2c: Nucleotide diversity

  • Code
    • Snakemake: Pipeline producing the alignments and nucleotide diversity calculation.
      • Key commands:
      # align each assembly to reference
      minimap2 -ax asm20 -B2 -O6,26 --end-bonus 100 --cs <NC_001421.fasta> <assembly> > <output.sam>
      
      # calculate nucleotide diversity
      vcftools --vcf <merged_vcf> --window-pi 100 --window-pi-step 1 --out <NucDiv.100bp.slideby1.windowed.pi>
    • Jupyer notebook: Plot, heatmap, and genome map.
  • Data

Figure 3

3b,c: Host-range heatmap

  • Code

    • Jupyter notebook: Processing growth curves, calculating area under the curve and liquid assay score, producing heatmap.
    • Jupyter notebook: Plotting sample curves and heatmap.
    • Custom functions imported in notebooks: EOL_tools.py
      • Area under the curve calculation (line): $auc = \sum_{i=1}^{120}\frac{OD_{i+1} + OD_{i}}{2}$
      • Liquid assay score calculation (line): $las = \frac{(auc_{no\ phage} - auc_{phage})}{auc_{no\ phage}} \times 100$
  • Data

Figure 4

4a: Abundance

4b: Genome maps of uncultivated tectiviruses

4c: Tree of the DNA packaging ATPase of tectiviruses

  • Code
    • Jupyer notebook: Build ATPase tree
      • Key commands:
        # align ATPase sequences from all tectiviruses with ATPase hmm model
        hmmalign --trim <IX.2.hmm> <P9.faa> | esl-reformat --gapsym='-' afa - > <P9.afa>
        
        # build tree 
        phyml -d aa -m LG -b -4 -v 0.0 -c 4 -a e -f e --no_memory_check -i <P9.phy>
  • Data

4d: Alphatectivirus metagenomic reads in wastewater datasets

  • Code

    • Jupyter notebook: Build kraken database with viral database + tectiviruses from this study.
    • Snakemake pipeline: To run kraken on metagenomic datasets.
      • Key commands:
        kraken2 --paired --report <kraken_report> --db <custom_db> <fastq_1> <fastq_2> > <kraken_results>
    • Jupyter notebook: Extract kraken results and produce plot.
  • Data

4e: Mapped alphatectivirus metagenomic reads

Figure 5

5a, b, c: Trees

5e: FtMidnight genome map


How to replicate these figures

Notebooks

Everything in the notebooks should be able to run after installing this conda environment.

conda env create -f envs/pdep.yml

I tried including all the raw files in this repository, with the exception of large files such as sequencing runs, which can be accessed through the SRA (see specific section of accessions). Likewise, some intermediate files might be absent, but everything should be obtainable by running the code in the notebooks.

Snakemake pipelines

The snakemake piplelines should be able to run also from the same conda environment. Additional dependencies of each pipeline are included in the envs/ directory, next to the corresponding Snakefile, and are dealt with by snakemake. I've included a run_snakemake.sh and a run_snakemake.loc.sh file for each, which show how they can be executed for running it in a computer cluster or locally (respectively).

Questions?

If you have trouble finding or running anything shown here, please do get in contact. You can submit an issue or send me an email: nquinones@g.harvard.edu

About

Code corresponding to the paper: Diverse and abundant viruses exploit conjugative plasmids (2023)

Resources

Stars

Watchers

Forks

Languages