This repository contains the Snakemake workflow used for the bioinformatic analyses for the paper Hososhima et al (2022) "Proton-transporting heliorhodopsins from marine giant viruses" eLife.
All of the dependencies are taken care of with conda, so it is recommended to run snakemake
with --use-conda
.
This repository is organized as follows:
analysis
-- contains the intermediate filesannotations
-- manually curated datadatabases
-- includes Pfam databases and algal protein sequences. To run the workflow from scratch, the soft linkdatabases/Pfam
should point to the Pfam database folder and soft links indatabases/algae
should point the corresponding fasta files.output
-- final output filesproteins
-- curated sequence data for algal and viral heliorhodopsinsviruses
-- GenBank files with the viral genomesworkflow
-- workflow files, including:envs
-- conda environment filesSnakefile
-- the snakemake filescripts
-- folder with scripts
The output files are as follows:
cat_phylogeny.pdf
-- concatenation phylogeny of the virusesEhVHeRs.tsv
-- distribution of heliorhodopsin genes among EhVsHeR_tree.jtree
-- phylogenetic tree of viral and algal heliorhodopsins in.jtree
formatHeR_tree.pdf
-- image version of the same treeminiset_chronos.pdf
-- small HeR tree with alignment of critical positionsorthogroups.tsv
-- EhV orthogroups