Just a set of scripts organized as a toolkit for conducting sub-sample analyses based on the Augur tool, focused on Brazilian states.
Requirements: python 3.9+
git clone https://github.com/dezordiPhD/gist.git
git add gist
conda env create -f env/gist_ubuntu.yml
conda activate gist
pip install .
## check installation
gist --help
## clone ncov nextstrain repository
git clone https://github.com/nextstrain/ncov.git
Mac users should install ncbi+blast
This mode get subsampling data based on specific lineages on specific brazilian states and other countries. The input json file should be configure as the template present on templates/get_by_states.json
gist get-states --ncov_dir ncov --sequences <gisaid_genomes.tar.xz> --metadata <gisaid_metadata.tar.xz> --threads <number_of_threads> templates/get_by_states.json
Get gisaid genomes based on blast analysis. The input json file should be configured as the template present on templates/get_similar_genomes.json
gist get-genomes --input <query_genomes.fasta> --sequences <gisaid_genomes.fasta> --metadata <gisaid_metadata.tsv> templates/get_similar_genomes.json
Perform a mafft add --keeplength
alignment and - if passed mask_pos - mask alignment positions
gist get-algn --input sequences.fa --reference reference.fa --threads 8 --mask_pos templates/mask_pos.tsv