### Anvio - Contig profiling

Contig profiling creates a database of your contigs. It calculates k-mer frequencies for your sample (standard k-setting is 4, which you can change with the --kmer-size parameter (DON'T unless you have a good reason)), soft splits long contigs, and identifies open reading frames (which can be skipped using --skip-gene-calling). Run the following code to generate your database:

Subsequently, you can add various elements of analysis to your contig profile. The following list is available:

Augustus + Prodigal gene calls: adds open reading frames to your dataset from the genes from the Augustus database (eukaryotes) and Prodigal (bacteria + archaea) WORKING ON THIS: NOT SURE IF IT WORKS

Hidden Markov Model (HMM): A widely used prediction model in bioinformatics software, which can offer great advantages in homology detection.

NCBI's Cluster of Orthologous genes (NCBI COG): this allows you to annotate your database with gene functions from NCBI COG. Current version: 2020

KoFAM Metabolism calls: Uses the KEGG database to call metabolic genes and estimate paths of your community. Currently used KEGG version is KEGG_build_2020-12-23.

Kaiju Taxonomy calls:

Each of these adds a new layer of information to your dataset, so might be very interesting to explore.

In [None]:
conda activate anvio-8

#install taxonomy 
diamond --version
conda install diamond=0.9.14
anvi-setup-scg-taxonomy

#download ncbi-cogs 
anvi-setup-ncbi-cogs --num-threads 11

#metabolism database
anvi-setup-kegg-data

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=50G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 20:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o //project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/bash_scripts/slurm-%j.out  # %j = job ID

module load miniconda/22.11.1-1
conda activate anvio-8

#Contig database from assembled genomes. stores information related to your sequences: positions of open reading frames, k-mer frequencies for each contig, functional and taxonomic annotation of genes, etc.
#set parameters:
SAMPLENAME=mcav
CONTIGPATH=/project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/mcav_assembly_redo
CONTIGFILE="$SAMPLENAME".contigs-fixed.fa

#generate the contigs database:
#default k-mer frequency is 4
anvi-gen-contigs-database -f $CONTIGPATH/$CONTIGFILE --project-name $SAMPLENAME -o ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db  

#integrate HMMs into the database:
anvi-run-hmms -c ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db --num-threads 6

#this runs NCBI COGs against your contigs.db, integrating gene functions.
anvi-run-ncbi-cogs -c ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db -T 4

#ADD KEGG-KOFAM
anvi-run-kegg-kofams -c ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db \
                     -T 4 #these are the threads that Anvi'O is allowed to use
#ADD CONTIG STATS
anvi-display-contigs-stats ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db --report-as-text --as-markdown -o working/mcav_assembly_redo/anvio_stats.txt

#generates contig database from merged, fixed contig fasta file created in previous step..need for downstream analysis
#JOB ID: 15910210
#bash script: mcav_db.txt

#### Results using contigs-fixed-1000bp
contigs_db	mcav_contigs_fixed
Total Length	57097694
Num Contigs	39414
Num Contigs > 100 kb	0
Num Contigs > 50 kb	0
Num Contigs > 20 kb	2
Num Contigs > 10 kb	35
Num Contigs > 5 kb	370
Num Contigs > 2.5 kb	2156
Longest Contig	24857
Shortest Contig	1000
Num Genes (prodigal)	58029
L50	14211
L75	25911
L90	33821
N50	1339
N75	1127
N90	1044
Archaea_76	9
Bacteria_71	8
Protista_83	5
Ribosomal_RNA_12S	0
Ribosomal_RNA_16S	0
Ribosomal_RNA_18S	2
Ribosomal_RNA_23S	0
Ribosomal_RNA_28S	2
Ribosomal_RNA_5S	0
bacteria (Bacteria_71)	0
eukarya (Protista_83)	0
archaea (Archaea_76)	0

In [None]:
#below: create sample profiles
module load miniconda/22.11.1-1
conda activate anvio-8

#set parameters
SAMPLENAME=mcav
#mkdir=//project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/$SAMPLENAME_profiles
samplepath=//project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/bash_scripts
OUTDIR=//project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/mcav_assembly_redo/profiles
BAMPATH=/project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/mcav_assembly_redo

for f in T1_12_2022 T1_13_2022 T1_16_2019 T1_20_2019 T1_24_2019 T1_40_2022 T1_57_2022 T1_70_2022 T2_10_2022 T2_16_2019 T3_13_2022 T3_14_2019 T3_15_2019 T3_19_2022 T3_1_2019 T3_40_2022 T3_48_2022 T3_49_2022 T3_51_2022 T3_60_2022 T3_8_2019 T3_9_2019
do
anvi-profile -c ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db  \
            -i $BAMPATH/"$f".bam \
            --min-percent-identity 95 \
            --sample-name "$f"
            --output-dir $OUTDIR
#use contig database and sample bam files to create single profiles. can specify to keep contigs of min length (min-contig-length) and 95% identity to database
#keep parameters consistent in order to merge to larger profile 
#already filtered out min 1000bp contigs in 'anvi-script-reformat-fasta', so not adding specification.
done

#seperated individual sample profiling and merging to different bash scripts just for troubleshooting
#bash script: profiles.txt
#JOB ID: 13298522

**Do not use below code...still troubleshooting**

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=50G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 6:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o //project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/bash_scripts/slurm-%j.out  # %j = job ID

module load miniconda/22.11.1-1
conda activate anvio-8
SAMPLENAME=mcav
OUTDIR=//project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/index
DIR=//project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/results/MetaBAT_mcav_bins


anvi-merge -c ./working/"$SAMPLENAME".contigs.db \
            $OUTDIR/*/PROFILE.db \
            -o $OUTDIR/"$SAMPLENAME"_profile_merged
#merge single sample profiles to one profile

FILES=$(find $DIR/*.fa)
for f in $FILES; do
 NAME=$(basename $f .fa)
 grep ">" $f | sed 's/>//' | sed -e "s/$/\t$NAME/" | sed 's/\./_/' >> metabins4anvio.txt
done
#convert metabin results to anvio format
#metabin produces fasta files containing contigs of each bin 
#collection artifact requires a txt file that contains list of contigs with their associated bins (2 columns) 

anvi-import-collection metabins4anvio.txt \
                       -p $OUTDIR/"$SAMPLENAME"_profile_merged/PROFILE.db \
                       -c ./working/$SAMPLENAME.contigs.db \
                       --contigs-mode \
                       -C "$SAMPLENAME"_collection
#import binning results of metabat from 3Binning step as a collection artifact in anvio
##contigs-mode specificies that input txt file describes contigs names not split names
  

#seperated individual sample profiling and merging to different bash scripts just for troubleshooting
#bash script: binning
#job ID: 

**Troubleshooting code below**
- Trying to figure out clustering in anvio itself
- Skip for now

In [None]:
anvi-cluster-contigs -p $OUTDIR/"$SAMPLENAME"_contigs_merged \
                    -c ./working/$SAMPLENAME.contigs.db \
                    --log-file //project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/bash_scripts/cluster_log \
                    -C "$SAMPLENAME"_collection \
                    -T 3 \
                    --driver 'metabat2'
#binning: clusters contigs. can use various drivers: 'concoct, metabat2, maxbin2, dastool, or binsanity.'

#error output that the module of clustering isnt fully developed and is better to just use own binning software. use flag '--just-do-it' if you want to try it out

#### Results from importing collection:
- low coverage...make metabin parameters less stringent??

In [None]:
anvi-migrate --migrate-safely ./working/"$SAMPLENAME".contigs.db
#migrate anvio artifact after updating anvio to version 8 - only do once if updated

#### Taxonomy 
https://merenlab.org/2019/10/08/anvio-scg-taxonomy/
- single-copy core genes (SCGs) and the taxonomy of the genomes - as defined by the GTDB - from which these genes are coming from

- very bad results...don't think we should use this method?

In [None]:
mkdir ./results/taxonomy
SAMPLENAME=mcav 

In [None]:
anvi-estimate-scg-taxonomy -c ./working/$SAMPLENAME.contigs.db -p working/index/"$SAMPLENAME"_profile_merged/PROFILE.db -T 3 --metagenome-mode --compute-scg-coverages -o ./results/taxonomy/"$SAMPLENAME"-scg-taxonomy.tsv
#taxa matching from assembled metagenome - will just match ASVs to SCG taxa and give percent identity 

In [None]:
anvi-run-scg-taxonomy -c working/"$SAMPLENAME".contigs.db

In [None]:
anvi-run-scg-taxonomy -c working/"$SAMPLENAME".contigs.db --min-percent-identity 60

In [None]:
anvi-estimate-scg-taxonomy -c ./working/"$SAMPLENAME".contigs.db \
                           -p ./working/index/mcav_profile_merged/PROFILE.db \
                           --num-threads 3 \
                           --metagenome-mode \
                           --compute-scg-coverages \
                           --output-file ./results/taxonomy/"$SAMPLENAME"-taxa-abundance.tsv
#difference here is that it will calculate relative abundances of the matched SCG taxa across all samples as well as percent identity (like an OTU table)

In [None]:
ssh -L 8080:localhost:8080 unity
#coonect to unity this way from local terminal if you want to use the interactive browser below

In [None]:
anvi-interactive -p $OUTDIR/"$SAMPLENAME"_profile_merged/PROFILE.db -c ./working/$SAMPLENAME.contigs.db -C "$SAMPLENAME"_collection
#need collection info since we skipped clustering in the merging step 
#also very bad results - don't think it's helpful

## Redoing taxonomy with contigs filtered to 50bp instead of 1000

In [None]:
conda activate anvio-8

#Contig database from assembled genomes. stores information related to your sequences: positions of open reading frames, k-mer frequencies for each contig, functional and taxonomic annotation of genes, etc.
#set parameters:
SAMPLENAME=mcav
CONTIGPATH=/project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/mcav_assembly_redo
CONTIGFILE="$SAMPLENAME".contigs-fixed.fa

#generate the contigs database:
#default k-mer frequency is 4
anvi-gen-contigs-database -f $CONTIGPATH/$CONTIGFILE --project-name $SAMPLENAME -o ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db  

#integrate HMMs into the database:
anvi-run-hmms -c ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db --num-threads 6

#this runs NCBI COGs against your contigs.db, integrating gene functions.
anvi-run-ncbi-cogs -c ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db -T 4

#ADD KEGG-KOFAM
anvi-run-kegg-kofams -c ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db \
                     -T 4 #these are the threads that Anvi'O is allowed to use
#ADD CONTIG STATS
anvi-display-contigs-stats ./working/mcav_assembly_redo/$SAMPLENAME.contigs.db --report-as-text --as-markdown -o./working/mcav_assembly_redo/anvio_stats.txt

#generates contig database from merged, fixed contig fasta file created in previous step..need for downstream analysis


### Results

In [None]:
# db creation:
Input FASTA file .............................: /project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/mcav_assembly_redo/mcav.contigs-fixed.fa
Name .........................................: mcav
Description ..................................: No description is given
Num threads for gene calling .................: 1                                                                         

Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmpu8mzs_bk/contigs.genes
Amino acid sequences .........................: /tmp/tmpu8mzs_bk/contigs.amino_acid_sequences
Log file .....................................: /tmp/tmpu8mzs_bk/00_log.txt

CITATION
===============================================
Anvi'o will use 'prodigal' by Hyatt et al (doi:10.1186/1471-2105-11-119) to
identify open reading frames in your data. When you publish your findings,
please do not forget to properly credit their work.

Result .......................................: Prodigal (v2.6.3) has identified 707308 genes.                            

                                                                                                                          
CONTIGS DB CREATE REPORT
===============================================
Split Length .................................: 20,000
K-mer size ...................................: 4
Skip gene calling? ...........................: False
External gene calls provided? ................: False
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: True
Contigs with at least one gene call ..........: 617514 of 839850 (73.5%)                                                  
Contigs database .............................: A new database, ./working/mcav_assembly_redo/mcav.contigs.db, has been
                                                created.
Number of contigs ............................: 839,850
Number of splits .............................: 839,850
Total number of nucleotides ..................: 435,829,081
Gene calling step skipped ....................: False
Splits broke genes (non-mindful mode) ........: False
Desired split length (what the user wanted) ..: 20,000
Average split length (what anvi'o gave back) .: (Anvi'o did not create any splits)

✓ anvi-gen-contigs-database took 1:11:13.341861

In [None]:
# run hmms

ontigs DB ...................................: ./working/mcav_assembly_redo/mcav.contigs.db                              
HMM sources ..................................: Archaea_76, Bacteria_71, Protista_83, Ribosomal_RNA_12S,
                                                Ribosomal_RNA_16S, Ribosomal_RNA_18S, Ribosomal_RNA_23S,
                                                Ribosomal_RNA_28S, Ribosomal_RNA_5S
Alphabet/context target found ................: AA:GENE
Alphabet/context target found ................: RNA:CONTIG                                                                
Target sequences determined ..................: 707,308 sequences for AA:GENE; 839,850 sequences for RNA:CONTIG           

HMM Profiling for Archaea_76
===============================================
Reference ....................................: Lee, https://doi.org/10.1093/bioinformatics/btz188
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: archaea
HMM model path ...............................: /tmp/tmp86o8o987/Archaea_76.hmm
Number of genes in HMM model .................: 76
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: hmmscan
Temporary work dir ...........................: /tmp/tmp1t46_s5r
Log file for thread 0 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.5_log
                                                                                                                          
Done with Archaea_76 🎊

Number of raw hits in table file .............: 71                                                                        
Number of weak hits removed by HMMER parser ..: 0
Number of hits in annotation dict  ...........: 71

HMM Profiling for Bacteria_71
===============================================
Reference ....................................: Lee modified, https://doi.org/10.1093/bioinformatics/btz188
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /tmp/tmp86o8o987/Bacteria_71.hmm
Number of genes in HMM model .................: 71
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: hmmscan
Temporary work dir ...........................: /tmp/tmp1t46_s5r
Log file for thread 0 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.5_log
                                                                                                                          
Done with Bacteria_71 🎊

Number of raw hits in table file .............: 62                                                                        
Number of weak hits removed by HMMER parser ..: 0
Number of hits in annotation dict  ...........: 62
                                                                                                                          
HMM Profiling for Protista_83
===============================================
Reference ....................................: Delmont, http://merenlab.org/delmont-euk-scgs
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: eukarya
HMM model path ...............................: /tmp/tmp86o8o987/Protista_83.hmm
Number of genes in HMM model .................: 83
Noise cutoff term(s) .........................: -E 1e-25
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: hmmscan
Temporary work dir ...........................: /tmp/tmp1t46_s5r
Log file for thread 0 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmp1t46_s5r/AA_gene_sequences.fa.5_log
                                                                                                                          
Done with Protista_83 🎊

Number of raw hits in table file .............: 14                                                                        
Number of weak hits removed by HMMER parser ..: 0
Number of hits in annotation dict  ...........: 14
                                                                                                                          
HMM Profiling for Ribosomal_RNA_12S
===============================================
Reference ....................................: Seeman T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNA_12S
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N/A
HMM model path ...............................: /tmp/tmp86o8o987/Ribosomal_RNA_12S.hmm
Number of genes in HMM model .................: 1
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: nhmmscan
Temporary work dir ...........................: /tmp/tmpabg4juzo
Log file for thread 0 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.5_log
                                                                                                                          
Done with Ribosomal_RNA_12S 🎊

Number of raw hits in table file .............: 0                                                                         

* The HMM source 'Ribosomal_RNA_12S' returned 0 hits. SAD (but it's OK).
                                                                                                                          
HMM Profiling for Ribosomal_RNA_16S
===============================================
Reference ....................................: Seeman T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNA_16S
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N/A
HMM model path ...............................: /tmp/tmp86o8o987/Ribosomal_RNA_16S.hmm
Number of genes in HMM model .................: 3
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: nhmmscan
Temporary work dir ...........................: /tmp/tmpabg4juzo
Log file for thread 0 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.5_log
                                                                                                                          
Done with Ribosomal_RNA_16S 🎊

Number of raw hits in table file .............: 0                                                                         

* The HMM source 'Ribosomal_RNA_16S' returned 0 hits. SAD (but it's OK).
                                                                                                                          
HMM Profiling for Ribosomal_RNA_18S
===============================================
Reference ....................................: Seeman T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNA_18S
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N/A
HMM model path ...............................: /tmp/tmp86o8o987/Ribosomal_RNA_18S.hmm
Number of genes in HMM model .................: 1
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: nhmmscan
Temporary work dir ...........................: /tmp/tmpabg4juzo
Log file for thread 0 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.5_log
                                                                                                                          
Done with Ribosomal_RNA_18S 🎊

Number of raw hits in table file .............: 2                                                                         
Number of weak hits removed by HMMER parser ..: 0
Number of hits in annotation dict  ...........: 2
Gene calls added to db .......................: 2 (from source "Ribosomal_RNA_18S")                                       
                                                                                                                          
HMM Profiling for Ribosomal_RNA_23S
===============================================
Reference ....................................: Seeman T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNA_23S
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N/A
HMM model path ...............................: /tmp/tmp86o8o987/Ribosomal_RNA_23S.hmm
Number of genes in HMM model .................: 2
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: nhmmscan
Temporary work dir ...........................: /tmp/tmpabg4juzo
Log file for thread 0 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.5_log
                                                                                                                          
Done with Ribosomal_RNA_23S 🎊

Number of raw hits in table file .............: 0                                                                         

* The HMM source 'Ribosomal_RNA_23S' returned 0 hits. SAD (but it's OK).
                                                                                                                          
HMM Profiling for Ribosomal_RNA_28S
===============================================
Reference ....................................: Seeman T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNA_28S
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N/A
HMM model path ...............................: /tmp/tmp86o8o987/Ribosomal_RNA_28S.hmm
Number of genes in HMM model .................: 1
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: nhmmscan
Temporary work dir ...........................: /tmp/tmpabg4juzo
Log file for thread 0 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.5_log
                                                                                                                          
Done with Ribosomal_RNA_28S 🎊

Number of raw hits in table file .............: 2                                                                         
Number of weak hits removed by HMMER parser ..: 0
Number of hits in annotation dict  ...........: 2
Gene calls added to db .......................: 2 (from source "Ribosomal_RNA_28S")                                       
                                                                                                                          
HMM Profiling for Ribosomal_RNA_5S
===============================================
Reference ....................................: Seeman T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNA_5S
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N/A
HMM model path ...............................: /tmp/tmp86o8o987/Ribosomal_RNA_5S.hmm
Number of genes in HMM model .................: 5
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 6
HMMer program used for search ................: nhmmscan
Temporary work dir ...........................: /tmp/tmpabg4juzo
Log file for thread 0 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.0_log
Log file for thread 1 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.1_log
Log file for thread 2 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.2_log
Log file for thread 3 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.3_log
Log file for thread 4 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.4_log
Log file for thread 5 ........................: /tmp/tmpabg4juzo/RNA_contig_sequences.fa.5_log
                                                                                                                          
Done with Ribosomal_RNA_5S 🎊

Number of raw hits in table file .............: 0                                                                         

* The HMM source 'Ribosomal_RNA_5S' returned 0 hits. SAD (but it's OK).

In [None]:
#  NCBI COGS 

COG version ..................................: COG20
COG data source ..............................: The anvi'o default.
COG base directory ...........................: /home/brooke_sienkiewicz_student_uml_edu/.conda/envs/anvio-8/lib/python3.10/site-packages/anvio/data/misc/COG
COG data directory ...........................: /home/brooke_sienkiewicz_student_uml_edu/.conda/envs/anvio-8/lib/python3.10/site-packages/anvio/data/misc/COG/COG20
Searching with ...............................: diamond
Directory to store temporary files ...........: /tmp/tmphiw_yu82
Directory will be removed after the run ......: True
                                                                                                                          
DIAMOND BLASTP
===============================================
Additional params for blastp .................: 
Search results ...............................: /tmp/tmphiw_yu82/diamond-search-results.txt                               

DIAMOND VIEW
===============================================
Diamond  tabular output file .................: /tmp/tmphiw_yu82/diamond-search-results.txt                               
COG version ..................................: COG20
COG data source ..............................: The anvi'o default.
COG base directory ...........................: /home/brooke_sienkiewicz_student_uml_edu/.conda/envs/anvio-8/lib/python3.10/site-packages/anvio/data/misc/COG
Gene functions ...............................: 10,860 function calls from 3 sources (COG20_PATHWAY, COG20_FUNCTION,      
                                                COG20_CATEGORY) for 4,796 unique gene calls have been added to the contigs
                                                database.

WARNING
===============================================
Well. Your COGs were successfully added to the database, but there were some
garbage anvi'o brushed off under the rug. There were 2 genes in your database
that hit 2 protein IDs in NCBIs COGs database, but since NCBI did not release
what COGs they correspond to in the database they made available (that helps us
to resolve protein IDs to COG ids), we could not annotate those genes with
functions. Anvi'o apologizes on behalf of all computer scientists for half-done
stuff we often force biologists to deal with. If you want to do some Googling,
these were the offending protein IDs: 'ADB95264.1, WP_148215877.1'.


✓ anvi-run-ncbi-cogs took 0:08:29.765873

In [None]:
# KEGG KOFOAM

