## Anvi'O 7.1
Anvi'O is a powerful piece of software in metagenomics and takes an odd place in the workflow of this project. It is both a binning software and a visualization strategy. I will clearly write down what goes where, but be aware that of all the software presented here, Anvi'O is quite likely the most powerful but has the steepest learning curve. It expects a great deal of foreknowledge, so be highly aware of that. If you want to read up on the software, they publish a lot of its functionality on their website:

 https://anvio.org/

For now, know that the Anvi'O workflow consists of three separate pieces:

* Contig profiling: this creates a database of your contigs and does part of the binning for you
* Single profiling: this creates a database of your reads matched with your contigs
* Processing: this is where Anvi'O becomes powerful. You can collect all the data from previous workflows and start defining your findings in Anvi'O (taxonomy, representation, etc.)

This last step is clearly more oriented towards results interpretation, while the first two are more geared towards data generation. The workflow file that is on this GitHub should show you clearly how to work this data!

### Contig profiling
Contig profiling creates a database of your contigs *per sample*. It calculates k-mer frequencies for your sample (standard k-setting is 4, which you can change with the --kmer-size parameter (DON'T unless you have a good reason)), soft splits long contigs, and identifies open reading frames (which can be skipped using --skip-gene-calling). Run the following code to generate your database:


In [6]:
anvi-gen-contigs-database -f ../../02_assembly/contigs/"$f".contigs.fa -o ../data/working/"$f".contigs.db -n "$f"

SyntaxError: invalid syntax (<ipython-input-6-52b95cff961f>, line 1)

##### HMMs

In [1]:
anvi-run-hmms -c ../data/working/<samplename>.contigs.db

SyntaxError: invalid syntax (<ipython-input-1-79d93b07c097>, line 1)

##### Contig-stats
After creating your HMMs, you can now look at your contig stats. This is the first part of Anvi'O that is interactive, so you now have to be careful. If you are running this workflow on a computing cluster, you are unlikely to be able to run the interactive mode (most computing clusters aren't set out to be visual). If you want the full Anvi'O experience, you can download the database on your local device, run Anvi'O on it as normal, and get the interactive output. For a full explanation of results, consult this page: 

https://anvio.org/help/main/programs/anvi-display-contigs-stats/

If you choose to remain in the cluster environment, you can get the stats as .txt or .md output and see what's in there. 

In [5]:
anvi-display-contigs-stats "$f".contigs.db --report-as-text --as-markdown -o ../data/results/"$f"_contigstats.md
    
# Alternatively, you could download your contigs database onto your local computer and run it as follows:
anvi-display-contigs-stats "$f".contigs.db

SyntaxError: invalid syntax (<ipython-input-5-dbd2ad518711>, line 1)

##### NCBI C.O.G.s
This allows you to match your contig database against the NCBI Cluster of Orthologous Genes database. To do so, you first need to set up this database. You will want to be careful where you set this up, as this is important for actually using it. Once its set up, you probably will not need to do it again. Just make sure it stays up to date with how COG looks every once in a while (current version is COG20). 

In [7]:
anvi-setup-ncbi-cogs --cog-data-dir /scratch/genomics/stegmannt/metagenomes/first_data-CC-revisited/04_binning/data/DATABASE/
#you'll only have to do this once, after this you should be able to copy this database
#sometimes gives an error based on internet speeds: https://github.com/merenlab/anvio/issues/1738 

anvi-run-ncbi-cogs -c ../data/working/"$f".contigs.db  --cog-data-dir /scratch/genomics/stegmannt/metagenomes/first_data-CC/06_anvio/data/DATABASE/
#this annotates your contigs with functions from the NCBI COG database that you generated. 

SyntaxError: invalid decimal literal (<ipython-input-7-3430689d7771>, line 1)

### Single profiling
Other than the contig file, the profile that is about to be generated contains information about your contigs, based on the results of your mapping step. Each database links to a contig database. Its important to make sure that all the profiles that you are generating are generated using the same parameters, since you're quite likely to *merge* them later. For more information on profiling:

https://anvio.org/help/main/programs/anvi-profile/


In [1]:
for f in <sample1> <sample2> <sample3> <sample4>
anvi-profile -i ../../03_mapping/data/results/"$f".bam -c ../data/working/contigs.db \
--min-contig-length 1000 \
--output-dir ../data/working/"$f"_singeprofile \
--sample-name "$f"
#these are all mostly optional flags, but you'll want to incorporate them, as they standardize annotation and contents
# cluster-contigs is only needed when creating a single profile, which you'll probably not do

SyntaxError: invalid syntax (<ipython-input-1-140c1f352b25>, line 1)

#### Merging of profiles
To work any further with Anvi'O, you need to merge your profiles into a single profile. There is more to this step, but that is all very neatly explained here:

https://anvio.org/help/main/programs/anvi-merge/

In [2]:
#this piece will merge your single profiles into an overarching one. At this point, make sure to take extreme care that your samples carry some similarity!
anvi-merge ..data/working/Coral*-singleprofile/PROFILE.db \
-o ../data/working/samples-merged \
-c ../data/working/co-assembly1_contigs.db \
-S M_cavernosa \
--enforce-hierarchical-clustering #only run this if you're sure, its not really necessary

SyntaxError: invalid syntax (<ipython-input-2-c0ff3c4f5243>, line 2)

Anvi'O no longer bins results on its own. You can ask it to, but it will use the same algorithms as above. So therefore, you might just want to do it manually, as you can oversee your results a little better. You need to provide your binning results as a tab-delimited text file, where each contig name is assigned a bin (they can also be left out). The easiest way to get that file is to create a .tsv file from your DAS_tool results and convert those into a .txt file. 

In [None]:
#after that, you can import your binning results into your profile as such:
anvi-import-collection ../data/working/binning_results.txt -p ..data/working/PROFILE_<name>.db -c contigs.db --contig-mode -C <collection name> 

##### Taxonomy estimates
Assessing the taxonomy of your bins can be very helpful in the long run, when you do manual binning combined with your automated binning. There are two ways of calling taxonomy on your samples: using Anvi'O SCG taxonomy and using Kaiju. I have done both, since they take a slightly different approach. Additionally, while SCG taxonomy is great, Anvi'O only uses bacterial and archaea genomes to do taxonomy calling, leaving a host of important organisms out. Kaiju can help you later in assessing the quality of your bins and seeing what belongs where. Use the following links to get a little streetwise in this step:

Kaiju: https://github.com/bioinformatics-centre/kaiju

Anvi'O taxonomy: https://merenlab.org/2019/10/08/anvio-scg-taxonomy/

Combining Anvi'O with Kaiju: https://merenlab.org/2016/06/18/importing-taxonomy/

In [None]:
#this runs the Anvi'O SCG script:
anvi-run-scg-taxonomy -c ../data/working/contigs.db
#and this allows you to integrate said information with your profiles and your bins
anvi-estimate-scg-taxonomy -c ../data/working/contigs.db \
                           -p ../data/working/<name>/PROFILE.db \
                           -C <Collection name>

#### Kaiju
Kaiju allows you to add gene level calls to your collection. This is not inherently reliable unfortunately, but it might help you in later steps. Kaiju does not work like most programs that you run (its not installable through conda for example), so you might need to do some reading on their GitHUB. Most important is that you know where the scripts of Kaiju are saved. I personally saved them in the overarching 'metagenomes' directory so all my projects could use the same Kaiju database. 

In [None]:
#and this is the Kaiju code (currently interactive):
mkdir KAIJU-DB
cd KAIJU-DB
kaiju-makedb -s mar #this creates a Kaiju database of the Marine Metagenomics Project
#Kaiju only needs nodes.dmp kaiju_db_mar.fmi and names.dmp


anvi-get-sequences-for-gene-calls -c ../data/working/co-assembly1.contigs.db -o ../data/working/<name>_gene_calls.fa
#this extracts the contigs without any bells and whistles, so Kaiju doesn't get confused

kaiju -t ../../../KAIJU-DB/nodes.dmp \
      -f ../../../KAIJU-DB/kaiju_db_mar.fmi \
      -i ../data/working/gene_calls.fa \
      -o ../data/working/gene_calls_mar.out \
      -z $NSLOTS \
      -v

#the following script is also from Kaiju AND MUST BE EXECUTED EXACTLY AS THIS, NO CHANGE TO OPTIONS
addTaxonNames -t /path/to/nodes.dmp \
              -n /path/to/names.dmp \
              -i ../data/working/gene_calls_mar.out \
              -o ../data/working/gene_calls_mar.names \
              -r superkingdom,phylum,order,class,family,genus,species

#and now we bring it back to anvi'o for processing
anvi-import-taxonomy-for-genes -i ../data/working/gene_calls_mar.names \
                               -c ../data/working/co-assembly1.contigs.db \
                               -p kaiju \
                               --just-do-it #if you don't add this, you'll get a protective error

#at this point, Anvi'O will give you a quick peek behind the scenes: if your phylum names are wrong, use ctrl+C to kill the whole thing before it lays eggs


### Data interpretation


One of the more powerful features of Anvi'O is the interactive interface. For this to work, you'll probably need to download your contigs database and your merged profile to run this. Alternatively, you can run Anvi'O off the computing cluster you have been working on, which is super useful if you just want to take a quick peek!

https://merenlab.org/2015/11/28/visualizing-from-a-server/


In [None]:
  anvi-interactive -p profile-db \
                 -c contigs-db \
                 -C collection #run this if you specifically want to run your bins in the interface

Anvi'O will automatically try and open your browser at this point (not if you are running it from the server). If your browser doesn't pop up, try entering this into ~Chrome~ your browser:

http://localhost:8080

Which should show you the results! You can kill the session at any moment by entering CTRL+C in the command line. This tutorial shows you some of the power of the interactive interface:
https://merenlab.org/tutorials/interactive-interface/

In [None]:
#this line should allow you to add metadata to your samples:
anvi-import-misc-data ../data/working/metadata.txt \ #see below!
                         --target-data-table layers \
                         --pan-or-profile-db ../data/working/<samplename>.profile.db

Using this piece of code, you can insert metadata in your Anvi'O graph. Super useful, but take into account that you need to adhere to some principles: 

* This is a tab-deliminated text file containing information about the samples you're displaying
* The first column should match the name of the samples for each row
* The following columns can contain all sorts of information

https://anvio.org/help/main/programs/anvi-import-misc-data/