## Anvi'O 7.1
Anvi'O is a powerful piece of software in metagenomics and takes an odd place in the workflow of this project. It is both a binning software and a visualization strategy. I will clearly write down what goes where, but be aware that of all the software presented here, Anvi'O is quite likely the most powerful but has the steepest learning curve. It expects a great deal of foreknowledge, so be highly aware of that. If you want to read up on the software, they publish a lot of its functionality on their website:

 https://anvio.org/

For now, know that the Anvi'O workflow consists of three separate pieces:

* Contig profiling: this creates a database of your contigs and does part of the binning for you
* Single profiling: this creates a database of your reads matched with your contigs
* Processing: this is where Anvi'O becomes powerful. You can collect all the data from previous workflows and start defining your findings in Anvi'O (taxonomy, representation, etc.)

This last step is clearly more oriented towards results interpretation, while the first two are more geared towards data generation. The workflow file that is on this GitHub should show you clearly how to work this data!

### Contig profiling
Contig profiling creates a database of your contigs *per sample*. It calculates k-mer frequencies for your sample (standard k-setting is 4, which you can change with the --kmer-size parameter (DON'T unless you have a good reason)), soft splits long contigs, and identifies open reading frames (which can be skipped using --skip-gene-calling). Run the following code to generate your database:


In [6]:
anvi-gen-contigs-database -f ../../02_assembly/contigs/"$f".contigs.fa -o ../data/working/"$f".contigs.db -n "$f"

SyntaxError: invalid syntax (<ipython-input-6-52b95cff961f>, line 1)

##### HMMs

In [4]:
anvi-run-hmms -c ../data/working/"$f".contigs.db

SyntaxError: invalid syntax (<ipython-input-4-987f921eaeaf>, line 1)

##### Contig-stats
After creating your HMMs, you can now look at your contig stats. This is the first part of Anvi'O that is interactive, so you now have to be careful. If you are running this workflow on a computing cluster, you are unlikely to be able to run the interactive mode (most computing clusters aren't set out to be visual). If you want the full Anvi'O experience, you can download the database on your local device, run Anvi'O on it as normal, and get the interactive output. For a full explanation of results, consult this page: 

https://anvio.org/help/main/programs/anvi-display-contigs-stats/

If you choose to remain in the cluster environment, you can get the stats as .txt or .md output and see what's in there. 

In [5]:
anvi-display-contigs-stats "$f".contigs.db --report-as-text --as-markdown -o ../data/results/"$f"_contigstats.md
    
# Alternatively, you could download your contigs database onto your local computer and run it as follows:
anvi-display-contigs-stats "$f".contigs.db

SyntaxError: invalid syntax (<ipython-input-5-dbd2ad518711>, line 1)

##### NCBI C.O.G.s
This allows you to match your contig database against the NCBI Cluster of Orthologous Genes database. To do so, you first need to set up this database. You will want to be careful where you set this up, as this is important for actually using it. Once its set up, you probably will not need to do it again. Just make sure it stays up to date with how COG looks every once in a while (current version is COG20). 

In [7]:
anvi-setup-ncbi-cogs --cog-data-dir /scratch/genomics/stegmannt/metagenomes/first_data-CC-revisited/04_binning/data/DATABASE/
#you'll only have to do this once, after this you should be able to copy this database

anvi-run-ncbi-cogs -c ../data/working/"$f".contigs.db  --cog-data-dir /scratch/genomics/stegmannt/metagenomes/first_data-CC/06_anvio/data/DATABASE/
#this annotates your contigs with functions from the NCBI COG database that you generated. 

SyntaxError: invalid decimal literal (<ipython-input-7-3430689d7771>, line 1)

##### Taxonomy estimates

### Single profiling
Other than the contig file, the profile that is about to be generated contains information about your contigs, based on the results of your mapping step. Each database links to a contig database. Its important to make sure that all the profiles that you are generating are generated using the same parameters, since you're quite likely to *merge* them later. For more information on profiling:

https://anvio.org/help/main/programs/anvi-profile/


In [1]:
for f in <sample1> <sample2> <sample3> <sample4>
anvi-profile -i ../../03_mapping/data/results/"$f".bam -c ../data/working/contigs.db \
--min-contig-length 1000 \
--output-dir ../data/working/"$f"_singeprofile \
--sample-name "$f"
#these are all mostly optional flags, but you'll want to incorporate them, as they standardize annotation and contents
# cluster-contigs is only needed when creating a single profile, which you'll probably not do

SyntaxError: invalid syntax (<ipython-input-1-140c1f352b25>, line 1)

#### Merging of profiles
To work any further with Anvi'O, you need to merge your profiles into a single profile. There is more to this step, but that is all very neatly explained here:

https://anvio.org/help/main/programs/anvi-merge/

In [2]:
#this piece will merge your single profiles into an overarching one. At this point, make sure to take extreme care that your samples carry some similarity!
anvi-merge ..data/working/Coral*-singleprofile/PROFILE.db  -o ../data/working/samples-merged -c ../data/working/co-assembly1_contigs.db -S M_cavernosa

SyntaxError: invalid syntax (<ipython-input-2-c0ff3c4f5243>, line 2)

### Data interpretation
