# Quality Check
**Metaquast**
- quality assessment of metagenomic reads, no reference genome included here

https://quast.sourceforge.net/docs/manual.html#sec1
..how to interpret quality results?
- check how many large contigs you have (>1000 bp)
- did not map to reference genome.
- right now just helpful to see length and quality of contigs, maybe can reassess after mapping back to metagenome?
Cite metaquast: https://quast.sourceforge.net/publications.html

In [None]:
metaquast mcav.contigs.fa -o quast_output

All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).

### Assembly                    mcav.contigs
#### contigs (>= 0 bp)         839850      
#### contigs (>= 1000 bp)      39414       
#### contigs (>= 5000 bp)      370         
#### contigs (>= 10000 bp)     35          
#### contigs (>= 25000 bp)     0           
#### contigs (>= 50000 bp)     0           
Total length (>= 0 bp)      435829081   
Total length (>= 1000 bp)   57097694    
Total length (>= 5000 bp)   2623425     
Total length (>= 10000 bp)  443897      
Total length (>= 25000 bp)  0           
Total length (>= 50000 bp)  0           
#### contigs                   312757      
Largest contig              24857       
Total length                237368824   
GC (%)                      42.80       
N50                         729         
N75                         595         
L50                         113047      
L75                         203599      
N's per 100 kbp           0.00  

# Mapping 
- maps the reads onto to the MAG (metagenome-assembled genome) (contigs.fa)
- allows you to quantify genes/taxa in each sample by matching the sequences to the MAG

## Anvio

https://anvio.org/
- used for further analysis 
- Here, we use it for mapping, ...

In [None]:
#create anvio environment
conda create -n anvio-7.1
#dir=/home/brooke_sienkiewicz_student_uml_edu/.conda/envs/anvio-7.1

Anvio
https://merenlab.org/tutorials/assembly-based-metagenomics/
- reformats fasta file, filters contigs >1000bp
- aligns reads and indexes and stores in bam files 

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=50G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 20:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o slurm-%j.out  # %j = job ID

module load miniconda/22.11.1-1
conda activate anvio-8

SAMPLENAME=mcav
READSPATH=/project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/trimmed
CONTIGPATH=/project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/results/mcav_assembly3
CONTIGFILE="$SAMPLENAME".contigs.fa
newCONTIGPATH=/project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/working/mcav_assembly_redo

#fixes deflines for later and filters on size 
anvi-script-reformat-fasta $CONTIGPATH/$CONTIGFILE -o $newCONTIGPATH/$SAMPLENAME.contigs-fixed.fa -l 50 --simplify-names --report-file contig-rename-report-txt
#need to play around with filtering based on bp length
#deflines = sequence definition line. comes directly before its associated sequence in a fasta file


FIXEDCON="$SAMPLENAME".contigs-fixed.fa

bowtie2-build $newCONTIGPATH/$FIXEDCON $newCONTIGPATH/"$SAMPLENAME"_contigs
#this builds an index of your contigs, which only needs to happen once

for f in T1_12_2022 T1_13_2022 T1_16_2019 T1_20_2019 T1_24_2019 T1_40_2022 T1_57_2022 T1_70_2022 T2_10_2022 T2_16_2019 T3_13_2022 T3_14_2019 T3_15_2019 T3_19_2022 T3_1_2019 T3_40_2022 T3_48_2022 T3_49_2022 T3_51_2022 T3_60_2022 T3_8_2019 T3_9_2019
do
bowtie2 --threads 11 -x $newCONTIGPATH/"$SAMPLENAME"_contigs -1 $READSPATH/"$f"_MCAV_R1_001_val_1.fq -2 $READSPATH/"$f"_MCAV_R2_001_val_2.fq -S $newCONTIGPATH/"$f".sam
#this creates an alignment of your reads to your contigs and collects that in a .sam file

samtools view -F 4 -b -S $newCONTIGPATH/"$f".sam -o $newCONTIGPATH/"$f"-RAW.bam
#this converts your sam file to a bam file, but its neither sorted nor indexed, so we use an Anvi'O script to do so:

anvi-init-bam $newCONTIGPATH/"$f"-RAW.bam -o $newCONTIGPATH/"$f".bam
#index and sort your bam file

rm $newCONTIGPATH/"$f"-RAW.bam
#removal failed (was pointing to wrong directory) but going to keep raw seqs anyway
done
#generates BAM files from each sample sequence, aligns, indexes...need output bam for downstream analysis

#bash script: mapping.txt
#JOB ID: 15804854