## *De novo* assembly 
This piece of code deals with the generation of contigs from the quality trimmed reads from the Illumina sequencer. It is a memory intensive process and will take quite some time, even with a powerful computing cluster at your fingertips. For assembly, two pieces of software are used:

* MegaHIT: https://github.com/voutcn/megahit
* QUAST: http://quast.sourceforge.net/

MegaHIT has quite a few options on how to perform the assembly, so it is recommended to familiarize yourself with their options and the way the software works.

#### MegaHIT
For MegaHIT, nothing beside the quality-trimmed reads from the quality control is required. Do realise that assembly is fairly sensitive to adapters still being present, so if this step is not yet performed, then this is a must before continuing.

In [1]:
megahit --presets meta-large \
-1 /scratch/genomics/stegmannt/metagenomes/first_data-CC/01_quality/data/results/Coral1_host_removed_R1.fastq \
-2 /scratch/genomics/stegmannt/metagenomes/first_data-CC/01_quality/data/results/Coral1_host_removed_R2.fastq \
-o ../data/working/Coral1_assembly_complex --out-prefix Coral1_complex_HM

#in this, --meta-large indicates to MegaHIT that it is working with a complex sample, and thus will judge contigs more strictly

SyntaxError: invalid syntax (<ipython-input-1-0a7394b13c60>, line 1)

#### QUAST
QUAST executes a quality control for your contigs. This is also a good indicator whether a set of contigs is going to be of use to you. If there are very few large contigs (>1000), binning may not work very well. 

In [1]:
for f in Coral1 Coral2 Coral4 Coral5 
do
quast ../data/working/"$f"_assembly_complex/"$f".contigs.fa -o ../data/results/quast_output_"$f"_complex

SyntaxError: invalid syntax (<ipython-input-1-9001c02202cf>, line 1)

#### Deflines and size
For binning, only contigs over 1000 basepairs can be considered. This is due to the fact that most binning softwares rely on the use of k-mers, and smaller sizes don't give an accurate k-mer signature for k-step =4. So, the following piece of code uses scripts from Anvi'O to do two things: fix the deflines of your contigs, so Anvi'O can use them (and makes them nice and uniform), and *remove* all contigs that are under 1000 bp. It is strongly recommended to save your contigs, so you can be sure to go back to them if you decide to go another direction with this. 

In [2]:
anvi-script-reformat-fasta contigs.fa -o contigs-fixed.fa -l 1000 --simplify-names

SyntaxError: invalid syntax (<ipython-input-2-dc09bcf27e0c>, line 1)