### Preparing assembly files for binning
To create a depth processing file, reads must be re-aligned to the contigs, or mapping. This has been done using bowtie2 (can also be done using BWA). The next step would be to create a depth file with MetaBat2, convert that to be suitable for CONCOCT and MaxBin2, and then process these into bins. 

This is all assuming you have installed all of the softwares mentioned here. Use conda for quick install. If needed, the documentation for everything can be found here:

Metabinner: https://github.com/ziyewang/MetaBinner

MetaBAT: https://bitbucket.org/berkeleylab/metabat/src/master/README.md

CONCOCT: https://github.com/BinPro/CONCOCT

CheckM: https://github.com/Ecogenomics/CheckM/wiki

Das_tool: https://github.com/cmks/DAS_Tool

#### MetaBat2
The first piece of code here generates a fairly simple text file for the coverage of these files. The next set of code runs MetaBat2  (v2.10.2) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200. It sets the minimum size for a bin to 200000 basepairs, which is fairly low, so you can keep it. It gathers all mapping information into a single depth file, so you can use your 1 file in the next analysis. An important parameter to play around with is the minimum bin size. When set to 200000, this will severely limit the amount of bins you gain, especially if your samples aren't perfect. Therefore, it is wise to run MetaBAT several times with slight alterations to the -s flag to find your optimal setting (you don't want 3 bins, you also don't want 1000).

For a reference on how to do this accurately, use: https://bitbucket.org/berkeleylab/metabat/wiki/Best%20Binning%20Practices

In [1]:
#this creates a depth file for MetaBat
jgi_summarize_bam_contig_depths --outputDepth ../data/working/metabat_depth_co-assembly1.txt ../../03_mapping/data/results/*.bam
done

SyntaxError: invalid syntax (<ipython-input-1-5d2ab8b6c55f>, line 2)

In [11]:
#this is the actual MetaBat2 script
metabat2 -i ../../02_assembly/data/contigs_fixed/co-assembly1.contigs-fixed.fa -a ../data/working/metabat_depth_co-assembly1.txt \
-o ../data/results/bins_MetaBAT/metabat_bin \
-v \
-m 1500 \ #minimum contig length, standard at 2500
-s 20000 \ #minimum bin size, standard at 200000
--unbinned \


SyntaxError: invalid syntax (<ipython-input-11-87bf656f4fdd>, line 2)

 #### CONCOCT
 This set of commands runs CONCOCT in its standard mode. It first creates a depth/coverage file for itself to use and then runs CONCOCT, with the standard settings. This means k-mer value is set to 4, minimum contig length is 1000, and CONCOCT runs on the exact amount of slots given to it by Hydra. 
 
CONCOCT creates a depth file out of the coverance created in the mapping step. It is key that this is all in the correct places before proceeding with binning. It creates a single file, which is then used for the complete binning process. Do keep in mind that binning might take awhile, so be prepared to let this run overnight.

IMPORTANT: in the current version of CONCOCT, you're missing a vital file, called libmkl.so. Without this file, CONCOCT will not be able to start. You can fix this issue by installing another file through Conda: 

conda install mkl

Additionally, samtools will not work properly after a fresh CONCOCT install. The easiest way to fix this is to go to your environment where you installed CONCOCT and force an update through conda. 

In [1]:
#this creates the CONCOCT depth file

#this part cuts up the contigs into 10kb pieces for CONCOCT to use 
cut_up_fasta.py ../../02_assembly/data/results/contigs_fixed/co-assembly1.contigs-fixed.fa -c 1000 -o 0 --merge_last -b ../data/working/co-assembly1_contigs_cut.bed > ../data/working/co-assembly1_contigs_cut.fa
#this part estimates contig coverage
concoct_coverage_table.py ../data/working/co-assembly1_cut.bed ../../03_mapping/data/results/*.bam > ../data/working/coverage_table_co-assembly1.tsv



SyntaxError: invalid syntax (<ipython-input-1-f36314307c24>, line 4)

In [9]:
#CONCOCT script
#make correct directories (can be omitted I think)
mkdir ../data/results/concoct_bins
mkdir ..data/working/concoct_temp

#this next bit actually runs CONCOCT itself
concoct --composition_file ../data/working/co_assembly1_cut.fa --coverage_file ../data/working/coverage_table_"$f".tsv -t $NSLOTS -b ../data/working/concoct_temp/"$f"_concoct_temp/
merge_cutup_clustering.py ../data/working/concoct_temp/"$f"_concoct_temp/clustering_gt1000.csv > ..data/working/concoct_temp/"$f"_concoct_temp/"$f"_clustering_merged.csv
mkdir ../data/results/concoct_bins/<samplename>_concoct_bins
extract_fasta_bins.py "$f".contigs.fa ..data/working/concoct_temp/"$f"_concoct_temp/"$f"_clustering_merged.csv --output_path ../data/results/concoct_bins/"$f"_concoct_bins
done

SyntaxError: invalid syntax (<ipython-input-9-1ee438ed8a32>, line 3)

### Metabinner
This is another binning software that can be used. Metabinner relies on the use of scripts rather than executable commands, so you have to point it to where the scripts are located. If you installed using Conda, you will find them in your home/user/.conda/envs directory. First, you'll want to generate a coverage file using Metabinner. Metabinner is based off the MetaWrap script and uses 1000bp contigs as the minimum. You can also tweak some memory settings. In the same script, you can calculate kmer composition.

In [1]:
#first you have to generate a coverage file using the script that Metabinner has. It doesn't locate these on its own so you have to point it in the correct direction
bash /home/stegmannt/.conda/envs/metabinner_env/bin/scripts/gen_coverage_file.sh -a ../../02_assembly/data/results/contigs_fixed/contig_file \
-o ../data/working/depth_metabinner \
-f ../../01_quality/data/results/*_host_removed_R1.fastq \
-r ../../01_quality/data/results/*_host_removed_R2.fastq \
-t @NSLOTS
-m 8
 

python /home/stegmannt/.conda/envs/metabinner_env/bin/scripts/gen_kmer.py ../../02_assembly/data/results/contigs_fixed/co-assembly1.contigs-fixed.fa 999 4
#in which 1000 is the minimum contig length and 4 is the kmer interval
#this puts the kmer file in the same area as the contig file, which is super annoying, so
mv ../../02_assembly/data/results/contigs_fixed/kmer_4_f999.csv ../data/working/kmer_4_f999_<samplename>.csv

SyntaxError: invalid syntax (<ipython-input-1-540e67cd6ec3>, line 2)

You can now proceed to actually running Metabinner. 

In [None]:
#Metabinner runs a simplified version of CheckM that still requires the database to be set correctly
export CHECKM_DATA_PATH=/scratch/genomics/stegmannt/metagenomes/first_data-CC-revisit/04_binning/data/DATABASE
checkm data setRoot /scratch/genomics/stegmannt/metagenomes/first_data-CC-revisit/04_binning/data/DATABASE
bash /home/stegmannt/.conda/envs/metabinner_env/bin/run_metabinner.sh \
-a ../../02_assembly/data/results/contigs_fixed/co-assembly1.contigs-fixed.fa \
-o ../data/results/bins_Metabinner \
-d ../data/working/depth_metabinner/coverage_profile.tsv \
-k ../data/working/kmer_4_f999_<samplename>.csv \
-p /home/stegmannt/.conda/envs/metabinner_env/bin \
-t $NSLOTS



#The file "metabinner_result.tsv" in the "${output_dir}/metabinner_res" is the final output.
#You probably don't need to convert to fasta, but if you do: 

### DAS_tool
This is a tool to recombine all your bins from several different algorithms into a single one, without redundancy. It requires a .tsv input, where most binners will create .fa bins. It comes with a script to convert your .fa bins to a useful filetype. 

In [None]:
Fasta_to_Contigs2Bin.sh - i ../data/results/<outputfolder> -e fa > ../data/working/<software>_contigs2bin.tsv
#this finds the contig bins, and converts them for further use

In [2]:
DAS_Tool --write_bins -t $NSLOTS -i ../data/working/<software1>_contigs2bin,../data/working/<software1>_contigs2bin,../data/working/<software1>_contigs2bin \
-c ../../02_assembly/data/results/contigs_fixed/<samplename>.fa \
-o ../data/results/DAS_bins/bin_FINAL

SyntaxError: invalid syntax (<ipython-input-2-4e6a680b61b0>, line 1)

### Continuing
You should now have 3 sets of bins, each created with a slightly different algorithm, consolidated into a single set of bins through DAS_tools. It is now important to run the CheckM software with the script below and generate output files for all of them. This will inform you towards the quality of your bins and your contamination/completion rate. After this, you can proceed to the "Refine Bins" part of the workflow.

CheckM runs a check against a database to determine the levels of completeness versus contamination. These statistics are vital in determining how you want to proceed in the refinement process. Mind you, CheckM works without setting the database you need, but you get very confusing data. So make sure you set it correctly before running it. 

CheckM has an insane amount of options, which could keep you occupied for forever. The most direct workflow, however, is the following command: 

In [1]:
#you need to run these everytime in the current version of checkm
export CHECKM_DATA_PATH=/scratch/genomics/stegmannt/metagenomes/first_data-CC-revisit/04_binning/data/DATABASE
checkm data setRoot /scratch/genomics/stegmannt/metagenomes/first_data-CC-revisit/04_binning/data/DATABASE
checkm lineage_wf -x fa -t $NSLOTS --pplacer_threads $NSLOTS ../data/results/bins_<software> ../data/results/<software>_bins.stats
#checkm can do a lot more. most of these functionalities work better and safer in Anvi'O

#Checkm can use an absolutely insane amount of memory: make sure you account for this!

SyntaxError: invalid syntax (<ipython-input-1-ac34c80eed5f>, line 2)

Congratulations! You have finished binning. The bins you have produced are considered putative genomes and can be used for a fair amount of practices, some of which I have listed in the Anvi'O notebook, others which are in the Analysis notebook. Good luck!