# ARG Analysis 
ARG: antibiotic resistance genes

### Goal of the analysis
- Alignment of the MAG to the ARG database using DeepARG, DeepBGC and AntiSMASH pipelines
- Detection of antibiotic resistance relevant genes, gene clusters and orthology


### First steps
1. Extraction of complete CDSs and ORFs 
    - using given perl script
    - Script 5_0
    - If required, do statistics about the fasta files containing CDSs and ORFs using written perl script

In [None]:
cd /gxfs_work/geomar/smomw681/DATA/MAG_Illumina/
## STATS for

## CTG_Illumina
sbatch -c 3 -p base --mem=50G --job-name=StatCTG_Illumina \
     --wrap="perl /gxfs_work/geomar/smomw681/DATA/MAG_Files/SL_Fasta_Files_stats_modified_jobname.pl \
     CONTIGs_renamed"
##
## CTG_PROKS
sbatch -c 3 -p base --mem=50G --job-name=StatCTG_PROKS \
     --wrap="perl /gxfs_work/geomar/smomw681/DATA/MAG_Files/SL_Fasta_Files_stats_modified_jobname.pl \
     CLASS_CONTIGs/PROKS"
##
## ORFs
sbatch -c 3 -p base --mem=50G --job-name=StatORF_PROKS \
     --wrap="perl /gxfs_work/geomar/smomw681/DATA/MAG_Files/SL_Fasta_Files_stats_modified_jobname.pl \
     PRODIGAL/ORFs_ORIGINAL"
##
## CDS
sbatch -c 3 -p base --mem=50G --job-name=StatCDS_PROKS \
     --wrap="perl /gxfs_work/geomar/smomw681/DATA/MAG_Files/SL_Fasta_Files_stats_modified_jobname.pl \
     PRODIGAL/CDS_ORIGINAL"
##

2. Downstream analysis using different bioinformatic pipelines
2.1. DeepARG: 
- A deep learning based approach to predict ARGs from metagenomes
    - provides two models ()
    - deepARG-LS for long sequence: 
        - Annotate gene-like sequences when the input is a nucleotide FASTA file:
            - deeparg predict --model LS --type nucl --input /path/file.fasta --out /path/to/out/file.out
        - Annotate gene-like sequences when the input is an amino acid FASTA file:
            - deeparg predict --model LS --type prot --input /path/file.fasta --out /path/to/out/file.out
    - deepARG-SS for short short sequence reads
- Script: 5_1
- After the run: 
    - Search for ARG hits and count the number of hits
    - Summarize the results with perl script below

In [None]:
## count number of hits
wc -l | awk '$1 > 1 {print $1,$2}'
## the total hits
awk 'NR > 1' DeepARGs/*.deeparg.out.mapping.ARG | wc -l | awk '$1 > 1 {print $1,$2}'
## showing for each file, excluding the header
wc -l DeepARGs/*.deeparg.out.mapping.ARG | awk '$1 > 1 {print $1-1,$2}' > DeepARGs_hits_perSample.txt

## Summary using perl script :
INPUT_FILE="/gxfs_work/geomar/smomw681/DATA/MAG_Illumina/PRODIGAL/DeepARG/DeepARGs_hits_perSample.txt"  
OUTPUT_FILE="/gxfs_work/geomar/smomw681/DATA/MAG_Illumina/PRODIGAL/DeepARG/deeparg_PROKS_summary.txt"
perl /gxfs_work/geomar/smomw681/DATA/MAG_Files/SL_summarize_deeparg.pl "$INPUT_FILE" "$OUTPUT_FILE"


2.2. DeeepBGC
- script: 5_2  
- Search for Biosynthetic Gene Cluster detection and classification
    - Product class and activity of detected BGCs is predicted using a Random Forest classifier
 
2.3. AntiSMASH 
- Script: 5_3
- Search for secondary metabolite biosynthesis gene cluster