# ARG Profiling

### Indexing the ResFinder Database with Bowtie2
The command below uses Bowtie2 to create a searchable index from the ResFinder database's nucleotide sequences. This index is essential for aligning sequence reads to the database during downstream analyses,

In [44]:
!bowtie2-build /resfinder_db/all.fsa /resfinder_db/resfinder

Settings:
  Output files: "/resfinder_db/resfinder.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /resfinder_db/all.fsa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 749187
Using parameters --bmax 561891 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 561891 --dcv 1024
Construct

### Mapping Sequencing Reads to the ResFinder Database with Bowtie2
This command processes multiple sequencing samples to identify antibiotic resistance genes by aligning them against the ResFinder database using Bowtie2, followed by conversion of the output to BAM format using Samtools. 

The alignment was performed  using the command bowtie2 -x CARD-db / Resfinder-db -1 reads1 -2 reads2 -D 20 -R 3 -N 1 -L 20 -i S,1,0.50, which corresponds to Bowtie2's "very sensitive" preset. This preset was chosen to enhance alignment accuracy by maximizing read capture and achieving a search depth capable of identifying , targeted to increase search depth by capturing up to 95-99% of potential read matches to the reference sequences

In [53]:
!mkdir -p /home/jovyan/mapped_resfinder

!for sample in $(ls /home/jovyan/data/*_R1_*.fastq.gz | sed "s/.*\\///" | sed "s/_R1_.*//"); do \
    echo "Processing sample: ${sample}"; \
    bowtie2 -x /resfinder_db/resfinder \
        -1 /home/jovyan/data/${sample}_R1_trimmed.fastq.gz \
        -2 /home/jovyan/data/${sample}_R2_trimmed.fastq.gz \
        -D 20 -R 3 -N 1 -L 20 -i S,1,0.50 | \
    samtools view -Sb - > /home/jovyan/mapped_resfinder/${sample}_unfiltered.bam 2>&1; \
    echo "Completed sample: $sample"; \
done

Processing sample: BFH10_S128
3129692 reads; of these:
  3129692 (100.00%) were paired; of these:
    3129048 (99.98%) aligned concordantly 0 times
    69 (0.00%) aligned concordantly exactly 1 time
    575 (0.02%) aligned concordantly >1 times
    ----
    3129048 pairs aligned concordantly 0 times; of these:
      21 (0.00%) aligned discordantly 1 time
    ----
    3129027 pairs aligned 0 times concordantly or discordantly; of these:
      6258054 mates make up the pairs; of these:
        6257400 (99.99%) aligned 0 times
        84 (0.00%) aligned exactly 1 time
        570 (0.01%) aligned >1 times
0.03% overall alignment rate
Completed sample: BFH10_S128
Processing sample: BFH33_S151
3129692 reads; of these:
  3129692 (100.00%) were paired; of these:
    3127275 (99.92%) aligned concordantly 0 times
    540 (0.02%) aligned concordantly exactly 1 time
    1877 (0.06%) aligned concordantly >1 times
    ----
    3127275 pairs aligned concordantly 0 times; of these:
      115 (0.00%) a

#### Sorting Reads by Position: 

The command sorts the reads in the BAM file by their position on the reference genome.

In [60]:
!mkdir -p /home/jovyan/sorted_reads_resfinder

!for sample in $(ls /home/jovyan/data/*_R1_*.fastq.gz | sed "s/.*\\///" | sed "s/_R1_.*//"); do \
    echo "Processing sample: ${sample}"; \
    samtools sort -T /home/jovyan/sorted_reads_resfinder/${sample} -O bam \
    -o /home/jovyan/sorted_reads_resfinder/${sample}.bam \
    /home/jovyan/mapped_resfinder/${sample}_unfiltered.bam; \
done


Processing sample: BFH10_S128
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
Processing sample: BFH33_S151
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
Processing sample: BH02_S77
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
Processing sample: BH03_S78
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
Processing sample: FH1_S162
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
Processing sample: FH2_S163
[bam_sort_core] merging from 2 files and 1 in-memory blocks...


#### Indexing Sorted BAM Files
This command processes the sorted BAM files generated previously and creates index files for them using Samtools. 

In [61]:
!for sample in $(ls /home/jovyan/data/*_R1_*.fastq.gz | sed 's/.*\///' | sed 's/_R1_.*//'); do \
    echo "Processing sample: ${sample}"; \
    samtools index /home/jovyan/sorted_reads_resfinder/${sample}.bam;\
done

Processing sample: BFH10_S128
Processing sample: BFH33_S151
Processing sample: BH02_S77
Processing sample: BH03_S78
Processing sample: FH1_S162
Processing sample: FH2_S163


#### Extracting Gene Names and Count of Mapped Reads Using samtools idxstats
This command extracts gene names (or reference names) from the sorted BAM files using samtools idxstats

In [62]:
!mkdir -p /home/jovyan/results
!for sample in $(ls /home/jovyan/data/*_R1_*.fastq.gz | sed 's/.*\///' | sed 's/_R1_.*//'); do \
    echo "Processing sample: ${sample}";\
    samtools idxstats /home/jovyan/sorted_reads_resfinder/${sample}.bam | grep -v "\*" | cut -f1 | \
    (echo "GENE"; cat) > /home/jovyan/results/gene_names;\
done

Processing sample: BFH10_S128
Processing sample: BFH33_S151
Processing sample: BH02_S77
Processing sample: BH03_S78
Processing sample: FH1_S162
Processing sample: FH2_S163


Using samtools to get the counts of mapped reads for each reference gene from the aligned BAM file.

In [63]:
!for sample in $(ls /home/jovyan/data/*_R1_*.fastq.gz | sed 's/.*\///' | sed 's/_R1_.*//'); do \
    echo "Processing sample: ${sample}"; \
    echo -n "${sample}" > /home/jovyan/results/${sample}_counts; \
    samtools idxstats /home/jovyan/sorted_reads_resfinder/${sample}.bam | grep -v "\\*" | cut -f3 >> /home/jovyan/results/${sample}_counts; \
done

Processing sample: BFH10_S128
Processing sample: BFH33_S151
Processing sample: BH02_S77
Processing sample: BH03_S78
Processing sample: FH1_S162
Processing sample: FH2_S163


#### Combine into one file
This command combines the count of mapped reads from all samples into a single tab-delimited file, where each row represents a gene and each column represents a sample. 

In [64]:
!echo -e "GENE\t$(ls /home/jovyan/data/*_R1_*.fastq.gz | sed 's/.*\///' | sed 's/_R1_.*//' | tr '\n' '\t' | sed 's/\t$//')" > /home/jovyan/results/ARG_genemat.txt

!paste /home/jovyan/results/gene_names $(ls /home/jovyan/data/*_R1_*.fastq.gz | \
sed 's/.*\///' | sed 's/_R1_.*//' | sed 's|^|/home/jovyan/results/|g' | sed 's|$|_counts|') |\
tail -n +2 >> /home/jovyan/results/ARG_genemat.txt


In [65]:
!head /home/jovyan/results/ARG_genemat.txt

GENE	BFH10_S128	BFH33_S151	BH02_S77	BH03_S78	FH1_S162	FH2_S163
aac(6')-Ib_2_M23634	1	3	14	0	49	5
aac(6')-Ib11_1_AY136758	6	22	21	4	101	50
aac(6')-30-aac(6')-Ib'_1_AJ584652	0	0	0	0	0	0
aac(6')-Iaj_1_AB709942	0	0	0	0	0	0
aac(6')-Ian_1_AP014611	0	0	0	0	0	0
aac(6')-Iak_1_AB894482	4	16	10	1	86	21
aac(6')-Ib-Hangzhou_1_FJ503047	0	0	0	0	0	0
aac(6')-Iid_1_AJ584700	0	0	0	0	0	0
aac(6')-Iih_1_AJ584701	0	0	0	0	0	0


## Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data
- Used for normalization of the counts of mapped reads later on.


For each sample, the script runs Metaxa2, a tool for detecting ribosomal RNA (rRNA) in metagenomic data with the following parameters:

- -1 and -2 specify the paired-end read files (forward and reverse)
- --align none disables alignment (typically used when only taxonomic classification is desired).
- --graphical F disables graphical output (if you don't want plots).
- --plus tells Metaxa2 to use the BLAST+ for alignment.


In [31]:
!mkdir -p /home/jovyan/metaxa2
!for sample in $(ls /home/jovyan/data/*_R1_*.fastq.gz | sed 's/.*\///' | sed 's/_R1_.*//'); do \
    echo "Metaxa Processing Sample $sample";\
    metaxa2 -1 /home/jovyan/data/${sample}_R1_trimmed.fastq.gz -2 /home/jovyan/data/${sample}_R2_trimmed.fastq.gz \
    -f fastq -z gzip -t b -o /home/jovyan/metaxa2/${sample} --align none --graphical F --plus 2>/dev/null;\
done

Metaxa Processing Sample BFH10_S128
Metaxa Processing Sample BFH33_S151
Metaxa Processing Sample BH02_S77
Metaxa Processing Sample BH03_S78
Metaxa Processing Sample FH1_S162
Metaxa Processing Sample FH2_S163


#### Processing Metaxa Output:

The command metaxa2_ttt processes the taxonomy output from each sample, extracting counts of identified taxa (e.g., genus-level counts) for each sample's rRNA sequences.
This generates raw counts of each identified genus in each sample.
Combining Counts at the Genus Level with metaxa2_dc:

The metaxa2_dc step combines these raw taxonomy counts across all samples at the genus level.
It generates a summary file (metaxa_genus.txt) that includes the counts of each genus detected in every sample.
Normalization Using Genus-Level Counts:

The genus-level counts produced by metaxa2_dc can be used as input for normalization. Since sequencing depth varies across samples, normalization ensures that the observed counts are comparable.
Normalization involves scaling counts to a common factor (e.g., total reads, sequencing depth, or per million reads).

In [32]:
!for sample in $(ls /home/jovyan/data/*_R1_*.fastq.gz | sed 's/.*\///' | sed 's/_R1_.*//'); do \
    metaxa2_ttt -i /home/jovyan/metaxa2/${sample}.taxonomy.txt -t b -o /home/jovyan/metaxa2/${sample};\
done


Metaxa Taxonomic Traversal Tool -- Processes Taxonomic Output From Metaxa
by Johan Bengtsson-Palme, University of Gothenburg
Version: 2.2
-----------------------------------------------------------------
Metaxa Taxonomic Traversal Tool -- Processes Taxonomic Output From Metaxa
by Johan Bengtsson-Palme, University of Gothenburg
Version: 2.2
-----------------------------------------------------------------
Metaxa Taxonomic Traversal Tool -- Processes Taxonomic Output From Metaxa
by Johan Bengtsson-Palme, University of Gothenburg
Version: 2.2
-----------------------------------------------------------------
Metaxa Taxonomic Traversal Tool -- Processes Taxonomic Output From Metaxa
by Johan Bengtsson-Palme, University of Gothenburg
Version: 2.2
-----------------------------------------------------------------
Metaxa Taxonomic Traversal Tool -- Processes Taxonomic Output From Metaxa
by Johan Bengtsson-Palme, University of Gothenburg
Version: 2.2
----------------------------------------------

In [39]:
!mkdir -p /home/jovyan/results
!metaxa2_dc -o /home/jovyan/results/metaxa_genus.txt /home/jovyan/metaxa2/*.level_6.txt

Metaxa2 Diversity Tools - Data Collector
by Johan Bengtsson-Palme, University of Gothenburg
Version: 2.2
This program is distributed under the GNU GPL 3 license, use the --license option for more information on this license.
-----------------------------------------------------------------


#### Reformat Sample Name Headers to Exclude File Path

In [42]:
!sed -i '1s|/home/jovyan/metaxa2/||g' /home/jovyan/results/metaxa_genus.txt

In [43]:
!head /home/jovyan/results/metaxa_genus.txt

Taxa	BFH10_S128	BFH33_S151	BH02_S77	BH03_S78	FH1_S162	FH2_S163
Bacteria;Acidobacteria;Acidobacteria;Acidobacteriales;Acidobacteriaceae;Unclassified Acidobacteriaceae	0	0	4	0	0	0
Bacteria;Acidobacteria;Acidobacteria;Acidobacteriales;Unclassified Acidobacteriales;	0	1	0	0	0	0
Bacteria;Acidobacteria;Acidobacteria;DA023;Unclassified DA023;	1	0	0	1	0	0
Bacteria;Acidobacteria;Acidobacteria;Order Incertae Sedis;Family Incertae Sedis;Bryobacter	0	0	0	1	0	0
Bacteria;Acidobacteria;Acidobacteria;Unclassified Acidobacteria;;	3	0	5	0	0	0
Bacteria;Acidobacteria;Holophagae;Holophagales;Holophagaceae;Geothrix	0	1	7	0	0	0
Bacteria;Acidobacteria;Holophagae;Holophagales;Holophagaceae;Holophaga	0	0	1	0	1	0
Bacteria;Acidobacteria;Holophagae;Holophagales;Holophagaceae;Unclassified Holophagaceae	0	0	4	0	0	0
Bacteria;Acidobacteria;Unclassified Acidobacteria;;;	0	1	27	0	0	0
