### __NCBI Antimicrobial Resistance Gene Finder (AMRFinderPlus)__

Find AMR genes in genome assemblies

__Docs:__ https://github.com/ncbi/amr/wiki

#### __Libraries & requeriments__

```bash
conda install -c conda-forge -c bioconda -n amrfinder ncbi-amrfinderplus -y
```

In [1]:
import polars as pl

R = "/home/npilquinao/BEM-ISP/group_storage/amrfinder/results"

__Get database__

In [4]:
!amrfinder -u

Running: amrfinder -u
Software directory: '/home/npilquinao/miniconda3/envs/ISP/bin/'
Software version: 3.12.8
Running: /home/npilquinao/miniconda3/envs/ISP/bin/amrfinder_update -d /home/npilquinao/miniconda3/envs/ISP/share/amrfinderplus/data
Looking up the published databases at https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/
Looking for the target directory /home/npilquinao/miniconda3/envs/ISP/share/amrfinderplus/data/2024-01-31.1/
Downloading AMRFinder database version 2024-01-31.1 into '/home/npilquinao/miniconda3/envs/ISP/share/amrfinderplus/data/2024-01-31.1/'
Running: /home/npilquinao/miniconda3/envs/ISP/bin/amrfinder_index /home/npilquinao/miniconda3/envs/ISP/share/amrfinderplus/data/2024-01-31.1/
Indexing
Database directory: '/home/npilquinao/miniconda3/envs/ISP/share/amrfinderplus/data/2024-01-31.1'
Database version: 2024-01-31.1


#### __Tool testing__

In [14]:
%%bash

amrfinder\
    --nucleotide /home/npilquinao/BEM-ISP/group_storage/npilquinao/ISP/carbapenemase/assembly/VA61_2022/assembly.fasta\
    --name VA61_2022\
    --plus\
    --threads 40\
    -o ~/BEM-ISP/AMRfinder/VA61_2022_amrfinder.tsv

Running: amrfinder --nucleotide /home/npilquinao/BEM-ISP/group_storage/npilquinao/ISP/2carbapenemase/assembly/VA61_2022/assembly.fasta --name VA61_2022 --plus --threads 40 -o /home/npilquinao/BEM-ISP/AMRfinder/VA61_2022_amrfinder.tsv
Software directory: '/home/npilquinao/miniconda3/envs/ISP/bin/'
Software version: 3.12.8
Database directory: '/home/npilquinao/miniconda3/envs/ISP/share/amrfinderplus/data/2024-01-31.1'
Database version: 2024-01-31.1
AMRFinder translated nucleotide search
  - include -O ORGANISM, --organism ORGANISM option to add mutation searches and suppress common proteins
Running tblastn
Making report
AMRFinder took 4 seconds to complete


__Output format overview__

In [5]:
VA61_2022 = pl.read_csv(R + "/VA61_2022_amrfinder.tsv",
                        separator="\t")
print(f"Table dimensions: {VA61_2022.shape}")
display(VA61_2022.head(4))

#display(VA61_2022["Gene symbol"].unique().to_list())

Table dimensions: (40, 23)


Name,Protein identifier,Contig id,Start,Stop,Strand,Gene symbol,Sequence name,Scope,Element type,Element subtype,Class,Subclass,Method,Target length,Reference sequence length,% Coverage of reference sequence,% Identity to reference sequence,Alignment length,Accession of closest sequence,Name of closest sequence,HMM id,HMM description
str,str,i64,i64,i64,str,str,str,str,str,str,str,str,str,i64,i64,f64,f64,i64,str,str,str,str
"""VA61_2022""","""NA""",1,369761,370873,"""+""","""iroB""","""salmochelin bi…","""plus""","""VIRULENCE""","""VIRULENCE""","""NA""","""NA""","""BLASTX""",371,371,100.0,92.99,371,"""EOW04219.1""","""salmochelin bi…","""NA""","""NA"""
"""VA61_2022""","""NA""",1,371018,374662,"""+""","""iroC""","""salmochelin/en…","""plus""","""VIRULENCE""","""VIRULENCE""","""NA""","""NA""","""BLASTX""",1215,1219,99.75,90.62,1216,"""AUH19662.1""","""salmochelin/en…","""NA""","""NA"""
"""VA61_2022""","""NA""",1,377096,379267,"""-""","""iroN""","""siderophore sa…","""plus""","""VIRULENCE""","""VIRULENCE""","""NA""","""NA""","""BLASTX""",724,725,100.0,95.03,725,"""AAN76093.1""","""siderophore sa…","""NA""","""NA"""
"""VA61_2022""","""NA""",2,92428,93600,"""+""","""oqxA""","""multidrug effl…","""core""","""AMR""","""AMR""","""PHENICOL/QUINO…","""PHENICOL/QUINO…","""BLASTX""",391,391,100.0,91.3,391,"""WP_002914189.1…","""multidrug effl…","""NA""","""NA"""


__Filter by ISP-reported carbanepemases__

- CP genes: KPC, VIM, NDM

In [8]:
VA61_F = VA61_2022.filter(pl.col("Gene symbol").str.contains("kpc|ndm|vim|KPC|NDM|VIM"))

print("ISP-reported proteins founded = ", VA61_F.shape[0])
display(VA61_F.head(4))

ISP-reported proteins founded =  1


Name,Protein identifier,Contig id,Start,Stop,Strand,Gene symbol,Sequence name,Scope,Element type,Element subtype,Class,Subclass,Method,Target length,Reference sequence length,% Coverage of reference sequence,% Identity to reference sequence,Alignment length,Accession of closest sequence,Name of closest sequence,HMM id,HMM description
str,str,i64,i64,i64,str,str,str,str,str,str,str,str,str,i64,i64,f64,f64,i64,str,str,str,str
"""VA61_2022""","""NA""",25,19154,20032,"""-""","""blaKPC-2""","""carbapenem-hyd…","""core""","""AMR""","""AMR""","""BETA-LACTAM""","""CARBAPENEM""","""ALLELEX""",293,293,100.0,100.0,293,"""WP_004199234.1…","""carbapenem-hyd…","""NA""","""NA"""


#### __Run AMR finder on all samples__

##### __Unicycler assemblies__

Only illumina short reads assembly

In [1]:
%%bash

samples=("VA1046_2020" "VA1184_2021" "VA1788_2021" "VA2464_2020" "VA418_2022" "VA692_2022" 
          "VA1101_2021" "VA1565_2021" "VA2067_2020" "VA2588_2020" "VA585_2022")
          
for dir in "${samples[@]}"
do
    amrfinder\
        --nucleotide /home/npilquinao/BEM-ISP/group_storage/npilquinao/ISP/2carbapenemase/assembly/${dir}/assembly.fasta\
        --name ${dir}\
        --plus\
        --threads 40\
        -o ~/BEM-ISP/AMRfinder/${dir}_amrfinder.tsv
done

head -n 1 ~/BEM-ISP/AMRfinder/VA1046_2020_amrfinder.tsv > ~/BEM-ISP/AMRfinder/complete_amrfinder.tsv

for dir in "${samples[@]}"
do
    tail -n +2 ~/BEM-ISP/AMRfinder/${dir}_amrfinder.tsv >> ~/BEM-ISP/AMRfinder/complete_amrfinder.tsv
done

Running: amrfinder --nucleotide /home/npilquinao/BEM-ISP/group_storage/npilquinao/ISP/2carbapenemase/assembly/VA1046_2020/assembly.fasta --name VA1046_2020 --plus --threads 40 -o /home/npilquinao/BEM-ISP/AMRfinder/VA1046_2020_amrfinder.tsv
Software directory: '/home/npilquinao/miniconda3/envs/ISP/bin/'
Software version: 3.12.8
Database directory: '/home/npilquinao/miniconda3/envs/ISP/share/amrfinderplus/data/2024-01-31.1'
Database version: 2024-01-31.1
AMRFinder translated nucleotide search
  - include -O ORGANISM, --organism ORGANISM option to add mutation searches and suppress common proteins
Running tblastn
Making report
AMRFinder took 2 seconds to complete
Running: amrfinder --nucleotide /home/npilquinao/BEM-ISP/group_storage/npilquinao/ISP/2carbapenemase/assembly/VA1184_2021/assembly.fasta --name VA1184_2021 --plus --threads 40 -o /home/npilquinao/BEM-ISP/AMRfinder/VA1184_2021_amrfinder.tsv
Software directory: '/home/npilquinao/miniconda3/envs/ISP/bin/'
Software version: 3.12.8
Da

In [1]:
AMRfinder = pl.read_csv(R + "/complete_amrfinder.tsv",
                        separator="\t")

### Carbapenemase gen identifiers ###

carbapenemase = ["KPC",     # KPC (Klebsiella pneumoniae carbapenemase)
                 "VIM",     # VIM (Verona integron-encoded metallo-β-lactamase)
                 "IMP",     # IMP (Imipenemase)
                 "NDM",     # NDM (New Delhi metallo-β-lactamase)
                 "OXA-48",  # OXA (Oxacillinase)
                 "AIM",     # AIM (Australian Imipenemase)
                 "SPM",     # SPM (São Paulo metallo-β-lactamase)
                 "SIM",     # SIM (Serratia In19-like carbapenemase)
                 "GES"]     # GES (Guiana Extended-Spectrum)

AMRfinder_filtered = AMRfinder.filter(pl.col('Gene symbol').map_elements(lambda value: any(word in value for word in carbapenemase), return_dtype=bool))
print(f"Dimensions: {AMRfinder_filtered.shape}")
display(AMRfinder_filtered["Name"].to_list())

Dimensions: (15, 23)


['VA1046_2020',
 'VA1184_2021',
 'VA1788_2021',
 'VA2464_2020',
 'VA418_2022',
 'VA692_2022',
 'VA692_2022',
 'VA1101_2021',
 'VA1101_2021',
 'VA1565_2021',
 'VA2067_2020',
 'VA2067_2020',
 'VA2588_2020',
 'VA585_2022',
 'VA585_2022']

In [2]:
pl.DataFrame.write_csv(AMRfinder_filtered, "~/BEM-ISP/AMRfinder/AMRfinder_only_carbapenemase.tsv", separator="\t")