# Module 10: AMR 

## Overview

One of the benefits of whole genome sequencing bacterial pathogens is that you capture the genomic inventory of the organism. This has been capitalised on in clinical microbiology for the in silico prediction of antibiotic resistance directly from whole genome sequencing data. This is being developed as an alternative to phenotypic sensitivity testing of microorganisms in the laboratory, where microorganisms are routinely sequenced.

For many microorganisms the genetic basis of antibiotic resistance has been extensively studied. This means that the genes responsible for resistance have been identified and sequenced, and can be used to compile a database of resistance determinants and used to query an organism’s genome and define its resistome. Based on the presence or absence of genes or mutations it is possible to make a prediction of the antibiotic sensitivities of an organism.

### Install condacolab

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

### Install software

In [None]:
# Install ABRicate
conda install -c conda-forge -c bioconda -c defaults abricate

In [None]:
# Install any2fasta
!conda install -c bioconda any2fasta

In [None]:
# Install blast
!conda install bioconda::blast

### Download data

In [None]:
!wget https://zenodo.org/records/13750987/files/Module_10.tar.gz

### Extract the .tar.gz file 

In [None]:
!tar xvf Module_10.tar.gz

## Screening for AMR genes using ABRicate

[ABRicate](https://github.com/tseemann/abricate/tree/master) carries out mass screening of contigs for antimicrobial resistance or virulence genes. It comes bundled with multiple databases: NCBI, CARD, ARG-ANNOT, Resfinder, MEGARES, EcOH, PlasmidFinder, Ecoli_VF and VFDB.

To process with ABRicate, you should have in mind that: 

- It only supports contigs, not FASTQ reads
- It only detects acquired resistance genes, NOT point mutations
- It uses a DNA sequence database, not protein
- It needs BLAST+ >= 2.7 and any2fasta to be installed
- It's written in Perl

ABRicate takes any sequence file that the tool any2fasta can convert to FASTA files (eg. Genbank, EMBL), and they can be optionally gzip or bzip2 compressed. ABRicate comes with some pre-downloaded databases which can be viewed using the command:

In [None]:
# View the list of available databases
!abricate --list

The default database is ncbi but you can choose a different database using the `--db` option, for example:

In [None]:
# Select the database to use
!abricate --db ncbi --quiet input file

In this section, we will analyze data from the Whole Genome Shotgun Sequencing Project at Universidade Federal do Rio de Janeiro under project [PRJNA1086968](https://www.ebi.ac.uk/ena/browser/view/PRJNA1086968). In this case, we will use the genome assembly contig set with accession number [SAMN41581986](https://www.ebi.ac.uk/ena/browser/view/SAMN41581986?show=related-records).

### To execute ABRicate on contigs of a single strain, we will use the command: 

In [None]:
# Run ABRicate
!abricate --db resfinder --quiet contigs.fasta > results.tab

An explanation of this command is as follows:

`abricate` is the tool

`--db resfinder` specifies the database

`--quiet` no screen output

`contigs.fasta` input file

`>results.tab` specifies the output file

View the output of the above command (open the results.tab file):

This results indicate that this strain has Tet(M), Msr(D) and mef(A) genes which markers for tetracycline and macrolide resistance, respectively. You can read more [here](https://www.sciencedirect.com/topics/medicine-and-dentistry/penicillin-binding-protein)

### To execute ABRicate on contigs of multiple strains, navigate to the contigs folder and run the command below:

>**Note**: In this module, we will not run the multiple analysis due to the lack of resources in Colab. However, here is an example of how to do it.

In [None]:
# Do not execute
# Run ABRicate on all contigs files
#!abricate --db resfinder --quiet *contigs.fasta > results.tab

`*contigs.fasta` indicates multiple contig files. 

*Adapted from:*

- Advanced Bioinformatics Course developed for the GPS and JUNO projects - Wellcome Sanger Insitute

*Modified by Luisa Sacristán (Universidad de los Andes-CABANA)*