![](https://raw.githubusercontent.com/WCSCourses/ACORN-ClinAMR/7eae1fa5d5c73edb3bc1b5ee92ca23c1cb070db4/course_data/WCS_ACORN_Logo.png)




# WCS ACORN - Bioinformatics for Antimicrobial Resistance

Hello! I hope you found the 'Practical Insights into Bacterial Genomic Annotaion' lecture useful. Make sure to reach out if you have **any** questions.

We are now going to test our Bioinformatics skills and annotate three *Enterococcus faecium* genomes, compare their AMR potential, and explore their AMR potential.

# Table of contents <!-- omit in toc -->
- [Command Line](#command-line)
  - [Setting up our computational environment](#setting-up-our-computational-environment)
  - [Download data](#download-data)
  - [Annotate genomes using Prokka](#annotate-genomes-using-prokka)
  - [Find genes associated with Vancomycin](#find-genes-associated-with-vancomycin)
  - [Discussion](#discussion)
- [Web Based](#web-based)
  - [Loading Genomes](#loading-genomes)
  - [Running Bakta](#running-bakta)
  - [Visualising the Annotation](#visualising-the-annotation)
  - [Final Discussion](#final-discussion)

# Coomand Line
## Setting up working environment




In [None]:
!python --version

Python 3.10.12


Lets get started by first setting up `conda` our software installation manager.

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:14
🔁 Restarting kernel...


Now We can easily install [`bakta`](https://github.com/oschwengers/bakta) using [`conda`](https:/anaconda.org/bioconda/bakta)

In [None]:
!conda install -c bioconda prokka

Channels:
 - bioconda
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - 

## Download data
Let's retrieve the genomes we are going to annotate. These come in a compressed format, so before we can use them, lets uncompress them.

In [None]:
!wget https://wcs_data_transfer.cog.sanger.ac.uk/genomes.tar.gz
!tar -zxvf /content/genomes.tar.gz

--2024-05-28 22:29:18--  https://wcs_data_transfer.cog.sanger.ac.uk/genomes.tar.gz
Resolving wcs_data_transfer.cog.sanger.ac.uk (wcs_data_transfer.cog.sanger.ac.uk)... 193.62.203.62, 193.62.203.63, 193.62.203.61
Connecting to wcs_data_transfer.cog.sanger.ac.uk (wcs_data_transfer.cog.sanger.ac.uk)|193.62.203.62|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2705193 (2.6M) [application/gzip]
Saving to: ‘genomes.tar.gz’


2024-05-28 22:29:18 (28.6 MB/s) - ‘genomes.tar.gz’ saved [2705193/2705193]

genomes/
genomes/SRR16146183.fasta
genomes/SRR11910820.fasta
genomes/SRR16866601.fasta


## Annotate genomes using Prokka
We have to annotate these three *E. faecium* genomes. This step will take a while, so feel free to oevrview the subsequent steps while this step runs.

In [None]:
!prokka --outdir SRR16146183 /content/genomes/SRR16146183.fasta
!prokka --outdir SRR11910820 /content/genomes/SRR11910820.fasta
!prokka --outdir SRR16866601 /content/genomes/SRR16866601.fasta

[22:30:34] This is prokka 1.14.6
[22:30:34] Written by Torsten Seemann <torsten.seemann@gmail.com>
[22:30:34] Homepage is https://github.com/tseemann/prokka
[22:30:34] Local time is Tue May 28 22:30:34 2024
[22:30:34] You are not telling me who you are!
[22:30:34] Operating system is linux
[22:30:34] You have BioPerl 1.7.8
Argument "1.7.8" isn't numeric in numeric lt (<) at /usr/local/bin/prokka line 259.
[22:30:34] System has 2 cores.
[22:30:34] Option --cpu asked for 8 cores, but system only has 2
[22:30:34] Will use maximum of 2 cores.
[22:30:34] Annotating as >>> Bacteria <<<
[22:30:34] Generating locus_tag from '/content/genomes/SRR16146183.fasta' contents.
[22:30:34] Setting --locustag KIEFAGIL from MD5 42efa025cabb28705138cb0bc6e6f60b
[22:30:34] Creating new output folder: SRR16146183
[22:30:34] Running: mkdir -p SRR16146183
[22:30:34] Using filename prefix: PROKKA_05282024.XXX
[22:30:34] Setting HMMER_NCPU=1
[22:30:34] Writing log to: SRR16146183/PROKKA_05282024.log
[22:30:34] 

## Find genes associated with Vancomycin

Resistance to vancomycin requires carriage of the gene encoding the resistance protein (e.g. vanA)vancomycin resistance response regulator transcription factor (e.g van R-A).

Lets see if our sample genomes reflect resistance to vancomycin

In [None]:
!grep vanR-A /content/SRR16866601/PROKKA_05282024.gff
!grep vanA /content/SRR16866601/PROKKA_05282024.gff

grep: /content/SRR16866601/PROKKA_05282024.gff: No such file or directory
grep: /content/SRR16866601/PROKKA_05282024.gff: No such file or directory


In [None]:
!grep vanR-A /content/SRR11910820/PROKKA_05282024.gff
!grep vanA /content/SRR11910820/PROKKA_05282024.gff

In [None]:
!grep vanR-A /content/SRR16146183/PROKKA_05282024.gff
!grep vanA /content/SRR16146183/PROKKA_05282024.gff

NODE_80_length_7887_cov_70.835003	Prodigal:002006	CDS	1254	2285	.	+	0	ID=JOBCPLHH_02544;eC_number=6.1.2.1;Name=vanA;gene=vanA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P25051;locus_tag=JOBCPLHH_02544;product=Vancomycin/teicoplanin A-type resistance protein VanA
NODE_93_length_5335_cov_35.576806	Prodigal:002006	CDS	3191	4222	.	+	0	ID=KIEFAGIL_02668;eC_number=6.1.2.1;Name=vanA;gene=vanA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P25051;locus_tag=KIEFAGIL_02668;product=Vancomycin/teicoplanin A-type resistance protein VanA


## Discussion

Seems like two genomes of the sample dataset carry vanA but not the regulating varR-A.

This may indicate that none of these genomes carry resistance against Vancomycin. Nevertheless, Prokka is a generalist annotating tool; perhaps we should annotate these genomes with Bakta, which is a tool developed to specifically annoate bacterial genomes.

Bakta, similarly to Prokka, can be installed with conda, but it requires downloading a large database Thus we will use a web-based annotation tool known as proksee.ca to implement bakta.

1. Which genomes carry *vanA* ?
2. In which contigs do they carry vanA ?


# Web based

## Loading Genomes

Open your web browser and go to [Proksee](https://https://proksee.ca/). Here you will be able to load your genomes (you will need to create a new project for each genome).

> Note: If you do not create an account, your projects will be removed after 7 days.

* Click on `Browse`
* Once the genome fastanfile is selected, click on `Create Map`



![<img src="Browse">](https://raw.githubusercontent.com/WCSCourses/ACORN-ClinAMR/main/course_data/4_June_Day_5/browse.jpg)



After processing your genome files, Proksee will show the contigs in circular form.

> **This does not mean the software has circularised your genome. The order of the contigs in the circle is meaningless**

## Running Bakta

In the right panel, you will find Bakta at the top under `Genome Annotation` tools. Click on the `Start` button.




![](https://raw.githubusercontent.com/WCSCourses/ACORN-ClinAMR/main/course_data/4_June_Day_5/bakta.jpg)

A window will appear with options.

* Select Gram positive under `Gram type`
* Click on `OK`

![](https://raw.githubusercontent.com/WCSCourses/ACORN-ClinAMR/main/course_data/4_June_Day_5/start_bakta.jpg)

This will trigger start of the annotation process for your input genome. You can wait for the process to finish or open a new tab and start the annotation process for the other genomes.

Upon completion you will be shown the Bakta results. You can choose to download the annoation results in `gff3`, `tsv`, `json`, or `gbff` formats.

## Visualising the Annotation

For this excercise, we will add the annotation results to the circular genome view by clicking on `Add Features to Map`.

![](https://raw.githubusercontent.com/WCSCourses/ACORN-ClinAMR/main/course_data/4_June_Day_5/add_map.jpg)

Then give the annotation results a Track name of your choice. Next click on `OK`.

![](https://raw.githubusercontent.com/WCSCourses/ACORN-ClinAMR/main/course_data/4_June_Day_5/track.jpg)

This will load the results of the Bakta annotation over the contigs. Now you can explore the genetic content of the input genomes by zooming in and out of the regions of interest.

## Final Discussion
When you annotated your genomes using Prokka, you were able to identify which contigs carry vanA.

1. Can you find vanA in the genomes annotated with Bakta?
2. Do the results from Bakta validate Prokka results? Hint: You can also run Prokka in Proksee
3. Can you confirm whether any of these genomes carry resistance to vancomycin. Hint: Sometimes the regulator vanR-A can be misannotated as walR