# Data Environment Setup

Open the terminal and make a directory for the results of your analyses. In {your_name_here} put your user name

```
$ mkdir -p ~/storage/user_lab/{your_name_here}/annotations
```


in my case this would be:
```
$ mkdir -p ~/storage/user_lab/mpachiadaki/annotations
```


Within this directory create two more directories; one for prokka and one for DRAM

```
$ mkdir -p ~/storage/user_lab/{your_name_here}/annotations/prokka

$ mkdir -p ~/storage/user_lab/{your_name_here}/annotations/DRAM```

# Software Environment Setup

The softwares needed for this tutorial can be found in the conda env config files ./prokka_env.yml and ./dram_env.yml

They are pre-installed on the jupyter hub. 

To activate prokka in the terminal type: 

```
$ source activate prokka
```

# Run the analysis

## Prokka

We are going to analyse a small subset of SAGs with prokka. We are going to use the 19 genomes - belonging to the Alphaproteobactetial groups Pelagibacterales (SAR11) and HIMB59 - that you are going to use for the pangenomic analysis with Florian. The indentifiers for these are in: `~/storage/data/identifiers/AG-910_forpan.ids`. To see what is there you can type:

```
less ~/storage/data/identifiers/AG-910_forpan.ids
```

Before we start we can check the full menu of prokka by typing:

```
prokka --help
```

We will write a small bash loop to run one SAG at the time. The `${i}` denotes the variable in the following command. In every iteration of the loop the program will use on of the identifiers provided by the file ~/storage/data/identifiers/AG-910_forpan.ids

```
$ for i in $(cat ~/storage/data/identifiers/AG-910_forpan.ids); do prokka --outdir ~/storage/user_lab/mpachiadaki/annotations/prokka/${i} --prefix ${i} --locustag ${i} --cpus 4 ~/storage/data/contigs/AG-910/${i}_contigs.fasta; done
```

While we are waiting for the results we can check out the various file formats that prokka produces.


In class, we run the command for our favorite SAG only. Here is an example for mine which is AG-910-K02

```
$ prokka --outdir ~/storage/user_lab/mpachiadaki/annotations/prokka/AG-910-K02 --prefix AG-910-K02 --locustag AG-910-K02 --cpus 4 ~/storage/data/contigs/AG-910/AG-910-K02_contigs.fasta
```

## DRAM

Before we start this analysis, let's deactivate the previous conda environment and activate DRAM

```
$ conda deactivate
$ source activate DRAM
```

DRAM takes a lot of time to run. So each of you will only run it in one SAGs. Pick your favorite or a random SAG identifier (in my case AG-910-K02) and type the following command. We will modify the --threads parameter to 1 (in order for all our jobs to run in parallel; you can increase this if you are working in your own cluster). We have also modified the --min_contig_size to 2000 in order for the software to annotate all our contigs (the default is 2500).

```
$ DRAM.py annotate -i ~/storage/data/contigs/AG-910/AG-910-K02_contigs.fasta --min_contig_size 2000 --threads 1 -o ~/storage/user_lab/mpachiadaki/annotations/DRAM/AG-910-K02
```

After the annotation is done we can run the destillation (it produces summaries of our results).

```
$ DRAM.py distill -i ~/storage/user_lab/mpachiadaki/annotations/DRAM/AG-910-K02/annotations.tsv -o  ~/storage/user_lab/mpachiadaki/annotations/DRAM/AG-910-K02/summary --rrna_path  ~/storage/user_lab/mpachiadaki/annotations/DRAM/AG-910-K02/AG-910-K02/rrnas.tsv --trna_path  ~/storage/user_lab/mpachiadaki/annotations/DRAM/AG-910-K02/AG-910-K02/trnas.tsv
```

If your SAG didn't have any rrnas.tsv you should omit this flag (--rrna_path  ~/storage/user_lab/mpachiadaki/annotations/DRAM/AG-910-K02/AG-910-K02/rrnas.tsv). It is always good practice to check your output files by typing `ls -lh ~/storage/user_lab/mpachiadaki/annotations/DRAM/AG-910-K02` before proceeding. Also remember to help the menu before writing your command. For DRAM you can write:

```
DRAM.py annotate --help
DRAM.py distill --help
```