# Miscellaneous code and commands used in our study of intrinsic resistance in enterococci

### Data retrieval, preprocessing and quality checks

- Dowload metadata from *Enterococcus* genome entries in the NCBI Assembly database

```bash
./utils/download_ncbi_assembly_metadata.sh enterococcus enterococcus_ncbi_assembly_metadata.tsv
```

- Download the assembly fasta and gff files

```python
import os
with open('data/enterococcus_ncbi_assembly_metadata.tsv', 'r') as f:
    for line in f:
        flds = line.strip().split('\t')
        acc = flds[6]
        asm = flds[4]
        asm_name = acc + '_' + asm
        ftp_url = flds[17]
        fna_url = ftp_url + '/' + asm_name + '_' + '_genomic.fna.gz'
        faa_url = ftp_url + '/' + asm_name + '_' + '_protein.faa.gz'
        gff_url = ftp_url + '/' + asm_name + '_' + '_genomic.gff.gz'

        os.system('wget {} -P data/ncbi/genome'.format(fna_url))
        os.system('wget {} -P data/ncbi/protein'.format(faa_url))
        os.system('wget {} -P data/ncbi/annotation'.format(gff_url))
```

- Run CheckM to calculate assembly metrics
```bash
checkm analyze firmicutes.ms data/ncbi/genome checkm_out -t 8 --ali --nt -x fa
```

- Run HMMer against the Phylosift database
```python
import os
for f in os.listdir('data/ncbi/protein'):
    file_name = os.path.basename(f)
    genome = file_name.replace('.faa', '')
    cmd = "hmmsearch --tblout output/phylosift_out/{}.out data/PhyloSift/PhyloSift.hmm {}".format(genome, f)
    os.system(cmd)
```

### Genome annotation 

- Run prokka on genome assemblies with *Enterococcus* specific reference genes```

```python
import os
for f in os.listdir('data/ncbi/genome'):
    fna_file = os.path.basename(f)
    genome = fna_file.replace('.fna', '')
    cmd = 'prokka --outdir {} --prefix {} --centre X --addgenes --locustag {} --cpus 12 --compliant --proteins data/enterococcus_refgenes.faa {}'.format(genome, genome, genome, fna_file)
    os.system(cmd)
```

- Run HMMer on the PFAM database (release 32)

```bash
hmmscan -E 1 --cpu 2 --domtblout genome.pfam.out data/pfam/Pfam-A.hmm data/annotation/prokka/genome/genome.faa 
```

### Build SCC phylogeny

- Run OrthoFinder
```bash
orthofinder -f output/orthofinder/input -t 32 -og
```
- Create SCC gene alignments and concatenate them
```python
./build_scc_phylogeny.py
```

- Run IQTREE on the SCC genes alignment file, use MFP to find the best model and calculate bootstrap values
```bash
iqtree -s SCCorthogroups_aligned_nuc.fasta --prefix iqtree_mfp -bb 1000 -wbtl -nt 32 -m MFP --seed 1 -spp SCCorthogroups_partition_file.txt
```

## Find mobile genetic elements 

To make sure we find as many mobile elements as we can, we made use of several methods and databases.

- TnFinder Database

We used the `tncentraldb` and `Tn3R` databases, and ran `blastn`
```bash
blastn -db data/tncentraldb/tncentraldb -query data/genome/genome.fna \
    -outfmt '6 qaccver ssciname stitle saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs' \
    -evalue 0.001 -num_threads 4 -out output/mge/blastn_tncentraldb_genome.out
```
- ISFinder Database

We used the ISfinder database and ran `blastn`
``` bash
blastn -db data/isfinderdb/IS -query data/genome/genome.fna \
    -outfmt '6 qaccver ssciname stitle saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs' \
    -evalue 0.001 -num_threads 4 -out output/mge/blastn_isfinderis_genome.out
```

- Ran MOBsuite (v3.0.3)

Since our assembly sequences are made up of multiple contigs/scaffolds, we used the `--multi` option of `mob_typer`
```bash
mob_typer --multi --infile data/genome/genoma.fna --out_file output/mge/mob_genome.out -n 2
```

- Ran PlasmidFinder (v2.1.1)

``` bash
plasmidfinder.py -i data/genome/genome.fna -o output/mge/plasmidfinder -x -p data/plasmidfinderdb
```
- DeepMicrobe

```bash
python bin/DeepMicrobeFinder/predict.py -i data/genome/genome.fna one-hot -d data/DeepMicrobeFinder/models/one-hot-models/ -m single -o genome -l 2000
```

- Ran Prophet (v0.5.1) to predict phage sequences

```bash
perl bin/ProphET-0.5.1/ProphET_standalone.pl --fasta_in data/genome/genome.fna --gff_in data/annotaion/prokka/genome.gff --outdir output/mge/prophet
```