# Miscellaneous code and commands used in our studyof intrinsic resistance in enterococci

### Data retrieval, preprocessing and quality checks

- Dowload metadata from *Enterococcus* genome entries in the NCBI Assembly database

```bash
./utils/download_ncbi_assembly_metadata.sh enterococcus enterococcus_ncbi_assembly_metadata.tsv
```

- Download the assembly fasta and gff files

```python
import os
with open('enterococcus_ncbi_assembly_metadata.tsv', 'r') as f:
    for line in f:
        flds = line.strip().split('\t')
        acc = flds[6]
        asm = flds[4]
        asm_name = acc + '_' + asm
        ftp_url = flds[17]
        fna_url = ftp_url + '/' + asm_name + '_' + '_genomic.fna.gz'
        faa_url = ftp_url + '/' + asm_name + '_' + '_protein.faa.gz'
        gff_url = ftp_url + '/' + asm_name + '_' + '_genomic.gff.gz'

        os.system('wget {} -P data/ncbi/genome'.format(fna_url))
        os.system('wget {} -P data/ncbi/protein'.format(faa_url))
        os.system('wget {} -P data/ncbi/annotation'.format(gff_url))
```

- Run CheckM to calculate assembly metrics
```bash
checkm analyze firmicutes.ms data/ncbi/genome checkm_out -t 8 --ali --nt -x fa
```

- Run HMMer against the Phylosift database
```python
import os
for f in os.listdir('data/ncbi/protein'):
    file_name = os.path.basename(f)
    genome = file_name.replace('.faa', '')
    cmd = "hmmsearch --tblout phylosift_out/{}.out PhyloSift.hmm {}".format(genome, f)
    os.system(cmd)
```

### Genome annotation using prokka

- Run prokka on genome assemblies with *Enterococcus* specific reference genes```

```python
import os
for f in os.listdir('data/ncbi/genome'):
    fna_file = os.path.basename(f)
    genome = fna_file.replace('.fna', '')
    cmd = 'prokka --outdir {} --prefix {} --centre X --addgenes --locustag {} --cpus 12 --compliant --proteins data/enterococcus_refgenes.faa {}'.format(genome, genome, genome, fna_file)
    os.system(cmd)
```

