# Notes:

In building this model, I have separated out running code for **new** phages into a separate package called `bacphlip`. To ensure that the `bacphlip` code-base gives the same results as when it's all thrown together here, I'm running a brief triple check to ensure that if I treat my "test set" of phages as if they were totally new, the results are equivalent. 

It's probably a good idea to run this with any updates.

# Imports

In [1]:
import pandas as pd
import os
import shutil
import glob
import numpy as np
import subprocess

import bacphlip

**Create a temporary data to house everything**

In [2]:
temp_dir = '/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/'
if not os.path.exists(temp_dir):
    os.mkdir(temp_dir)

**Copy over all of the test set `fasta` files**

In [3]:
orig_fasta_path = '../Data/phage_data_nmicro2017/phage_fasta_files/'
test_df = pd.read_csv('../Data/classifier_data/test_df.csv', index_col=0)
for i in test_df['Identifier_AJH']:
    shutil.copyfile(orig_fasta_path+i+'.fasta', temp_dir+i+'.fasta')

**Run `bacphlip` as a python library**

In [4]:
for i in test_df['Identifier_AJH'][:]:
    fasta_file = temp_dir+i+'.fasta'
    print(fasta_file)
    bacphlip.run_pipeline('{}'.format(fasta_file), force_overwrite=False)

/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagleeye.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagleeye.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagleeye.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagleeye.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagleeye.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/seagreen.fasta
#########

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/michellemybell.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/michellemybell.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/michellemybell.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/arturo.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/arturo.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024125.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024125.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024125.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/graduation.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/graduation.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Project

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_015296.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_015296.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_015296.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_015296.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/pg1.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenbe

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018831.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018831.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018831.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_003313.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_003313.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/acadian.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/acadian.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/acadian.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_006552.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_006552.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphl

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/stinger.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/stinger.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/stinger.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/kratio.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/kratio.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-mod

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_002747.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_002747.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_002747.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/makemake.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/makemake.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/ba

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007021.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007021.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007021.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007021.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/rap15.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhocken

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024121.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024121.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024121.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019457.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019457.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_009382.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_009382.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_009382.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024124.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_024124.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/athena.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/athena.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/athena.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/athena.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/trouble.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Proj

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/donovan.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/donovan.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/donovan.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/trixie.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/trixie.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-mod

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/kita.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/kita.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/kita.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/kita.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_009811.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/b

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/meezee.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/meezee.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/meezee.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_021531.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_021531.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019916.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019916.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019916.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_011534.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_011534.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/unionjack.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/unionjack.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/unionjack.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_013693.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_013693.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/murucutumbu.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/murucutumbu.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/murucutumbu.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/murucutumbu.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/sneeze.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/a

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/fang.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/fang.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/fang.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_020078.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_020078.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_011356.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_011356.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_011356.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/conspiracy.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/conspiracy.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Project

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019399.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019399.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019399.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_008717.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_008717.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/snenia.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/snenia.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/snenia.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/snenia.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_004827.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Pr

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/florinda.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/florinda.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/florinda.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019542.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019542.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bac

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/shilan.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/shilan.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/shilan.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/shilan.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_005354.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Pr

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_014661.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_014661.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_014661.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_016073.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_016073.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/solon.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/solon.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/solon.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/trike.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/trike.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/D

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001416.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001416.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001416.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/wheeler.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/wheeler.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacp

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001835.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001835.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001835.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_023743.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_023743.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007804.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007804.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007804.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_005135.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_005135.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/aeneas.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/aeneas.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/aeneas.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/holli.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/holli.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-de

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/poptart.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/poptart.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/poptart.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/nasiatalie.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/nasiatalie.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacp

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagle.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagle.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/eagle.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/cheetobro.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/cheetobro.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-mod

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/myrna.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/myrna.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/myrna.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/myrna.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/lamina13.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Project

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/ava3.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/ava3.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/ava3.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/arv1.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/arv1.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/b

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/zoej.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/zoej.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/zoej.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/hertubise.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/hertubise.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019501.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019501.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019501.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_003291.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_003291.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/momo.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/momo.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/momo.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/momo.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/cooper.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacp

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007581.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007581.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_007581.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/wayne.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/wayne.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/gumbie.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/gumbie.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/gumbie.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001901.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_001901.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_000872.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_000872.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_000872.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_011308.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_011308.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_005178.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_005178.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_005178.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/jobu08.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/jobu08.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphl

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019709.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019709.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_019709.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_020204.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_020204.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/

Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018087.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018087.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018087.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018087.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_023856.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamho

Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018837.fasta.hmmsearch
Finished converting hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018837.fasta.hmmsearch.tsv
Finished with BACPHLIP predictions! Final output file stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/NC_018837.fasta.bacphlip
#################################################################################
/Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/wally.fasta
#################################################################################
Beginning BACPHLIP pipeline
Finished six frame translation of genome (nucleotide) with output stored in /Users/adamhockenberry/Projects/bacphlip-model-dev/Data/bacphlip_test/wally.fasta.6frame
Finished outside call to hmmsearch with output stored in /Users/adamhockenberry/Projects/bacphlip

# Compare results!

**First, load in the test set dataframe and predict lifestyle for each from the pre-trained classifier**

In [5]:
###Load classifier model
import joblib
clf = joblib.load('../Data/classifier_data/rf_highMinAJH.joblib')
test_df = pd.read_csv('../Data/classifier_data/test_df.csv', index_col=0)
test_res = clf.predict_proba(test_df[test_df.columns[23:]])
a = list(zip(*test_res))[0]
b = list(zip(*test_res))[1]
test_df['Virulent'] = a
test_df['Temperate'] = b

**Quickly double check the accuracy of those predictions**

In [13]:
test_df.columns[:23]

Index(['Virus identifier used for the analysis', 'Database source',
       'RefSeq header source description', 'RefSeq accession number',
       'Genome type', 'Order', 'Family', 'Host domain', 'Host phylum',
       'Host class', 'Host order', 'Host family', 'Host genus', 'Cluster',
       'Subcluster', 'Genome size',
       'Number of genes used by Phamerator for creating phams',
       'Used to optimize Mash parameters',
       'Known to encode toxins or virulence factors', 'Temperate (empirical)',
       'Temperate (bioinformatically predicted)', 'Evolutionary mode',
       'Identifier_AJH'],
      dtype='object')

In [14]:
test_df['my_pred'] = 'no'
test_df.at[test_df[test_df['Virulent'] < test_df['Temperate']].index, 'my_pred'] = 'yes'

In [20]:
1-(test_df[test_df['Temperate (empirical)'] != test_df['my_pred']].shape[0] / test_df.shape[0])

0.983451536643026

**Now ensure that the `bacphlip` results are identical**

In [6]:
bacphlip_dir = '../Data/bacphlip_test/'
successes = 0
for index in test_df.index[:]:
    ident = test_df.loc[index]['Identifier_AJH']
    bacphlip_out = bacphlip_dir + ident + '.fasta.bacphlip'
    res_df = pd.read_csv(bacphlip_out, sep='\t')
    vals = res_df.loc[0].values
    assert np.isclose(vals[0], test_df.loc[index]['Virulent'])
    assert np.isclose(vals[1], test_df.loc[index]['Temperate'])
    successes += 1
print(successes, len(test_df.index))

423 423
