In addition to the original script, 2_generate_TP, this script does a blast search of selected_tp_genes against temp_neg_genomes and filters out all genomes that have a hit. In addition to 2.1_generate_TP, this script also does an hmm search, using an alignment of selected_tp_genes as query against negative genomes. It combines results of blast and hmm searches and removes all samples with hits from the pool of neg_genomes to choose from. One further parameter is required to specify an hmm evalue cutoff for samples to remove. Ideally, this would be 0, but I decided to keep a cutoff, as the original negative genomes from the sugimoto paper also have positive hits for an oxyN hmm.


This script generates the 2 remaining files required for input:
- True positive genes (nucleotide fasta)
- protein alignment (amino acid fasta)
IMPORTANTLY, these 2 files are created from non-overlapping sequences!


And also moves the corresponding files to new directories so that coverage tables can be generated from them.

In the first cell specify:
- BGC type (This name must stay constant throughout the scripts)
- select_neg_genomes, i.e. the amount of negative genomes to be transferred to the neg_genomes directory
- select_pos_genomes, i.e. the amount of positive genomes to be transferred to the pos_genomes directory and to generate the tp_genes file from (the surplus amount will be used to generate the protein alignment from)
- pos_isolation_source_filter, if these terms are found in the isolation_source column of the positive samples in the summary file, they will be scored higher in a scoring column, i.e. samples from a known and desired isolation source will be used preferentially.
- neg_isolation_source_filter, accordingly
- avoid_list. These terms are scored with a 0, end at the bottom of the table, and will be picked last. This is useful when an uncommon gene is searched for and more, and/or more tenuous isolation sources have been allowed during download. These are generally words that contain one of the search terms, e.g. 'sea' in 'diseased'.

Modify in such a way that TP genes are used as query against all individual negative genomes. Negative genomes are only moved from temp directory to neg_genomes directory if the blast search comes back negative

In [1]:
BGC_type = 'nitrile_hydratase_alpha'
select_neg_genomes = 140
select_pos_genomes = 10
hmm_evalue_cutoff = 3

avoid_list = ['', 'isolation_source not annotated', 'diseased', 'mice', 'spice', 'septicemic', 'research', 'crevice']
#these are identical to first script, but don't have to be
pos_isolation_source_filter =  ['marine', 'sea', 'sponge', 'ocean', 'porifera', 'seafloor','sediment', 'water', 'tidal', 'coral', 'reef', 'coast', 'ship', 'fish', 'aquaculture', 'atlantic', 'pacific', 'mediterranean', 'baltic', 'pond', 'river', 'ice', 'carribean', 'lake', 'fjord', 'marina', 'hydro', 'algal', 'algae']
neg_isolation_source_filter = ['marine', 'sea', 'sponge', 'ocean', 'porifera', 'seafloor', 'sediment', 'water', 'tidal', 'coral', 'reef', 'coast', 'ship', 'fish', 'aquaculture', 'atlantic', 'pacific', 'mediterranean', 'baltic', 'pond', 'river', 'ice', 'carribean', 'lake', 'fjord', 'marina', 'hydro', 'algal', 'algae']

In [2]:
import os
from os import listdir, mkdir
from os.path import isfile, join
from pathlib import Path
import pandas as pd
from pandas.errors import EmptyDataError
import random
import glob
import warnings
from Bio import SearchIO

In [3]:
def makedir(dirpath):
    if os.path.isdir(dirpath):
        print(dirpath,'exists already')
    else:
        print('Making', dirpath)    
        os.mkdir(dirpath)

        
# Defining paths for required directory structure for input and output files relative to parent directory
#parent_dir='/media/manu/RiPP_Prioritiser/'
#will make directories relative to the path the notebook was opened in
parent_dir= !echo $(pwd)
BGC_path=os.path.join(parent_dir[0], BGC_type)
neg_genomes_path=os.path.join(BGC_path, 'base_genomes/temp_neg_genomes')
pos_genomes_path=os.path.join(BGC_path, 'base_genomes/temp_pos_genomes')
output_neg_path=os.path.join(BGC_path, 'base_genomes/neg_genomes')
output_pos_path=os.path.join(BGC_path, 'base_genomes/pos_genomes')
#qc-paths. Put all in one directory eventually
neg_blast_path=os.path.join(BGC_path, 'base_genomes/neg_blast')
neg_blast_db_path=os.path.join(BGC_path, 'base_genomes/neg_blast/databases')
neg_blast_results_path=os.path.join(BGC_path, 'base_genomes/neg_blast/results')
neg_hmm_path=os.path.join(BGC_path, 'base_genomes/neg_hmm')
neg_hmm_db_path=os.path.join(BGC_path, 'base_genomes/neg_hmm/databases')
neg_hmm_results_path=os.path.join(BGC_path, 'base_genomes/neg_hmm/results')


# Calling function to make directories if they don't exist yet
makedir(output_neg_path)
makedir(output_pos_path)
makedir(neg_blast_path)
makedir(neg_blast_db_path)
makedir(neg_blast_results_path)
makedir(neg_hmm_path)
makedir(neg_hmm_db_path)
makedir(neg_hmm_results_path)

os.chdir(BGC_path)

Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast
Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases
Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/results
Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm
Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases
Making /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results


In [4]:
# Generating a report file for this script
with open(BGC_path+'/'+'report_2_generate_tp.txt', 'w') as f:
    f.write('Output directory is: '+BGC_path+'\n')
    f.write('\nBGC_type = '+BGC_type)
    f.write('\nselect_neg_genomes = '+str(select_neg_genomes))
    f.write('\nselect_pos_genomes = '+str(select_pos_genomes))
    f.write('\nhmm_evalue_cutoff = '+str(hmm_evalue_cutoff))
    f.write('\navoid_list = '+str(avoid_list))
    f.write('\nneg_isolation_source_filter = '+str(neg_isolation_source_filter))
    f.write('\npos_isolation_source_filter = '+str(pos_isolation_source_filter)+'\n')

In [5]:
# load summary table into data frame () output from 1.)
summary_file = pd.read_csv('summary.tsv', sep='\t')

Change order of tables to prioritize samples that have an isolation source

In [6]:
warnings.filterwarnings('ignore')

#filter positives and drop all duplicate protein sequences originating from different organisms
pos_mask = (summary_file['dir'] == '+')
pos_df = summary_file[pos_mask]
pos_df.drop_duplicates(subset='protein_id', keep=False, inplace=True)


#filter negatives
neg_mask = (summary_file['dir'] == '-')
neg_df = summary_file[neg_mask]

#scoring words in isolation source so as to preferentially pick samples with chosen isolation sources

def custom_sorting(source,isolation_source_filter):
    score = 1
    if isolation_source_filter=='pos':
        for word in pos_isolation_source_filter:
            if word in source:
                score +=1
        for word in avoid_list:
            if source == word:
                score=0
    elif isolation_source_filter=='neg':
        for word in neg_isolation_source_filter:
            if word in source:
                score +=1
        for word in avoid_list:
            if source == word:
                score=0
    return score


pos_df['scoring_column'] = pos_df.apply(lambda x: custom_sorting(x['isolation_source'],'pos'),axis=1)
neg_df['scoring_column'] = neg_df.apply(lambda x: custom_sorting(x['isolation_source'],'neg'),axis=1)

pos_df.sort_values(by=['scoring_column'], axis=0, ascending=False, inplace=True)
neg_df.sort_values(by=['scoring_column'], axis=0, ascending=False, inplace=True)

In [7]:
#Split positive genomes into 2 bins, one goes towards tp-genes and is the pos-genomes used for synthesising metagenomes
#the other one constitutes a source of protein sequences for alignment as an input file

# Genomes selected in such a way that they are from the top of the pre-sorted pos_df
unique_pos_df = pos_df.drop_duplicates(subset='assembly', inplace=False)
selected_tp_genomes = list(unique_pos_df.iloc[:,1])[0:select_pos_genomes]
remaining_pos_genomes = list(unique_pos_df.iloc[:,1])[select_pos_genomes:]

with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\nselected_tp_genomes are:\n')
    f.write(str(selected_tp_genomes)+'\n')

#select genomes and isolate GCF number from them, move selected tp genomes to final pos_genomes directory
for genome in selected_tp_genomes:
    print('moving positive', genome, 'to', output_pos_path)
    !mv "{pos_genomes_path}"/"{genome}"* "{output_pos_path}"
    
#generate dataframe containing all tp-genomes and all the tp-genes contained in it
filtered_pos_df = pos_df[pos_df['assembly'].isin(selected_tp_genomes)]
remaining_pos_df = pos_df[~pos_df['assembly'].isin(selected_tp_genomes)]

#isolate all the headers and transfer them to the selected_tp_genes file
full_header_list = []
for i in range(0,len(filtered_pos_df)):
    full_header=str('>')+filtered_pos_df.iloc[i,1]+str('_')+filtered_pos_df.iloc[i,3]+str('_')+filtered_pos_df.iloc[i,5]
    full_header_list.append(full_header)

# generate fasta file with selected tp genes found in the selected genomes
print('generating selected_tp_genes.fasta')
with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\nselected_tp_genes in positive genomes are:\n')
tp_gene_counter=0
with open(BGC_path+'/'+BGC_type+'_tp_genes.fasta') as fh:
    lines=fh.readlines()
    for i in range(0,len(lines)):
        for j in range(0,len(full_header_list)):
            if full_header_list[j] in lines[i]:
                tp_gene_counter+=1
                with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
                    f.write(lines[i][1:-1]+'\n')
                with open(BGC_path+'/'+BGC_type+'_selected_tp_genes.fasta', 'a') as outfile:
                    outfile.write(lines[i]+lines[i+1])    
with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n'+str(len(selected_tp_genomes))+' unique genomes with '+ str(tp_gene_counter)+' unique tp genes.\n\n')
                    
                    
# transfer all amino acid sequences that are not part of the tp-genomes to a fasta file
print('generating selected_tp_aa.fasta')
with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\nselected_tp_aa sequences for muscle alignment are:\n')
tp_aa_counter = 0
with open(BGC_path+'/'+BGC_type+'_selected_tp_aa.fasta', 'a') as outfile:
    for i in range(0,len(remaining_pos_df)):
        tp_aa_counter+=1
        fasta_header=str('>')+remaining_pos_df.iloc[i,1]+str('_')+remaining_pos_df.iloc[i,3]+str('_')+remaining_pos_df.iloc[i,5]+'\n'
        sequence = remaining_pos_df.iloc[i,6][2:-2]+'\n'
        with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
            f.write(fasta_header[1:-1]+'\n')
        outfile.write(fasta_header)
        outfile.write(sequence)
with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n'+str(len(remaining_pos_genomes))+' unique genomes with '+ str(tp_aa_counter)+' unique aa sequences.\n\n')

    
print('Done')

moving positive GCF_002983865.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_009363795.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_009363595.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_011290545.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_011040495.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_000332275.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_000511355.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_002215215.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/pos_genomes
moving positive GCF_004006175.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_gen

In [8]:
#Use blast to find tp_gene contamination in negative samples
unique_neg_df = neg_df.drop_duplicates(subset='assembly', inplace=False)
# gets a list of length of specified amount of neg genomes
neg_genomes_list = list(unique_neg_df.iloc[:,1])

with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n'+'Using blastn to check negative genomes for contamination'+'\n')

# makes blast databases of all individual neg genomes (easier to keep track of accession numbers than when combining)
for genomes in neg_genomes_list:
    !makeblastdb -in "{neg_genomes_path}"/"{genomes}"* -dbtype nucl -out "{neg_blast_db_path}"/"{genomes}"_db

# runs blastn search of all TP genes against all blast databases of negative genomes
for genomes in neg_genomes_list:
    !blastn -db "{neg_blast_db_path}"/"{genomes}"_db -query "{BGC_path}"/"{BGC_type}"_selected_tp_genes.fasta -out "{neg_blast_results_path}"/"{genomes}".blastout -outfmt "6 qseqid sseqid pident evalue"

# use pandas to concatenate all blast output tables
df_list = []
for outfile in os.listdir(neg_blast_results_path):
    try:
        blast_df = pd.read_csv(neg_blast_results_path+'/'+outfile, sep='\t', names=['qseqid', 'sseqid', 'pident', 'evalue'], index_col=None)
        blast_df['subject_accession'] = '.'.join(outfile.split('.')[0:2])
        df_list.append(blast_df)
    except EmptyDataError:
        continue
        
# Generate a list of contaminated negative genomes        
remove_df = pd.concat(df_list)
unique_remove_df = remove_df.drop_duplicates(subset='subject_accession', inplace=False)
blast_remove_list = list(unique_remove_df.iloc[:,4])

with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n'+'Contamination with tp_seqs:\n'+str(list(remove_df.iloc[:,0]))+'\nsequences identified in negative samples:\n'+str(blast_remove_list)+'\n\n')

print('Done')



Building a new DB, current time: 07/26/2021 12:52:29
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_000520615.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_000520615.1_SOAPdenovo_v1.05_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 64 sequences in 0.0507581 seconds.


Building a new DB, current time: 07/26/2021 12:52:29
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_000355675.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_000355675.1_ASM35567v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.0281951 seconds.


Building a new DB, current time: 07/26/2021 12:52:29
New DB name:   /media/manu/RiPP_Pr

Adding sequences from FASTA; added 33 sequences in 0.045681 seconds.


Building a new DB, current time: 07/26/2021 12:52:33
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_004378355.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_004378355.1_ASM437835v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 116 sequences in 0.0368278 seconds.


Building a new DB, current time: 07/26/2021 12:52:33
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_003050175.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003050175.1_ASM305017v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 53 sequences in 0.044523 seconds.


Building a new DB, cu

Adding sequences from FASTA; added 1 sequences in 0.0276239 seconds.


Building a new DB, current time: 07/26/2021 12:52:36
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_001015055.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_001015055.1_ASM101505v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 136 sequences in 0.0537672 seconds.


Building a new DB, current time: 07/26/2021 12:52:36
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_004360145.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_004360145.1_ASM436014v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 33 sequences in 0.032711 seconds.


Building a new DB, cu



Building a new DB, current time: 07/26/2021 12:52:40
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_013363245.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_013363245.1_ASM1336324v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 30 sequences in 0.0219841 seconds.


Building a new DB, current time: 07/26/2021 12:52:40
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_002893095.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002893095.1_ASM289309v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 36 sequences in 0.0380611 seconds.


Building a new DB, current time: 07/26/2021 12:52:40
New DB name:   /media/manu/RiPP_Prio



Building a new DB, current time: 07/26/2021 12:52:44
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_004022565.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_004022565.1_ASM402256v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.0255692 seconds.


Building a new DB, current time: 07/26/2021 12:52:44
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_002127625.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002127625.1_ASM212762v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 106 sequences in 0.0678141 seconds.


Building a new DB, current time: 07/26/2021 12:52:44
New DB name:   /media/manu/RiPP_Prior

Adding sequences from FASTA; added 4 sequences in 0.041625 seconds.


Building a new DB, current time: 07/26/2021 12:52:48
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_000593345.2_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_000593345.2_vpVPTS2010_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 75 sequences in 0.058589 seconds.


Building a new DB, current time: 07/26/2021 12:52:48
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_008632455.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_008632455.1_ASM863245v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.0330241 seconds.


Building a new DB, curren



Building a new DB, current time: 07/26/2021 12:52:51
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_001870285.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_001870285.1_ASM187028v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 9 sequences in 0.024852 seconds.


Building a new DB, current time: 07/26/2021 12:52:52
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_002156885.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002156885.1_ASM215688v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 203 sequences in 0.058526 seconds.


Building a new DB, current time: 07/26/2021 12:52:52
New DB name:   /media/manu/RiPP_Priorit

Adding sequences from FASTA; added 297 sequences in 0.059077 seconds.


Building a new DB, current time: 07/26/2021 12:52:55
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_003351365.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003351365.1_ASM335136v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.0506411 seconds.


Building a new DB, current time: 07/26/2021 12:52:56
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_018282115.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_018282115.1_ASM1828211v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 4 sequences in 0.033469 seconds.


Building a new DB, cur



Building a new DB, current time: 07/26/2021 12:52:59
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_003854195.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003854195.1_ASM385419v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 129 sequences in 0.0743492 seconds.


Building a new DB, current time: 07/26/2021 12:52:59
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_015276975.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_015276975.1_ASM1527697v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 25 sequences in 0.022264 seconds.


Building a new DB, current time: 07/26/2021 12:53:00
New DB name:   /media/manu/RiPP_Prio

Adding sequences from FASTA; added 93 sequences in 0.0509231 seconds.


Building a new DB, current time: 07/26/2021 12:53:03
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_010367435.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_010367435.1_ASM1036743v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 74 sequences in 0.020951 seconds.


Building a new DB, current time: 07/26/2021 12:53:03
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_003574085.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003574085.1_ASM357408v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 224 sequences in 0.038991 seconds.


Building a new DB, c



Building a new DB, current time: 07/26/2021 12:53:07
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_005502615.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_005502615.1_ASM550261v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 103 sequences in 0.0479181 seconds.


Building a new DB, current time: 07/26/2021 12:53:07
New DB name:   /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_blast/databases/GCF_003265965.1_db
New DB title:  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003265965.1_ASM326596v1_genomic.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 324 sequences in 0.0467019 seconds.


Building a new DB, current time: 07/26/2021 12:53:08
New DB name:   /media/manu/RiPP_Pri

In [9]:
# Use hmmer to find tp_gene contamination in negative samples
# This is to some extent dependent on which tp genes are chosen. If fewer neg_genomes result from the decontamination
#     process than are specified by select_neg_genomes, it might be worth trying a different set of tp_genes

with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n'+'Using hmmer to check negative genomes for contamination'+'\n\n')

# Build net alignment of tp_genes
!muscle -in "{BGC_path}"/"{BGC_type}"_selected_tp_genes.fasta -out "{neg_hmm_db_path}"/"{BGC_type}"_selected_tp_gene_alignment.fasta -loga "{BGC_path}"/report_2_generate_tp.txt

with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n'+'Building nucleotide hmm of selected true positive genes'+'\n\n'+'Searching neg genomes with nucleotide hmm.\n\n')

# Build nucleotide based hmm of tp_genes alignment
!hmmbuild -n tp_nucl_aln --dna "{neg_hmm_db_path}"/"{BGC_type}"_tp_nucl.hmm "{neg_hmm_db_path}"/"{BGC_type}"_selected_tp_gene_alignment.fasta

# Search neg genomes with hmm
for genomes in neg_genomes_list:
    !hmmsearch --F1 0.02 --F2 0.02 --F3 0.02 --tblout "{neg_hmm_results_path}"/"{genomes}"_hmm_result.tbl "{neg_hmm_db_path}"/"{BGC_type}"_tp_nucl.hmm "{neg_genomes_path}"/"{genomes}"*
#    !nhmmer --F1 0.02 --F2 0.02 --F3 0.02 --tblout "{neg_hmm_results_path}"/"{genomes}"_hmm_result.tbl "{neg_hmm_db_path}"/"{BGC_type}"_tp_nucl.hmm "{neg_genomes_path}"/"{genomes}"*

# Parse hmm output
#https://stackoverflow.com/questions/62012615/convert-a-hmmer-tblout-output-to-a-pandas-dataframe

sample_list = []
id_list=[]
evalue_list=[]

for filename in os.listdir(neg_hmm_results_path):
    samplename = '_'.join(filename.split('_')[0:2])
    with open(neg_hmm_results_path+'/'+filename) as handle:
        for queryresult in SearchIO.parse(handle, 'hmmer3-tab'):
            for hit in queryresult.hits:
                sample_list.append(samplename)
                id_list.append(hit.id)
                evalue_list.append(hit.evalue)

# Generate hit_df based on found tp_hits in neg genomes
hmm_dict = {'sample': sample_list, 'target_name': id_list, 'evalue': evalue_list}             
hit_df = pd.DataFrame.from_dict(hmm_dict)
hit_df.to_csv(neg_hmm_path+'/all_hmm_hits.csv', index=False)

#filter hit_df by evalue cutoff
evalue_filter = hit_df['evalue'] <= hmm_evalue_cutoff
cutoff_hit_df = hit_df[evalue_filter]
cutoff_hit_df.to_csv(neg_hmm_path+'/below_cutoff_hits.csv', index=False)

#deduplicate samples
unique_hit_df = cutoff_hit_df.drop_duplicates(subset='sample', inplace=False)
unique_hit_df.to_csv(neg_hmm_path+'/removed_hmm_hits.csv', index=False)

# Generate a list of contaminated negative genomes with combined results from blast and hmm hits
hmm_remove_list = list(unique_hit_df.iloc[:,0])
remove_list = blast_remove_list
remove_list.extend(x for x in hmm_remove_list if x not in remove_list)

with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n'+'Contamination with tp_seqs found in samples:\n'+str(hmm_remove_list)+'\n\n'+'Found '+str(len(remove_list))+' total samples with tp contamination in neg genomes.')

# Remove these contaminated samples from the possible pool of negative sequences
cleaned = ~unique_neg_df['assembly'].isin(remove_list)
cleaned_unique_neg_df = unique_neg_df[cleaned]

# Select a preset number of negative genomes (this could lead to fewer genomes available than selected. Handle)
selected_neg_genomes = list(cleaned_unique_neg_df.iloc[:,1])[0:select_neg_genomes]

# report block
with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
    f.write('\n\n decontaminated selected_neg_genomes are:\n')
    f.write(str(selected_neg_genomes))

for genome in selected_neg_genomes:
    print('moving negative', genome, 'to', output_neg_path)
    !mv "{neg_genomes_path}"/"{genome}"* "{output_neg_path}"
    
print('Done')


MUSCLE v3.8.1551 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

nitrile_hydratase_alpha_selecte 11 seqs, lengths min 597, max 672, avg 627
00:00:00    15 MB(-3%)  Iter   1  100.00%  K-mer dist pass 1
00:00:00    15 MB(-3%)  Iter   1  100.00%  K-mer dist pass 2
00:00:00    22 MB(-4%)  Iter   1  100.00%  Align node       
00:00:00    22 MB(-4%)  Iter   1  100.00%  Root alignment
00:00:00    22 MB(-4%)  Iter   2  100.00%  Refine tree   
00:00:00    22 MB(-4%)  Iter   2  100.00%  Root alignment
00:00:00    22 MB(-4%)  Iter   2  100.00%  Root alignment
00:00:00    22 MB(-4%)  Iter   3  100.00%  Refine biparts
00:00:01    22 MB(-4%)  Iter   4  100.00%  Refine biparts
00:00:01    22 MB(-4%)  Iter   5  100.00%  Refine biparts
00:00:01    22 MB(-4%)  Iter   5  100.00%  Refine biparts
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.3 (Nov 2019); http://

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_000353285.1_ASM35328v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_000353285.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   -

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_013249065.1_ASM1324906v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_013249065.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_018199995.1_ASM1819999v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_018199995.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_009846755.1_ASM984675v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_009846755.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_016837255.1_ASM1683725v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_016837255.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003050175.1_ASM305017v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003050175.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                         24  (3337074 residues searched)
Passed MSV filter:                         8  (0.333333); expected 0.5 (0.02)
Passed bias filter:                        0  (0); expected 0.5 (0.02)
Passed Vit filter:                         0  (0); expected 0.5 (0.02)
Passed Fwd filter:                         0  (0); expected 0.5 (0.02)
Initial search space (Z):                 24 

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_004803475.1_ASM480347v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_004803475.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002021755.1_ASM202175v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002021755.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_006246515.1_ASM624651v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_006246515.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_017948225.1_ASM1794822v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_017948225.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_018500205.1_ASM1850020v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_018500205.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                         11  (3871907 residues searched)
Passed MSV filter:                         5  (0.454545); expected 0.2 (0.02)
Passed bias filter:                        0  (0); expected 0.2 (0.02)
Passed Vit filter:                         0  (0); expected 0.2 (0.02)
Passed Fwd filter:                         0  (0); expected 0.2 (0.02)
Initial search space (Z):                 11 

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_009377985.1_ASM937798v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_009377985.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                         77  (4583336 residues searched)
Passed MSV filter:                         1  (0.012987); expected 1.5 (0.02)
Passed bias filter:                        1  (0.012987); expected 1.5 (0.02)
Passed Vit filter:                         1  (0.012987); expected 1.5 (0.02)
Passed Fwd filter:                         0  (0); expected 1.5 (0.02)
Initial search space (Z):      

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                         22  (3888276 residues searched)
Passed MSV filter:                        11  (0.5); expected 0.4 (0.02)
Passed bias filter:                        0  (0); expected 0.4 (0.02)
Passed Vit filter:                         0  (0); expected 0.4 (0.02)
Passed Fwd filter:                         0  (0); expected 0.4 (0.02)
Initial search space (Z):                 22  [act

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------          -----------
    1.1e-95  312.5   4.3    4.3e-95  310.6   4.3    2.0  1  NZ_WQBL01000006.1  Ruegeria arenilitoris strain HKCCD8755 NOD


Domain annotation for each sequence (and alignments):
>> NZ_WQBL01000006.1  Ruegeria arenilitoris strain HKCCD8755 NODE_6, whole genome shotgun sequence
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  310.6   4.3   1.3e-96   4.3e-95      34     604 ..  171829  172399 ..  171817  172422 .. 0.96

  Alignments for each domain:
  == domain 1  score: 310.6 bits;  conditional E-value: 1.3e-96
        tp_nucl_aln    

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002882915.1_GW456-12-1-14-LB3_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002882915.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_004214755.1_ASM421475v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_004214755.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                          3  (5684538 residues searched)
Passed MSV filter:                         0  (0); expected 0.1 (0.02)
Passed bias filter:                        0  (0); expected 0.1 (0.02)
Passed Vit filter:                         0  (0); expected 0.1 (0.02)
Passed Fwd filter:                         0  (0); expected 0.1 (0.02)
Initial search spa

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_006374735.1_ASM637473v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_006374735.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_007859875.1_ASM785987v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_007859875.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002127625.1_ASM212762v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002127625.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_016812975.1_ASM1681297v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_016812975.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                          1  (6520772 residues searched)
Passed MSV filter:                         0  (0); expected 0.0 (0.02)
Passed bias filter:                        0  (0); expected 0.0 (0.02)
Passed Vit filter:                         0  (0); expected 0.0 (0.02)
Passed Fwd filter:                         0  (0); expected 0.0 (0.02)
Initial search space (Z):                  1  [actua

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                          1  (3671901 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        0  (0); expected 0.0 (0.02)
Passed Vit filter:                         0  (0); expected 0.0 (0.02)
Passed Fwd filter:                         0  (0); expected 0.0 (0.02)
Initial search space (Z):                  1  [actua

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_005280335.1_ASM528033v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_005280335.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                         72  (1734238 residues searched)
Passed MSV filter:                         0  (0); expected 1.4 (0.02)
Passed bias filter:                        0  (0); expected 1.4 (0.02)
Passed Vit filter:                         0  (0); expected 1.4 (0.02)
Passed Fwd filter:                         0  (0); expected 1.4 (0.02)
Initial search space (Z):                 72  [actua

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_008632455.1_ASM863245v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_008632455.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_006369625.1_ASM636962v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_006369625.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002929435.1_ASM292943v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002929435.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003340555.1_ASM334055v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003340555.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_000769825.1_ASM76982v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_000769825.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   -

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003296495.1_ASM329649v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003296495.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_001870285.1_ASM187028v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_001870285.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_013032445.1_ASM1303244v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_013032445.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                          2  (5036404 residues searched)
Passed MSV filter:                         0  (0); expected 0.0 (0.02)
Passed bias filter:                        0  (0); expected 0.0 (0.02)
Passed Vit filter:                         0  (0); expected 0.0 (0.02)
Passed Fwd filter:                         0  (0); expected 0.0 (0.02)
Initial search space (Z):                  2  [actua

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                         29  (4934421 residues searched)
Passed MSV filter:                         3  (0.103448); expected 0.6 (0.02)
Passed bias filter:                        0  (0); expected 0.6 (0.02)
Passed Vit filter:                         0  (0); expected 0.6 (0.02)
Passed Fwd filter:                         0  (0); expected 0.6 (0.02)
Initial search space (Z):                 29 

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_001707555.1_ASM170755v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_001707555.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_000400485.1_k0-k1250_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_000400485.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   ---

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_018282115.1_ASM1828211v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_018282115.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002881895.1_GW704-F2_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002881895.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   ---

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_900167585.1_16852_2_94_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_900167585.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   -

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002883245.1_GW101-3F01_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002883245.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   -

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002241775.2_ASM224177v2_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002241775.2_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003854195.1_ASM385419v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003854195.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_015680345.1_ASM1568034v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_015680345.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_014526385.1_ASM1452638v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_014526385.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_006369235.1_ASM636923v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_006369235.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_001507335.1_ASM150733v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_001507335.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_900379955.1_ICMP_6278_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_900379955.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   --

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_000143845.1_ASM14384v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_000143845.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   -

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003574085.1_ASM357408v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003574085.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (624 nodes)
Target sequences:                          6  (2224823 residues searched)
Passed MSV filter:                         0  (0); expected 0.1 (0.02)
Passed bias filter:                        0  (0); expected 0.1 (0.02)
Passed Vit filter:                         0  (0); expected 0.1 (0.02)
Passed Fwd filter:                         0  (0); expected 0.1 (0.02)
Initial search space (Z):                  6  [actua

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003698745.1_ASM369874v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003698745.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence          Description
    ------- ------ -----    ------- ------ -----   ---- --  --------          -----------
  ------ inclusion threshold ------
       0.11    3.6   0.0        0.2    2.8   0.0    1.5  1  NZ_SDRE01000107.1  Staphylococcus sp. SNAZ 75 s75_Ayuni_conti
        2.7   -1.0   0.0        4.7   -1.8   0.0    1.1  1  NZ_SDRE01000002.1  Staphylococcus sp. SNAZ 75 s75_Ayuni_conti


Domain annotation for each sequence (and alignments):
>> NZ_SDRE01000107.1  Staphylococcus sp. SNAZ 75 s75_Ayuni_contig_107, whole genome shotgun sequence
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 ?    2.8   0.0    0.0037       0.2     546     591 ..  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_013735435.1_ASM1373543v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_013735435.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_005502615.1_ASM550261v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_005502615.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_014534785.1_ASM1453478v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_014534785.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
  

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003839525.1_ASM383952v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003839525.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_002849795.1_ASM284979v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_002849795.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3 (Nov 2019); http://hmmer.org/
# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/databases/nitrile_hydratase_alpha_tp_nucl.hmm
# target sequence database:        /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/temp_neg_genomes/GCF_003353065.1_ASM335306v1_genomic.fna
# per-seq hits tabular output:     /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_hmm/results/GCF_003353065.1_hmm_result.tbl
# Vit filter P threshold:       <= 0.02
# Fwd filter P threshold:       <= 0.02
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       tp_nucl_aln  [M=624]
Scores for complete sequences (score includes all domains):
   

moving negative GCF_000294775.2 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_001951155.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_009789075.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_002573675.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_013363245.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_002893095.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_014164675.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_001888265.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_001296255.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_gen

moving negative GCF_003449015.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_001507335.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_003710255.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_900379955.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_000143845.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_010367435.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_003574085.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_015594835.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_genomes/neg_genomes
moving negative GCF_003967135.1 to /media/manu/RiPP_Prioritiser/nitrile_hydratase_alpha/base_gen

In [10]:
# checking out pos genomes with hmm build from tp genes for confirmation
#for genomes in selected_neg_genomes:
#    print(genomes)
#    !nhmmer --tblout "{neg_hmm_path}"/pos_results/"{genomes}"_hmm_result.tbl "{neg_hmm_db_path}"/"{BGC_type}"_tp_nucl.hmm "{output_neg_path}"/"{genomes}"*

The cell below generates a muscle alignment of the protein sequences from genomes that were not chosen as pos_genomes. However, in order to follow the procedure of the 2019 Sugimoto paper more closely, instead download the seed alignment for the respective protein of interest from pfam in fasta format and save is as {BGC_type}_selected_tp_alignment.fasta

In [11]:
#!module load MUSCLE/3.8.1551
#not sure why full path is required?
#with open(BGC_path+'/'+'report_2_generate_tp.txt', 'a') as f:
#    f.write('\n\nMUSCLE alignment details:\n')

#!muscle -in "{BGC_path}"/"{BGC_type}"_selected_tp_aa.fasta -out "{BGC_path}"/"{BGC_type}"_selected_tp_alignment.fasta -loga "{BGC_path}"/report_2_generate_tp.txt
#!module unload MUSCLE/3.8.1551

Introduce a step here where all negative genomes are concatenated, a blastdb is built from the concatenated file, and all selected TP genes are used to query the blast db. If there are any hits, remove that sequence from the selected_neg_genomes_list, and add another one, then check again.