We identified that the majority of the BFClust runtime is spent during Boundary-Tree construction, specifically during the sequence comparison step.     
In order to see whether the runtime of BFClust can be further optimized, we compare the time it takes to make a 500x500 pairwise comparison (using the first 500 CDS from *Streptococcus pneumoniae* Strain TIGR4) using either Biopython (what BFClust currently uses in v.0.1.26.2) or blastp. Blastp appears to be 2 orders of magnitude faster. Therefore, swapping the sequence comparison step to include the compiled blastp application can lead to reduced runtime. 

In [1]:
from Bio import SeqIO
from Bio import Align
from Bio.Align import substitution_matrices
from Bio import pairwise2 as pw2
import os
import time

In [2]:
records = list(SeqIO.parse("TIGR4.fasta", "fasta"))
n = len(records)

In [3]:
aligner = Align.PairwiseAligner()
aligner.substitution_matrix = substitution_matrices.load("BLOSUM62")
aligner.open_gap_score = -0.5
aligner.extend_gap_score = -0.1

Runtime using Biopython 

In [4]:
start_time = time.time()
for i in range(n):
    for j in range(n):
        score = aligner.score(records[i].seq, records[j].seq)
runtime = time.time() - start_time
print(runtime)

132.70679783821106


Runtime using blastp

In [5]:
start_time = time.time()
os.system('blastp -query TIGR.fasta -subject TIGR.fasta')
runtime = time.time() - start_time
print(runtime)

0.12857365608215332
