# Assay Binding analysis
Analyze how well my PCR primers and probes match the sequences I've found in my samples.

## Initialization, configuration and utility functions

In [None]:
from Bio.Seq import Seq
from Bio import SeqIO
from Bio import Align

%load_ext autoreload
%autoreload 1
%aimport RCUtils

primers = RCUtils.readPrimers("RespiCovPrimers.fasta", display=True)

aligner = Align.PairwiseAligner(mode='local', match_score=1, mismatch_score=0, gap_score=-1)

def printSeqBinding(path, format="fastq"):
    # TODO: Try to print a semi-global alignment. Can use global with end_gap_score=0    
    record = SeqIO.read(path, format)
    hits = RCUtils.computePrimerHits(record, primers, allowOverlaps=True)
    for hit in sorted(hits, key=lambda hit: hit.primer.id):
        print ("%s len=%d match=%d%% [%d:%d]" % (hit.primer.id, len(hit.primer.seq), 100*hit.mr, hit.start, hit.end))
        if hit.mr < 1:
            a = aligner.align(record.seq, hit.primer.seq, strand="-" if hit.rev else "+")[0]
            if a.coordinates[1][0] > 0:
                if a.coordinates[0][0] == 0:
                    print ("  Primer falls %d bases off the start of the sequence" % a.coordinates[1][0])
                else:
                    print ("  Primer mismatch in first %d bases" % a.coordinates[1][0])
            pt = len(hit.primer.seq) - a.coordinates[1][-1]
            if pt > 0:
                if a.coordinates[0][-1] == len(record):
                    print ("  Primer falls %d bases off the end of the sequence" % pt)
                else:
                    print ("  Primer mismatch in the last %d bases" % pt)
            print(a)

## Summary


In [None]:
import glob
from collections import defaultdict
import pandas as pd

# Show a table of primer match scores for each sequence
table = dict()
for path in glob.glob("myseqs/*.fastq"):
    record = SeqIO.read(path, "fastq")
    hits = RCUtils.computePrimerHits(record, primers, allowOverlaps=True)
    scores = dict()
    for hit in hits:
        # Get the primer name without the suffix
        pname = hit.primer.id.split("-")[0]
        if pname in scores:
            scores[pname] *= hit.mr
        else:
            scores[pname] = hit.mr
        
    table[record.id] = scores

df = pd.DataFrame.from_dict(table, orient='index')
df *= 100
df = df.round(0).astype(int)
df

Overall we see HRVMa is a pretty poor match for everything except S28, which matches qPCR experimental results. ENTng and ENTrc are generally both good.

# S28 - Rhinovirus A-23

In [None]:
printSeqBinding("myseqs/S28-RVA-23.fastq")

## S44 - Rhinovirus A-56

In [None]:
printSeqBinding("myseqs/S44-RVA-56.fastq")

## S48-RC-1 - Rhinovirus C-1

In [None]:
printSeqBinding("myseqs/S48-RVC-1.fastq")

Here we see the critical final base of the HRVMA-f primer mismatches our S48 RVC-1 sequence, and there are three other mismatches. This explains why I couldn't get S48 to test positive via HRV qPCR. However, it does seem to be a perfect match for ENTng, so I still don't know why I had so much trouble with that assay being unreliable for this sample.

## RefSeq Rhinovirus C-1

Test against the full C-1 genome since earlier primers and probes weren't designed for Rhinovirus C.

In [None]:
printSeqBinding("refseq/Rhinovirus-C1.gb", "gb")

The KRV ka primer sets match perfectly.