# RespiCoV sequencing analysis by primer and known sequence

Analyze fastq file(s) from nanopore sequencing output using [RespiCoV](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0264855) primers and known sequences to match against. Designed for ligation sequencting chemistry.

Run here on my second RespiCoV sequencing attempt where I knew (from gel) that I had a lot of mis-priming, and where my flow cell got clogged and wasn't able to generate a lot of reads.

**Goals:**
 * Precisely identify target matches per input sample given known sequences
 * Run quickly on a single machine and scale linearly with input sequence
 * Enable iterative exploration of the data (cache the most expensive operations)
 * Support multiple targets per sample (pooled samples)

**TODO:**
 
**Non-goals / future work elsewhere:**
 * Use reads which only include one (or even zero) primers, eg. for tagmentation chemistry
 * Analyze mis-priming or PCR efficiency (see RCMatchPrimers).

## Initialization and configuration

In [None]:
from Bio.Seq import Seq
from Bio import SeqIO
import matplotlib_inline.backend_inline
import os
import pandas as pd
import RCUtils

# Get high-dpi output for retina displays
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')

fastQBaseDir = "../RespiCov-2/20230430_1642_MN41817_APC888_8b249272/fastq_pass/"

pd.options.display.max_rows = 50
pd.options.display.min_rows = 25

primers = RCUtils.readPrimers("RespiCovPrimers.fasta")
print("Read %i primers" % (len(primers)))

# Store primer indicies for efficient serialization
for i, primer in enumerate(primers):
    primer.index = i
    primer.baseName = primer.description[:primer.description.rindex(' ')]