# Eventalign_collapse usage

This program collapses the raw file generated by `nanopolish eventalign` by kmers rather than by event.

### Output file format

Contrary to `nanopolish eventalign` output text file, in `Eventalign_collapse` the reads are separated by a hashtag headers containing the read_id and ref_id. This reduces the redundancy and makes it easier to find the start and end of a read.

Example : ```#7ef1d7b9-5824-4382-b23b-78d82c07ebbd	YHR055C.```

The main data file contains the following fields:

* **ref_pos**: Reference sequence ID (contig).
* **ref_kmer**: Sequence of the reference kmers.
* **num_events**: Number of events for this kmer before collapsing.
* **dwell_time**: dwell time for this kmer in seconds
* **NNNNN_dwell_time**: dwell time of events for this kmers with a model sequence "NNNNN" (events ignored by nanopolish HMM).
* **mismatch_dwell_time**:  dwell time of events for this kmers with a model sequence different from the reference kmer
* **start_idx**: Only if nanopolish eventalign called with --signal_idx. Start coordinate on original raw signal in fast5 file
* **end_idx**: Only if nanopolish eventalign called with --signal_idx. End coordinate on original raw signal in fast5 file
* **mean**: Only if nanopolish eventalign called with --samples. Mean of the normalised signal values provided by Nanopolish eventalign
* **median**: Only if nanopolish eventalign called with --samples. Median of the normalised signal values provided by Nanopolish eventalign
* **std**: Only if nanopolish eventalign called with --samples. Standard deviation of the normalised signal values provided by Nanopolish eventalign
* **mad**: Only if nanopolish eventalign called with --samples. Median absolute deviation of the normalised signal values provided by Nanopolish eventalign
* **num_signals**: Only if nanopolish eventalign called with --samples. Number of raw signal points.
* **samples**: Only if nanopolish eventalign called with --samples and Eventalign_collapse called with --write_samples. List of normalised signal intensity values for this kmer

In addition `Eventalign_collapse` also generates an useful index file containing reads level information. It contains the following fields:

* **read_id**: Name or index of the read
* **ref_id**: Name of the reference sequence the read was aligned on (contig)
* **ref_start**: Start coordinate of the alignment on the reference sequence
* **ref_end**: End coordinate of the alignment on the reference sequence
* **dwell_time**: Cumulative dwell time in seconds for the entire resquiggled sequence
* **kmers**: Overall number of resquiggled kmers
* **NNNNN_kmers**: Number of resquiggled kmers containing at least 1 event for which the model sequence was "NNNNN"
* **mismatching_kmers**: Number of resquiggled kmers containing at least 1 event for which the model sequence diverged from the reference sequence
* **missing_kmers**: Number of skipped/missing reference positions in nanopolish output
* **byte_offset**: Number of characters before the start of the sequence in the main output file. **This can be used in conjunction with file.seek() to directly access the start of a read**. An example is provided in the Usage notebook.
* **byte_len**: Length of characters after byte_offset to the end of the read, excluding the last newline. **This can be used in conjunction with read() to read all the text chunk corresponding to the read**.

## Bash command line usage

### Command line help

In [1]:
%%bash
source ~/.bashrc && workon Nanopolish_0.11.1

NanopolishComp Eventalign_collapse --help

usage: NanopolishComp Eventalign_collapse [-h] -o OUTPUT_FN [-i INPUT_FN] [-s]
                                          [-r MAX_READS]
                                          [-f STAT_FIELDS [STAT_FIELDS ...]]
                                          [-v | -q] [-t THREADS]

Collapse the nanopolish eventalign output at kmers level and compute kmer
level statistics

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase verbosity (default: False)
  -q, --quiet           Reduce verbosity (default: False)

Input/Output options:
  -o OUTPUT_FN, --output_fn OUTPUT_FN
                        Path the output eventalign collapsed tsv file
  -i INPUT_FN, --input_fn INPUT_FN
                        Path to a nanopolish eventalign tsv output file. If
                        '0' read from std input (default: 0)

Run parameters options:
  -s, --write_samples   If given, will write the raw sample if nanopolish
                        eventalig

### Example usage

#### From an existing nanopolish eventalign output to a file

In [2]:
%%bash
source ~/.bashrc && workon Nanopolish_0.11.1

NanopolishComp Eventalign_collapse -i ./data/nanopolish_reads.tsv -o ./output/nanopolish_collapsed_reads.tsv
head ./output/nanopolish_collapsed_reads.tsv
head ./output/nanopolish_collapsed_reads.tsv.idx

#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time
1656	GAAAA	1	0.00266	0.0	0.0
1657	AAAAC	1	0.00764	0.0	0.0
1658	AAACA	1	0.00398	0.0	0.0
1659	AACAA	1	0.00432	0.0	0.0
1660	ACAAA	1	0.00498	0.0	0.0
1661	CAAAG	1	0.00564	0.0	0.0
1662	AAAGA	1	0.00963	0.0	0.0
1663	AAGAT	1	0.00299	0.0	0.0
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YGR240C	1656	2960	0	1250	13.788570000000009	35	0	54	0	38028
YCR030C	1578	2576	1	971	11.487010000000005	23	0	27	38029	29704
YHR174W	0	839	2	825	9.659210000000002	15	0	14	67734	24231
YHR174W	218	1309	3	1028	11.06325000000001	36	0	63	91966	30801
YHR174W	462	1309	4	818	10.73776000000001	18	0	29	122768	24589
YLR441C	173	764	5	554	5.556939999999999	20	0	37	147358	16437
YGR192C	1	989	6	927	11.731470000000003	37	0	61	163796	27388
YDR500C	9	252	8	231	2.9179100000000027	7	0	14	191185	6751
YDR224C	0	389	9	376	4.2840300000000004	8	0	13	197937	10935


Checking arguments
Starting to process files
0 reads [00:00, ? reads/s]13 reads [00:00, 115.97 reads/s]21 reads [00:00, 140.37 reads/s]
[Eventalign_collapse] total reads: 21 [138.37 reads/s]



#### From standard input to a file

In [3]:
%%bash
source ~/.bashrc && workon Nanopolish_0.10.1

cat ./data/nanopolish_reads_index.tsv | NanopolishComp Eventalign_collapse -o ./output/nanopolish_collapsed_reads.tsv --verbose
head ./output/nanopolish_collapsed_reads.tsv
head ./output/nanopolish_collapsed_reads.tsv.idx

#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	start_idx	end_idx
1656	GAAAA	1	0.00266	0.0	0.0	63446	63454
1657	AAAAC	1	0.00764	0.0	0.0	63423	63446
1658	AAACA	1	0.00398	0.0	0.0	63411	63423
1659	AACAA	1	0.00432	0.0	0.0	63398	63411
1660	ACAAA	1	0.00498	0.0	0.0	63383	63398
1661	CAAAG	1	0.00564	0.0	0.0	63366	63383
1662	AAAGA	1	0.00963	0.0	0.0	63337	63366
1663	AAGAT	1	0.00299	0.0	0.0	63328	63337
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YGR240C	1656	2960	0	1250	13.788570000000009	35	0	54	0	53046
YCR030C	1578	2576	1	971	11.487010000000005	23	0	27	53047	41374
YHR174W	0	839	2	825	9.659210000000002	15	0	14	94422	33595
YHR174W	218	1309	3	1028	11.06325000000001	36	0	63	128018	43155
YHR174W	462	1309	4	818	10.73776000000001	18	0	29	171174	34423
YLR441C	173	764	5	554	5.556939999999999	20	0	37	205598	23103
YGR192C	1	989	6	927	11.731470000000003	37	0	61	228702	38529
YDR500C	9	252	8	231	2.917910

Checking arguments
	Testing input file readability
	Testing output file writability
	Checking number of threads
	Checking if stat_fields names are valid
Starting to process files
	[split_reads] Start reading input file/stream
	[process_read 1] Starting processing reads
	[process_read 2] Starting processing reads
	[write_output] Start rwriting output
0 reads [00:00, ? reads/s]5 reads [00:00, 49.68 reads/s]13 reads [00:00, 50.50 reads/s]	[split_reads] Done
	[process_read 1] Done
	[process_read 2] Done
21 reads [00:00, 67.87 reads/s]
	[write_output] Done
[Eventalign_collapse] total reads: 21 [67.34 reads/s]



#### On the fly, from nanopolish eventalign to a file

In [4]:
%%bash
source ~/.bashrc && workon Nanopolish_0.10.1

nanopolish eventalign -t 4 --samples --scale-events --print-read-name --reads ./data/reads.fastq --bam ./data/aligned_reads.bam --genome ./data/reference.fa | NanopolishComp Eventalign_collapse  -o ./output/nanopolish_collapsed_reads.tsv
head ./output/nanopolish_collapsed_reads.tsv
head ./output/nanopolish_collapsed_reads.tsv.idx

#9a1c5296-2ab1-4abd-8d50-e059754cf332	YCR030C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	mean	median	num_signals
1578	CCACC	1	0.00365	0.0	0.0	70.06341552734375	70.67729949951172	11
1579	CACCC	15	0.09761000000000002	0.0	0.0	74.59468841552734	75.08149719238281	294
1580	ACCCT	1	0.00232	0.0	0.0	67.23785400390625	67.88800048828125	7
1581	CCCTC	1	0.00232	0.0	0.0	72.0615005493164	72.58580017089844	7
1582	CCTCA	5	0.0156	0.0	0.0	73.5634994506836	73.76029968261719	47
1583	CTCAA	1	0.00398	0.0	0.0	77.58948516845703	78.16450500488281	12
1584	TCAAG	1	0.01096	0.0	0.0	83.52072143554688	84.03679656982422	33
1585	CAAGG	1	0.00232	0.0	0.0	105.23970794677734	104.2959976196289	7
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YCR030C	1578	2576	9a1c5296-2ab1-4abd-8d50-e059754cf332	971	11.487010000000005	23	0	27	0	67812
YHR174W	0	839	3784283c-47cc-48ac-8d7b-7efd32123b56	825	9.659210000000002	15	0	14	67813	56569
YG

Checking arguments
Starting to process files
0 reads [00:00, ? reads/s]1 reads [00:00,  4.05 reads/s]2 reads [00:00,  4.30 reads/s]3 reads [00:00,  4.39 reads/s]4 reads [00:00,  4.36 reads/s]6 reads [00:00,  4.68 reads/s]7 reads [00:01,  4.90 reads/s]10 reads [00:01,  5.32 reads/s]12 reads [00:01,  5.61 reads/s]14 reads [00:01,  5.92 reads/s][post-run summary] total reads: 21, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 0
17 reads [00:01,  6.44 reads/s]20 reads [00:01,  6.98 reads/s]21 reads [00:01, 12.12 reads/s]
[Eventalign_collapse] total reads: 21 [12.1 reads/s]



## Python API usage

### Import the package

In [5]:
from NanopolishComp.Eventalign_collapse import Eventalign_collapse
from NanopolishComp.common import jhelp

### python API help

In [6]:
jhelp(Eventalign_collapse)

---

**NanopolishComp.Eventalign_collapse.__init__**

Collapse the nanopolish eventalign output by kmers rather that by events. kmer level statistics (mean, median, std, mad) are only computed if nanopolish is run with --samples option

---

* **output_fn** *: str (required)*

Path the output eventalign collapsed tsv file

* **input_fn** *: str (required)*

Path to a nanopolish eventalign tsv output file.

* **max_reads** *: int (default = None)*

Maximum number of read to parse. 0 to deactivate (default = 0)

* **write_samples** *: bool (default = False)*

If given, will write the raw sample if nanopolish eventalign was ran with --samples option

* **stat_fields** *: list of str (default = ['mean', 'median', 'num_signals'])*

List of statistical fields to compute if nanopolish eventalign was ran with --samples option. Valid values = "mean", "std", "median", "mad", "num_signals"

* **threads** *: int (default = 4)*

Total number of threads. 1 thread is used for the reader and 1 for the writer (default = 4)

* **verbose** *: bool (default = False)*

Increase verbosity

* **quiet** *: bool (default = False)*

Reduce verbosity



### Example usage

#### Example with minimal file

In [7]:
input_fn = "./data/nanopolish_reads.tsv"
output_fn = "./output/collapsed_nanopolish.tsv"
index_fn = "./output/collapsed_nanopolish.tsv.idx"

Eventalign_collapse (input_fn=input_fn, output_fn=output_fn, threads=6, verbose=True)

print ("Collapsed eventalign file")
! head {output_fn} -n 10

print ("Index file")
! head {index_fn} -n 10

Checking arguments
	Testing input file readability
	Testing output file writability
	Checking number of threads
	Checking if stat_fields names are valid
Starting to process files
	[process_read 1] Starting processing reads
	[process_read 2] Starting processing reads
	[split_reads] Start reading input file/stream
	[process_read 3] Starting processing reads
	[process_read 4] Starting processing reads
	[write_output] Start rwriting output
8 reads [00:00, 66.82 reads/s]	[split_reads] Done
	[process_read 4] Done
	[process_read 2] Done
	[process_read 3] Done
	[process_read 1] Done
21 reads [00:00, 70.23 reads/s]
	[write_output] Done
[Eventalign_collapse] total reads: 21 [91.02 reads/s]



Collapsed eventalign file
#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time
1656	GAAAA	1	0.00266	0.0	0.0
1657	AAAAC	1	0.00764	0.0	0.0
1658	AAACA	1	0.00398	0.0	0.0
1659	AACAA	1	0.00432	0.0	0.0
1660	ACAAA	1	0.00498	0.0	0.0
1661	CAAAG	1	0.00564	0.0	0.0
1662	AAAGA	1	0.00963	0.0	0.0
1663	AAGAT	1	0.00299	0.0	0.0
Index file
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YGR240C	1656	2960	0	1250	13.788570000000009	35	0	54	0	38028
YCR030C	1578	2576	1	971	11.487010000000005	23	0	27	38029	29704
YHR174W	218	1309	3	1028	11.06325000000001	36	0	63	67734	30801
YHR174W	0	839	2	825	9.659210000000002	15	0	14	98536	24231
YHR174W	462	1309	4	818	10.73776000000001	18	0	29	122768	24589
YLR441C	173	764	5	554	5.556939999999999	20	0	37	147358	16437
YGR192C	1	989	6	927	11.731470000000003	37	0	61	163796	27388
YDR500C	9	252	8	231	2.9179100000000027	7	0	14	191185	6751
YGR192C	3	995	7	946	12.30464000000001	31	0	50	19

#### Example with indexes

In [8]:
input_fn = "./data/nanopolish_reads_index.tsv"
output_fn = "./output/collapsed_nanopolish_index.tsv"
index_fn = "./output/collapsed_nanopolish_index.tsv.idx"

Eventalign_collapse (input_fn=input_fn, output_fn=output_fn, threads=6)

print ("\nCollapsed eventalign file")
! head {output_fn} -n 10

print ("\nIndex file")
! head {index_fn} -n 10

Checking arguments
Starting to process files
21 reads [00:00, 71.87 reads/s]
[Eventalign_collapse] total reads: 21 [69.6 reads/s]




Collapsed eventalign file
#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	start_idx	end_idx
1656	GAAAA	1	0.00266	0.0	0.0	63446	63454
1657	AAAAC	1	0.00764	0.0	0.0	63423	63446
1658	AAACA	1	0.00398	0.0	0.0	63411	63423
1659	AACAA	1	0.00432	0.0	0.0	63398	63411
1660	ACAAA	1	0.00498	0.0	0.0	63383	63398
1661	CAAAG	1	0.00564	0.0	0.0	63366	63383
1662	AAAGA	1	0.00963	0.0	0.0	63337	63366
1663	AAGAT	1	0.00299	0.0	0.0	63328	63337

Index file
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YGR240C	1656	2960	0	1250	13.788570000000009	35	0	54	0	53046
YCR030C	1578	2576	1	971	11.487010000000005	23	0	27	53047	41374
YHR174W	0	839	2	825	9.659210000000002	15	0	14	94422	33595
YHR174W	218	1309	3	1028	11.06325000000001	36	0	63	128018	43155
YHR174W	462	1309	4	818	10.73776000000001	18	0	29	171174	34423
YLR441C	173	764	5	554	5.556939999999999	20	0	37	205598	23103
YGR192C	1	989	6	927	11.731470000000003	37	0	61	22

#### Example including samples

In [9]:
input_fn = "./data/nanopolish_reads_samples.tsv"
output_fn = "./output/collapsed_nanopolish_samples.tsv"
index_fn = "./output/collapsed_nanopolish_samples.tsv.idx"

Eventalign_collapse (input_fn=input_fn, output_fn=output_fn, threads=6, quiet=True, stat_fields=["mean", "std", "median", "mad", "num_signals"])

print ("\nCollapsed eventalign file")
! head {output_fn} -n 10

print ("\nIndex file")
! head {index_fn} -n 10

[Eventalign_collapse] total reads: 21 [12.43 reads/s]




Collapsed eventalign file
#1	YCR030C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	mean	std	median	mad	num_signals
1578	CCACC	1	0.00365	0.0	0.0	70.06341552734375	2.593737840652466	70.67729949951172	1.4680023193359375	11
1579	CACCC	15	0.09761000000000002	0.0	0.0	74.59468841552734	3.2702603340148926	75.08149719238281	2.4958038330078125	294
1580	ACCCT	1	0.00232	0.0	0.0	67.23785400390625	1.7063778638839722	67.88800048828125	2.2021026611328125	7
1581	CCCTC	1	0.00232	0.0	0.0	72.0615005493164	1.1475614309310913	72.58580017089844	1.0277023315429688	7
1582	CCTCA	5	0.0156	0.0	0.0	73.5634994506836	2.3691813945770264	73.76029968261719	1.614898681640625	47
1583	CTCAA	1	0.00398	0.0	0.0	77.58948516845703	1.5072672367095947	78.16450500488281	0.7339935302734375	12
1584	TCAAG	1	0.01096	0.0	0.0	83.52072143554688	3.1505558490753174	84.03679656982422	2.3488998413085938	33
1585	CAAGG	1	0.00232	0.0	0.0	105.23970794677734	3.632479667663574	104.2959976196289	1.7620010375976562	7


#### Example including samples and writing samples values in ouput file

In [10]:
input_fn = "./data/nanopolish_reads_samples.tsv"
output_fn = "./output/collapsed_nanopolish_samples_all.tsv"
index_fn = "./output/collapsed_nanopolish_samples_all.tsv.idx"

Eventalign_collapse (input_fn=input_fn, output_fn=output_fn, threads=6, write_samples=True)

print ("\nCollapsed eventalign file")
! head {output_fn} -n 10

print ("\nIndex file")
! head {index_fn} -n 10

Checking arguments
Starting to process files
21 reads [00:01,  4.82 reads/s]
[Eventalign_collapse] total reads: 21 [20.24 reads/s]




Collapsed eventalign file
#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	mean	median	num_signals	samples
1656	GAAAA	1	0.00266	0.0	0.0	105.49774932861328	105.4800033569336	8	104.343,106.049,106.617,105.196,105.48,107.47,103.347,105.48
1657	AAAAC	1	0.00764	0.0	0.0	111.09274291992188	111.0250015258789	23	112.873,106.617,110.03,114.579,111.736,113.726,111.309,109.745,113.157,109.034,111.025,112.162,112.162,110.172,110.598,110.74,111.025,109.461,110.883,112.02,111.878,110.314,109.887
1658	AAACA	1	0.00398	0.0	0.0	100.11283111572266	99.50865173339844	12	100.077,97.5182,98.0869,97.0917,97.5182,99.3665,97.9447,99.6508,100.362,105.764,102.636,105.338
1659	AACAA	1	0.00432	0.0	0.0	90.71566009521484	89.12989807128906	13	86.4285,87.1394,86.8551,89.1299,87.9925,91.5468,90.9781,95.2434,95.6699,97.376,89.1299,94.3903,87.4238
1660	ACAAA	1	0.00498	0.0	0.0	81.67988586425781	81.59459686279297	15	81.7368,79.6041,80.7415,82.5898,84.7224,80.8837,86.2864,82.3055,81.5946

## Using the index to random access a specific entry in the file

In [11]:
output_fn = "./output/collapsed_nanopolish_samples.tsv"
index_fn = "./output/collapsed_nanopolish_samples.tsv.idx"

# Import the index in a pandas dataframe (because it is simple)
import pandas as pd 
index_df = pd.read_csv (index_fn, sep="\t")

# Select random lines
random_lines = index_df.sample(5)
print ("Random index lines")
display (random_lines)

# Open the collapsed event align file
with open (output_fn) as fp:
    for id, read in random_lines.iterrows():
        
        # Access the header corresponding to the randomly selected index line using seek 
        fp.seek(0) # Return to file start
        fp.seek(read.byte_offset) # Move to the offset indicated in the index file
        print (fp.readline().rstrip()) # Print read header
        df = pd.read_csv (fp, nrows=read.kmers, sep="\t") # Read lines corresponding to the read
        with pd.option_context("display.max_rows",4): # display first and last lines
            display(df)

Random index lines


Unnamed: 0,ref_id,ref_start,ref_end,read_id,kmers,dwell_time,NNNNN_kmers,mismatch_kmers,missing_kmers,byte_offset,byte_len
17,YKL060C,897,1076,19,171,4.61332,8,0,8,1220041,18486
8,YGR192C,3,995,7,946,12.30464,31,0,50,700611,100085
19,YGL076C,10,731,18,695,8.95742,21,0,26,1316719,73568
4,YLR441C,173,764,5,554,5.55694,20,0,37,432770,58654
3,YHR174W,218,1309,3,1028,11.06325,36,0,63,323818,108951


#19	YKL060C


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,mean,std,median,mad,num_signals
0,897,ATCAG,1,0.00863,0.00000,0.0,79.211464,3.275715,79.250900,1.757504,26
1,898,TCAGA,1,0.00299,0.00000,0.0,111.056442,4.227791,111.398003,1.757996,9
...,...,...,...,...,...,...,...,...,...,...,...
169,1074,TTATA,1,0.01295,0.00000,0.0,90.905457,2.273109,90.454803,1.611000,39
170,1075,TATAA,3,0.03054,0.01959,0.0,117.605080,8.927938,120.039001,5.418999,92


#7	YGR192C


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,mean,std,median,mad,num_signals
0,3,GTTAG,1,0.00564,0.0000,0.0,82.788704,2.058451,83.222099,1.155602,17
1,4,TTAGA,1,0.00730,0.0000,0.0,100.523682,3.785173,99.689751,1.661201,22
...,...,...,...,...,...,...,...,...,...,...,...
944,989,CAAGG,7,0.05709,0.0073,0.0,102.884529,6.507542,101.422997,3.466698,172
945,994,CTTAA,1,0.00232,0.0000,0.0,98.905594,9.529233,94.200500,6.211502,7


#18	YGL076C


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,mean,std,median,mad,num_signals
0,10,AAAAA,1,0.00697,0.00000,0.0,107.097427,1.975842,106.890999,1.443001,21
1,11,AAAAA,1,0.01959,0.00000,0.0,109.596329,2.148095,109.777000,1.298004,59
...,...,...,...,...,...,...,...,...,...,...,...
693,729,AACTA,6,0.03187,0.00000,0.0,91.310394,4.251484,90.007103,1.587395,96
694,730,ACTAA,6,0.04283,0.02523,0.0,100.259720,8.799801,103.861000,5.772003,129


#5	YLR441C


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,mean,std,median,mad,num_signals
0,173,AGATG,4,0.02655,0.0,0.0,131.997757,8.953995,134.468002,5.272995,80
1,174,GATGC,1,0.00365,0.0,0.0,75.093773,3.819872,74.248596,2.497498,11
...,...,...,...,...,...,...,...,...,...,...,...
552,762,GTGTA,7,0.03618,0.0,0.0,93.325340,8.111859,92.980400,4.578796,109
553,763,TGTAA,4,0.01560,0.0,0.0,104.724258,3.458026,104.774002,2.082001,47


#3	YHR174W


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,mean,std,median,mad,num_signals
0,218,TGCTG,1,0.00730,0.0,0.0,104.062309,5.644166,103.998001,3.615997,22
1,219,GCTGC,1,0.00498,0.0,0.0,80.874825,1.808162,80.600700,1.134399,15
...,...,...,...,...,...,...,...,...,...,...,...
1026,1307,GTTGT,2,0.00863,0.0,0.0,83.054901,2.571079,83.365799,1.843399,26
1027,1308,TTGTA,10,0.05312,0.0,0.0,91.716957,6.146588,90.455803,2.197903,160
