# Eventalign_collapse usage

This program collapses the raw file generated by `nanopolish eventalign` by kmers rather than by event.

## Options

* **input_fn**

Path to a `nanopolish eventalign` tsv output file (read access required). In command line mode it is also possible to pipe the output of the `nanopolish eventalign` directly into `Eventalign_collapse`.

* **outdir**

Path to an existing directory where to write all the output files generated by `Eventalign_collapse`(write access required). If the directory does not exist an error is raised.

* **outprefix**

Prefix for all the files generated by the program

* **write_samples**

Write the concatenated sample data values corresponding to each kmer in the output data file. This options make only sense if `nanopolish eventalign` was ran with `--samples` option.
  
* **max_reads**

Controls the maximum number of read to parse before stopping to read the input file. This could be usefull for testing or downsampling. 

* **stat_fields**

Specify the list of statistical analyses to perform on a kmer basis and add in the output data file. This will only be performed if `nanopolish eventalign` was ran with `--samples` option.

!!! note "Valid statistics fields"
    * mean (mean of signal intensity)
    * median (mean of signal intensity)
    * std (standard deviation of signal intensity))
    * mad (median absolute deviation of signal intensity))
    * num_signals (Number of raw signal data point)

* **threads**

`Eventalign_collapse` is multi threaded to speed up the data processing and keep pace with Nanopolish if using the direct piping strategy. Take advantage of many threads if you have access to a large compute cluster

## Output files format

Contrary to `nanopolish eventalign` output text file, in `Eventalign_collapse` the reads are separated by a hashtag headers containing the read_id and ref_id. This reduces the redundancy and makes it easier to find the start and end of a read.

Example : ```#7ef1d7b9-5824-4382-b23b-78d82c07ebbd	YHR055C.```

The main data file contains the following fields:

* **ref_pos**: Reference sequence ID (contig).
* **ref_kmer**: Sequence of the reference kmers.
* **num_events**: Number of events for this kmer before collapsing.
* **dwell_time**: dwell time for this kmer in seconds
* **NNNNN_dwell_time**: dwell time of events for this kmers with a model sequence "NNNNN" (events ignored by nanopolish HMM).
* **mismatch_dwell_time**:  dwell time of events for this kmers with a model sequence different from the reference kmer
* **start_idx**: Only if nanopolish eventalign called with --signal_idx. Start coordinate on original raw signal in fast5 file
* **end_idx**: Only if nanopolish eventalign called with --signal_idx. End coordinate on original raw signal in fast5 file
* **mean**: Only if nanopolish eventalign called with --samples. Mean of the normalised signal values provided by Nanopolish eventalign
* **median**: Only if nanopolish eventalign called with --samples. Median of the normalised signal values provided by Nanopolish eventalign
* **std**: Only if nanopolish eventalign called with --samples. Standard deviation of the normalised signal values provided by Nanopolish eventalign
* **mad**: Only if nanopolish eventalign called with --samples. Median absolute deviation of the normalised signal values provided by Nanopolish eventalign
* **num_signals**: Only if nanopolish eventalign called with --samples. Number of raw signal points.
* **samples**: Only if nanopolish eventalign called with --samples and Eventalign_collapse called with --write_samples. List of normalised signal intensity values for this kmer

In addition `Eventalign_collapse` also generates an useful index file containing reads level information. It contains the following fields:

* **read_id**: Name or index of the read
* **ref_id**: Name of the reference sequence the read was aligned on (contig)
* **ref_start**: Start coordinate of the alignment on the reference sequence
* **ref_end**: End coordinate of the alignment on the reference sequence
* **dwell_time**: Cumulative dwell time in seconds for the entire resquiggled sequence
* **kmers**: Overall number of resquiggled kmers
* **NNNNN_kmers**: Number of resquiggled kmers containing at least 1 event for which the model sequence was "NNNNN"
* **mismatching_kmers**: Number of resquiggled kmers containing at least 1 event for which the model sequence diverged from the reference sequence
* **missing_kmers**: Number of skipped/missing reference positions in nanopolish output
* **byte_offset**: Number of characters before the start of the sequence in the main output file. **This can be used in conjunction with file.seek() to directly access the start of a read**. An example is provided in the Usage notebook.
* **byte_len**: Length of characters after byte_offset to the end of the read, excluding the last newline. **This can be used in conjunction with read() to read all the text chunk corresponding to the read**.

## Bash command line usage

### Command line help

In [1]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon Nanopolish_0.11.1

NanopolishComp Eventalign_collapse --help

usage: NanopolishComp Eventalign_collapse [-h] [-i INPUT_FN] [-o OUTDIR]
                                          [-p OUTPREFIX] [-s] [-r MAX_READS]
                                          [-f STAT_FIELDS [STAT_FIELDS ...]]
                                          [-t THREADS] [-v | -q]

Collapse the nanopolish eventalign output at kmers level and compute kmer
level statistics

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase verbosity (default: False)
  -q, --quiet           Reduce verbosity (default: False)

Input/Output options:
  -i INPUT_FN, --input_fn INPUT_FN
                        Path to a nanopolish eventalign tsv output file. If
                        '0' read from std input (default: 0)
  -o OUTDIR, --outdir OUTDIR
                        Path to the output folder (default: ./)
  -p OUTPREFIX, --outprefix OUTPREFIX
                        text outprefix for all the files generated (default:
                  

### Example usage

#### From an existing nanopolish eventalign output to a file

In [2]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon Nanopolish_0.11.1

NanopolishComp Eventalign_collapse -i ./data/eventalign_collapse//nanopolish_reads.tsv -o ./output/eventalign_collapse/
head ./output/eventalign_collapse/out_eventalign_collapse.tsv
head ./output/eventalign_collapse/out_eventalign_collapse.tsv.idx

#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time
1656	GAAAA	1	0.00266	0.0	0.0
1657	AAAAC	1	0.00764	0.0	0.0
1658	AAACA	1	0.00398	0.0	0.0
1659	AACAA	1	0.00432	0.0	0.0
1660	ACAAA	1	0.00498	0.0	0.0
1661	CAAAG	1	0.00564	0.0	0.0
1662	AAAGA	1	0.00963	0.0	0.0
1663	AAGAT	1	0.00299	0.0	0.0
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YGR240C	1656	2960	0	1250	13.788570000000009	35	0	54	0	38028
YCR030C	1578	2576	1	971	11.487010000000005	23	0	27	38029	29704
YHR174W	0	839	2	825	9.659210000000002	15	0	14	67734	24231
YHR174W	218	1309	3	1028	11.06325000000001	36	0	63	91966	30801
YHR174W	462	1309	4	818	10.73776000000001	18	0	29	122768	24589
YLR441C	173	764	5	554	5.556939999999999	20	0	37	147358	16437
YGR192C	1	989	6	927	11.731470000000003	37	0	61	163796	27388
YDR500C	9	252	8	231	2.9179100000000027	7	0	14	191185	6751
YGR192C	3	995	7	946	12.30464000000001	31	0	50	197937	28123


Checking arguments
Testing output dir writability
Starting to process files
0 reads [00:00, ? reads/s]13 reads [00:00, 106.55 reads/s]21 reads [00:00, 129.30 reads/s]
[Eventalign_collapse] total reads: 21 [127.11 reads/s]



#### From standard input to a file

In [3]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon Nanopolish_0.11.1

cat ./data/eventalign_collapse//nanopolish_reads_index.tsv | NanopolishComp Eventalign_collapse -o ./output --verbose
head ./output/eventalign_collapse/out_eventalign_collapse.tsv
head ./output/eventalign_collapse/out_eventalign_collapse.tsv.idx

#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time
1656	GAAAA	1	0.00266	0.0	0.0
1657	AAAAC	1	0.00764	0.0	0.0
1658	AAACA	1	0.00398	0.0	0.0
1659	AACAA	1	0.00432	0.0	0.0
1660	ACAAA	1	0.00498	0.0	0.0
1661	CAAAG	1	0.00564	0.0	0.0
1662	AAAGA	1	0.00963	0.0	0.0
1663	AAGAT	1	0.00299	0.0	0.0
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YGR240C	1656	2960	0	1250	13.788570000000009	35	0	54	0	38028
YCR030C	1578	2576	1	971	11.487010000000005	23	0	27	38029	29704
YHR174W	0	839	2	825	9.659210000000002	15	0	14	67734	24231
YHR174W	218	1309	3	1028	11.06325000000001	36	0	63	91966	30801
YHR174W	462	1309	4	818	10.73776000000001	18	0	29	122768	24589
YLR441C	173	764	5	554	5.556939999999999	20	0	37	147358	16437
YGR192C	1	989	6	927	11.731470000000003	37	0	61	163796	27388
YDR500C	9	252	8	231	2.9179100000000027	7	0	14	191185	6751
YGR192C	3	995	7	946	12.30464000000001	31	0	50	197937	28123


Checking arguments
	Testing input file readability
Testing output dir writability
	Checking number of threads
	Checking if stat_fields names are valid
Starting to process files
	[split_reads] Start reading input file/stream
	[process_read 1] Starting processing reads
	[process_read 2] Starting processing reads
	[write_output] Start rwriting output
0 reads [00:00, ? reads/s]8 reads [00:00, 78.08 reads/s]19 reads [00:00, 80.03 reads/s]	[split_reads] Done
	[process_read 1] Done
	[process_read 2] Done
21 reads [00:00, 96.20 reads/s]
	[write_output] Done
[Eventalign_collapse] total reads: 21 [95.11 reads/s]



#### On the fly, from nanopolish eventalign to a file

In [4]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon Nanopolish_0.11.1

nanopolish eventalign -t 4 --samples --scale-events --print-read-name --reads ./data/eventalign_collapse//reads.fastq --bam ./data/eventalign_collapse//aligned_reads.bam --genome ./data/eventalign_collapse//reference.fa | NanopolishComp Eventalign_collapse -o ./output
head ./output/eventalign_collapse/out_eventalign_collapse.tsv
head ./output/eventalign_collapse/out_eventalign_collapse.tsv.idx

#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time
1656	GAAAA	1	0.00266	0.0	0.0
1657	AAAAC	1	0.00764	0.0	0.0
1658	AAACA	1	0.00398	0.0	0.0
1659	AACAA	1	0.00432	0.0	0.0
1660	ACAAA	1	0.00498	0.0	0.0
1661	CAAAG	1	0.00564	0.0	0.0
1662	AAAGA	1	0.00963	0.0	0.0
1663	AAGAT	1	0.00299	0.0	0.0
ref_id	ref_start	ref_end	read_id	kmers	dwell_time	NNNNN_kmers	mismatch_kmers	missing_kmers	byte_offset	byte_len
YGR240C	1656	2960	0	1250	13.788570000000009	35	0	54	0	38028
YCR030C	1578	2576	1	971	11.487010000000005	23	0	27	38029	29704
YHR174W	0	839	2	825	9.659210000000002	15	0	14	67734	24231
YHR174W	218	1309	3	1028	11.06325000000001	36	0	63	91966	30801
YHR174W	462	1309	4	818	10.73776000000001	18	0	29	122768	24589
YLR441C	173	764	5	554	5.556939999999999	20	0	37	147358	16437
YGR192C	1	989	6	927	11.731470000000003	37	0	61	163796	27388
YDR500C	9	252	8	231	2.9179100000000027	7	0	14	191185	6751
YGR192C	3	995	7	946	12.30464000000001	31	0	50	197937	28123


HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140014833698560:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1546 in H5F_open(): unable to open file: time = Wed May  1 10:26:17 2019
, name = './data/fast5//20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_348_strand.fast5', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #004: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = './data/fast5//20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_348_strand.fast5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
    major: File accessibilty
    minor: Unable 

## Python API usage

### Import the package

In [14]:
# Import main program
from NanopolishComp.Eventalign_collapse import Eventalign_collapse

# Import helper functions
from NanopolishComp.common import jhelp, head

### python API help

In [6]:
jhelp(Eventalign_collapse)

---

**NanopolishComp.Eventalign_collapse.__init__**

Collapse the nanopolish eventalign output by kmers rather that by events. kmer level statistics (mean, median, std, mad) are only computed if nanopolish is run with --samples option

---

* **input_fn** *: str (required)*

Path to a nanopolish eventalign tsv output file.

* **outdir** *: str (default = ./)*

Path to the output folder

* **outprefix** *: str (default = out)*

text outprefix for all the files generated

* **max_reads** *: int (default = None)*

Maximum number of read to parse. 0 to deactivate (default = 0)

* **write_samples** *: bool (default = False)*

If given, will write the raw sample if nanopolish eventalign was ran with --samples option

* **stat_fields** *: list of str (default = ['mean', 'median', 'num_signals'])*

List of statistical fields to compute if nanopolish eventalign was ran with --samples option. Valid values = "mean", "std", "median", "mad", "num_signals"

* **threads** *: int (default = 4)*

Total number of threads. 1 thread is used for the reader and 1 for the writer (default = 4)

* **verbose** *: bool (default = False)*

Increase verbosity

* **quiet** *: bool (default = False)*

Reduce verbosity



### Example usage

#### Example with minimal file

In [7]:
input_fn = "./data/eventalign_collapse//nanopolish_reads.tsv"
outdir = "./output/eventalign_collapse"
outprefix = "basic"

Eventalign_collapse (input_fn=input_fn, outdir=outdir, outprefix=outprefix, threads=6, verbose=True)

head("./output/eventalign_collapse/basic_eventalign_collapse.tsv")
head("./output/eventalign_collapse/basic_eventalign_collapse.tsv.idx")

Checking arguments
	Testing input file readability
Testing output dir writability
	Checking number of threads
	Checking if stat_fields names are valid
Starting to process files
	[process_read 1] Starting processing reads
	[split_reads] Start reading input file/stream
	[process_read 2] Starting processing reads
	[process_read 4] Starting processing reads
	[write_output] Start rwriting output
	[process_read 3] Starting processing reads
11 reads [00:00, 101.86 reads/s]	[split_reads] Done
	[process_read 3] Done
	[process_read 1] Done
	[process_read 4] Done
	[process_read 2] Done
21 reads [00:00, 100.25 reads/s]
	[write_output] Done
[Eventalign_collapse] total reads: 21 [82.03 reads/s]



#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time
1656	GAAAA	1	0.00266	0.0	0.0
1657	AAAAC	1	0.00764	0.0	0.0
1658	AAACA	1	0.00398	0.0	0.0
1659	AACAA	1	0.00432	0.0	0.0
1660	ACAAA	1	0.00498	0.0	0.0
1661	CAAAG	1	0.00564	0.0	0.0
1662	AAAGA	1	0.00963	0.0	0.0
1663	AAGAT	1	0.00299	0.0	0.0

ref_id  ref_start ref_end read_id kmers dwell_time         NNNNN_kmers mismatch_kmers missing_kmers byte_offset byte_len 
YGR240C 1656      2960    0       1250  13.788570000000009 35          0              54            0           38028    
YCR030C 1578      2576    1       971   11.487010000000005 23          0              27            38029       29704    
YHR174W 0         839     2       825   9.659210000000002  15          0              14            67734       24231    
YHR174W 218       1309    3       1028  11.06325000000001  36          0              63            91966       30801    
YHR174W 462       1309    4       818   10.73776000000001  18         

#### Example with indexes

In [8]:
input_fn = "./data/eventalign_collapse//nanopolish_reads_index.tsv"
outdir = "./output/eventalign_collapse"
outprefix = "index"

Eventalign_collapse (input_fn=input_fn, outdir=outdir, outprefix=outprefix, threads=6)

head("./output/eventalign_collapse/index_eventalign_collapse.tsv")
head("./output/eventalign_collapse/index_eventalign_collapse.tsv.idx")

Checking arguments
	Testing input file readability
Testing output dir writability
	Checking number of threads
	Checking if stat_fields names are valid
Starting to process files
	[split_reads] Start reading input file/stream
	[process_read 1] Starting processing reads
	[process_read 2] Starting processing reads
	[process_read 3] Starting processing reads
	[write_output] Start rwriting output
0 reads [00:00, ? reads/s]	[process_read 4] Starting processing reads
17 reads [00:00, 69.40 reads/s]	[split_reads] Done
	[process_read 1] Done
	[process_read 2] Done
	[process_read 4] Done
	[process_read 3] Done
21 reads [00:00, 73.41 reads/s]
	[write_output] Done
[Eventalign_collapse] total reads: 21 [69.34 reads/s]



#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	start_idx	end_idx
1656	GAAAA	1	0.00266	0.0	0.0	63446	63454
1657	AAAAC	1	0.00764	0.0	0.0	63423	63446
1658	AAACA	1	0.00398	0.0	0.0	63411	63423
1659	AACAA	1	0.00432	0.0	0.0	63398	63411
1660	ACAAA	1	0.00498	0.0	0.0	63383	63398
1661	CAAAG	1	0.00564	0.0	0.0	63366	63383
1662	AAAGA	1	0.00963	0.0	0.0	63337	63366
1663	AAGAT	1	0.00299	0.0	0.0	63328	63337

ref_id  ref_start ref_end read_id kmers dwell_time         NNNNN_kmers mismatch_kmers missing_kmers byte_offset byte_len 
YGR240C 1656      2960    0       1250  13.788570000000009 35          0              54            0           53046    
YCR030C 1578      2576    1       971   11.487010000000005 23          0              27            53047       41374    
YHR174W 0         839     2       825   9.659210000000002  15          0              14            94422       33595    
YLR441C 173       764     5       554   5.556939999999999  20          0      

#### Example including samples

In [9]:
input_fn = "./data/eventalign_collapse//nanopolish_reads_index.tsv"
outdir = "./output/eventalign_collapse"
outprefix = "stats"

Eventalign_collapse (input_fn=input_fn, outdir=outdir, outprefix=outprefix, threads=6, quiet=True, stat_fields=["mean", "std", "median", "mad", "num_signals"])

head("./output/eventalign_collapse/stats_eventalign_collapse.tsv")
head("./output/eventalign_collapse/stats_eventalign_collapse.tsv.idx")

[Eventalign_collapse] total reads: 21 [75.34 reads/s]



#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	start_idx	end_idx
1656	GAAAA	1	0.00266	0.0	0.0	63446	63454
1657	AAAAC	1	0.00764	0.0	0.0	63423	63446
1658	AAACA	1	0.00398	0.0	0.0	63411	63423
1659	AACAA	1	0.00432	0.0	0.0	63398	63411
1660	ACAAA	1	0.00498	0.0	0.0	63383	63398
1661	CAAAG	1	0.00564	0.0	0.0	63366	63383
1662	AAAGA	1	0.00963	0.0	0.0	63337	63366
1663	AAGAT	1	0.00299	0.0	0.0	63328	63337

ref_id  ref_start ref_end read_id kmers dwell_time         NNNNN_kmers mismatch_kmers missing_kmers byte_offset byte_len 
YGR240C 1656      2960    0       1250  13.788570000000009 35          0              54            0           53046    
YCR030C 1578      2576    1       971   11.487010000000005 23          0              27            53047       41374    
YHR174W 0         839     2       825   9.659210000000002  15          0              14            94422       33595    
YHR174W 462       1309    4       818   10.73776000000001  18          0      

#### Example including samples and writing samples values in ouput file

In [13]:
input_fn = "./data/eventalign_collapse//nanopolish_reads_samples.tsv"
outdir = "./output/eventalign_collapse"
outprefix = "samples"

Eventalign_collapse (input_fn=input_fn, outdir=outdir, outprefix=outprefix, threads=6, quiet=True, stat_fields=["mean", "std", "median", "mad", "num_signals"], write_samples=True)

head("./output/eventalign_collapse/samples_eventalign_collapse.tsv")
head("./output/eventalign_collapse/samples_eventalign_collapse.tsv.idx")

[Eventalign_collapse] total reads: 21 [12.6 reads/s]



#0	YGR240C
ref_pos	ref_kmer	num_events	dwell_time	NNNNN_dwell_time	mismatch_dwell_time	mean	std	median	mad	num_signals	samples
1656	GAAAA	1	0.00266	0.0	0.0	105.49774932861328	1.1988129615783691	105.4800033569336	0.852996826171875	8	104.343,106.049,106.617,105.196,105.48,107.47,103.347,105.48
1657	AAAAC	1	0.00764	0.0	0.0	111.09274291992188	1.667485237121582	111.0250015258789	0.9950027465820312	23	112.873,106.617,110.03,114.579,111.736,113.726,111.309,109.745,113.157,109.034,111.025,112.162,112.162,110.172,110.598,110.74,111.025,109.461,110.883,112.02,111.878,110.314,109.887
1658	AAACA	1	0.00398	0.0	0.0	100.11283111572266	2.855329990386963	99.50865173339844	1.7771987915039062	12	100.077,97.5182,98.0869,97.0917,97.5182,99.3665,97.9447,99.6508,100.362,105.764,102.636,105.338
1659	AACAA	1	0.00432	0.0	0.0	90.71566009521484	3.648205518722534	89.12989807128906	2.2747955322265625	13	86.4285,87.1394,86.8551,89.1299,87.9925,91.5468,90.9781,95.2434,95.6699,97.376,89.1299,94.3903,87.4238
1660	ACAAA

## Using the index to random access a specific entry in the file

In [11]:
output_fn = "./output/eventalign_collapse/stats_eventalign_collapse.tsv"
index_fn = "./output/eventalign_collapse/stats_eventalign_collapse.tsv.idx"

# Import the index in a pandas dataframe (because it is simple)
import pandas as pd 
index_df = pd.read_csv (index_fn, sep="\t")

# Select random lines
random_lines = index_df.sample(5)
print ("Random index lines")
display (random_lines)

# Open the collapsed event align file
with open (output_fn) as fp:
    for id, read in random_lines.iterrows():
        
        # Access the header corresponding to the randomly selected index line using seek 
        fp.seek(0) # Return to file start
        fp.seek(read.byte_offset) # Move to the offset indicated in the index file
        print (fp.readline().rstrip()) # Print read header
        df = pd.read_csv (fp, nrows=read.kmers, sep="\t") # Read lines corresponding to the read
        with pd.option_context("display.max_rows",4): # display first and last lines
            display(df)

Random index lines


Unnamed: 0,ref_id,ref_start,ref_end,read_id,kmers,dwell_time,NNNNN_kmers,mismatch_kmers,missing_kmers,byte_offset,byte_len
9,YDR224C,0,389,9,376,4.28403,8,0,13,316268,15465
10,YIL117C,27,953,10,889,13.64007,23,0,37,331734,37225
7,YDR500C,9,252,8,231,2.91791,7,0,14,267232,9541
14,YDR382W,5,329,14,308,7.05602,13,0,16,437025,13152
13,YPL198W,10,719,13,667,8.79275,30,0,44,409081,27943


#9	YDR224C


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,start_idx,end_idx
0,0,ATGTC,1,0.00299,0.0,0.0,24307,24316
1,1,TGTCT,1,0.00498,0.0,0.0,24292,24307
...,...,...,...,...,...,...,...,...
374,387,CAAGC,11,0.04947,0.0,0.0,11435,11584
375,388,AAGCA,1,0.00764,0.0,0.0,11412,11435


#10	YIL117C


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,start_idx,end_idx
0,27,GGCCT,4,0.03287,0.00498,0.0,57497,57596
1,31,TTCCA,1,0.00398,0.00000,0.0,57485,57497
...,...,...,...,...,...,...,...,...
887,951,GAGTA,3,0.01163,0.00432,0.0,16588,16623
888,952,AGTAA,5,0.02590,0.00000,0.0,16510,16588


#8	YDR500C


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,start_idx,end_idx
0,9,GGTAC,2,0.01095,0.00000,0.0,22637,22670
1,10,GTACT,1,0.00631,0.00000,0.0,22618,22637
...,...,...,...,...,...,...,...,...
229,248,TAAGG,2,0.01560,0.01228,0.0,13899,13946
230,251,GGCTA,1,0.00598,0.00000,0.0,13881,13899


#14	YDR382W


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,start_idx,end_idx
0,5,ATACT,2,0.01494,0.00000,0.0,48813,48858
1,6,TACTT,2,0.00631,0.00000,0.0,48794,48813
...,...,...,...,...,...,...,...,...
306,327,GATTA,2,0.02191,0.00000,0.0,27833,27899
307,328,ATTAA,14,0.07603,0.01029,0.0,27604,27833


#13	YPL198W


Unnamed: 0,ref_pos,ref_kmer,num_events,dwell_time,NNNNN_dwell_time,mismatch_dwell_time,start_idx,end_idx
0,10,AAAAA,1,0.00697,0.00000,0.0,44620,44641
1,11,AAAAA,1,0.01959,0.00000,0.0,44561,44620
...,...,...,...,...,...,...,...,...
665,715,TGGTT,4,0.02557,0.00797,0.0,18191,18268
666,718,TTAAG,1,0.01162,0.00000,0.0,18156,18191
