# Seqsum API Usage

## Import package

In [1]:
from pyBioTools import Seqsum
from pyBioTools.common import jhelp, head

## Merge

In [4]:
jhelp(Seqsum.Merge)

**Merge** (input_fn, output_fn, old_filename_synthax, verbose, quiet, progress, kwargs)



---

* **input_fn** (required) [list(str)]

Sequencing summary file path or directory containing Sequencing summary file  or list of files, or regex or list of regex. It is quite flexible. Files can also be gzipped

* **output_fn** (required) [str]

Destination sequencing summary file. Automatically gzipped if the .gz extension is found

* **old_filename_synthax** (default: False) [bool]

Replace the `filename_fast5` field by `filename` as in older versions. Useful for nanopolish index compatibility

* **verbose** (default: False) [bool]

* **quiet** (default: False) [bool]

* **progress** (default: False) [bool]

* **kwargs**



### Basic usage

In [5]:
Seqsum.Merge (["./data/seqsum_new1.tsv", "./data/seqsum_new2.tsv"], "./output/seqsum_merged_1.tsv", verbose=True)
head ("./output/seqsum_merged_1.tsv", 20)

[01;34m## Running Seqsum Merge ##[0m
[32m	Parsing reads[0m
[37m	[DEBUG]: Reading file ./data/seqsum_new1.tsv[0m
[37m	[DEBUG]: End of file ./data/seqsum_new1.tsv[0m
[37m	[DEBUG]: Reading file ./data/seqsum_new2.tsv[0m
[37m	[DEBUG]: End of file ./data/seqsum_new2.tsv[0m
[32m	Read counts summary[0m
[32m	 Files found: 2[0m
[32m	 Valid files: 2[0m


Only 19 lines in the file
filename_fastq                 filename_fast5                 read_id                        run_id                         channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563    1   59.920000  1.032000 825        TRUE             59.976250      780                 0.975750          449                      12.316098            0.000000         ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533    1   60.045000  1.475000 1180       FALSE            60.351250      935                 1.168750          534                      4.257434             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 

### Using a regex instead

In [6]:
Seqsum.Merge (["./data/seqsum*"], "./output/seqsum_merged_2.tsv.gz", verbose=True)
head ("./output/seqsum_merged_2.tsv.gz", 20)

[01;34m## Running Seqsum Merge ##[0m
[32m	Parsing reads[0m
[37m	[DEBUG]: Reading file ./data/seqsum_new2.tsv[0m
[37m	[DEBUG]: End of file ./data/seqsum_new2.tsv[0m
[37m	[DEBUG]: Reading file ./data/seqsum_new1.tsv[0m
[37m	[DEBUG]: End of file ./data/seqsum_new1.tsv[0m
[32m	Read counts summary[0m
[32m	 Files found: 2[0m
[32m	 Valid files: 2[0m


Only 19 lines in the file
filename_fastq                 filename_fast5                 read_id                        run_id                         channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563    1   59.920000  1.032000 825        TRUE             59.976250      780                 0.975750          449                      12.316098            0.000000         ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533    1   60.045000  1.475000 1180       FALSE            60.351250      935                 1.168750          534                      4.257434             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 

### Files with non-matching header are skipped

In [8]:
Seqsum.Merge ("./data/*", "./output/seqsum_merged_3.tsv.gz", verbose=True)
head ("./output/seqsum_merged_3.tsv.gz", 20)

[01;34m## Running Seqsum Merge ##[0m
[32m	Parsing reads[0m
[37m	[DEBUG]: Reading file ./data/seqsum_new2.tsv[0m
[37m	[DEBUG]: End of file ./data/seqsum_new2.tsv[0m
[37m	[DEBUG]: Reading file ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz[0m
[01;31mERROR: Header of file `./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz` is not consistant[0m
[37m	[DEBUG]: Skipping file ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz[0m
[37m	[DEBUG]: Reading file ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz[0m
[01;31mERROR: Header of file `./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz` is not consistant[0m
[37m	[DEBUG]: Skipping file ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz[0m
[37m	[DEBUG]: Reading file ./data/seqsum_new1.tsv[0m
[37m	[DEBUG]: End of file ./data/seqsum_new1.tsv[0m
[37m	[DEBUG]: Reading file ./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz[0m
[01;31mERROR: Header of file `.

Only 19 lines in the file
filename_fastq                 filename_fast5                 read_id                        run_id                         channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563    1   59.920000  1.032000 825        TRUE             59.976250      780                 0.975750          449                      12.316098            0.000000         ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533    1   60.045000  1.475000 1180       FALSE            60.351250      935                 1.168750          534                      4.257434             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 