Skip to content

conchoecia/afp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

afp — Another Fastx Parser

CI Coverage PyPI Python versions License: MIT

Wheel Format

Bioconda Bioconda downloads Bioconda platform

Last commit Open issues DOI

A tiny, dependency-free Python reader/writer for FASTA and FASTQ files with transparent gzip / bzip2 / zip / zstandard decompression. Standard library only — zstandard is an optional extra used only for .zst inputs.

The whole module is a single file: afp.py.

Install

Three ways to use it:

Drop the file into your project:

curl -O https://raw.githubusercontent.com/conchoecia/afp/main/afp.py
# put afp.py somewhere on your python path

Or pip-install:

pip install run-afp               # core, no extras
pip install "run-afp[zstd]"       # also read .zst-compressed files
pip install "run-afp[dev]"        # pytest + zstandard for development

The PyPI distribution is run-afp (the bare afp name was already taken). The import name stays import afp.

Or vendor inside another repo: copy afp.py into your dependencies/ directory, add that directory to sys.path, then import afp.

Quick start

import afp

# Auto-detects FASTA vs FASTQ from the first byte, and gzip/bzip2/zip/zstd
# compression from the file's magic bytes (not its extension).
for rec in afp.parse("reads.fq.gz"):
    print(rec.id, len(rec.seq), rec.qual[:10])

for rec in afp.parse("genome.fa"):
    print(rec.id, rec.desc, rec.seq[:50])

Force a specific format if needed:

for rec in afp.parse("weirdly_named_file", format="fasta"):
    ...

# Or use the explicit parsers:
afp.parse_fasta("genome.fa")
afp.parse_fastq("reads.fq")

The Record object

class Record:
    id: str             # token after '>' or '@', up to first whitespace
    seq: str            # sequence, newlines stripped
    desc: str | None    # everything after id on the header line (or None)
    qual: str | None    # quality string (FASTQ only; None for FASTA)

Records are mutable. You can rewrite record.id in place.

rec.format()           # back to FASTA / FASTQ text
rec.format(wrap=80)    # FASTA only: wrap sequence at 80 columns
len(rec)               # length of seq
"ACGT" in rec          # membership on the sequence string
list(rec)              # iterate over letters

Writing

afp.write(records, "out.fa")          # plain
afp.write(records, "out.fa.gz")       # auto-gzip from .gz extension
afp.write(records, "out.fq.gz")       # FASTQ if the first record has `qual`
afp.write(records, "out.fa", wrap=80) # wrapped FASTA

Mixing FASTA and FASTQ records in a single output stream is rejected.

Compression helpers

afp.detect_compression("file")   # 'gzip' | 'bzip2' | 'zip' | 'zstd' | 'none'
afp.get_open_func("file.gz")     # returns gzip.open

detect_compression reads only the first 4 bytes — it's cheap to call.

Why

Built for projects that want to vendor a single Python file rather than pull in a multi-megabyte sequence toolkit, under a permissive license they can carry through to their own code. The whole module is one ~400-line file, no external runtime dependencies, no compiled extensions.

License

MIT — see LICENSE.

About

Another Fastx Parser — tiny, MIT-licensed FASTA/FASTQ reader with transparent gzip/bzip2/zip/zstd decompression.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages