Skip to content

ABFCode/Spine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spine

Fast, tolerant EPUB parser for Go.

Quick start

book, err := spine.ParseFile("book.epub")
if err != nil {
    // book may still contain partial data + warnings
    log.Fatal(err)
}

fmt.Println(book.Metadata.Title)
fmt.Println("Landmarks:", len(book.Landmarks))
fmt.Println("Pages:", len(book.PageList))

chunks, err := book.Chunks(spine.ChunkingOptions{Mode: spine.ChunkByParagraph})
if err != nil {
    log.Fatal(err)
}
for _, c := range chunks {
    fmt.Println(c.ID, c.Text)
}

chapters, err := book.Chapters(spine.ChapterOptions{TitleSource: spine.TitleAuto})
if err != nil {
    log.Fatal(err)
}
for _, ch := range chapters {
    fmt.Println(ch.SpineIndex, ch.Title, len(ch.Text))
}

// Stream chapters (memory-friendly for very large books).
err = book.ForEachChapter(spine.ChapterOptions{TitleSource: spine.TitleAuto}, func(ch spine.Chapter) error {
    fmt.Println(ch.SpineIndex, ch.Title, len(ch.Text))
    return nil
})
if err != nil {
    log.Fatal(err)
}

// Resolve anchors (after building chunks).
if ref, ok := book.ResolveAnchor("OEBPS/chapter1.xhtml#c1"); ok {
    fmt.Println(ref.ChunkID, ref.Offset)
}

// Cover extraction.
if cover, err := book.Cover(); err == nil {
    fmt.Println(cover.ContentType, len(cover.Bytes))
}

Parser configuration

cfg := spine.DefaultConfig()
cfg.Strict = false
cfg.Fallbacks.GenerateTOC = true
cfg.Chunking = spine.ChunkingOptions{Mode: spine.ChunkBySize, MaxChars: 2000}

parser := spine.NewParser(cfg)
book, err := parser.ParseFile("book.epub")

For a full API reference, see docs/README.md (index) or docs/API.md. Error details and examples are in docs/errors.md.

Typed errors

The parser returns sentinel errors you can check with errors.Is, such as ErrMissingContainer, ErrMalformedOPF, and ErrNoSpine.

Notes

  • The parser is streaming-first: content is parsed on demand.
  • Parse(io.Reader) spools to a temp file when needed. Call Close() to release resources.
  • Anchor keys are normalized as path#id inside the EPUB; use ResolveAnchor for convenience lookups.
  • Strict disables best-effort recovery; fallbacks stay off unless you explicitly set Fallbacks.
  • Use OpenCover / Cover to retrieve the cover image (if present).
  • TOC targets resolve after chunking; use TOCWithTargets if you need chunk offsets.
  • Run go mod tidy to resolve module dependencies.

Compatibility fixtures

Add .epub files to testdata/fixtures (tracked) or testdata/fixtures/external (ignored) and generate golden outputs with (tracked fixtures by default):

go run ./cmd/spine-golden
go run ./cmd/spine-golden -external

The test TestCompatibilityFixtures will compare against the generated JSON files in testdata/expected.

Fixture provenance and licenses are documented in testdata/fixtures/SOURCES.md.

Benchmarks

Benchmarks pick the largest .epub in testdata/fixtures by default. Override with SPINE_BENCH_FIXTURE=/path/to/book.epub.

go test -run '^$' -bench BenchmarkOpenLargestFixture -benchmem
go test -run '^$' -bench BenchmarkParseBytesLargestFixture -benchmem
go test -run '^$' -bench BenchmarkParseAndChunkLargestFixture -benchmem

Generate CPU/memory profiles:

make bench-cpu
make bench-mem

Starter fixtures (synthetic)

Generate a small set of synthetic edge-case EPUBs with:

go run ./cmd/spine-fixtures

Project Gutenberg fetcher

Download a limited set of Gutenberg EPUBs (respect their Terms of Use). Files go to testdata/fixtures/external by default:

go run ./cmd/spine-fetch-gutenberg -ids 11,84,1342 -variant epub3.images -yes

W3C/EPUBCheck test fixtures

Download a limited set from W3C epub-tests and EPUBCheck test suites. Files go to testdata/fixtures/external by default:

go run ./cmd/spine-fetch-tests -yes

About

Comprehensive Go Epub Parser

Resources

License

Stars

Watchers

Forks

Packages

No packages published