Skip to content

Releases: dotcommander/defuddle

v0.3.1

06 Apr 17:14

Choose a tag to compare

Defuddle Go v0.3.1

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/defuddle@v0.3.1

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Bug fixes

  • 4736085: fix: rename test fixtures to avoid colons in file paths

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.3.0...v0.3.1

v0.3.0

05 Apr 21:15

Choose a tag to compare

Defuddle Go v0.3.0

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/defuddle@v0.3.0

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Features

  • 3c36dc8: feat(cli): add extractors and batch commands, 3 parse flags, ldflags version

Others

  • 7231ab4: chore(ci): add bench job with submodule fixtures, drop windows from matrix

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.2.3...v0.3.0

v0.2.3

05 Apr 17:54

Choose a tag to compare

Defuddle Go v0.2.3

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/defuddle@v0.2.3

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Others

  • b47d46e: chore(taskfile): replace sudo cp with symlink in install-cli task

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.2.2...v0.2.3

v0.2.2

05 Apr 17:49

Choose a tag to compare

Defuddle Go v0.2.2

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/defuddle@v0.2.2

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Bug fixes

  • 8094a2f: fix(cli): apply 9 UX fixes from audit, bump to v0.2.2

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.2.1...v0.2.2

v0.2.1

05 Apr 17:35

Choose a tag to compare

Defuddle Go v0.2.1

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/defuddle@v0.2.1

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Performance improvements

  • d25c2c5: perf(elements): restrict detectTextFootnotes DOM scope
  • dc4a0a2: perf: pre-compile regexes and hoist repeated allocations

Refactors

  • f055ed8: refactor(elements): split footnotes.go into 3 files
  • 1d6c1f8: refactor(extractors): extract shared comment thread rendering
  • 28b809b: refactor: rename module to dotcommander/defuddle, fix golangci-lint v2 config, update docs

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.2.0...v0.2.1

v0.1.0

05 Apr 03:14

Choose a tag to compare

v0.1.0 — Initial Release

First tagged release of defuddle-go. This version brings the Go port to full parity with the TypeScript original and adds Go-specific advantages.

New features

  • CJK-aware word counting (internal/text) — Han, Hangul, Hiragana, Katakana each count as one word via Go's unicode range tables, covering supplementary planes the TypeScript charCodeAt approach cannot reach
  • Sentinel errorsErrNotHTML, ErrTooLarge, ErrTimeout, ErrNoContent for caller-branching via errors.Is()
  • Relative URL resolution (internal/urlutil) — resolves href/src/srcset/poster against the page URL, respects <base href>
  • XSS sanitization (internal/urlutil) — strips event handlers, srcdoc, and dangerous URL schemes from extracted content
  • Shadow DOM flattening — inlines <template shadowrootmode> content into the main document
  • React SSR streaming resolution — resolves $RC() Suspense boundaries so streamed content is visible
  • Charset detection — converts non-UTF-8 responses to UTF-8 using golang.org/x/net/html/charset
  • Concurrent URL parsingParseFromURLs() with configurable MaxConcurrency and semaphore-bounded goroutines
  • Substack extractor — parses window._preloads JSON and falls back to DOM selectors
  • Markdown converter rewrite — 15 custom renderers: code blocks with language detection, figures, highlights (==text==), YouTube/Twitter embeds, footnote refs, GitHub Alert callouts, data-callout blockquotes, strikethrough, tables

Improvements

  • Pipeline reordered to match TypeScript: hidden removal → scoring → selector removal → standardize
  • Entry point scoring now evaluates all candidates and picks the most specific matching child
  • mainContent ancestor protection prevents selectors from removing parent elements of the main content
  • Link density uses text-length multiplier instead of link-count ratio
  • Navigation indicator matching uses pre-compiled word-boundary regexes (no false positives from substring matching)
  • Card grid detection avoids treating article listing sections as content
  • mergeOptions refactored to applyOptions helper — replaces ~80 lines of copy-paste
  • aside:not([class*="callout"]) preserves callout components; data-callout added to allowed attributes
  • Pre-compiled combined regex for partial selector matching (O(n) per element instead of O(n*m))

Testing

  • 213 tests across 17 packages
  • 6 real-world benchmark fixtures (19KB–664KB HTML)