Releases: dotcommander/defuddle
v0.3.1
Defuddle Go v0.3.1
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.3.1Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Bug fixes
- 4736085: fix: rename test fixtures to avoid colons in file paths
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.3.0...v0.3.1
v0.3.0
Defuddle Go v0.3.0
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.3.0Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Features
- 3c36dc8: feat(cli): add extractors and batch commands, 3 parse flags, ldflags version
Others
- 7231ab4: chore(ci): add bench job with submodule fixtures, drop windows from matrix
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.2.3...v0.3.0
v0.2.3
Defuddle Go v0.2.3
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.2.3Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Others
- b47d46e: chore(taskfile): replace sudo cp with symlink in install-cli task
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.2.2...v0.2.3
v0.2.2
Defuddle Go v0.2.2
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.2.2Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Bug fixes
- 8094a2f: fix(cli): apply 9 UX fixes from audit, bump to v0.2.2
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.2.1...v0.2.2
v0.2.1
Defuddle Go v0.2.1
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/defuddle@v0.2.1Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Performance improvements
- d25c2c5: perf(elements): restrict detectTextFootnotes DOM scope
- dc4a0a2: perf: pre-compile regexes and hoist repeated allocations
Refactors
- f055ed8: refactor(elements): split footnotes.go into 3 files
- 1d6c1f8: refactor(extractors): extract shared comment thread rendering
- 28b809b: refactor: rename module to dotcommander/defuddle, fix golangci-lint v2 config, update docs
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.2.0...v0.2.1
v0.1.0
v0.1.0 — Initial Release
First tagged release of defuddle-go. This version brings the Go port to full parity with the TypeScript original and adds Go-specific advantages.
New features
- CJK-aware word counting (
internal/text) — Han, Hangul, Hiragana, Katakana each count as one word via Go's unicode range tables, covering supplementary planes the TypeScriptcharCodeAtapproach cannot reach - Sentinel errors —
ErrNotHTML,ErrTooLarge,ErrTimeout,ErrNoContentfor caller-branching viaerrors.Is() - Relative URL resolution (
internal/urlutil) — resolves href/src/srcset/poster against the page URL, respects<base href> - XSS sanitization (
internal/urlutil) — strips event handlers, srcdoc, and dangerous URL schemes from extracted content - Shadow DOM flattening — inlines
<template shadowrootmode>content into the main document - React SSR streaming resolution — resolves
$RC()Suspense boundaries so streamed content is visible - Charset detection — converts non-UTF-8 responses to UTF-8 using
golang.org/x/net/html/charset - Concurrent URL parsing —
ParseFromURLs()with configurableMaxConcurrencyand semaphore-bounded goroutines - Substack extractor — parses
window._preloadsJSON and falls back to DOM selectors - Markdown converter rewrite — 15 custom renderers: code blocks with language detection, figures, highlights (
==text==), YouTube/Twitter embeds, footnote refs, GitHub Alert callouts,data-calloutblockquotes, strikethrough, tables
Improvements
- Pipeline reordered to match TypeScript: hidden removal → scoring → selector removal → standardize
- Entry point scoring now evaluates all candidates and picks the most specific matching child
mainContentancestor protection prevents selectors from removing parent elements of the main content- Link density uses text-length multiplier instead of link-count ratio
- Navigation indicator matching uses pre-compiled word-boundary regexes (no false positives from substring matching)
- Card grid detection avoids treating article listing sections as content
mergeOptionsrefactored toapplyOptionshelper — replaces ~80 lines of copy-pasteaside:not([class*="callout"])preserves callout components;data-calloutadded to allowed attributes- Pre-compiled combined regex for partial selector matching (
O(n)per element instead ofO(n*m))
Testing
- 213 tests across 17 packages
- 6 real-world benchmark fixtures (19KB–664KB HTML)