Library to extract text from HTML files
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
example
qa
COPYING
README.md
doc.go
elcatalog.go
element.go
fetch.go
main.go
text.go
util.go
util_test.go

README.md

Library that uses Readability-like heuristics to extract text from an HTML document.

Example:

import "golang.org/x/net/html"node, err := html.Parse(bytes.NewReader(raw_html))
if err != nil {
	log.Fatal("Parsing error: ", err)
}
title, text := sandblast.Extract(node)
fmt.Printf("Title: %s\n%s", title, text)
…

See also example/extract.go, a command line utility to extract text from a URL.