Skip to content

aarzilli/sandblast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Library that uses Readability-like heuristics to extract text from an HTML document.

Example:

import "golang.org/x/net/html"node, err := html.Parse(bytes.NewReader(raw_html))
if err != nil {
	log.Fatal("Parsing error: ", err)
}
title, text := sandblast.Extract(node)
fmt.Printf("Title: %s\n%s", title, text)
…

See also example/extract.go, a command line utility to extract text from a URL.

About

Library to extract text from HTML files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages