This is a Go package which strip HTML tags from a string. Also, you can provide an array of allowableTags
that can be
skipped.
Strip HTML tags library is very useful if you work with web crawlers, or just want to strip all or specific tags from
a string.
nodes, err := Strip(content string, allowableTags []string, stripInlineAttributes bool) (Nodes, error)
nodes.Elements //HTML nodes structure of type *html.Node
nodes.ToString() //returns stripped HTML string
$ go get github.com/darkoatanasovski/htmltags
input - string
allowableTags - []string{} //array of strings e.g. []string{"p", "span"}
removeInlineAttributes - bool // true/false
Returns node
structure. You can get the stripped string with nodes.ToString()
. If there are errors, it will return
the first error message
If you want to keep the inline attributes of the tags, set the third parameter to false
stripped, err := htmltags.Strip("<h1>Header text with <span style=\"color:red\">color</span></h1>", []string{"span"}, false)
Or if you want to strip all tags from the string, and get a pure text, the second parameter has to be empty array
stripped, err := htmltags.Strip("<h1>Header text with <span style=\"color:red\">color</span></h1>", []string{}, false)
A working example
package main
import(
"fmt"
"github.com/darkoatanasovski/htmltags"
)
func main() {
original := "<div>This is <strong style=\"font-size:50px\">complex</strong> text with <span>children <i>nodes</i></span></div>"
allowableTags := []string{"strong", "i"}
removeInlineAttributes := false
stripped, _ := htmltags.Strip(original, allowableTags, removeInlineAttributes)
fmt.Println(stripped) //output: Node structure
fmt.Println(stripped.ToString()) //output string: This is <strong>complex</strong> text with children <i>nodes</i>
}
If you have cloned this repo you will probably need the dependency:
go get golang.org/x/net/html
The broken or partial html will be fixed. If your input HTML string is
<p>Content <i>italic
, the fixed string will be<p>Content <i>italic</i></p>