Skip to content
This repository

A little like that j-thing, only in Go.

branch: master
Octocat-spinner-32 bench test go1.0 compat April 11, 2014
Octocat-spinner-32 misc update doc, example April 11, 2014
Octocat-spinner-32 testdata add tests for closest with selector November 10, 2012
Octocat-spinner-32 .gitignore add benchmark file for go1.2 December 02, 2013
Octocat-spinner-32 .travis.yml add go1.1+ requirement April 11, 2014
Octocat-spinner-32 LICENSE add travis badge, update copyright year February 28, 2014
Octocat-spinner-32 README.md add go1.1+ requirement April 11, 2014
Octocat-spinner-32 array.go lint April 11, 2014
Octocat-spinner-32 array_test.go Added test TestLastEmpty February 28, 2014
Octocat-spinner-32 bench_array_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 bench_example_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 bench_expand_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 bench_filter_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 bench_iteration_test.go add v0.3.0 tag and benchmark EachWithBreak May 24, 2013
Octocat-spinner-32 bench_property_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 bench_query_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 bench_traversal_test.go finalize tests and benchmarks for v0.2 November 11, 2012
Octocat-spinner-32 doc.go add go1.1+ requirement April 11, 2014
Octocat-spinner-32 example_test.go update doc, example April 11, 2014
Octocat-spinner-32 expand.go lint April 11, 2014
Octocat-spinner-32 expand_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 filter.go more linting April 11, 2014
Octocat-spinner-32 filter_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 iteration.go lint April 11, 2014
Octocat-spinner-32 iteration_test.go add EachWithBreak(), fixes #13 May 24, 2013
Octocat-spinner-32 property.go lint April 11, 2014
Octocat-spinner-32 property_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 query.go more linting April 11, 2014
Octocat-spinner-32 query_test.go remove Document.Root, Document is itself a Selection (using embedded … October 14, 2012
Octocat-spinner-32 traversal.go final linting April 11, 2014
Octocat-spinner-32 traversal_test.go add test for issue #26, cannot reproduce November 12, 2013
Octocat-spinner-32 type.go lint April 11, 2014
Octocat-spinner-32 type_test.go final linting April 11, 2014
Octocat-spinner-32 utilities.go lint April 11, 2014
README.md

goquery - a little like that j-thing, only in Go

build status

GoQuery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns tokens (nodes), and not a full-featured DOM object, jQuery's manipulation and modification functions have been left off (no point in modifying data in the parsed tree of the HTML, it has no effect).

Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML.

Supported functions are query-oriented features (hasClass(), attr() and the likes), as well as traversing functions that make sense given what we have to work with. This makes GoQuery a great library for scraping web pages.

Syntax-wise, it is as close as possible to jQuery, with the same function names when possible, and that warm and fuzzy chainable interface. jQuery being the ultra-popular library that it is, I felt that writing a similar HTML-manipulating library was better to follow its API than to start anew (in the same spirit as Go's fmt package), even though some of its methods are less than intuitive (looking at you, index()...).

Installation

Please note that because of the net/html dependency, goquery requires Go1.1+.

$ go get github.com/PuerkitoBio/goquery

(optional) To run unit tests:

$ cd $GOPATH/src/github.com/PuerkitoBio/goquery
$ go test

(optional) To run benchmarks (warning: it runs for a few minutes):

$ cd $GOPATH/src/github.com/PuerkitoBio/goquery
$ go test -bench=".*"

Changelog

Note that goquery's API is now stable, and will not break.

  • v0.3.2 : Add NewDocumentFromReader() (thanks jweir) which allows creating a goquery document from an io.Reader.
  • v0.3.1 : Add NewDocumentFromResponse() (thanks assassingj) which allows creating a goquery document from an http response.
  • v0.3.0 : Add EachWithBreak() which allows to break out of an Each() loop by returning false. This function was added instead of changing the existing Each() to avoid breaking compatibility.
  • v0.2.1 : Make go-getable, now that go.net/html is Go1.0-compatible (thanks to @matrixik for pointing this out).
  • v0.2.0 : Add support for negative indices in Slice(). BREAKING CHANGE Document.Root is removed, Document is now a Selection itself (a selection of one, the root element, just like Document.Root was before). Add jQuery's Closest() method.
  • v0.1.1 : Add benchmarks to use as baseline for refactorings, refactor Next...() and Prev...() methods to use the new html package's linked list features (Next/PrevSibling, FirstChild). Good performance boost (40+% in some cases).
  • v0.1.0 : Initial release.

API

GoQuery exposes two classes, Document and Selection. Unlike jQuery, which is loaded as part of a DOM document, and thus acts on its containing document, GoQuery doesn't know which HTML document to act upon. So it needs to be told, and that's what the Document class is for. It holds the root document node as the initial Selection object to manipulate.

jQuery often has many variants for the same function (no argument, a selector string argument, a jQuery object argument, a DOM element argument, ...). Instead of exposing the same features in GoQuery as a single method with variadic empty interface arguments, I use statically-typed signatures following this naming convention:

  • When the jQuery equivalent can be called with no argument, it has the same name as jQuery for the no argument signature (e.g.: Prev()), and the version with a selector string argument is called XxxFiltered() (e.g.: PrevFiltered())
  • When the jQuery equivalent requires one argument, the same name as jQuery is used for the selector string version (e.g.: Is())
  • The signatures accepting a jQuery object as argument are defined in GoQuery as XxxSelection() and take a *Selection object as argument (e.g.: FilterSelection())
  • The signatures accepting a DOM element as argument in jQuery are defined in GoQuery as XxxNodes() and take a variadic argument of type *html.Node (e.g.: FilterNodes())
  • Finally, the signatures accepting a function as argument in jQuery are defined in GoQuery as XxxFunction() and take a function as argument (e.g.: FilterFunction())

GoQuery's complete godoc reference documentation can be found here.

Please note that Cascadia's selectors do NOT necessarily match all supported selectors of jQuery (Sizzle). See the cascadia project for details.

Examples

Taken from example_test.go:

import (
    "fmt"
    "log"

    // In real use, this import would be required (not in this example, since it
    // is part of the goquery package)
    //"github.com/PuerkitoBio/goquery"
)

// This example scrapes the reviews shown on the home page of metalsucks.net.
func ExampleScrape_MetalSucks() {
    // Load the HTML document (in real use, the type would be *goquery.Document)
    var doc *Document
    var e error

    if doc, e = NewDocument("http://metalsucks.net"); e != nil {
        log.Fatal(e)
    }

    // Find the review items (the type of the Selection would be *goquery.Selection)
    doc.Find(".reviews-wrap article .review-rhs").Each(func(i int, s *Selection) {
        // For each item found, get the band and title
        band := s.Find("h3").Text()
        title := s.Find("i").Text()
        fmt.Printf("Review %d: %s - %s\n", i, band, title)
    })
    // To see the output of the Example while running the test suite (go test), simply
    // remove the leading "x" before Output on the next line. This will cause the
    // example to fail (all the "real" tests should pass).

    // xOutput: voluntarily fail the Example output.
}

License

The BSD 3-Clause license, the same as the Go language. Cascadia's license is here.

Something went wrong with that request. Please try again.