Skip to content

Release 0.2.0

Compare
Choose a tag to compare
@yorickpeterse yorickpeterse released this 17 Nov 22:39

Alternative version: https://github.com/YorickPeterse/oga/blob/b8f9d04b17b2f56eed6e8a5d1ab77a6123fd985e/doc/changelog.md#020---2014-11-17

CSS Selector Support

Probably the biggest feature of this release: support for querying documents
using CSS selectors. Oga supports a subset of the CSS3 selector specification,
in particular the following selectors are supported:

  • Element, class and ID selectors
  • Attribute selectors (e.g. foo[x ~= "y"])

The following pseudo classes are supported:

  • :root
  • :nth-child(n)
  • :nth-last-child(n)
  • :nth-of-type(n)
  • :nth-last-of-type(n)
  • :first-child
  • :last-child
  • :first-of-type
  • :last-of-type
  • :only-child
  • :only-of-type
  • :empty

You can use CSS selectors using the methods css and at_css on an instance of
Oga::XML::Document or Oga::XML::Element. For example:

document = Oga.parse_xml('<people><person>Alice</person></people>')

document.css('people person') # => NodeSet(Element(name: "person" ...))

The architecture behind this is quite similar to parsing XPath. There's a lexer
(Oga::CSS::Lexer) and a parser (Oga::CSS::Parser). Unlike Nokogiri (and
perhaps other libraries) the parser does not output XPath expressions as a
String or a CSS specific AST. Instead it directly emits an XPath AST. This
allows the resulting AST to be directly evaluated by Oga::XPath::Evaluator.

See #11 for more information.

Mutli-line Attribute Support

Oga can now lex/parse elements that have attributes with newlines in them.
Previously this would trigger memory allocation errors.

See #58 for more information.

SAX after_element

The after_element method in the SAX parsing API now always takes two
arguments: the namespace name and element name. Previously this method would
always receive a single nil value as its argument, which is rather pointless.

See #54 for more information.

XPath Grouping

XPath expressions can now be grouped together using parenthesis. This allows one
to specify a custom operator precedence.

Enumerator Parsing Input

Enumerator instances can now be used as input for Oga.parse_xml and friends.
This can be used to download and parse XML files on the fly. For example:

enum = Enumerator.new do |yielder|
  HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
    yielder << chunk
  end
end

document = Oga.parse_xml(enum)

See #48 for more information.

Removing Attributes

Element attributes can now be removed using Oga::XML::Element#unset:

element = Oga::XML::Element.new(:name => 'foo')

element.set('class', 'foo')
element.unset('class')

XPath Attributes

XPath predicates are now evaluated for every context node opposed to being
evaluated once for the entire context. This ensures that expressions such as
descendant-or-self::node()/foo[1] are evaluated correctly.

Available Namespaces

When calling Oga::XML::Element#available_namespaces the Hash returned by
Oga::XML::Element#namespaces would be modified in place. This was a bug that
has been fixed in this release.

NodeSets

NodeSet instances can now be compared with each other using ==. Previously
this would always consider two instances to be different from each other due to
the usage of the default Object#== method.

XML Entities

XML entities such as &amp; and &lt; are now encoded/decoded by the lexer,
string and text nodes.

See #49 for more information.

General

Source lines are no longer included in error messages generated by the XML
parser. This simplifies the code and removes the need of re-reading the input
(in case of IO/Enumerable inputs).

XML Lexer Newlines

Newlines in the XML lexer are now counted in native code (C/Java). On MRI and
JRuby the improvement is quite small, but on Rubinius it's a massive
improvement. See commit 8db77c0a09bf6c996dd2856a6dbe1ad076b1d30a for more
information.

HTML Void Element Performance

Performance for detecting HTML void elements (e.g. <br> and <link>) has been
improved by removing String allocations that were not needed.