Skip to content
Gumbo (a C lib to parse HTML5) perl6 binding
Perl 6 HTML
Branch: master
Clone or download
Latest commit 3009506 Jun 16, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib compare ints without stringifying them Jun 8, 2019
t Fix tests for the exporting module 'bug' Jan 5, 2017
.travis.yml Replace panda with zef Jul 7, 2017
LICENSE Add for regex support on filter Jan 21, 2016
META6.json Fix version Jun 16, 2019 Add installation instruction in the readme Aug 4, 2018
gumbo.gpt Use a generated Binding.pm6 file Mar 15, 2016
prove First release Oct 27, 2015

Build Status

Name Gumbo


use Gumbo;
use LWP::Simple;

my $xml = parse-html(LWP::Simple.get(""));
say $xml.lookfor(:TAG<title>); # Google;


From the Gumbo project page :

Gumbo is an implementation of the HTML5 parsing algorithm implemented as a pure C99 library with no outside dependencies. It's designed to serve as a building block for other tools and libraries such as linters, validators, templating languages, and refactoring and analysis tools.

This module is a binding to this library. It provide a parse-html routine that parse a given html string and return a XML::Document object. To access all the Gumbo library has to offer you probably want to look at the Gumbo::Binding module.


Simply use zef. For the tests to pass, you need to install the Gumbo library (see your distribution documentation). If you installed the Gumbo library in your own location you can set the PERL6_GUMBOLIB environement variable with the path to the library file.

zef install Gumbo or PERL6_GUMBOLIB="/home/user/gumbo/" zef install Gumbo


parse-html(Str $html) : XML::Document

Parse a html string and retrurn a XML::Document.

parse-html(Str $html, :$nowhitespace, *%filters) : XML::Document

This is the full signature of the parse-html routine.

  • nowhitespace

Tell Gumbo to not include all extra whitespaces that can exist around tag, like intendation put in front of html tags

  • *%filters

The module offer some form of basic filtering if you want to restrict the XML::Document returned. You can only filter on elements (understand tags) and not content like the text of a p tag.

It inspired by the elements method of the XML::Element class. The main purpore is to reduce time spent parsing uneccessary content and decrease the memory print of the XML::Document.

IMPORTANT: the root will always be the html tag.

  • TAG Limits to elements with the given tag name

  • SINGLE If set only get the first match

  • attrib You can filter on one attribute name with his given value

All the children of the element(s) matched are kept. Like if you search for all the links, you will get the eventuals additionals tags put around the text part.

$gumbo_last_c_parse_duration && $gumbo_last_xml_creation_duration

These two variables hold the time (Duration) spend in the two steps of the work parse-html does.


Set this environment variable to specify where the module can find the gumbo library if it's not in the usual system library path.


use Gumbo;

my $html = q:to/END_HTML/;
       <p>It's fancy</p>
       <p class="fancier">It's fancier</p>


my $xmldoc = parse-html($html);

say $xmldoc.root.elements(:TAG<p>, :RECURSE)[0][0].text; #It's fancy

$xmldoc = parse-html($html, :TAG<p>, :SINGLE);

say $xmldoc[0][0].text; #It's still fancy

$xmldoc = parse-html($html, :TAG<p>, :class<fancier>, :SINGLE);

say $xmldoc[0][0].text; # It's fancier


This module provide a Gumbo::Parser class that does the role defined by the HTML::Parser module. It also provide some additionnals attributes that contains various informations. It work exactly like the parse-html method with the same extra optionnals arguments.

use Gumbo::Parser;

my $parser =;
my $xmldoc = $parser->parse($html);
say $parser.c-parse-duration;
say $parser.xml-creation-duration;
say $parser.stats<xml-objects>; # the number of XML::* created (excluding the XML::Document)
say $parser.stats<whitespaces>; # the number of Whitespaces elements (created or not)
say $parser.stats<elements>; # the number of XML::Element (including root)

See Also

XML, HTML::Parser::XML


Sylvain "Skarsnik" Colinet


The modules provided by Gumbo are under the same licence as Rakudo, see the LICENCE file

You can’t perform that action at this time.