Fetch a URL, parse its HTML, and query the DOM with CSS-like selectors — all in pure GDScript. No native dependencies, works on every platform Godot supports.
- Engine: Godot 4.2+
- License: MIT
- Status: 0.1.0 — usable, forgiving HTML parser, subset of CSS selectors.
- Copy the
addons/godot_dom_parser/folder into your project'saddons/directory. (Or install via the AssetLib tab in the editor.) - Open Project → Project Settings → Plugins and enable GodotDOMParser.
All public classes register their class_name globally, so you can use
DOMParser, DOMDocument, DOMNode, HTMLParser, and CSSSelector from
anywhere without preload.
extends Node
func _ready() -> void:
var parser := DOMParser.new()
add_child(parser)
var doc: DOMDocument = await parser.fetch("https://example.com")
if doc == null:
push_error("fetch failed")
return
print("Title: ", doc.get_title())
for link in doc.query_selector_all("a[href]"):
print(link.get_attribute("href"), " -> ", link.get_text_content())var html := "<html><body><p class='hi'>hello <b>world</b></p></body></html>"
var doc := DOMParser.parse_html(html)
print(doc.query_selector("p.hi").get_text_content()) # "hello world"| Member | Description |
|---|---|
fetch(url: String) -> DOMDocument |
Awaitable. GETs the URL and returns a parsed document, or null on error. |
static parse_html(html: String) -> DOMDocument |
Parse an HTML string directly. |
user_agent: String |
UA string sent with requests. |
extra_headers: PackedStringArray |
Extra request headers, "Name: value" format. |
timeout_seconds: float |
Request timeout. |
max_redirects: int |
Redirects to follow. |
signal document_loaded(document) |
Emitted after a successful fetch. |
signal fetch_failed(error, response_code) |
Emitted on network or HTTP error. |
| Member | Description |
|---|---|
source_url: String |
URL this document was fetched from (if any). |
raw_html: String |
The original HTML text. |
get_document_element() |
The <html> element (or first element child). |
get_head() / get_body() |
Convenience accessors. |
get_title() -> String |
Text of the <title> element. |
| Member | Description |
|---|---|
tag_name: String |
Lowercase tag (e.g. "div"). Empty for text/comment. |
attributes: Dictionary |
Attribute map (keys lowercased). |
children: Array[DOMNode] |
Child nodes. |
parent: DOMNode |
Parent (may be null). |
text: String |
Text content for text/comment nodes. |
is_element() / is_text() / is_void() |
Type predicates. |
get_attribute(name, default="") |
Read attribute. |
has_attribute(name) / set_attribute(name, value) / remove_attribute(name) |
Attribute CRUD. |
get_id() / get_classes() / has_class(cls) |
Shortcuts. |
get_text_content() |
Concatenated text of this node and descendants. |
get_inner_html() / get_outer_html() |
Serialize back to HTML. |
append_child(n) / remove_child(n) / remove() |
Tree mutation. |
get_element_by_id(id) |
First descendant element with that id. |
get_elements_by_tag_name(tag) |
All descendant elements with that tag ("*" for all). |
get_elements_by_class_name(cls) |
All descendant elements with that class. |
query_selector(sel) |
First descendant matching the selector. |
query_selector_all(sel) |
All descendants matching the selector. |
matches(sel) |
Does this node match the selector? |
walk() / walk_elements() |
Pre-order traversal helpers. |
- Type / universal:
div,* - ID:
#main - Class:
.title,.a.b(multiple) - Attribute:
[disabled]— present[type="text"]— exact[class~="hero"]— whitespace-separated word[href^="https"]— prefix[href$=".pdf"]— suffix[href*="foo"]— substring[lang|="en"]— exact or"en-"prefix
- Combinators: descendant (space), child (
>), adjacent sibling (+), general sibling (~) - Selector lists:
a, b, c - Pseudo-classes:
:first-child,:last-child,:only-child,:first-of-type,:last-of-type,:not(<simple>),:nth-child(<an+b>),:nth-last-child(<an+b>),:nth-of-type(<an+b>),:nth-last-of-type(<an+b>)(accepts integers,odd,even, and fullan+bnotation like2n+1,-n+3)
Examples:
doc.query_selector_all("article.post > h2 a[href^='https']")
doc.query_selector_all("ul.nav li:first-child")
doc.query_selector_all("p:not(.muted)")The tree is fully mutable. Changes are reflected by get_outer_html().
var body := doc.get_body()
var new_p := DOMNode.create_element("p")
new_p.set_attribute("class", "added")
new_p.append_child(DOMNode.create_text("injected from Godot"))
body.append_child(new_p)
for node in doc.query_selector_all(".advert"):
node.remove()
print(doc.get_outer_html())- Not a spec-compliant HTML5 parser. It's forgiving enough for typical pages
(void elements, unquoted attributes, implicit
<p>/<li>closing, raw-text for<script>/<style>), but edge cases in table foster-parenting,<template>, and malformed markup are handled heuristically. - Entity decoding covers the numeric (
&#...;,&#x...;) forms plus a small named-entity table. Uncommon named entities pass through as-is. - Selectors do not (yet) support namespaces or case-sensitive attribute
matching (
[attr=val i]). - JavaScript is not executed. If a page renders its content client-side, you'll only see the initial HTML.
Bug reports and PRs welcome. If you hit HTML that parses incorrectly, a minimal reproducing snippet is the most useful thing you can send.
The test suite lives in its own repository: codeWonderland/godot-dom-parser-tests. It's kept separate from the addon so the AssetLib download stays small and clutter-free — users who just want to drop the addon into their project shouldn't have to pull test fixtures, a test runner, and extra scenes.
If you're submitting a PR against this addon, please clone the tests repo alongside it, add/update tests for your change, and confirm the full suite still passes. The tests repo uses this repo as a git submodule and explains its setup in its own README.