Skip to content
Lightweight HTML processor
Branch: master
Clone or download
Latest commit e1eaf6f Jul 17, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LtGt.Tests Add code coverage Jul 10, 2019
LtGt Add code coverage Jul 10, 2019
.gitattributes Add .gitignore and .gitattributes. Jun 22, 2019
.gitignore Add .gitignore and .gitattributes. Jun 22, 2019
Changelog.md Update version Jul 10, 2019
License.txt Add readme, license and changelog Jun 22, 2019
LtGt.sln Add readme, license and changelog Jun 22, 2019
Readme.md Update readme Jul 17, 2019
appveyor.yml Add code coverage Jul 10, 2019
favicon.png Add favicon and logo Jun 22, 2019
logo.png Add favicon and logo Jun 22, 2019

Readme.md

LtGt

Build Tests Coverage NuGet NuGet Donate Donate

LtGt is a minimalistic library for working with HTML. It can parse any HTML5-compliant code into an object model which you can use to traverse nodes or locate specific elements. The library establishes itself as a foundation that you can build upon, and comes with a lot of extension methods that can help navigate the DOM easily. It also supports HTML rendering, so you can turn any HTML object tree to code.

Currently, the object model in LtGt is immutable so it cannot be used to manipulate DOM directly.

Download

Features

  • Parse any HTML5-compliant code
  • Use LINQ to traverse HTML nodes
  • Find elements using JS-like functions (GetElementById(), GetElementsByTagName(), etc)
  • Find elements using CSS selectors
  • Convert HTML nodes to a Linq2Xml representation (XNode, XElement, etc)
  • Render HTML nodes as code
  • Easily extensible with custom methods
  • Targets .NET Framework 4.5+ and .NET Standard 1.0+

Usage

Parse a document

To parse an HTML document, you may create a new instance of HtmlParser or use a singleton HtmlParser.Default.

const string html = @"<!doctype html>
<html>
  <head>
    <title>Document</title>
  </head>
  <body>
    <div>Content</div>
  </body>
</html>";

var document = HtmlParser.Default.ParseDocument(html);

Parse a fragment

Besides a full document, you can also parse any other type of node.

const string html = "<div id=\"some-element\"><a href=\"https://example.com\">Link</a></div>";

// Parse an element node
var element = HtmlParser.Default.ParseElement(html);

// Parse any node
var node = HtmlParser.Default.ParseNode(html);

Find specific element

There are many extension methods that should help you locate elements you want to find.

var element1 = document.GetElementById("menu-bar");
var element2 = document.GetElementsByTagName("div").FirstOrDefault();
var element3 = document.GetElementsByClassName("floating-button floating-button--enabled").FirstOrDefault();

var element1Data = element1.GetAttribute("data")?.Value;
var element2Id = element2.GetId();
var element3Text = element3.GetInnerText();

You can leverage the full power of CSS selectors as well.

var element = document.GetElementsBySelector("div#main > span.container:empty").FirstOrDefault();

Convert to Linq2Xml

You can convert LtGt's objects to System.Xml.Linq objects (XNode, XElement, etc). This can be useful if you need to convert HTML to XML or if you want to use XPath to select nodes.

var htmlDocument = HtmlParser.Default.ParseDocument(html);

var xmlDocument = htmlDocument.ToXDocument();

var elements = xmlDocument.XPathSelectElements("//input[@type=\"submit\"]");

Render nodes

You can turn any node or hierarchy of nodes to HTML code.

var element = new HtmlElement("div",
    new HtmlAttribute("id", "main"),
    new HtmlText("Hello world"));

var html = HtmlRenderer.Default.RenderNode(element); // <div id="main">Hello world</div>

Libraries used

Donate

If you really like my projects and want to support me, consider donating to me on Patreon or BuyMeACoffee. All donations are optional and are greatly appreciated. 🙏

You can’t perform that action at this time.