Skip to content
A full-featured, reliable HTML parser for .NET that implements the XmlReader interface.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Properties
.gitattributes
.gitignore
App.config
ConsoleHelper.cs
HtmlReader.cs
HtmlReader.csproj
HtmlReader.sln
LICENSE.txt
Program.cs
README.md
UnitTest_00.html
UnitTest_00.xml
UnitTest_01.html
UnitTest_01.xml
UnitTest_02.html
UnitTest_02.xml
UnitTest_03.html
UnitTest_03.xml
UnitTest_04.html
UnitTest_04.xml
UnitTests.cs

README.md

HtmlReader

HtmlReader is a simple but full-featured HTML parser that implements the .NET XmlReader interface. This allows a programmer to use the rich XML features in .NET on HTML documents.

The software is distributed as a CodeBit located here.

This project include the master copy of HtmlReader.cs plus a set of unit tests that may also be examined as sample code.

Potential Applications for HtmlReader

  • Translate arbitrary HTML into well-formatted and indented XHTML.
  • Automated HTML processing such as templated content, link processing, and so forth.
  • Check HTML for adherence to practices such as WCAG compliance.
  • Screen-scraping websites.
  • Automated reprocessing of HTML.

HtmlReader follows the HTML5 parsing rules but tolerates malformed HTML whenever possible. In this, it's similar to the parsers built into web browsers. Future enhancements may include configurable tolerance and reporting of syntax errors.

Sample Use

Here's an example of loading HTML into a .NET XmlDocument:

XmlDocument doc = new XmlDocument();
HtmlReaderSettings settings = new HtmlReaderSettings();
settings.CloseInput = true;
using (HtmlReader reader = new HtmlReader(new StreamReader("sample.htm", Encoding.UTF8, true), settings))
{
  doc.Load(reader);
}

About CodeBits

A CodeBit is a way to share common code that's lighter weight than NuGet. Each CodeBit consists of a single source code file. A structured comment at the beginning of the file indicates where to find the master copy so that automated tools can retrieve and update CodeBits to the latest version.

License

Offered under the MIT Open Source License.

You can’t perform that action at this time.