Skip to content


Subversion checkout URL

You can clone with
Download ZIP
jsoup: Java HTML Parser, with best of DOM, CSS, and jquery
Pull request Compare This branch is 399 commits ahead, 859 commits behind jhy:master.
Failed to load latest commit information.
.gitignore Updated ignore list
CHANGES Changelog prep for 1.5.2
LICENSE Updated copyright date
README Readme update
pom.xml Added support for XPath 1.0 with Jaxen


jsoup: Java HTML parser that makes sense of real-world HTML soup.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API
for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

* parse HTML from a URL, file, or string
* find and extract data, using DOM traversal or CSS selectors
* manipulate the HTML elements, attributes, and text
* clean user-submitted content against a safe white-list, to prevent XSS

jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating,
to invalid tag-soup; jsoup will create a sensible parse tree.

See for downloads and documentation.
Something went wrong with that request. Please try again.