Skip to content

PythonLinks/html-to-etree

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html to etree

Coverage Status Requirements Status

Parse html to lxml etree

Convenience methods for parsing html documents to lxml etree.

Lxml has limited capabilities for handling different encodings, and this library is intended as a reusable utility parsing byte-code html responses into ElementTrees using sane character decoding.

  • Free software: BSD license
  • Python versions: 2.7, 3.4+

Features

  • Parse html to lxml etree
  • Handle character decoding

Quickstart

Parse HTML given as byte strings:

tree = parse_html_bytes(body=body_bytes, content_type=res.headers.get('content-type'))

Parse HTML given as already decoded unicode string:

tree = parse_html_unicode(uni_string=body_unicode)

Credits

This package was created with Cookiecutter and the `fluquid/cookiecutter-pypackage`_ project template.

About

convenience method for parsing html to lxml elementtree using sane character decoding

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.6%
  • Makefile 33.4%