Navigation Menu

Skip to content

bpabel/html5charref

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html5charref

Build Status Documentation Status License

Python library for escaping/unescaping HTML5 Named Character References.

The standard library includes the HTMLParser library for unescaping HTML named entities and HTML unicode escapes. Unfortunately, it doesn't include any of the named character entity references defined in HTML5. This library intends to provide a solution for escaping/unescaping HTML character references defined in HTML5.

Installation

You can install this project from PyPI:

pip install html5charref

Usage

The main purpose of html5charref is to unescape HTML named entities. It will also handle HTML unicode character escapes.

html = u'This has © and < and © symbols'
print html5charref.unescape(html)
# u'This has \uxa9 and < and \uxa9 symbols' 

You can also use html5charref to find the HTML5 named entity for a given unicode character.

import html5charref
# The copyright character
print html5charref.escape_char(u'\u00a9')
# u'&copy;'

Updating Named Entity References

It is possible that additional named entity references will be added to the HTLM5 spec. You can update the list maintained by html5charref using the update_charrefs() function. This queries the latest named entity definitions from the w3 HTML5 site.

import html5charref
html5charref.update_charrefs()

Licensing

This project is licensed under the MIT license.

Documentation

View the full documentation.

About

Python library for escaping/unescaping HTML5 Named Character References

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages