Skip to content

amplafi/htmlcleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


NOTE: This fork of htmlcleaner is now merged back into the http://htmlcleaner.sourceforge.net/ project as of version 2.4

2.4 is officially released!

This fork is kept only to help with patch submission to the official version.

==========================================================================


* omitHtmlEnvelope behavior change:
 * output all the html contained in the body not just first TagNode contents. ( useful for cleaning html fragments )  ( creates a new blank TagNode to hold the nodes to be outputed
 * omitHtmlEnvelope also triggers omitDoctype

* TagNodes that can be reopened after their parent is closed ( i.e. <b><i></b> -- would result in <b><i></i><b><i> ) if the reopened tag ( <i> in this example ) is immediately closed, the reopened tag is pruned. -- accomplished by checking the autoGenerated boolean on TagNode ) 

* refactoring template methods from Utils to TagTransformer.

*CleanerTransformations changes:
 * Utils.updateTagTransformations now member function.
 * Handles the transformation work so that multiple TagTransformations can be applied to a given tag. ( sets up for regex expression matching ) 
 * now owns responsibility for determining transformed tagname.
 *concept of global AttributeTransformations -- used to strip all attributes that start with "on" for example ( i.e. "onclick" , "onblur" ) 
 * plus added regular expressions matching on values/attribute names

XmlSerializer/HtmlCleaner -- remove IOException being thrown when reading from strings.

* work on spotting "tricky" encoding -- unencode normal ascii characters.

 * get Default Output charset from CleanerProperties

 * handle badly encoded numbers better for example &x0fx , &0A; were parsed badly before

 * added a bunch of html special entities

 * convert &apos; in html context to &#39; 
 * added regex attribute/value matching

 * random spelling corrections
 * additional documentation
 
* add greek and math symbols

* cleanup change - if tag was closed due to improperly placed child it will be reopened after the child.
  See ClosedTagReopenTest.java for examples
  
* added audit code - now it is possible to hook in code that will be notified about changes that htmlcleaner does.
  See CleanerProperties#addHtmlModificationListener.
  
* Added unit tests for escapeXml function from Utils

* JDom generation updated not to fail on starting with 'xml' attributes. 

* Unit tests TODOs added