Releases: kata198/AdvancedHTMLParser
9.0.2
9.0.1 - XPath Engine-er!!!
- 9.0.1 - Feb 12 2020
- Fix installation issue under some conditions
- 9.0.0 - Jan 16 2020
- (8.9.9 - beta release 1)
-
XPath engine. See new function "getElementsByXPathExpression" on parser,
tags, and tag collections. -
Implement many XPath features, some less-used items are not yet implemented
(will raise an exception if you try to use them)
9.0.0 - XPath Engine!
- 9.0.0 - Jan 16 2020
- (8.9.9 - beta release 1)
-
XPath engine. See new function "getElementsByXPathExpression" on parser,
tags, and tag collections. -
Implement many XPath features, some less-used items are not yet implemented
(will raise an exception if you try to use them)
9.0.0 Beta1 (8.9.9) - XPath Engine!
- 9.0.0 - ??? ?? ????
- (8.9.9 - beta release 1)
- XPath engine. See new function "getElementsByXPathExpression" on parser,
tags, and tag collections.
README describes some more. Most XPath usages I have seen or used myself work. I have not implemented all functions and obscure usages yet. You will know it's not implemented by exception raised. If you want a specific feature, let me know and I will make it a priority!
Otherwise, enjoy this beta release with xpath support!
.getElementsByXPathExpression (or alias .getElementsByXPath) on parser, tags, and tag collections (same places as getElementsByName, for example).
8.1.8 - Fixed re-release of 8.1.7
- 8.1.8 - Jul 22 2019
- Fix accidental re-release of 8.1.6 to github, bump version to signify
- 8.1.7 - Jul 20 2019
- Update all forms of getElementsByClassName to support multiple classes in a single call, space-separated in a string, per update to spec.
8.1.7 - Sorry For Delay
Sorry guys for the delay, got a lot going on right now.
- 8.1.7 - Jul 20 2019
- Update all forms of getElementsByClassName to support multiple classes in a single call, space-separated in a string, per update to spec.
8.1.6 - Coming Back Around
Alright guys, I'm coming back to this. I've got a couple enhancement requests I"m going to implement, and will also add xpath engine soon. Stay tuned. As always -- backwards compatible.
- 8.1.6 - Jun 21 2019
-
Added AdvancedHTMLParser.AdvancedHTMLParser.setDoctype method, which can be used to set the doctype, or clear the doctype, from the output .getHTML will produce
-
Added related doctype tests, assert we parse it correctly, and that setDoctype works correctly
8.1.5 - Still Alive
- 8.1.5 - May 3 2019
-
Expand some docstrings, fix copyright notices
-
Add attribute-name validation. The base HTML parser will feed us invalid names, for example <div id="abc"; name="hello"> will feed us a name ';'.
-
The standard AdvancedHTMLParser remains best-effort, and will ignore any invalid attribute names when parsing a file/string, but will raise KeyError if you use the .setAttribute method with an invalid name. This allows us to survive parsing more error-ridden files.
-
ValidatingAdvancedHTMLParser will now raise a new kind of exception - InvalidAttributeNameException if an invalid attribute name is encountered during parsing.
-
-
Update tests for new validating and attribute name issues
-
Strip trailing whitespace from all files
8.1.4 - Glorious Golden Lobster Edition
- 8.1.4 - Nov 14 2018
-
Expand documentation in README
-
Add "slim" option to formatHTML, available with either -s or --slim. This will use either the AdvancedHTMLSlimTagFormatter (if in pretty mode, default) or AdvancedHTMLSlimTagMiniFormatter (if in mini mode, -m or --mini)
-
Intercept control+c in formatHTML when reading from stdin and exit cleanly instead of displaying error message
-
Add "--version" switch to formatHTML to print the AdvancedHTMLParser suite version
- 8.1.3 - Oct 16 2018
- Fix python2 inheritance issue with new SlimTag formatters
- 8.1.2 - Oct 16 2018
-
Add two new formatters to AdvancedHTMLParser.Formatter - AdvancedHTMLSlimTagFormatter and AdvancedHTMLSlimTagMiniFormatter
-
These represent the pretty-printer and mini-printer respectively, but will omit the trailing space on start tags,
e.x. will become
-
By default, self-closing tags will retain their trailing space, e.x.
.
This is for xhtml compatibility so that the "/" does not become part of the previous attribute or its own attributeThis can be toggled-off by passing "slimSelfClosing=True" to either of the new formatters, and your output will be
-
-
Added tests and documentation for the two new formatter types
-
Ensure that AdvancedHTMLMiniFormatter is exported by the AdvancedHTMLParser.init.py
-
Add both new SlimTag formatters to be exported by AdvancedHTMLParser.init.py
-
Update the version reference within the pydoc url within the READMEs to bypass caching of previous versions
- 8.1.1 - Oct 15 2018
-
Add AdvancedHTMLMiniFormatter to top-module level (so from AdvancedHTMLParser import AdvancedHTMLMiniFormatter works as well as from AdvancedHTMLParser.Formatter import AdvancedHTMLMiniFormatter)
-
Update "formatHTML" program
-
Expand --help - Now documents the options better.
-
Document the previously-implemented but unadvertised --indent=' ' argument to formatHTML, to set the level-indentation
-
Add "-p" or "--pretty" to toggle pretty-printer on formatHTML program (default mode)
-
Add "-m" or "--mini" to toggle the mini-printer on formatHTML program (new)
-
- 8.1.0 - Oct 15 2018
-
Fix an issue where .classNames became no longer an attribute. [ Bug report and solution validation found by github user mninc [ https://github.com/mninc] ]
-
Fix an issue where under certain conditions binary attributes would have a value of string 'None' (like hidden="None" instead of just hidden in the output) [ Bug report and solution validation found by github user UntoSten [ https://github.com/UntoSten ] ]
-
Expand unit tests to explicitly test the above two scenarios
-
Fixed IndexedAdvancedHTMLParser not working in some conditions due to a typo in a previous change
-
Added a new formatter to AdvancedHTMLParser.Formatter - AdvancedHTMLMiniFormatter which will output mini html.
This will have all non-functional whitespace removed (keeping single-spaces which take up 1 character width), and provide no indentation.
For example, the following:
'''<title>Hello World</title>
If parsed and run through AdvancedHTMLMiniFormatter would come out as:
'<title >Hello World</title>
retaining a space where one would not be ignored before, but removing all non-disregarded whitespace.
This feature is available on an AdvancedHTMLParser.AdvancedHTMLParser object via the new method "getMiniHTML"
As a reminder, "getHTML()" on a parser will retain all original whitespace,
"getFormattedHTML()" with an optional "indent" parameter (default 4 spaces per line) will pretty-print your HTML
and now "getMiniHTML()" will minify it.
8.1.3 - Stablest and Ablest
- 8.1.3 - Oct 16 2018
- Fix python2 inheritance issue with new SlimTag formatters
- 8.1.2 - Oct 16 2018
-
Add two new formatters to AdvancedHTMLParser.Formatter - AdvancedHTMLSlimTagFormatter and AdvancedHTMLSlimTagMiniFormatter
-
These represent the pretty-printer and mini-printer respectively, but will omit the trailing space on start tags,
e.x. will become
-
By default, self-closing tags will retain their trailing space, e.x.
.
This is for xhtml compatibility so that the "/" does not become part of the previous attribute or its own attributeThis can be toggled-off by passing "slimSelfClosing=True" to either of the new formatters, and your output will be
-
-
Added tests and documentation for the two new formatter types
-
Ensure that AdvancedHTMLMiniFormatter is exported by the AdvancedHTMLParser.init.py
-
Add both new SlimTag formatters to be exported by AdvancedHTMLParser.init.py
-
Update the version reference within the pydoc url within the READMEs to bypass caching of previous versions
- 8.1.1 - Oct 15 2018
-
Add AdvancedHTMLMiniFormatter to top-module level (so from AdvancedHTMLParser import AdvancedHTMLMiniFormatter works as well as from AdvancedHTMLParser.Formatter import AdvancedHTMLMiniFormatter)
-
Update "formatHTML" program
-
Expand --help - Now documents the options better.
-
Document the previously-implemented but unadvertised --indent=' ' argument to formatHTML, to set the level-indentation
-
Add "-p" or "--pretty" to toggle pretty-printer on formatHTML program (default mode)
-
Add "-m" or "--mini" to toggle the mini-printer on formatHTML program (new)
-
- 8.1.0 - Oct 15 2018
-
Fix an issue where .classNames became no longer an attribute. [ Bug report and solution validation found by github user mninc [ https://github.com/mninc] ]
-
Fix an issue where under certain conditions binary attributes would have a value of string 'None' (like hidden="None" instead of just hidden in the output) [ Bug report and solution validation found by github user UntoSten [ https://github.com/UntoSten ] ]
-
Expand unit tests to explicitly test the above two scenarios
-
Fixed IndexedAdvancedHTMLParser not working in some conditions due to a typo in a previous change
-
Added a new formatter to AdvancedHTMLParser.Formatter - AdvancedHTMLMiniFormatter which will output mini html.
This will have all non-functional whitespace removed (keeping single-spaces which take up 1 character width), and provide no indentation.
For example, the following:
'''<title>Hello World</title>
If parsed and run through AdvancedHTMLMiniFormatter would come out as:
'<title >Hello World</title>
retaining a space where one would not be ignored before, but removing all non-disregarded whitespace.
This feature is available on an AdvancedHTMLParser.AdvancedHTMLParser object via the new method "getMiniHTML"
As a reminder, "getHTML()" on a parser will retain all original whitespace,
"getFormattedHTML()" with an optional "indent" parameter (default 4 spaces per line) will pretty-print your HTML
and now "getMiniHTML()" will minify it.