Releases · kata198/AdvancedHTMLParser

17 Apr 23:44

kata198

9.0.2

35cec4d

9.0.2 Latest

Latest

9.0.2 - Not Dead! Python 3.9 Updates

9.0.2 - Apr 17 2023

Fixed a compatibility issue with python 3.9 in xpath
Fixed all warnings with python > 3.6
Fixed some tests which displayed failure when there was no problem

9.0.1 - Feb 12 2020

Fix installation issue under some conditions

Assets 2

12 Feb 22:10

kata198

9.0.1

53d5a75

9.0.1 - XPath Engine-er!!!

9.0.1 - Feb 12 2020

Fix installation issue under some conditions

9.0.0 - Jan 16 2020
(8.9.9 - beta release 1)

XPath engine. See new function "getElementsByXPathExpression" on parser,
tags, and tag collections.
Implement many XPath features, some less-used items are not yet implemented
(will raise an exception if you try to use them)

Assets 2

16 Jan 21:45

kata198

9.0.0

3fe4cae

9.0.0 - XPath Engine!

9.0.0 - Jan 16 2020
(8.9.9 - beta release 1)

XPath engine. See new function "getElementsByXPathExpression" on parser,
tags, and tag collections.
Implement many XPath features, some less-used items are not yet implemented
(will raise an exception if you try to use them)

Assets 2

03 Dec 22:32

kata198

8.9.9

d3b6549

9.0.0 Beta1 (8.9.9) - XPath Engine! Pre-release

Pre-release

9.0.0 - ??? ?? ????
(8.9.9 - beta release 1)

XPath engine. See new function "getElementsByXPathExpression" on parser,
tags, and tag collections.

README describes some more. Most XPath usages I have seen or used myself work. I have not implemented all functions and obscure usages yet. You will know it's not implemented by exception raised. If you want a specific feature, let me know and I will make it a priority!

Otherwise, enjoy this beta release with xpath support!

.getElementsByXPathExpression (or alias .getElementsByXPath) on parser, tags, and tag collections (same places as getElementsByName, for example).

Assets 2

22 Jul 08:01

kata198

8.1.8

8e0b2be

8.1.8 - Fixed re-release of 8.1.7

8.1.8 - Jul 22 2019

Fix accidental re-release of 8.1.6 to github, bump version to signify

8.1.7 - Jul 20 2019

Update all forms of getElementsByClassName to support multiple classes in a single call, space-separated in a string, per update to spec.

Assets 2

20 Jul 06:08

kata198

8.1.7

0036f64

8.1.7 - Sorry For Delay

Sorry guys for the delay, got a lot going on right now.

8.1.7 - Jul 20 2019

Update all forms of getElementsByClassName to support multiple classes in a single call, space-separated in a string, per update to spec.

Assets 2

22 Jun 02:31

kata198

8.1.6

0036f64

8.1.6 - Coming Back Around

Alright guys, I'm coming back to this. I've got a couple enhancement requests I"m going to implement, and will also add xpath engine soon. Stay tuned. As always -- backwards compatible.

8.1.6 - Jun 21 2019

Added AdvancedHTMLParser.AdvancedHTMLParser.setDoctype method, which can be used to set the doctype, or clear the doctype, from the output .getHTML will produce
Added related doctype tests, assert we parse it correctly, and that setDoctype works correctly

Assets 2

03 May 16:49

kata198

8.1.5

9d49f4d

8.1.5 - Still Alive

8.1.5 - May 3 2019

Expand some docstrings, fix copyright notices
Add attribute-name validation. The base HTML parser will feed us invalid names, for example <div id="abc"; name="hello"> will feed us a name ';'.
- The standard AdvancedHTMLParser remains best-effort, and will ignore any invalid attribute names when parsing a file/string, but will raise KeyError if you use the .setAttribute method with an invalid name. This allows us to survive parsing more error-ridden files.
- ValidatingAdvancedHTMLParser will now raise a new kind of exception - InvalidAttributeNameException if an invalid attribute name is encountered during parsing.
Update tests for new validating and attribute name issues
Strip trailing whitespace from all files

Assets 2

15 Nov 00:01

kata198

8.1.4

ab6c3eb

8.1.4 - Glorious Golden Lobster Edition

8.1.4 - Nov 14 2018

Expand documentation in README
Add "slim" option to formatHTML, available with either -s or --slim. This will use either the AdvancedHTMLSlimTagFormatter (if in pretty mode, default) or AdvancedHTMLSlimTagMiniFormatter (if in mini mode, -m or --mini)
Intercept control+c in formatHTML when reading from stdin and exit cleanly instead of displaying error message
Add "--version" switch to formatHTML to print the AdvancedHTMLParser suite version

8.1.3 - Oct 16 2018

Fix python2 inheritance issue with new SlimTag formatters

8.1.2 - Oct 16 2018

Add two new formatters to AdvancedHTMLParser.Formatter - AdvancedHTMLSlimTagFormatter and AdvancedHTMLSlimTagMiniFormatter
- These represent the pretty-printer and mini-printer respectively, but will omit the trailing space on start tags,
  
  e.x. will become
- By default, self-closing tags will retain their trailing space, e.x.
  .
  This is for xhtml compatibility so that the "/" does not become part of the previous attribute or its own attribute
  
  This can be toggled-off by passing "slimSelfClosing=True" to either of the new formatters, and your output will be
Added tests and documentation for the two new formatter types
Ensure that AdvancedHTMLMiniFormatter is exported by the AdvancedHTMLParser.init.py
Add both new SlimTag formatters to be exported by AdvancedHTMLParser.init.py
Update the version reference within the pydoc url within the READMEs to bypass caching of previous versions

8.1.1 - Oct 15 2018

Add AdvancedHTMLMiniFormatter to top-module level (so from AdvancedHTMLParser import AdvancedHTMLMiniFormatter works as well as from AdvancedHTMLParser.Formatter import AdvancedHTMLMiniFormatter)
Update "formatHTML" program
- Expand --help - Now documents the options better.
- Document the previously-implemented but unadvertised --indent=' ' argument to formatHTML, to set the level-indentation
- Add "-p" or "--pretty" to toggle pretty-printer on formatHTML program (default mode)
- Add "-m" or "--mini" to toggle the mini-printer on formatHTML program (new)

8.1.0 - Oct 15 2018

Fix an issue where .classNames became no longer an attribute. [ Bug report and solution validation found by github user mninc [ https://github.com/mninc] ]
Fix an issue where under certain conditions binary attributes would have a value of string 'None' (like hidden="None" instead of just hidden in the output) [ Bug report and solution validation found by github user UntoSten [ https://github.com/UntoSten ] ]
Expand unit tests to explicitly test the above two scenarios
Fixed IndexedAdvancedHTMLParser not working in some conditions due to a typo in a previous change
Added a new formatter to AdvancedHTMLParser.Formatter - AdvancedHTMLMiniFormatter which will output mini html.

This will have all non-functional whitespace removed (keeping single-spaces which take up 1 character width), and provide no indentation.

For example, the following:

'''<title>Hello World</title>

Hello world And welcome to the show.

'''

If parsed and run through AdvancedHTMLMiniFormatter would come out as:

'<title >Hello World</title>

Hello world And welcome to the show.

retaining a space where one would not be ignored before, but removing all non-disregarded whitespace.

This feature is available on an AdvancedHTMLParser.AdvancedHTMLParser object via the new method "getMiniHTML"

As a reminder, "getHTML()" on a parser will retain all original whitespace,

"getFormattedHTML()" with an optional "indent" parameter (default 4 spaces per line) will pretty-print your HTML

and now "getMiniHTML()" will minify it.

Assets 2

17 Oct 04:10

kata198

8.1.3

01fa4fa

8.1.3 - Stablest and Ablest

8.1.3 - Oct 16 2018

Fix python2 inheritance issue with new SlimTag formatters

8.1.2 - Oct 16 2018

Add two new formatters to AdvancedHTMLParser.Formatter - AdvancedHTMLSlimTagFormatter and AdvancedHTMLSlimTagMiniFormatter
- These represent the pretty-printer and mini-printer respectively, but will omit the trailing space on start tags,
  
  e.x. will become
- By default, self-closing tags will retain their trailing space, e.x.
  .
  This is for xhtml compatibility so that the "/" does not become part of the previous attribute or its own attribute
  
  This can be toggled-off by passing "slimSelfClosing=True" to either of the new formatters, and your output will be
Added tests and documentation for the two new formatter types
Ensure that AdvancedHTMLMiniFormatter is exported by the AdvancedHTMLParser.init.py
Add both new SlimTag formatters to be exported by AdvancedHTMLParser.init.py
Update the version reference within the pydoc url within the READMEs to bypass caching of previous versions

8.1.1 - Oct 15 2018

Add AdvancedHTMLMiniFormatter to top-module level (so from AdvancedHTMLParser import AdvancedHTMLMiniFormatter works as well as from AdvancedHTMLParser.Formatter import AdvancedHTMLMiniFormatter)
Update "formatHTML" program
- Expand --help - Now documents the options better.
- Document the previously-implemented but unadvertised --indent=' ' argument to formatHTML, to set the level-indentation
- Add "-p" or "--pretty" to toggle pretty-printer on formatHTML program (default mode)
- Add "-m" or "--mini" to toggle the mini-printer on formatHTML program (new)

8.1.0 - Oct 15 2018

Fix an issue where .classNames became no longer an attribute. [ Bug report and solution validation found by github user mninc [ https://github.com/mninc] ]
Fix an issue where under certain conditions binary attributes would have a value of string 'None' (like hidden="None" instead of just hidden in the output) [ Bug report and solution validation found by github user UntoSten [ https://github.com/UntoSten ] ]
Expand unit tests to explicitly test the above two scenarios
Fixed IndexedAdvancedHTMLParser not working in some conditions due to a typo in a previous change
Added a new formatter to AdvancedHTMLParser.Formatter - AdvancedHTMLMiniFormatter which will output mini html.

This will have all non-functional whitespace removed (keeping single-spaces which take up 1 character width), and provide no indentation.

For example, the following:

'''<title>Hello World</title>

Hello world And welcome to the show.

'''

If parsed and run through AdvancedHTMLMiniFormatter would come out as:

'<title >Hello World</title>

Hello world And welcome to the show.

retaining a space where one would not be ignored before, but removing all non-disregarded whitespace.

This feature is available on an AdvancedHTMLParser.AdvancedHTMLParser object via the new method "getMiniHTML"

As a reminder, "getHTML()" on a parser will retain all original whitespace,

"getFormattedHTML()" with an optional "indent" parameter (default 4 spaces per line) will pretty-print your HTML

and now "getMiniHTML()" will minify it.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: kata198/AdvancedHTMLParser

9.0.2

9.0.1 - XPath Engine-er!!!

9.0.0 - XPath Engine!

9.0.0 Beta1 (8.9.9) - XPath Engine!

8.1.8 - Fixed re-release of 8.1.7

8.1.7 - Sorry For Delay

8.1.6 - Coming Back Around

8.1.5 - Still Alive

8.1.4 - Glorious Golden Lobster Edition

8.1.3 - Stablest and Ablest