Skip to content

Releases: kata198/AdvancedHTMLParser

9.0.2

17 Apr 23:44
Compare
Choose a tag to compare

9.0.2 - Not Dead! Python 3.9 Updates

  • 9.0.2 - Apr 17 2023
  • Fixed a compatibility issue with python 3.9 in xpath
  • Fixed all warnings with python > 3.6
  • Fixed some tests which displayed failure when there was no problem
  • 9.0.1 - Feb 12 2020
  • Fix installation issue under some conditions

9.0.1 - XPath Engine-er!!!

12 Feb 22:10
Compare
Choose a tag to compare
  • 9.0.1 - Feb 12 2020
  • Fix installation issue under some conditions
  • 9.0.0 - Jan 16 2020
  • (8.9.9 - beta release 1)
  • XPath engine. See new function "getElementsByXPathExpression" on parser,
    tags, and tag collections.

  • Implement many XPath features, some less-used items are not yet implemented
    (will raise an exception if you try to use them)

9.0.0 - XPath Engine!

16 Jan 21:45
Compare
Choose a tag to compare
  • 9.0.0 - Jan 16 2020
  • (8.9.9 - beta release 1)
  • XPath engine. See new function "getElementsByXPathExpression" on parser,
    tags, and tag collections.

  • Implement many XPath features, some less-used items are not yet implemented
    (will raise an exception if you try to use them)

9.0.0 Beta1 (8.9.9) - XPath Engine!

03 Dec 22:32
Compare
Choose a tag to compare
Pre-release
  • 9.0.0 - ??? ?? ????
  • (8.9.9 - beta release 1)
  • XPath engine. See new function "getElementsByXPathExpression" on parser,
    tags, and tag collections.

README describes some more. Most XPath usages I have seen or used myself work. I have not implemented all functions and obscure usages yet. You will know it's not implemented by exception raised. If you want a specific feature, let me know and I will make it a priority!

Otherwise, enjoy this beta release with xpath support!

.getElementsByXPathExpression (or alias .getElementsByXPath) on parser, tags, and tag collections (same places as getElementsByName, for example).

8.1.8 - Fixed re-release of 8.1.7

22 Jul 08:01
Compare
Choose a tag to compare
  • 8.1.8 - Jul 22 2019
  • Fix accidental re-release of 8.1.6 to github, bump version to signify
  • 8.1.7 - Jul 20 2019
  • Update all forms of getElementsByClassName to support multiple classes in a single call, space-separated in a string, per update to spec.

8.1.7 - Sorry For Delay

20 Jul 06:08
Compare
Choose a tag to compare

Sorry guys for the delay, got a lot going on right now.

  • 8.1.7 - Jul 20 2019
  • Update all forms of getElementsByClassName to support multiple classes in a single call, space-separated in a string, per update to spec.

8.1.6 - Coming Back Around

22 Jun 02:31
Compare
Choose a tag to compare

Alright guys, I'm coming back to this. I've got a couple enhancement requests I"m going to implement, and will also add xpath engine soon. Stay tuned. As always -- backwards compatible.

  • 8.1.6 - Jun 21 2019
  • Added AdvancedHTMLParser.AdvancedHTMLParser.setDoctype method, which can be used to set the doctype, or clear the doctype, from the output .getHTML will produce

  • Added related doctype tests, assert we parse it correctly, and that setDoctype works correctly

8.1.5 - Still Alive

03 May 16:49
Compare
Choose a tag to compare
  • 8.1.5 - May 3 2019
  • Expand some docstrings, fix copyright notices

  • Add attribute-name validation. The base HTML parser will feed us invalid names, for example <div id="abc"; name="hello"> will feed us a name ';'.

    • The standard AdvancedHTMLParser remains best-effort, and will ignore any invalid attribute names when parsing a file/string, but will raise KeyError if you use the .setAttribute method with an invalid name. This allows us to survive parsing more error-ridden files.

    • ValidatingAdvancedHTMLParser will now raise a new kind of exception - InvalidAttributeNameException if an invalid attribute name is encountered during parsing.

  • Update tests for new validating and attribute name issues

  • Strip trailing whitespace from all files

8.1.4 - Glorious Golden Lobster Edition

15 Nov 00:01
Compare
Choose a tag to compare
  • 8.1.4 - Nov 14 2018
  • Expand documentation in README

  • Add "slim" option to formatHTML, available with either -s or --slim. This will use either the AdvancedHTMLSlimTagFormatter (if in pretty mode, default) or AdvancedHTMLSlimTagMiniFormatter (if in mini mode, -m or --mini)

  • Intercept control+c in formatHTML when reading from stdin and exit cleanly instead of displaying error message

  • Add "--version" switch to formatHTML to print the AdvancedHTMLParser suite version

  • 8.1.3 - Oct 16 2018
  • Fix python2 inheritance issue with new SlimTag formatters
  • 8.1.2 - Oct 16 2018
  • Add two new formatters to AdvancedHTMLParser.Formatter - AdvancedHTMLSlimTagFormatter and AdvancedHTMLSlimTagMiniFormatter

    • These represent the pretty-printer and mini-printer respectively, but will omit the trailing space on start tags,

      e.x. will become

    • By default, self-closing tags will retain their trailing space, e.x.


      .
      This is for xhtml compatibility so that the "/" does not become part of the previous attribute or its own attribute

      This can be toggled-off by passing "slimSelfClosing=True" to either of the new formatters, and your output will be


  • Added tests and documentation for the two new formatter types

  • Ensure that AdvancedHTMLMiniFormatter is exported by the AdvancedHTMLParser.init.py

  • Add both new SlimTag formatters to be exported by AdvancedHTMLParser.init.py

  • Update the version reference within the pydoc url within the READMEs to bypass caching of previous versions

  • 8.1.1 - Oct 15 2018
  • Add AdvancedHTMLMiniFormatter to top-module level (so from AdvancedHTMLParser import AdvancedHTMLMiniFormatter works as well as from AdvancedHTMLParser.Formatter import AdvancedHTMLMiniFormatter)

  • Update "formatHTML" program

    • Expand --help - Now documents the options better.

    • Document the previously-implemented but unadvertised --indent=' ' argument to formatHTML, to set the level-indentation

    • Add "-p" or "--pretty" to toggle pretty-printer on formatHTML program (default mode)

    • Add "-m" or "--mini" to toggle the mini-printer on formatHTML program (new)

  • 8.1.0 - Oct 15 2018
  • Fix an issue where .classNames became no longer an attribute. [ Bug report and solution validation found by github user mninc [ https://github.com/mninc] ]

  • Fix an issue where under certain conditions binary attributes would have a value of string 'None' (like hidden="None" instead of just hidden in the output) [ Bug report and solution validation found by github user UntoSten [ https://github.com/UntoSten ] ]

  • Expand unit tests to explicitly test the above two scenarios

  • Fixed IndexedAdvancedHTMLParser not working in some conditions due to a typo in a previous change

  • Added a new formatter to AdvancedHTMLParser.Formatter - AdvancedHTMLMiniFormatter which will output mini html.

    This will have all non-functional whitespace removed (keeping single-spaces which take up 1 character width), and provide no indentation.

    For example, the following:

'''<title>Hello World</title>

Hello world And welcome to the show.
'''

If parsed and run through AdvancedHTMLMiniFormatter would come out as:

'<title >Hello World</title>

Hello world And welcome to the show.
'

retaining a space where one would not be ignored before, but removing all non-disregarded whitespace.

This feature is available on an AdvancedHTMLParser.AdvancedHTMLParser object via the new method "getMiniHTML"

As a reminder, "getHTML()" on a parser will retain all original whitespace,

"getFormattedHTML()" with an optional "indent" parameter (default 4 spaces per line) will pretty-print your HTML

and now "getMiniHTML()" will minify it.

8.1.3 - Stablest and Ablest

17 Oct 04:10
Compare
Choose a tag to compare
  • 8.1.3 - Oct 16 2018
  • Fix python2 inheritance issue with new SlimTag formatters
  • 8.1.2 - Oct 16 2018
  • Add two new formatters to AdvancedHTMLParser.Formatter - AdvancedHTMLSlimTagFormatter and AdvancedHTMLSlimTagMiniFormatter

    • These represent the pretty-printer and mini-printer respectively, but will omit the trailing space on start tags,

      e.x. will become

    • By default, self-closing tags will retain their trailing space, e.x.


      .
      This is for xhtml compatibility so that the "/" does not become part of the previous attribute or its own attribute

      This can be toggled-off by passing "slimSelfClosing=True" to either of the new formatters, and your output will be


  • Added tests and documentation for the two new formatter types

  • Ensure that AdvancedHTMLMiniFormatter is exported by the AdvancedHTMLParser.init.py

  • Add both new SlimTag formatters to be exported by AdvancedHTMLParser.init.py

  • Update the version reference within the pydoc url within the READMEs to bypass caching of previous versions

  • 8.1.1 - Oct 15 2018
  • Add AdvancedHTMLMiniFormatter to top-module level (so from AdvancedHTMLParser import AdvancedHTMLMiniFormatter works as well as from AdvancedHTMLParser.Formatter import AdvancedHTMLMiniFormatter)

  • Update "formatHTML" program

    • Expand --help - Now documents the options better.

    • Document the previously-implemented but unadvertised --indent=' ' argument to formatHTML, to set the level-indentation

    • Add "-p" or "--pretty" to toggle pretty-printer on formatHTML program (default mode)

    • Add "-m" or "--mini" to toggle the mini-printer on formatHTML program (new)

  • 8.1.0 - Oct 15 2018
  • Fix an issue where .classNames became no longer an attribute. [ Bug report and solution validation found by github user mninc [ https://github.com/mninc] ]

  • Fix an issue where under certain conditions binary attributes would have a value of string 'None' (like hidden="None" instead of just hidden in the output) [ Bug report and solution validation found by github user UntoSten [ https://github.com/UntoSten ] ]

  • Expand unit tests to explicitly test the above two scenarios

  • Fixed IndexedAdvancedHTMLParser not working in some conditions due to a typo in a previous change

  • Added a new formatter to AdvancedHTMLParser.Formatter - AdvancedHTMLMiniFormatter which will output mini html.

    This will have all non-functional whitespace removed (keeping single-spaces which take up 1 character width), and provide no indentation.

    For example, the following:

'''<title>Hello World</title>

Hello world And welcome to the show.
'''

If parsed and run through AdvancedHTMLMiniFormatter would come out as:

'<title >Hello World</title>

Hello world And welcome to the show.
'

retaining a space where one would not be ignored before, but removing all non-disregarded whitespace.

This feature is available on an AdvancedHTMLParser.AdvancedHTMLParser object via the new method "getMiniHTML"

As a reminder, "getHTML()" on a parser will retain all original whitespace,

"getFormattedHTML()" with an optional "indent" parameter (default 4 spaces per line) will pretty-print your HTML

and now "getMiniHTML()" will minify it.