Skip to content

Element inheritance change

Abel Cheung edited this page Apr 11, 2023 · 4 revisions

Summary

Since release 2023.3.28, some incompatible change of lxml Element inheritance is introduced, primarily intended to simplify type checking usage.

TL;DR

  1. lxml.html.HtmlMixin methods and properties are merged into HtmlElement.
  2. HtmlComment and HtmlEntity used to be a subclass of HtmlMixin; now they inherit from HtmlElement instead. (Marked in pink, see 2nd diagram below)
  3. HtmlProcessingInstruction is completely removed.

Rationale

One of the biggest surprise for annotation is that, though _Comment, _PI and _Entity are inherited from _Element, the HTML counterparts are not true for HtmlElement — they exist independently.

Their common denominator is a mixin class (HtmlMixin), which unfortunately contains quite a few methods / properties that only make sense in HtmlElement (for example, HtmlComment never contains any class attribute nor associates with any label element).

Inheritance in source

lxml.etree._Element and their descendants in source code can be illustrted by following abirdged diagram.

  • Most prominent element classes from each package are colored in teal.
  • lxml.html and lxml.objectify package are stripped down for simplicity, subelements from these subpackages do not play any important role in the topic discussed here.

Diagram showing inheritance of lxml Element in source code

Inheritance in stub

Following diagram illustrates how inheritance is changed in types-lxml stub.

Diagram showing inheritance of lxml Element in types-lxml stub

Notes about diagrams

Please visit this gist to access all UML source of diagrams as well as their description.