Update dependency lxml to v4 [SECURITY] #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==3.8.0
->==4.9.1
Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
GitHub Vulnerability Alerts
CVE-2020-27783
A XSS vulnerability was discovered in python-lxml's clean module. The module's parser didn't properly imitate browsers, which caused different behaviors between the sanitizer and the user's page. A remote attacker could exploit this flaw to run arbitrary HTML/JS code.
CVE-2021-28957
An XSS vulnerability was discovered in the python
lxml
clean module versions before 4.6.3. When disablingthe safe_attrs_only
andforms
arguments, theCleaner
class does not remove theformaction
attribute allowing for JS to bypass the sanitizer. A remote attacker could exploit this flaw to run arbitrary JS code on users who interact with incorrectly sanitized HTML. This issue is patched inlxml
4.6.3.CVE-2021-43818
Impact
The HTML Cleaner in lxml.html lets certain crafted script content pass through, as well as script content in SVG files embedded using data URIs.
Users that employ the HTML cleaner in a security relevant context should upgrade to lxml 4.6.5.
Patches
The issue has been resolved in lxml 4.6.5.
Workarounds
None.
References
The issues are tracked under the report IDs GHSL-2021-1037 and GHSL-2021-1038.
CVE-2018-19787
An issue was discovered in lxml before 4.2.5. lxml/html/clean.py in the lxml.html.clean module does not remove javascript: URLs that use escaping, allowing a remote attacker to conduct XSS attacks, as demonstrated by "j a v a s c r i p t:" in Internet Explorer. This is a similar issue to CVE-2014-3146.
CVE-2022-2309
NULL Pointer Dereference allows attackers to cause a denial of service (or application crash). This only applies when lxml is used together with libxml2 2.9.10 through 2.9.14. libxml2 2.9.9 and earlier are not affected. It allows triggering crashes through forged input data, given a vulnerable code sequence in the application. The vulnerability is caused by the iterwalk function (also used by the canonicalize function). Such code shouldn't be in wide-spread use, given that parsing + iterwalk would usually be replaced with the more efficient iterparse function. However, an XML converter that serialises to C14N would also be vulnerable, for example, and there are legitimate use cases for this code sequence. If untrusted input is received (also remotely) and processed via iterwalk function, a crash can be triggered.
Release Notes
lxml/lxml (lxml)
v4.9.1
Compare Source
==================
Bugs fixed
iterwalk()
(orcanonicalize()
)after parsing certain incorrect input. Note that
iterwalk()
can crashon valid input parsed with the same parser after failing to parse the
incorrect input.
v4.9.0
Compare Source
==================
Bugs fixed
lxml.html
was corrected.Patch by xmo-odoo.
Other changes
Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.
Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35
(libxml2 2.9.12+ and libxslt 1.1.34 on Windows).
GH#343: Windows-AArch64 build support in Visual Studio.
Patch by Steve Dower.
v4.8.0
Compare Source
==================
Features added
GH#337: Path-like objects are now supported throughout the API instead of just strings.
Patch by Henning Janssen.
The
ElementMaker
now supportsQName
values as tags, which always overridethe default namespace of the factory.
Bugs fixed
lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively.
Patch by Tobias Deiminger.
Other changes
v4.7.1
Compare Source
==================
Features added
parser.feed()
now encodes the input datato the native UTF-8 encoding directly, instead of going through
Py_UNICODE
/wchar_t
encoding first, which previously required duplicate recoding in most cases.Bugs fixed
The standard namespace prefixes were mishandled during "C14N2" serialisation on Python 3.
See https://mail.python.org/archives/list/lxml@python.org/thread/6ZFBHFOVHOS5GFDOAMPCT6HM5HZPWQ4Q/
lxml.objectify
previously accepted non-XML numbers with underscores (like "1_000")as integers or float values in Python 3.6 and later. It now adheres to the number
format of the XML spec again.
LP#1939031: Static wheels of lxml now contain the header files of zlib and libiconv
(in addition to the already provided headers of libxml2/libxslt/libexslt).
Other changes
v4.6.5
Compare Source
==================
Bugs fixed
A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script
content through SVG images (CVE-2021-43818).
A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script
content through CSS imports and other crafted constructs (CVE-2021-43818).
v4.6.4
Compare Source
==================
Features added
GH#317: A new property
system_url
was added to DTD entities.Patch by Thirdegree.
GH#314: The
STATIC_*
variables insetup.py
can now be passed via env vars.Patch by Isaac Jurado.
v4.6.3
Compare Source
==================
Bugs fixed
which allowed JavaScript to pass through. The cleaner now removes the HTML5
formaction
attribute.v4.6.2
Compare Source
==================
Bugs fixed
which allowed JavaScript to pass through. The cleaner now removes more sneaky
"style" content.
v4.6.1
Compare Source
==================
Bugs fixed
JavaScript to pass through. The cleaner now removes more sneaky "style" content.
v4.6.0
Compare Source
==================
Features added
GH#310:
lxml.html.InputGetter
supports__len__()
to count the number of input fields.Patch by Aidan Woolley.
lxml.html.InputGetter
has a new.items()
method to ease processing all input fields.lxml.html.InputGetter.keys()
now returns the field names in document order.GH-309: The API documentation is now generated using
sphinx-apidoc
.Patch by Chris Mayo.
Bugs fixed
LP#1869455: C14N 2.0 serialisation failed for unprefixed attributes
when a default namespace was defined.
TreeBuilder.close()
raisedAssertionError
in some error cases where itshould have raised
XMLSyntaxError
. It now raises a combined exception tokeep up backwards compatibility, while switching to
XMLSyntaxError
as aninterface.
v4.5.2
Compare Source
==================
Bugs fixed
Cleaner()
now validates that only known configuration options can be set.LP#1882606:
Cleaner.clean_html()
discarded comments and PIs regardless of thecorresponding configuration option, if
remove_unknown_tags
was set.LP#1880251: Instead of globally overwriting the document loader in libxml2, lxml now
sets it per parser run, which improves the interoperability with other users of libxml2
such as libxmlsec.
LP#1881960: Fix build in CPython 3.10 by using Cython 0.29.21.
The setup options "--with-xml2-config" and "--with-xslt-config" were accidentally renamed
to "--xml2-config" and "--xslt-config" in 4.5.1 and are now available again.
v4.5.1
Compare Source
==================
Bugs fixed
LP#1570388: Fix failures when serialising documents larger than 2GB in some cases.
LP#1865141, GH#298:
QName
values were not accepted by theel.iter()
method.Patch by xmo-odoo.
LP#1863413, GH#297: The build failed to detect libraries on Linux that are only
configured via pkg-config.
Patch by Hugh McMaster.
v4.5.0
Compare Source
==================
Features added
indent()
was added to insert tail whitespace for pretty-printingan XML tree.
Bugs fixed
deletion disappeared silently instead of sticking with the node that was removed.
Other changes
MacOS builds are 64-bit-only by default.
Set CFLAGS and LDFLAGS explicitly to override it.
Linux/MacOS Binary wheels now use libxml2 2.9.10 and libxslt 1.1.34.
LP#1840234: The package version number is now available as
lxml.__version__
.v4.4.3
Compare Source
==================
Bugs fixed
itertext()
was missing tail text of comments and PIs since 4.4.0.v4.4.2
Compare Source
==================
Bugs fixed
ElementInclude
incorrectly rejected repeated non-recursiveincludes as recursive.
Patch by Rainer Hausdorf.
v4.4.1
Compare Source
==================
Bugs fixed
LP#1838252: The order of an OrderedDict was lost in 4.4.0 when passing it as
attrib mapping during element creation.
LP#1838521: The package metadata now lists the supported Python versions.
v4.4.0
Compare Source
==================
Features added
Element.clear()
accepts a new keyword argumentkeep_tail=True
to cleareverything but the tail text. This is helpful in some document-style use cases
and for clearing the current element in
iterparse()
and pull parsing.When creating attributes or namespaces from a dict in Python 3.6+, lxml now
preserves the original insertion order of that dict, instead of always sorting
the items by name. A similar change was made for ElementTree in CPython 3.8.
See https://bugs.python.org/issue34160
Integer elements in
lxml.objectify
implement the__index__()
special method.GH#269: Read-only elements in XSLT were missing the
nsmap
property.Original patch by Jan Pazdziora.
ElementInclude can now restrict the maximum inclusion depth via a
max_depth
argument to prevent content explosion. It is limited to 6 by default.
The
target
object of the XMLParser can havestart_ns()
andend_ns()
callback methods to listen to namespace declarations.
The
TreeBuilder
has new argumentscomment_factory
andpi_factory
topass factories for creating comments and processing instructions, as well as
flag arguments
insert_comments
andinsert_pis
to discard them from thetree when set to false.
A
C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>
_ implementation was added asetree.canonicalize()
, a correspondingC14NWriterTarget
class, anda
c14n2
serialisation method.Bugs fixed
When writing to file paths that contain the URL escape character '%', the file
path could wrongly be mangled by URL unescaping and thus write to a different
file or directory. Code that writes to file paths that are provided by untrusted
sources, but that must work with previous versions of lxml, should best either
reject paths that contain '%' characters, or otherwise make sure that the path
does not contain maliciously injected '%XX' URL hex escapes for paths like '../'.
Assigning to Element child slices with negative step could insert the slice at
the wrong position, starting too far on the left.
Assigning to Element child slices with overly large step size could take very
long, regardless of the length of the actual slice.
Assigning to Element child slices of the wrong size could sometimes fail to
raise a ValueError (like a list assignment would) and instead assign outside
of the original slice bounds or leave parts of it unreplaced.
The
comment
andpi
events initerwalk()
were never triggered, andinstead, comments and processing instructions in the tree were reported as
start
elements. Also, when walking an ElementTree (as opposed to its rootelement), comments and PIs outside of the root element are now reported.
LP#1827833: The RelaxNG compact syntax support was broken with recent versions
of
rnc2rng
.LP#1758553: The HTML elements
source
andtrack
were added to the listof empty tags in
lxml.html.defs
.Registering a prefix other than "xml" for the XML namespace is now rejected.
Failing to write XSLT output to a file could raise a misleading exception.
It now raises
IOError
.Other changes
Support for Python 3.4 was removed.
When using
Element.find*()
with prefix-namespace mappings, the empty stringis now accepted to define a default namespace, in addition to the previously
supported
None
prefix. Empty strings are more convenient since they keepall prefix keys in a namespace dict strings, which simplifies sorting etc.
The
ElementTree.write_c14n()
method has been deprecated in favour of thelong preferred
ElementTree.write(f, method="c14n")
. It will be removedin a future release.
v4.3.5
Compare Source
==================
v4.3.4
Compare Source
==================
v4.3.3
Compare Source
==================
Bugs fixed
_XSLTResultTree.write_output()
.v4.3.2
Compare Source
==================
Bugs fixed
Other changes
v4.3.0
Compare Source
==================
Features added
The module
lxml.sax
is compiled using Cython in order to speed it up.GH#267:
lxml.sax.ElementTreeProducer
now preserves the namespace prefixes.If two prefixes point to the same URI, the first prefix in alphabetical order
is used. Patch by Lennart Regebro.
Updated ISO-Schematron implementation to 2013 version (now MIT licensed)
and the corresponding schema to the 2016 version (with optional "properties").
Other changes
GH#270, GH#271: Support for Python 2.6 and 3.3 was removed.
Patch by hugovk.
The minimum dependency versions were raised to libxml2 2.9.2 and libxslt 1.1.27,
which were released in 2014 and 2012 respectively.
Built with Cython 0.29.2.
v4.2.6
Compare Source
==================
Bugs fixed
LP#1799755: Fix a DeprecationWarning in Py3.7+.
Import warnings in Python 3.6+ were resolved.
v4.2.5
Compare Source
==================
Bugs fixed
Security problem found by Omar Eissa. (CVE-2018-19787)
v4.2.4
Compare Source
==================
Features added
pkg-config
for build configuration.Patch by Patrick Griffis.
Bugs fixed
Element.insert()
.Patch by Alexander Weggerle.
v4.2.3
Compare Source
==================
Bugs fixed
v4.2.2
Compare Source
==================
Bugs fixed
GH#266: Fix sporadic crash during GC when parse-time schema validation is used
and the parser participates in a reference cycle.
Original patch by Julien Greard.
GH#265: lxml no longer links against zlib as a shared library, only on static builds.
Patch by Nehal J Wani.
v4.2.1
Compare Source
==================
Bugs fixed
LP#1755825:
iterwalk()
failed to return the 'start' event for the initialelement if a tag selector is used.
LP#1756314: Failure to import 4.2.0 into PyPy due to a missing library symbol.
LP#1727864, GH#258: Add "-isysroot" linker option on MacOS as needed by XCode 9.
v4.2.0
Compare Source
==================
Features added
GH#255:
SelectElement.value
returns more standard-compliant andbrowser-like defaults for non-multi-selects. If no option is selected, the
value of the first option is returned (instead of None). If multiple options
are selected, the value of the last one is returned (instead of that of the
first one). If no options are present (not standard-compliant)
SelectElement.value
still returnsNone
.GH#261: The
HTMLParser()
now supports thehuge_tree
option.Patch by stranac.
Bugs fixed
LP#1551797: Some XSLT messages were not captured by the transform error log.
LP#1737825: Crash at shutdown after an interrupted iterparse run with XMLSchema
validation.
Other changes
v4.1.1
Compare Source
==================
v4.1.0
Compare Source
==================
Features added
ElementPath supports text predicates for current node, like "[.='text']".
ElementPath allows spaces in predicates.
Custom Element classes and XPath functions can now be registered with a
decorator rather than explicit dict assignments.
Static Linux wheels are now built with link time optimisation (LTO) enabled.
This should have a beneficial impact on the overall performance by providing
a tighter compiler integration between lxml and libxml2/libxslt.
Bugs fixed
PythonElementClassLookup
could fail with a TypeError.v4.0.0
Compare Source
==================
Features added
The ElementPath implementation is now compiled using Cython,
which speeds up the
.find*()
methods quite significantly.The modules
lxml.builder
,lxml.html.diff
andlxml.html.clean
are also compiled using Cython in order to speed them up.
xmlfile()
supports async coroutines usingasync with
andawait
.iterwalk()
has a new methodskip_subtree()
that prevents walking intothe descendants of the current element.
RelaxNG.from_rnc_string()
accepts abase_url
argument toallow relative resource lookups.
The XSLT result object has a new method
.write_output(file)
that serialisesoutput data into a file according to the
<xsl:output>
configuration.Bugs fixed
GH#251: HTML comments were handled incorrectly by the soupparser.
Patch by mozbugbox.
LP#1654544: The html5parser no longer passes the
useChardet
optionif the input is a Unicode string, unless explicitly requested. When parsing
files, the default is to enable it when a URL or file path is passed (because
the file is then opened in binary mode), and to disable it when reading from
a file(-like) object.
Note: This is a backwards incompatible change of the default configuration.
If your code parses byte strings/streams and depends on character detection,
please pass the option
guess_charset=True
explicitly, which already workedin older lxml versions.
LP#1703810:
etree.fromstring()
failed to parse UTF-32 data with BOM.LP#1526522: Some RelaxNG errors were not reported in the error log.
LP#1567526: Empty and plain text input raised a TypeError in soupparser.
LP#1710429: Uninitialised variable usage in HTML diff.
LP#1415643: The closing tags context manager in
xmlfile()
could continueto output end tags even after writing failed with an exception.
LP#1465357:
xmlfile.write()
now accepts and ignores None as input argument.Compilation under Py3.7-pre failed due to a modified function signature.
Other changes
lxml.*.pyx
to plain*.pyx
(e.g.etree.pyx
) to simplify their handling in the buildprocess. Care was taken to keep the old header files as fallbacks for
code that compiles against the public C-API of lxml, but it might still
be worth validating that third-party code does not notice this change.
Configuration
📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.