Skip to content

Releases: scrapy/w3lib

v2.1.2

03 Aug 08:49
Compare
Choose a tag to compare
  • Fix test failures on Python 3.11.4+ (#212, #213).
  • Fix an incorrect type hint (#211).
  • Add project URLs to setup.py (#215).

v2.1.1

09 Dec 11:11
Compare
Choose a tag to compare

What's Changed

  • safe_url_string, canonicalize_url: apply stripping from the URL living standard by @Gallaecio in #207
  • Changelog by @kmike in #208
  • fix tox 4 compatibility by @kmike in #209

Full Changelog: v2.1.0...v2.1.1

2.1.0

09 Dec 11:11
Compare
Choose a tag to compare

What's Changed

  • update type annotation of auto_detect_fun param in html_to_unicode() by @BurnzZ in #190
  • update html_to_unicode() so that the BOM is used first to check the e… by @BurnzZ in #191
  • changing basic_auth_header flavor to b64encode instead of urlsafe_b64encode by @gsweiz in #192
  • Add support for Python 3.11. by @wRAR in #195
  • Use latest version of Ubuntu by @Laerte in #197
  • 187 safe url string already encoded user pass by @felipeboffnunes in #196
  • Strip spaces in canonicalize_url by @Gallaecio in #136
  • unit test Issue 91 is fixed by @felipeboffnunes in #198
  • Drop Python 3.6 support by @Laerte in #200
  • [MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. by @starrify in #77
  • Handle OverflowError exception on convert_entity by @Laerte in #202
  • safe_url_string: escape additional characters by @Gallaecio in #203
  • I have added contributing.md file and code_of_conduct.md file along with some very minor changes in the Readme.rst file by @VanshajPoonia in #194
  • Full typing by @wRAR in #206
  • Add release notes for version 2.1.0 by @Gallaecio in #205

New Contributors

Full Changelog: v2.0.1...v2.1.0

2.0.1

21 Oct 08:31
Compare
Choose a tag to compare

Backwards incompatible changes:

  • Python 2 is no longer supported; Python 3.6+ is required now (#168, #175).
  • w3lib.url.safe_url_string and w3lib.url.canonicalize_url
    no longer convert "%23" to "#" when it appears in the URL path. This is a bug
    fix. It's listed as a backward-incomatible change because in some cases the
    output of w3lib.url.canonicalize_url is going to change, and so, if
    this output is used to generate URL fingerprints, new fingerprints might be
    incompatible with those created with the previous w3lib versions
    (#141).

Deprecation removals (#169):

  • The w3lib.form module is removed.
  • The w3lib.html.remove_entities function is removed.
  • The w3lib.url.urljoin_rfc function is removed.

The following functions are deprecated, and will be removed in future releases
(#170):

  • w3lib.util.str_to_unicode
  • w3lib.util.unicode_to_str
  • w3lib.util.to_native_str

Other improvements and bug fixes:

  • Type annotations are added (#172, #184).
  • Added support for Python 3.9 and 3.10 (#168, #176).
  • Fixed w3lib.html.get_meta_refresh for <meta> tags where
    http-equiv is written after content (#179).
  • Fixed w3lib.url.safe_url_string for IDNA domains with ports (#174).
  • w3lib.url.url_query_cleaner no longer adds an unneeded # when
    keep_fragments=True is passed, and the URL doesn't have a fragment
    (#159).
  • Removed a workaround for an ancient pathname2url bug (#142)
  • CI is migrated to GitHub Actions (#166, #177); other CI improvements (#160,
    #182).
  • The code is formatted using black (#173).

v1.22.0

13 May 19:36
Compare
Choose a tag to compare
  • Python 3.4 is no longer supported (issue #156)
  • w3lib.url.safe_url_string now supports an optional quote_path
    parameter to disable the percent-encoding of the URL path (issue #119)
  • w3lib.url.add_or_replace_parameter and
    w3lib.url.add_or_replace_parameters no longer remove duplicate
    parameters from the original query string that are not being added or
    replaced (issue #126)
  • w3lib.html.remove_tags now raises a ValueError exception
    instead of AssertionError when using both the which_ones and the
    keep parameters (issue #154)
  • Test improvements (issues #143, #146, #148, #149)
  • Documentation improvements (issues #140, #144, #145, #151, #152, #153)
  • Code cleanup (issue #139)

v1.15.0

29 Jul 17:12
Compare
Choose a tag to compare

v1.14.3

15 Jul 13:08
Compare
Choose a tag to compare

Bugfix release:

  • Handle IDNA encoding failures in safe_url_string() (issue #62)

v1.14.2

11 Apr 15:28
Compare
Choose a tag to compare

Bugfix release:

  • fix function import for (deprecated) urljoin_rfc (issue #51)
  • only expose wanted functions from w3lib.url, via __all__
    (see issue #54, scrapy/scrapy#1917)

v1.14.1

08 Apr 17:21
Compare
Choose a tag to compare

Bugfix release:

  • For bytes URLs, when supplied encoding (or default UTF8) is wrong,
    safe_url_string falls back to percent-encoding offending bytes.

v1.14.0

06 Apr 17:18
Compare
Choose a tag to compare

Changes to safe_url_string:

  • proper handling of non-ASCII characters in Python2 and Python3
  • support IDNs
  • new path_encoding to override default UTF-8 when serializing non-ASCII
    characters before percent-encoding

html_body_declared_encoding also detects encoding when not sole attribute in <meta>.

Package is now properly marked as zip_safe.