Skip to content

Navigation Menu

Explore
For
- Enterprise
- Teams
- Startups
- Education
By Solution
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

scrapy / w3lib Public

Notifications
Fork 103
Star 383

Code
Issues 13
Pull requests 5
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Releases: scrapy/w3lib

Releases · scrapy/w3lib

v2.1.2

03 Aug 08:49

wRAR

Compare

Choose a tag to compare

v2.1.2 Latest

Latest

Fix test failures on Python 3.11.4+ (#212, #213).
Fix an incorrect type hint (#211).
Add project URLs to setup.py (#215).

Assets 2

All reactions

v2.1.1

09 Dec 11:11

kmike

Compare

Choose a tag to compare

v2.1.1

What's Changed

safe_url_string, canonicalize_url: apply stripping from the URL living standard by @Gallaecio in #207
Changelog by @kmike in #208
fix tox 4 compatibility by @kmike in #209

Full Changelog: v2.1.0...v2.1.1

Contributors

kmike and Gallaecio

Assets 2

All reactions

2.1.0

09 Dec 11:11

kmike

Compare

Choose a tag to compare

2.1.0

What's Changed

update type annotation of auto_detect_fun param in html_to_unicode() by @BurnzZ in #190
update html_to_unicode() so that the BOM is used first to check the e… by @BurnzZ in #191
changing basic_auth_header flavor to b64encode instead of urlsafe_b64encode by @gsweiz in #192
Add support for Python 3.11. by @wRAR in #195
Use latest version of Ubuntu by @Laerte in #197
187 safe url string already encoded user pass by @felipeboffnunes in #196
Strip spaces in canonicalize_url by @Gallaecio in #136
unit test Issue 91 is fixed by @felipeboffnunes in #198
Drop Python 3.6 support by @Laerte in #200
[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases. by @starrify in #77
Handle OverflowError exception on convert_entity by @Laerte in #202
safe_url_string: escape additional characters by @Gallaecio in #203
I have added contributing.md file and code_of_conduct.md file along with some very minor changes in the Readme.rst file by @VanshajPoonia in #194
Full typing by @wRAR in #206
Add release notes for version 2.1.0 by @Gallaecio in #205

New Contributors

@BurnzZ made their first contribution in #190
@gsweiz made their first contribution in #192
@felipeboffnunes made their first contribution in #196
@starrify made their first contribution in #77
@VanshajPoonia made their first contribution in #194

Full Changelog: v2.0.1...v2.1.0

Contributors

wRAR, starrify, and 6 other contributors

Assets 2

All reactions

2.0.1

21 Oct 08:31

kmike

Compare

Choose a tag to compare

2.0.1

Backwards incompatible changes:

Python 2 is no longer supported; Python 3.6+ is required now (#168, #175).
w3lib.url.safe_url_string and w3lib.url.canonicalize_url
no longer convert "%23" to "#" when it appears in the URL path. This is a bug
fix. It's listed as a backward-incomatible change because in some cases the
output of w3lib.url.canonicalize_url is going to change, and so, if
this output is used to generate URL fingerprints, new fingerprints might be
incompatible with those created with the previous w3lib versions
(#141).

Deprecation removals (#169):

The w3lib.form module is removed.
The w3lib.html.remove_entities function is removed.
The w3lib.url.urljoin_rfc function is removed.

The following functions are deprecated, and will be removed in future releases
(#170):

w3lib.util.str_to_unicode
w3lib.util.unicode_to_str
w3lib.util.to_native_str

Other improvements and bug fixes:

Type annotations are added (#172, #184).
Added support for Python 3.9 and 3.10 (#168, #176).
Fixed w3lib.html.get_meta_refresh for <meta> tags where
http-equiv is written after content (#179).
Fixed w3lib.url.safe_url_string for IDNA domains with ports (#174).
w3lib.url.url_query_cleaner no longer adds an unneeded # when
keep_fragments=True is passed, and the URL doesn't have a fragment
(#159).
Removed a workaround for an ancient pathname2url bug (#142)
CI is migrated to GitHub Actions (#166, #177); other CI improvements (#160,
#182).
The code is formatted using black (#173).

Assets 2

All reactions

v1.22.0

13 May 19:36

Gallaecio

Compare

Choose a tag to compare

v1.22.0

Python 3.4 is no longer supported (issue #156)
w3lib.url.safe_url_string now supports an optional quote_path
parameter to disable the percent-encoding of the URL path (issue #119)
w3lib.url.add_or_replace_parameter and
w3lib.url.add_or_replace_parameters no longer remove duplicate
parameters from the original query string that are not being added or
replaced (issue #126)
w3lib.html.remove_tags now raises a ValueError exception
instead of AssertionError when using both the which_ones and the
keep parameters (issue #154)
Test improvements (issues #143, #146, #148, #149)
Documentation improvements (issues #140, #144, #145, #151, #152, #153)
Code cleanup (issue #139)

Assets 2

All reactions

v1.15.0

29 Jul 17:12

redapple

Compare

Choose a tag to compare

v1.15.0

Add canonicalize_url() to w3lib.url

Assets 2

All reactions

v1.14.3

15 Jul 13:08

redapple

Compare

Choose a tag to compare

v1.14.3

Bugfix release:

Handle IDNA encoding failures in safe_url_string() (issue #62)

Assets 2

All reactions

v1.14.2

11 Apr 15:28

redapple

Compare

Choose a tag to compare

v1.14.2

Bugfix release:

fix function import for (deprecated) urljoin_rfc (issue #51)
only expose wanted functions from w3lib.url, via __all__
(see issue #54, scrapy/scrapy#1917)

Assets 2

All reactions

v1.14.1

08 Apr 17:21

redapple

Compare

Choose a tag to compare

v1.14.1

Bugfix release:

For bytes URLs, when supplied encoding (or default UTF8) is wrong,
safe_url_string falls back to percent-encoding offending bytes.

Assets 2

All reactions

v1.14.0

06 Apr 17:18

redapple

Compare

Choose a tag to compare

v1.14.0

Changes to safe_url_string:

proper handling of non-ASCII characters in Python2 and Python3
support IDNs
new path_encoding to override default UTF-8 when serializing non-ASCII
characters before percent-encoding

html_body_declared_encoding also detects encoding when not sole attribute in <meta>.

Package is now properly marked as zip_safe.

Assets 2

All reactions

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.