Skip to content

Commit

Permalink
Cover Scrapy 2.3 in the release notes
Browse files Browse the repository at this point in the history
  • Loading branch information
Gallaecio committed Jul 30, 2020
1 parent a783f15 commit d6a3753
Show file tree
Hide file tree
Showing 5 changed files with 128 additions and 4 deletions.
114 changes: 114 additions & 0 deletions docs/news.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,120 @@
Release notes
=============

.. _release-2.3.0:

Scrapy 2.3.0 (2020-0?-??)
-------------------------

Highlights:

* :ref:`Feed exports <topics-feed-exports>` support :ref:`Google Cloud
Storage <topics-feed-storage-gcs>`
* New :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` setting for batch deliveries
* The base implementation of :ref:`item loaders <topics-loaders>` has been
moved into :doc:`itemloaders <itemloaders:index>`


Deprecation removals
~~~~~~~~~~~~~~~~~~~~

* Removed the following classes and its parent modules from
``scrapy.linkextractors``:

* ``htmlparser.HtmlParserLinkExtractor``
* ``regex.RegexLinkExtractor``
* ``sgml.BaseSgmlLinkExtractor``
* ``sgml.SgmlLinkExtractor``

Use
:class:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
instead (:issue:`4356`, :issue:`4679`)


Deprecations
~~~~~~~~~~~~

* The ``scrapy.utils.python.retry_on_eintr`` function is now deprecated
(:issue:`4683`)


New features
~~~~~~~~~~~~

* :ref:`Feed exports <topics-feed-exports>` now support :ref:`Google Cloud
Storage <topics-feed-storage-gcs>` as a storage backend (:issue:`685`,
:issue:`3608`)

* The new :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` setting allows to deliver
output items in batches, split in separate files (:issue:`4250`,
:issue:`4434`)

* The :command:`parse` command now allows specifying an output file
(:issue:`4317`, :issue:`4377`)

* :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
:func:`~scrapy.utils.curl.curl_to_request_kwargs` now also support
``--data-raw`` (:issue:`4612`)

* A ``parse`` callback may now be used in built-in spider subclasses, such
as :class:`~scrapy.spiders.CrawlSpider` (:issue:`712`, :issue:`732`,
:issue:`781`, :issue:`4254` )


Bug fixes
~~~~~~~~~

* Fixed the :ref:`CSV exporting <topics-feed-format-csv>` of
:ref:`dataclass items <dataclass-items>` (:issue:`4667`, :issue:`4668`)

* :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
:func:`~scrapy.utils.curl.curl_to_request_kwargs` now set the request
method to ``POST`` when a request body is specified and no request method
is specified (:issue:`4612`)

* The processing of ANSI escape sequences in enabled in Windows 10.0.14393
and later, where it is required for colored output (:issue:`4393`,
:issue:`4403`)


Documentation
~~~~~~~~~~~~~

* Updated the `OpenSSL cipher list format`_ link in the documentation about
the :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` setting (:issue:`4653`)

* Simplified the code example in :ref:`topics-loaders-dataclass`
(:issue:`4652`)

.. _OpenSSL cipher list format: https://www.openssl.org/docs/manmaster/man1/openssl-ciphers.html#CIPHER-LIST-FORMAT


Quality assurance
~~~~~~~~~~~~~~~~~

* The base implementation of :ref:`item loaders <topics-loaders>` has been
moved into a separate library, :doc:`itemloaders <itemloaders:index>`,
allowing usage from outside Scrapy and a separate release schedule
(:issue:`4005`, :issue:`4516`)

* Fixed a silenced error in some scheduler tests (:issue:`4644`,
:issue:`4645`)

* Renewed the localhost certificate used for SSL tests (:issue:`4650`)

* Removed cookie-handling code specific to Python 2 (:issue:`4682`)

* Stopped using a backlash for line continuation (:issue:`4673`)

* Removed unneeded entries from the MyPy exception list (:issue:`4690`)

* Automated tests now pass on Windows as part of our continuous integration
system (:issue:`4458`)

* Automated tests now pass on the latest PyPy version for supported Python
versions in our continuous integration system (:issue:`4504`)


.. _release-2.2.1:

Scrapy 2.2.1 (2020-07-17)
Expand Down
2 changes: 2 additions & 0 deletions docs/topics/commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,8 @@ Supported options:

* ``--output`` or ``-o``: dump scraped items to a file

.. versionadded:: 2.3

.. skip: start
Usage example::
Expand Down
6 changes: 4 additions & 2 deletions docs/topics/developer-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -289,8 +289,10 @@ request::
"://quotes.toscrape.com/scroll' -H 'Cache-Control: max-age=0'")

Alternatively, if you want to know the arguments needed to recreate that
request you can use the :func:`scrapy.utils.curl.curl_to_request_kwargs`
function to get a dictionary with the equivalent arguments.
request you can use the :func:`~scrapy.utils.curl.curl_to_request_kwargs`
function to get a dictionary with the equivalent arguments:

.. autofunction:: scrapy.utils.curl.curl_to_request_kwargs

Note that to translate a cURL command into a Scrapy request,
you may use `curl2scrapy <https://michael-shub.github.io/curl2scrapy/>`_.
Expand Down
7 changes: 6 additions & 1 deletion docs/topics/feed-exports.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ The storages backends supported out of the box are:
* :ref:`topics-feed-storage-fs`
* :ref:`topics-feed-storage-ftp`
* :ref:`topics-feed-storage-s3` (requires botocore_)
* :ref:`topics-feed-storage-gcs` (requires `google-cloud-storage`_)
* :ref:`topics-feed-storage-stdout`

Some storage backends may be unavailable if the required external libraries are
Expand Down Expand Up @@ -199,14 +200,16 @@ You can also define a custom ACL for exported feeds using this setting:
Google Cloud Storage (GCS)
--------------------------

.. versionadded:: 2.3

The feeds are stored on `Google Cloud Storage`_.

* URI scheme: ``gs``
* Example URIs:

* ``gs://mybucket/path/to/export.csv``

* Required external libraries: `google-cloud-storage <https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python>`_.
* Required external libraries: `google-cloud-storage`_.

For more information about authentication, please refer to `Google Cloud documentation <https://cloud.google.com/docs/authentication/production>`_.

Expand All @@ -215,6 +218,8 @@ You can set a *Project ID* and *Access Control List (ACL)* through the following
* :setting:`FEED_STORAGE_GCS_ACL`
* :setting:`GCS_PROJECT_ID`

.. _google-cloud-storage: https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python

.. _topics-feed-storage-stdout:

Standard output
Expand Down
3 changes: 2 additions & 1 deletion scrapy/utils/curl.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ def curl_to_request_kwargs(curl_command, ignore_unknown_options=True):
:param str curl_command: string containing the curl command
:param bool ignore_unknown_options: If true, only a warning is emitted when
cURL options are unknown. Otherwise raises an error. (default: True)
cURL options are unknown. Otherwise
raises an error. (default: True)
:return: dictionary of Request kwargs
"""

Expand Down

0 comments on commit d6a3753

Please sign in to comment.