Skip to content

Commit

Permalink
to cookies
Browse files Browse the repository at this point in the history
  • Loading branch information
farsene committed Apr 4, 2022
1 parent ab063fd commit b60cb1e
Show file tree
Hide file tree
Showing 10 changed files with 152 additions and 52 deletions.
4 changes: 4 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ Solving specific problems
topics/jobs
topics/coroutines
topics/asyncio
topics/class-methods

:doc:`faq`
Get answers to most frequently asked questions.
Expand Down Expand Up @@ -216,6 +217,9 @@ Solving specific problems
:doc:`topics/asyncio`
Use :mod:`asyncio` and :mod:`asyncio`-powered libraries.

:doc: `topics/class-methods`
Instantiate Scrapy objects from the crawler or from settings.

.. _extending-scrapy:

Extending Scrapy
Expand Down
17 changes: 7 additions & 10 deletions docs/topics/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,16 +167,6 @@ SpiderLoader API
the :class:`scrapy.interfaces.ISpiderLoader` interface to guarantee an
errorless execution.

.. method:: from_settings(settings)

This class method is used by Scrapy to create an instance of the class.
It's called with the current project settings, and it loads the spiders
found recursively in the modules of the :setting:`SPIDER_MODULES`
setting.

:param settings: project settings
:type settings: :class:`~scrapy.settings.Settings` instance

.. method:: load(spider_name)

Get the Spider class with the given name. It'll look into the previously
Expand All @@ -198,6 +188,13 @@ SpiderLoader API
:param request: queried request
:type request: :class:`~scrapy.Request` instance

:meth:`from_settings`:

This class method is used by Scrapy to create an instance of the class.
It's called with the current project settings, and it loads the spiders
found recursively in the modules of the :setting:`SPIDER_MODULES`
setting.

.. _topics-api-signals:

Signals API
Expand Down
123 changes: 123 additions & 0 deletions docs/topics/class-methods.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
===========================
Class Factory Methods
===========================

Factory methods create an instance of their implementer class by
extracting the components needed for it from the argument that the method takes.
Throughout Scrapy the most common factory methods are ``from_crawler`` and ``from_settings`` where
they each take one parameter namely, a crawler or a settings object respectively.


The ``from_crawler`` class method is implemented in the following objects:
* ItemPipeline
* DownloaderMiddleware
* SpiderMiddleware
* Scheduler
* BaseScheduler
* Spider

The ``from_settings`` class method is implemented in the following objects:
* MailSender
* SpiderLoader


.. py:classmethod:: from_crawler(cls, crawler)
Factory method that if present, is used to create an instance of the implementer class
using a :class:`~scrapy.crawler.Crawler`. It must return a new instance
of the implementer class. The Crawler object is needed in order to provide
access to all Scrapy core components like settings and signals; It is a
way for the implenter class to access them and hook its functionality into Scrapy.

:param crawler: crawler that uses this middleware
:type crawler: :class:`~scrapy.crawler.Crawler` object


.. py:classmethod:: from_settings(cls, settings)
This class method is used by Scrapy to create an instance of the implementer class
using the settings passed as arguments.
This class method will not be called at all if from_crawler is defined.


:param settings: project settings
:type settings: :class:`~scrapy.settings.Settings` instance



Implementing Factory Methods
============================

The goal when extending these factory methods should be: given the arguments passed to it,
reate a class instance, regardless of it being a crawler, settings or other.
The main reason to include the Crawler object or the Settings object is the amount of
information these objects hold and can be used in the instantiation of the class.

``Crawler`` specifically gives access to ``settings``, ``signals``, ``stats``, ``extensions``,
``engine``, and ``spider`` which maybe very useful when wanting to instantiate a class.

For example, lets say that we want to create a new spider, TestSpider will look like this::

class TestSpider:
def __init__(self, ex1, ex2, ex3, name=None **kwargs):
super().__init__(name, **kwargs)
self.extra_param1: str = ex1
self.extra_param2: int = ex2
self.extra_param3: bool = ex3
# Other methods are ommited for the sake of the example

@classmethod
def from_crawler(cls, crawler, ex1, ex2, ex3):
# Do some configs if needed
# For example:
# first check if the extension should be enabled and raise
# NotConfigured otherwise
if not crawler.settings.getbool('MYEXT_ENABLED'):
raise NotConfigured
# E.g.: get the number of items from settings
item_count = crawler.settings.getint('MYEXT_ITEMCOUNT', 1000)

# Instantiate the extension object
spider = cls(ex1, ex2, ex3)

# Maybe connect the extension object to signals
crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)

# Validate some more settings
my_settings_dict = crawler.settings.getdict(f'MYEXT_DICT'):
if 'some_key' not in my_settings_dict:
raise SomeException
#....
# Do some more configs if needed
#....
# Finaly return the extension object
return spider

Similarly, when one wants to extend a class that implements the ``from_settings`` method, it will
look similar to the following example.
Say you want to create ::

class MyNewSender:
def __init__(self, is_enabled, send_at):
self.is_enabled = is_enabled
self.send_at = send_at
#Some more methods...

@classmethod
def from_settings(cls, settings):
# Get the needed values to instantiate the class from the settings object
is_enabled = settings.getbool('MY_SENDER_ENABLED')
send_at = settings.get("DATETIME_OF_SENDING")

# ...
# Maybe some more configs
# ...
# Finaly return the extension object
return cls(is_enabled, send_at)
12 changes: 3 additions & 9 deletions docs/topics/downloader-middleware.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,16 +163,10 @@ object gives you access, for example, to the :ref:`settings <topics-settings>`.
:param spider: the spider for which this request is intended
:type spider: :class:`~scrapy.Spider` object

.. method:: from_crawler(cls, crawler)
:meth:`from_crawler`:

If present, this classmethod is called to create a middleware instance
from a :class:`~scrapy.crawler.Crawler`. It must return a new instance
of the middleware. Crawler object provides access to all Scrapy core
components like settings and signals; it is a way for middleware to
access them and hook its functionality into Scrapy.

:param crawler: crawler that uses this middleware
:type crawler: :class:`~scrapy.crawler.Crawler` object
Class method that if present, is used to create a middleware
instance using a :class:`~scrapy.crawler.Crawler`.

.. _topics-downloader-middleware-ref:

Expand Down
14 changes: 5 additions & 9 deletions docs/topics/email.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,14 +67,6 @@ rest of the framework.
:param smtpssl: enforce using a secure SSL connection
:type smtpssl: bool

.. classmethod:: from_settings(settings)

Instantiate using a Scrapy settings object, which will respect
:ref:`these Scrapy settings <topics-email-settings>`.

:param settings: the e-mail recipients
:type settings: :class:`scrapy.settings.Settings` object

.. method:: send(to, subject, body, cc=None, attachs=(), mimetype='text/plain', charset=None)

Send email to the given recipients.
Expand Down Expand Up @@ -102,8 +94,12 @@ rest of the framework.
:type mimetype: str

:param charset: the character encoding to use for the e-mail contents
:type charset: str
:type charset: str

:meth:`from_settings`:

Instantiate a MailSender using a Scrapy settings object, which will respect
:ref:`these Scrapy settings <topics-email-settings>`.

.. _topics-email-settings:

Expand Down
12 changes: 3 additions & 9 deletions docs/topics/item-pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,16 +60,10 @@ Additionally, they may also implement the following methods:
:param spider: the spider which was closed
:type spider: :class:`~scrapy.Spider` object

.. method:: from_crawler(cls, crawler)
:meth:`from_crawler`:

If present, this classmethod is called to create a pipeline instance
from a :class:`~scrapy.crawler.Crawler`. It must return a new instance
of the pipeline. Crawler object provides access to all Scrapy core
components like settings and signals; it is a way for pipeline to
access them and hook its functionality into Scrapy.

:param crawler: crawler that uses this pipeline
:type crawler: :class:`~scrapy.crawler.Crawler` object
Class method that if present, is used to create a pipeline
instance using a :class:`~scrapy.crawler.Crawler`.


Item pipeline example
Expand Down
14 changes: 4 additions & 10 deletions docs/topics/spider-middleware.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,16 +169,10 @@ object gives you access, for example, to the :ref:`settings <topics-settings>`.
:param spider: the spider to whom the start requests belong
:type spider: :class:`~scrapy.Spider` object

.. method:: from_crawler(cls, crawler)

If present, this classmethod is called to create a middleware instance
from a :class:`~scrapy.crawler.Crawler`. It must return a new instance
of the middleware. Crawler object provides access to all Scrapy core
components like settings and signals; it is a way for middleware to
access them and hook its functionality into Scrapy.

:param crawler: crawler that uses this middleware
:type crawler: :class:`~scrapy.crawler.Crawler` object
:meth:`from_crawler`:

Class method that if present, is used to create a middleware
instance using a :class:`~scrapy.crawler.Crawler`.

.. _topics-spider-middleware-ref:

Expand Down
2 changes: 1 addition & 1 deletion scrapy/downloadermiddlewares/cookies.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,4 +183,4 @@ def process_response(self, request, response, spider):

spider.set_cookie_jar(jar)

return response
return response
4 changes: 1 addition & 3 deletions scrapy/spiders/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
"""
Base class for Scrapy spiders
See documentation in docs/topics/spiders.rst
"""
import logging
Expand Down Expand Up @@ -38,7 +37,6 @@ def logger(self):

def log(self, message, level=logging.DEBUG, **kw):
"""Log the given message at the given log level
This helper wraps a log call to the logger within the spider, but you
can use it directly (e.g. Spider.logger.info('msg')) or use any other
Python logger too.
Expand Down Expand Up @@ -126,4 +124,4 @@ def clear_cookies(self):
# Top-level imports
from scrapy.spiders.crawl import CrawlSpider, Rule
from scrapy.spiders.feed import XMLFeedSpider, CSVFeedSpider
from scrapy.spiders.sitemap import SitemapSpider
from scrapy.spiders.sitemap import SitemapSpider
2 changes: 1 addition & 1 deletion tests/test_downloadermiddleware_cookies.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import pytest

from scrapy.downloadermiddlewares.cookies import CookiesMiddleware
from scrapy.downloadermiddlewares.cookies import CookiesMiddleware, AccessCookiesMiddleware
from scrapy.downloadermiddlewares.defaultheaders import DefaultHeadersMiddleware
from scrapy.downloadermiddlewares.redirect import RedirectMiddleware
from scrapy.exceptions import NotConfigured
Expand Down

0 comments on commit b60cb1e

Please sign in to comment.