to cookies

OrestisKan · Apr 4, 2022 · b60cb1e · b60cb1e
1 parent ab063fd
commit b60cb1e
Show file tree

Hide file tree

Showing 10 changed files with 152 additions and 52 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -170,6 +170,7 @@ Solving specific problems
    topics/jobs
    topics/coroutines
    topics/asyncio
+   topics/class-methods
 
 :doc:`faq`
     Get answers to most frequently asked questions.
@@ -216,6 +217,9 @@ Solving specific problems
 :doc:`topics/asyncio`
     Use :mod:`asyncio` and :mod:`asyncio`-powered libraries.
 
+:doc: `topics/class-methods`
+    Instantiate Scrapy objects from the crawler or from settings.
+
 .. _extending-scrapy:
 
 Extending Scrapy

diff --git a/docs/topics/api.rst b/docs/topics/api.rst
@@ -167,16 +167,6 @@ SpiderLoader API
     the :class:`scrapy.interfaces.ISpiderLoader` interface to guarantee an
     errorless execution.
 
-    .. method:: from_settings(settings)
-
-       This class method is used by Scrapy to create an instance of the class.
-       It's called with the current project settings, and it loads the spiders
-       found recursively in the modules of the :setting:`SPIDER_MODULES`
-       setting.
-
-       :param settings: project settings
-       :type settings: :class:`~scrapy.settings.Settings` instance
-
     .. method:: load(spider_name)
 
        Get the Spider class with the given name. It'll look into the previously
@@ -198,6 +188,13 @@ SpiderLoader API
        :param request: queried request
        :type request: :class:`~scrapy.Request` instance
 
+    :meth:`from_settings`:
+
+       This class method is used by Scrapy to create an instance of the class.
+       It's called with the current project settings, and it loads the spiders
+       found recursively in the modules of the :setting:`SPIDER_MODULES`
+       setting.
+
 .. _topics-api-signals:
 
 Signals API

diff --git a/docs/topics/class-methods.rst b/docs/topics/class-methods.rst
@@ -0,0 +1,123 @@
+===========================
+Class Factory Methods
+===========================
+
+Factory methods create an instance of their implementer class by 
+extracting the components needed for it from the argument that the method takes.
+Throughout Scrapy the most common factory methods are ``from_crawler`` and ``from_settings`` where 
+they each take one parameter namely, a crawler or a settings object respectively.
+
+
+The ``from_crawler`` class method is implemented in the following objects:
+    * ItemPipeline
+    * DownloaderMiddleware
+    * SpiderMiddleware
+    * Scheduler
+    * BaseScheduler
+    * Spider
+
+The ``from_settings`` class method is implemented in the following objects:
+    * MailSender
+    * SpiderLoader
+
+
+.. py:classmethod:: from_crawler(cls, crawler)
+
+    Factory method that if present, is used to create an instance of the implementer class
+    using a :class:`~scrapy.crawler.Crawler`. It must return a new instance
+    of the implementer class. The Crawler object is needed in order to provide 
+    access to all Scrapy core components like settings and signals; It is a 
+    way for the implenter class to access them and hook its functionality into Scrapy.
+
+    :param crawler: crawler that uses this middleware
+    :type crawler: :class:`~scrapy.crawler.Crawler` object
+
+
+.. py:classmethod:: from_settings(cls, settings)
+
+    This class method is used by Scrapy to create an instance of the implementer class
+    using the settings passed as arguments.
+    This class method will not be called at all if from_crawler is defined.
+
+
+    :param settings: project settings
+    :type settings: :class:`~scrapy.settings.Settings` instance
+
+
+
+Implementing Factory Methods
+============================
+
+The goal when extending these factory methods should be: given the arguments passed to it,
+reate a class instance, regardless of it being a crawler, settings or other.
+The main reason to include the Crawler object or the Settings object is the amount of
+information these objects hold and can be used in the instantiation of the class.
+
+``Crawler`` specifically gives access to ``settings``, ``signals``, ``stats``, ``extensions``,
+``engine``, and ``spider`` which maybe very useful when wanting to instantiate a class.
+
+For example, lets say that we want to create a new spider, TestSpider will look like this::
+
+    class TestSpider:
+        
+        def __init__(self, ex1, ex2, ex3, name=None **kwargs):
+            super().__init__(name, **kwargs)
+            self.extra_param1: str = ex1
+            self.extra_param2: int = ex2
+            self.extra_param3: bool = ex3
+        
+        # Other methods are ommited for the sake of the example
+
+        @classmethod
+        def from_crawler(cls, crawler, ex1, ex2, ex3):
+            # Do some configs if needed 
+            # For example: 
+            # first check if the extension should be enabled and raise
+            # NotConfigured otherwise
+            if not crawler.settings.getbool('MYEXT_ENABLED'):
+                raise NotConfigured
+            
+            # E.g.: get the number of items from settings
+            item_count = crawler.settings.getint('MYEXT_ITEMCOUNT', 1000)
+
+            # Instantiate the extension object
+            spider = cls(ex1, ex2, ex3)
+
+            # Maybe connect the extension object to signals
+            crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)
+
+            # Validate some more settings
+            my_settings_dict = crawler.settings.getdict(f'MYEXT_DICT'):
+            if 'some_key' not in my_settings_dict:
+                raise SomeException
+            
+            #.... 
+            # Do some more configs if needed 
+            #....
+            
+            # Finaly return the extension object
+            return spider
+
+Similarly, when one wants to extend a class that implements the ``from_settings`` method, it will
+look similar to the following example. 
+Say you want to create ::
+
+    class MyNewSender:
+        def __init__(self, is_enabled, send_at):
+            self.is_enabled = is_enabled
+            self.send_at = send_at
+        
+        #Some more methods...
+
+        @classmethod
+        def from_settings(cls, settings):
+            # Get the needed values to instantiate the class from the settings object
+            is_enabled = settings.getbool('MY_SENDER_ENABLED')
+            send_at = settings.get("DATETIME_OF_SENDING")
+
+            # ...
+            # Maybe some more configs
+            # ...
+            
+            # Finaly return the extension object
+            return cls(is_enabled, send_at)
diff --git a/docs/topics/downloader-middleware.rst b/docs/topics/downloader-middleware.rst
@@ -163,16 +163,10 @@ object gives you access, for example, to the :ref:`settings <topics-settings>`.
       :param spider: the spider for which this request is intended
       :type spider: :class:`~scrapy.Spider` object
 
-   .. method:: from_crawler(cls, crawler)
+   :meth:`from_crawler`:
 
-      If present, this classmethod is called to create a middleware instance
-      from a :class:`~scrapy.crawler.Crawler`. It must return a new instance
-      of the middleware. Crawler object provides access to all Scrapy core
-      components like settings and signals; it is a way for middleware to
-      access them and hook its functionality into Scrapy.
-
-      :param crawler: crawler that uses this middleware
-      :type crawler: :class:`~scrapy.crawler.Crawler` object
+   	Class method that if present, is used to create a middleware 
+   	instance using a :class:`~scrapy.crawler.Crawler`.
 
 .. _topics-downloader-middleware-ref:
 

diff --git a/docs/topics/email.rst b/docs/topics/email.rst
@@ -67,14 +67,6 @@ rest of the framework.
     :param smtpssl: enforce using a secure SSL connection
     :type smtpssl: bool
 
-    .. classmethod:: from_settings(settings)
-
-        Instantiate using a Scrapy settings object, which will respect
-        :ref:`these Scrapy settings <topics-email-settings>`.
-
-        :param settings: the e-mail recipients
-        :type settings: :class:`scrapy.settings.Settings` object
-
     .. method:: send(to, subject, body, cc=None, attachs=(), mimetype='text/plain', charset=None)
 
         Send email to the given recipients.
@@ -102,8 +94,12 @@ rest of the framework.
         :type mimetype: str
 
         :param charset: the character encoding to use for the e-mail contents
-        :type charset: str
+        :type charset: str 
+
+    :meth:`from_settings`:
 
+        Instantiate a MailSender using a Scrapy settings object, which will respect
+        :ref:`these Scrapy settings <topics-email-settings>`.
 
 .. _topics-email-settings:
 

diff --git a/docs/topics/item-pipeline.rst b/docs/topics/item-pipeline.rst
@@ -60,16 +60,10 @@ Additionally, they may also implement the following methods:
    :param spider: the spider which was closed
    :type spider: :class:`~scrapy.Spider` object
 
-.. method:: from_crawler(cls, crawler)
+:meth:`from_crawler`: 
 
-   If present, this classmethod is called to create a pipeline instance
-   from a :class:`~scrapy.crawler.Crawler`. It must return a new instance
-   of the pipeline. Crawler object provides access to all Scrapy core
-   components like settings and signals; it is a way for pipeline to
-   access them and hook its functionality into Scrapy.
-
-   :param crawler: crawler that uses this pipeline
-   :type crawler: :class:`~scrapy.crawler.Crawler` object
+   Class method that if present, is used to create a pipeline 
+   instance using a :class:`~scrapy.crawler.Crawler`.
 
 
 Item pipeline example

diff --git a/docs/topics/spider-middleware.rst b/docs/topics/spider-middleware.rst
@@ -169,16 +169,10 @@ object gives you access, for example, to the :ref:`settings <topics-settings>`.
         :param spider: the spider to whom the start requests belong
         :type spider: :class:`~scrapy.Spider` object
 
-    .. method:: from_crawler(cls, crawler)
-
-       If present, this classmethod is called to create a middleware instance
-       from a :class:`~scrapy.crawler.Crawler`. It must return a new instance
-       of the middleware. Crawler object provides access to all Scrapy core
-       components like settings and signals; it is a way for middleware to
-       access them and hook its functionality into Scrapy.
-
-       :param crawler: crawler that uses this middleware
-       :type crawler: :class:`~scrapy.crawler.Crawler` object
+    :meth:`from_crawler`: 
+
+    	Class method that if present, is used to create a middleware 
+    	instance using a :class:`~scrapy.crawler.Crawler`.
 
 .. _topics-spider-middleware-ref:
 

diff --git a/scrapy/downloadermiddlewares/cookies.py b/scrapy/downloadermiddlewares/cookies.py
@@ -183,4 +183,4 @@ def process_response(self, request, response, spider):
 
         spider.set_cookie_jar(jar)
 
-        return response
+        return response
diff --git a/scrapy/spiders/__init__.py b/scrapy/spiders/__init__.py
@@ -1,6 +1,5 @@
 """
 Base class for Scrapy spiders
-
 See documentation in docs/topics/spiders.rst
 """
 import logging
@@ -38,7 +37,6 @@ def logger(self):
 
     def log(self, message, level=logging.DEBUG, **kw):
         """Log the given message at the given log level
-
         This helper wraps a log call to the logger within the spider, but you
         can use it directly (e.g. Spider.logger.info('msg')) or use any other
         Python logger too.
@@ -126,4 +124,4 @@ def clear_cookies(self):
 # Top-level imports
 from scrapy.spiders.crawl import CrawlSpider, Rule
 from scrapy.spiders.feed import XMLFeedSpider, CSVFeedSpider
-from scrapy.spiders.sitemap import SitemapSpider
+from scrapy.spiders.sitemap import SitemapSpider
diff --git a/tests/test_downloadermiddleware_cookies.py b/tests/test_downloadermiddleware_cookies.py
@@ -4,7 +4,7 @@
 
 import pytest
 
-from scrapy.downloadermiddlewares.cookies import CookiesMiddleware
+from scrapy.downloadermiddlewares.cookies import CookiesMiddleware, AccessCookiesMiddleware
 from scrapy.downloadermiddlewares.defaultheaders import DefaultHeadersMiddleware
 from scrapy.downloadermiddlewares.redirect import RedirectMiddleware
 from scrapy.exceptions import NotConfigured