##### User Agent中文名为用户代理，简称 UA，它是一个特殊字符串头，使得服务器能够识别客户使用的操作系统及版本、CPU 类型、浏览器及版本、浏览器渲染引擎、浏览器语言、浏览器插件等。

1. middlewares有两种,根据Scrapy架构图,middleware不仅能处理request也能处理response:
    * SPIDER_MIDDLEWARES
    * DOWNLOADER_MIDDLEWARES
2. 在使用middleware之前,需要在settings.py里配置
```
SPIDER_MIDDLEWARES = {
    'ArticleSpider.middlewares.ArticlespiderSpiderMiddleware': 543,
}
DOWNLOADER_MIDDLEWARES = {
    'ArticleSpider.middlewares.ArticlespiderDownloaderMiddleware': 543,
}
```
    * 这些配置和之前pipeline的配置一样,数字代表执行顺序,数字越小执行顺序越靠前
    * 可以自定义middleware
3. middleware的书写格式
    * scrapy自带有一个关于useragent的middleware(useragent.py),可以参考这个代码来自定义自己的middleware
    ```
    class UserAgentMiddleware(object):
    """This middleware allows spiders to override the user_agent"""

    def __init__(self, user_agent='Scrapy'):
        self.user_agent = user_agent

    @classmethod
    def from_crawler(cls, crawler):
        o = cls(crawler.settings['USER_AGENT'])
        crawler.signals.connect(o.spider_opened, signal=signals.spider_opened)
        return o

    def spider_opened(self, spider):
        self.user_agent = getattr(spider, 'user_agent', self.user_agent)

    def process_request(self, request, spider):
        if self.user_agent:
            request.headers.setdefault(b'User-Agent', self.user_agent)
    ```
    * 之前笔记中的def from_crawler(cls, crawler)又出现了一次,这次是定义在Middleware中,这里的from_crawler读取了crawler里setting定义的USER_AGENT,若USER_AGENT没有在setting.py中被定义,则会被赋值为'Scrapy'(user_agent='Scrapy'),所以输可以在setting.py中设置USER_AGENT
    * def process_request(self, request, spider)是一个非常重要的函数,有固定的写法.在这个方法中处理request
    * 在配置DOWNLOADER_MIDDLEWARES/SPIDER_MIDDLEWARES要记得现在dict里把默认的middleware置为None或者把自己自定义的middleware的顺序数字调高,因为默认的middleware会把user-agent置为'Scrapy',如果说默认的middleware执行顺序在自定义middleware之后,就会把我们自定义的规则覆盖掉.所以要么置为None,要么就把默认middleware的执行数字调第,保证自定义的middleware执行顺序在默认middleware执行顺序之后
        * 默认的middleware为:'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None
    * middleware的主要函数为:
        * process_response(request, response, spider)
        * process_request(request, spider)
        * process_exception(request, exception, spider)
        * from_crawler(cls, crawler)
4. 自定义middleware(**以下代码只是显示自定义middleware该如何书写,但对于处理user-agent还有可以优化的地方**)
```
class RandomUserAgentMiddlware(object):
    #随机更换user-agent
    def __init__(self, crawler):
        super(RandomUserAgentMiddlware, self).__init__()
        self.user_agent_list = crawler.settings.get("user_agent_list", [])

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler)

    def process_request(self, request, spider):
        def get_ua():
            return getattr(self.ua, self.ua_type)

        request.headers.setdefault('User-Agent', self.user_agent_list[random.randInt(len(user_agent_list))])
```
5. 优化处理user-agent的middleware
    * 在github上有一个专门随机获取user-agent的开源代码'fake-useragent'
```
class RandomUserAgentMiddlware(object):
    #随机更换user-agent
    def __init__(self, crawler):
        super(RandomUserAgentMiddlware, self).__init__()
        self.ua = UserAgent()
        # 需要在settings.py中设置"RANDOM_UA_TYPE"="random"
        self.ua_type = crawler.settings.get("RANDOM_UA_TYPE", "random")

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler)

    def process_request(self, request, spider):
        def get_ua():
            return getattr(self.ua, self.ua_type)

        request.headers.setdefault('User-Agent', get_ua())
```