# __Find Specific Words On Web Pages With Scrapy__

- Scrapy is a fast high-level web crawling and web scraping framework (in python), used to crawl websites and extract structured data from their pages.  
- It can be used for a wide range of purposes, from data mining to monitoring and automated testing. (see https://docs.scrapy.org/en/latest/)

#### __Its uses:__
- Scrapy can be used to find specific words on web pages and return the webpage url
- It can be used to find specific information about products on multiple pages

#### __In this example:__
- Lets say we want to search flyertalk articles for some information on booking hotels by the phone 
- The following code will crawl through each webpage and return a URL link if the word "phone", "hotel", "reservation" and "booked" has been mentioned on a webpage

### _Running the code in Anaconda Prompt_
- 1) scrapy startproject wordlist_scraper
- 2) cd wordlist_scraper
- 3) __scrapy crawl webcrawler > hotel_phone_project.csv__

In [1]:
import csv
from io import StringIO
from functools import partial
from scrapy.http import Request
from scrapy.spiders import BaseSpider
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.item import Item
import logging
import scrapy
import re

def find_all_substrings(string, sub):

    import re
    starts = [match.start() for match in re.finditer(re.escape(sub), string)]
    return starts

class WebsiteSpider(CrawlSpider):

    name = "webcrawler" 
    allowed_domains = ["www.flyertalk.com"] ##### Change the website ########
    
    start_urls = ["https://www.flyertalk.com"] ##### Change the website #####
    
    rules = [Rule(LinkExtractor(), follow=True, callback="check_buzzwords")]
    
    crawl_count = 0
    words_found = 0                                 


        
    def check_buzzwords(self, response):

        self.__class__.crawl_count += 1

        crawl_count = self.__class__.crawl_count

        ##### Change the words  ##########
        wordlist = ["phone", "hotel",
            "reservation", "booked",
            ]


        url = response.url
        contenttype = response.headers.get("content-type", "").decode('utf-8').lower()
        data = response.body.decode('utf-8')

        for word in wordlist:
                substrings = find_all_substrings(data, word)
                for pos in substrings:
                        ok = False
                        if not ok:
                                self.__class__.words_found += 1
                                print(word + ";" + url + ";")
                
        return Item()

    def _requests_to_follow(self, response):
        if getattr(response, "encoding", None) != None:
                return CrawlSpider._requests_to_follow(self, response)
        else:
                return []


### Run the above code
- The above code will crawl the website flyertalk and return a website page when it contains the words in wordlist = ["phone", "hotel","reservation", "booked"]
- The following code was intruppted for example purposes  

In [2]:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
import pandas as pd

process = CrawlerProcess(settings={'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',})

process.crawl(WebsiteSpider)
process.start()

2019-07-31 16:40:49 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scrapybot)
2019-07-31 16:40:49 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.1, Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.1a  20 Nov 2018), cryptography 2.4.2, Platform Windows-10-10.0.17134-SP0
2019-07-31 16:40:49 [scrapy.crawler] INFO: Overridden settings: {'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}
2019-07-31 16:40:49 [scrapy.extensions.telnet] INFO: Telnet Password: c6458e47474f4fa5
2019-07-31 16:40:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2019-07-31 16:40:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeo

phone;https://www.flyertalk.com;
phone;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
hotel;https://www.flyertalk.com;
phone;https://www.flyertalk.com/help/rules.php;
hotel;https://www.flyertalk.com/help/rules.php;


2019-07-31 16:40:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.flyertalk.com/forum/information-desk/1520497-start-here-how-use-information-desk.html> from <GET https://www.flyertalk.com/forum/information-desk/1520497-welcome-what-information-desk-forum.html>
2019-07-31 16:40:52 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'gallery.flyertalk.com': <GET http://gallery.flyertalk.com/gallery/>
2019-07-31 16:40:52 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'ad.doubleclick.net': <GET https://ad.doubleclick.net/jump/ft_hp;kw=top;sect=hp;tile=1;sz=728x90;ord=123456789>
2019-07-31 16:40:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/forum/lounge_connect.php> (referer: https://www.flyertalk.com)
2019-07-31 16:40:52 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.frugaltravelguy.com': <GET http://www.frugaltravelguy.com>
2019-07-31 16:40:52 [scrapy.

hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/the-lobby/;
hotel;https://www.flyertalk.com/

2019-07-31 16:40:54 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.flyertalk.com/forum/flyertalk-cares-688/> (referer: None)
Traceback (most recent call last):
  File "C:\Users\robert.lowe\AppData\Local\Continuum\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "C:\Users\robert.lowe\AppData\Local\Continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "C:\Users\robert.lowe\AppData\Local\Continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "C:\Users\robert.lowe\AppData\Local\Continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\Users\robert.lowe\AppData\Local\Continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\

hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/awards/;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;
hotel;https://www.flyertalk.com/bookclub.php;


2019-07-31 16:40:55 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.amazon.com': <GET http://www.amazon.com/exec/obidos/ASIN/0385722370/webflyer-20>
2019-07-31 16:40:55 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://www.flyertalk.com/forum/premium_subscription.php> from <GET https://www.flyertalk.com/forum/claim_subscription.php>
2019-07-31 16:40:55 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://www.flyertalk.com/articles> from <GET https://www.flyertalk.com/story>
2019-07-31 16:40:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.flyertalk.com/articles/author/%7B%7B%20oPost.post_user_nicename%20%7D%7D> (referer: https://www.flyertalk.com)
2019-07-31 16:40:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.internetbrands.com/travel/advertise/> from <GET http://www.internetbrands.com/travel/advertise/>
2019-07-31 16:40:56 [scrapy.spidermiddlewares

hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2012/;
hotel;https://www.flyertalk.com/awards/2013/;
hotel;https://www.flyertalk.com/awards/2013/;
hotel;https://www.flyertalk.com/awards/2013/;
hotel;https://www.flyertalk.com/awards/2013/;
hotel;https://www.flyertalk.com/awards/2013/;
hotel;https://www.flyertalk.com/awards/2013/;
hotel;https://www.flyertalk.com/awards/2013/;
hotel;https://www.flyertalk.com/awards/2013/;


2019-07-31 16:40:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/awards/2015/> (referer: https://www.flyertalk.com/awards/)
2019-07-31 16:40:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.flyertalk.com/forum/ftchat.php> from <GET https://www.flyertalk.com/chat/chatlog.php>


hotel;https://www.flyertalk.com/awards/2014/;
hotel;https://www.flyertalk.com/awards/2014/;
hotel;https://www.flyertalk.com/awards/2014/;
hotel;https://www.flyertalk.com/awards/2014/;
hotel;https://www.flyertalk.com/awards/2014/;
hotel;https://www.flyertalk.com/awards/2014/;
hotel;https://www.flyertalk.com/awards/2014/;
hotel;https://www.flyertalk.com/awards/2015/;
hotel;https://www.flyertalk.com/awards/2015/;
hotel;https://www.flyertalk.com/awards/2015/;
hotel;https://www.flyertalk.com/awards/2015/;
hotel;https://www.flyertalk.com/awards/2015/;
hotel;https://www.flyertalk.com/awards/2015/;
hotel;https://www.flyertalk.com/awards/2015/;


2019-07-31 16:40:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/the-lobby/page/8> (referer: https://www.flyertalk.com/the-lobby/)
2019-07-31 16:40:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.internetbrands.com/travel/contact/> from <GET https://www.internetbrands.com/travel/advertise/>
2019-07-31 16:40:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/awards/2016/> (referer: https://www.flyertalk.com/awards/)
2019-07-31 16:40:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/awards/2017/> (referer: https://www.flyertalk.com/awards/)
2019-07-31 16:40:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://www.flyertalk.com/articles/category/news> from <GET https://www.flyertalk.com/articles>
2019-07-31 16:40:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/awards/2018/> (referer: https://www.flyertalk.com/award

hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;
hotel;https://www.flyertalk.com/the-lobby/page/8;


2019-07-31 16:40:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/awards/2019/> (referer: https://www.flyertalk.com/awards/)


hotel;https://www.flyertalk.com/awards/2018/;
hotel;https://www.flyertalk.com/awards/2018/;
hotel;https://www.flyertalk.com/awards/2018/;
hotel;https://www.flyertalk.com/awards/2018/;
hotel;https://www.flyertalk.com/awards/2018/;
hotel;https://www.flyertalk.com/awards/2018/;
hotel;https://www.flyertalk.com/awards/2018/;
hotel;https://www.flyertalk.com/awards/2019/;
hotel;https://www.flyertalk.com/awards/2019/;
hotel;https://www.flyertalk.com/awards/2019/;
hotel;https://www.flyertalk.com/awards/2019/;
hotel;https://www.flyertalk.com/awards/2019/;
hotel;https://www.flyertalk.com/awards/2019/;
hotel;https://www.flyertalk.com/awards/2019/;


2019-07-31 16:40:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/acl/> (referer: https://www.flyertalk.com/help/rules.php)
2019-07-31 16:40:59 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 
2019-07-31 16:40:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html> (referer: https://www.flyertalk.com/awards/)
2019-07-31 16:40:59 [scrapy.core.engine] INFO: Closing spider (shutdown)
2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.ksat.com': <GET https://www.ksat.com/news/monkey-on-the-loose-at-san-antonio-airport>
2019-07-31 16:40:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.internetbrands.com/travel/contact/> (referer: None)


phone;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
phone;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.flyertalk.com/articles/monkey-named-dawkins-goes-on-two-hour-chase-at-sat.html;
hotel;https://www.fl

2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.fodors.com': <GET https://www.fodors.com>
2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'wikitravel.org': <GET https://wikitravel.org/>
2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.thehulltruth.com': <GET https://www.thehulltruth.com/>
2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.offshoreonly.com': <GET https://www.offshoreonly.com/>
2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.rctech.net': <GET https://www.rctech.net/>
2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.rcuniverse.com': <GET http://www.rcuniverse.com/>
2019-07-31 16:40:59 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.bikeforums.net': <GET https://www.bikeforums.ne

phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
phone;https://www.flyertalk.com/articles/a-boeing-787-captain-gives-tip-on-how-to-survive-if-your-plane-goes-down.html;
hotel;https://www.flyertalk.com/articles

2019-07-31 16:41:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/forum/external.php?type=RSS2> (referer: None)
2019-07-31 16:41:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html> (referer: https://www.flyertalk.com/awards/)
2019-07-31 16:41:02 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.flyertalk.com/forum/forumdisplay.php?f=605> from <GET http://www.flyertalk.com/forum/forumdisplay.php?f=605>
2019-07-31 16:41:02 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.flyertalk.com/forum/external.php?type=RSS2> (referer: None)
Traceback (most recent call last):
  File "C:\Users\robert.lowe\AppData\Local\Continuum\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "C:\Users\robert.lowe\AppData\Local\Continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.p

phone;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
phone;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
phone;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
phone;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
phone;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
phone;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
phone;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
hotel;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
hotel;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
hotel;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;
hotel;https://www.flyertalk.com/articles/pilot-takes-to-twitter-over-sexist-comments.html;

2019-07-31 16:41:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html> (referer: https://www.flyertalk.com/awards/)
2019-07-31 16:41:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/the-lobby/author/wendywilliams> (referer: https://www.flyertalk.com/the-lobby/page/8)
2019-07-31 16:41:03 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.businessinsider.in': <GET https://www.businessinsider.in/Chinas-social-credit-system-has-blocked-people-from-taking-11-million-flights-and-4-million-train-trips/articleshow/64255175.cms>
2019-07-31 16:41:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/the-gate/> (referer: https://www.flyertalk.com/help/rules.php)


phone;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
phone;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.flyertalk.com/articles/chinese-social-credit-blocks-11-14-million-flights.html;
hotel;https://www.fl

2019-07-31 16:41:04 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'feeds.flyertalk.com': <GET http://feeds.flyertalk.com/flyertalk>
2019-07-31 16:41:04 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'thetransnationalpost.com': <GET http://thetransnationalpost.com/blog/2016/12/10/flying-rights-granted-to-subsidiary-of-norwegian-air-shuttle-but-what-exactly-does-that-mean/>


hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
hotel;https://www.flyertalk.com/the-gate/;
reservation;https://www.flyertalk.com/the-gate/;
reservation;https://www.flyertalk.com/the-gate/;


2019-07-31 16:41:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/forum/premium_subscription.php> (referer: None)
2019-07-31 16:41:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/the-lobby/page/7> (referer: https://www.flyertalk.com/the-lobby/page/8)
2019-07-31 16:41:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flyertalk.com/the-lobby/page/6> (referer: https://www.flyertalk.com/the-lobby/page/8)
2019-07-31 16:41:04 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 45492,
 'downloader/request_count': 77,
 'downloader/request_method_count/GET': 77,
 'downloader/response_bytes': 777270,
 'downloader/response_count': 77,
 'downloader/response_status_count/200': 48,
 'downloader/response_status_count/301': 24,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/404': 3,
 'dupefilter/filtered': 1429,
 'finish_reason': 'shutdown',
 'finish_time': datetime.datetime

hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
hotel;https://www.flyertalk.com/the-lobby/page/7;
