# Google News Scraper
---

This is a scraper that I built to retrieve the top Google News articles from the first page of the Google's News tab for the search query **bitcoin**.

The scraper is chunked into each of the months for successive years beginning, 2014 until 2017.
> This chunking helps us scrape Google search results without violating their TOS (which could result in your IP being banned for a few minutes or even a few days). Basically to avoid imitating a robot.

Articles for each day of a particular month are extracted via the urls scraped from the corresponding Google search result page for that day. The article text from the url is then obtained via the 

>[Newspaper3k library](https://github.com/codelucas/newspaper).

> Note: When a url fails to be processed by Newspaper3k it throws a, *You must download() an article first!* error that usually indicates that that news source is absent within the library.

Finally I created a dictionary to store the dates as keys and the article lists for those dates as values. 
> I then merged all the collected csv files into one csv file containing the top articles list for each date starting from January 7, 2014 till Dec 12, 2017. You can view the merging in my other notebook for [Sentiment Analysis of Top Google News Articles for keyword bitcoin](Sentiment Analysis of Top Google News Articles for keyword bitcoin.ipynb).

Consider the first implementation for the January of 2014 to be an example that was extended to the rest of the years to understand how exactly the code works. 
> **Note**: In my implementation, I scraped all articles for the entire year for 2014 and then exported the data to a CSV file as opposed to month wise CSV files for the rest of the years. 

> The entire process of scraping took a week owing to restrictions imposed by Google's TOS for scraping. So if you are running this notebook ensure that you run each block of code individually as opposed to consecutively at once to avoid getting a 503 error (usually a result of bot detection on Google's end) that will block your access for a few hours. 

---

## Import Dependencies

In [1]:
from bs4 import BeautifulSoup
import requests
import time
from random import randint
import re
from selenium import webdriver
import datetime

## YEAR = 2014

### MONTH = JANUARY

### Create the Date list to iterate over

In [None]:
import datetime

dt = datetime.datetime(2014, 1, 1)
end = datetime.datetime(2014, 2, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

### Function to scrape News urls from Google Search News and Parse Article Text via Newspaper3k library

In [5]:
news_dictionary = {}

for dt in dates_list:
    def scrape_news_urls(link):
        # Time to wait before each request randomized to emulate human interaction with browser
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


In [6]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014'])


### Read dictionary into Pandas dataframe and write results to CSV (Sample extended to years 2015-2017)

You can view the actual implementation for only the year 2014 [here](#articles_2014_csv).

In [None]:
import pandas as pd

dates = list(news_dictionary.keys())
articleslist = list(news_dictionary.values())

df_date_articles = pd.DataFrame({'Date':dates})
df_date_articles['Articles'] = articleslist
# Only change will be in the csv file name for the rest of the code blocks
df_date_articles.to_csv('articles_2014_jan.csv', index=False)

### MONTH = FEBRUARY

In [7]:
dt = datetime.datetime(2014, 2, 1)
end = datetime.datetime(2014, 3, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
Article `download()` failed with 403 Client Error: Forbidden for url: https://seekingalpha.com/article/1987521-bitchip-the-bitcoin-killer-the-merchants-solution-to-the-bitcoin-problem on URL https://seekingalpha.com/article/1987521-bitchip-the-bitcoin-killer-the-merchants-solution-to-the-bitcoin-problem
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


In [8]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014'])


### MONTH = MARCH

In [14]:
dt = datetime.datetime(2014, 3, 1)
end = datetime.datetime(2014, 4, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
                choice = input("Do you need to log into FT.com? (y/n): ")
                if choice == "y":
                    open_browser()    
                elif choice == "n":
                    article_search(url)        
                else:
                    exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
Do you need to log into FT.com? (y/n): y
Please enter your username (email address): arjun.chndr@gmail.com
Please type in your password, then answer the following:
Please enter your password: 2F1g5e7E4
Have you typed in your password yet? (y/n): y
200
200
200
200
200
200
200
200
200
200
200
200
200
200
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n
200
200
200
200
200
200
200
Do you need to log into FT.com? (y/n): n
200
200
200
200
200
200
200
200


In [16]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

### MONTH = APRIL

In [17]:
dt = datetime.datetime(2014, 4, 1)
end = datetime.datetime(2014, 5, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


In [18]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

### MONTH = MAY

In [19]:
dt = datetime.datetime(2014, 5, 1)
end = datetime.datetime(2014, 6, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
You must `download()` an article first!
200
200
200
200
200


In [20]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

### MONTH = JUNE

In [21]:
dt = datetime.datetime(2014, 6, 1)
end = datetime.datetime(2014, 7, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
Article `download()` failed with 403 Client Error: Forbidden for url: https://seekingalpha.com/article/2247873-an-alternate-way-to-profit-off-bitcoin on URL https://seekingalpha.com/article/2247873-an-alternate-way-to-profit-off-bitcoin
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
You must `download()` an article first!


In [22]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

### MONTH = JULY

In [23]:
dt = datetime.datetime(2014, 7, 1)
end = datetime.datetime(2014, 8, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
You must `download()` an article first!
200
200
200
200
200


In [24]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

### MONTH = AUGUST

In [25]:
dt = datetime.datetime(2014, 8, 1)
end = datetime.datetime(2014, 9, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
Article `download()` failed with 503 Server Error: Service Temporarily Unavailable for url: http://www.moneylife.in/article/can-bitcoins-remain-unregulated/38388.html on URL http://www.moneylife.in/article/can-bitcoins-remain-unregulated/38388.html
Article `download()` failed with 403 Client Error: Forbidden for url: https://www.androidheadlines.com/2014/08/rushwallet-web-based-bitcoin-wallet-works-platform.html on URL https://www.androidheadlines.com/2014/08/rushwallet-web-based-bitcoin-wallet-works-platform.html
200
200
200
200
200
200
200
200
200
You must `download()` an article first!
200
200
200
200
200
200
200
200
200
Article `download()` failed with 403 Client Error: Forbidden for url: https://www.androidheadlines.com/2014/08/knc-wallet-bitcoin-gets-updated-easy-person-person-payments.html on URL https://www.androidheadlines.com/2014/08/knc-wallet-bitcoin-gets-updated-easy-person-person-payments.html
200
200
200
200
200


In [26]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

## MONTH = SEPTEMBER

In [27]:
dt = datetime.datetime(2014, 9, 1)
end = datetime.datetime(2014, 10, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
You must `download()` an article first!
200
200
200
200
200
200
You must `download()` an article first!
200
You must `download()` an article first!
You must `download()` an article first!
You must `download()` an article first!


In [28]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

## MONTH = OCTOBER

In [29]:
dt = datetime.datetime(2014, 10, 1)
end = datetime.datetime(2014, 11, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


In [30]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

## MONTH = NOVEMBER

In [45]:
dt = datetime.datetime(2014, 11, 1)
end = datetime.datetime(2014, 12, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
                choice = input("Do you need to log into FT.com? (y/n): ")
                if choice == "y":
                    open_browser()    
                elif choice == "n":
                    article_search(url)        
                else:
                    exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
Do you need to log into FT.com? (y/n): y
Please enter your username (email address): arjun.chndr@gmail.com
Please type in your password, then answer the following:
Please enter your password: 2F1g5e7E4
Have you typed in your password yet? (y/n): y
200
200
200
200
200
200
200
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n
200
200
200
200
200
200
200
200
200
200
200
200
200
200
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n
200
200
200
200
200
200
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n
200
Do you need to log into FT.com? (y/n): n
Do you need to log into FT.com? (y/n): n


In [46]:
# Check if all dates were processed 
print(news_dictionary.keys())

# import pandas as pd

# dates_14 = list(news_dictionary.keys())
# articleslist_14 = list(news_dictionary.values())

# df_14 = pd.DataFrame({'Date':dates_14})
# df_14['Articles'] = articleslist_14
# df_14.to_csv('articles_2014_till_oct.csv', index=False)

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

## MONTH = DECEMBER

In [47]:
dt = datetime.datetime(2014, 12, 1)
end = datetime.datetime(2015, 1, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


In [50]:
# Check if all dates were processed 
print(news_dictionary.keys())

dict_keys(['01/01/2014', '01/02/2014', '01/03/2014', '01/04/2014', '01/05/2014', '01/06/2014', '01/07/2014', '01/08/2014', '01/09/2014', '01/10/2014', '01/11/2014', '01/12/2014', '01/13/2014', '01/14/2014', '01/15/2014', '01/16/2014', '01/17/2014', '01/18/2014', '01/19/2014', '01/20/2014', '01/21/2014', '01/22/2014', '01/23/2014', '01/24/2014', '01/25/2014', '01/26/2014', '01/27/2014', '01/28/2014', '01/29/2014', '01/30/2014', '01/31/2014', '02/01/2014', '02/02/2014', '02/03/2014', '02/04/2014', '02/05/2014', '02/06/2014', '02/07/2014', '02/08/2014', '02/09/2014', '02/10/2014', '02/11/2014', '02/12/2014', '02/13/2014', '02/14/2014', '02/15/2014', '02/16/2014', '02/17/2014', '02/18/2014', '02/19/2014', '02/20/2014', '02/21/2014', '02/22/2014', '02/23/2014', '02/24/2014', '02/25/2014', '02/26/2014', '02/27/2014', '02/28/2014', '03/01/2014', '03/02/2014', '03/03/2014', '03/04/2014', '03/05/2014', '03/06/2014', '03/07/2014', '03/08/2014', '03/09/2014', '03/10/2014', '03/11/2014', '03/12/20

<h2><a id='articles_2014_csv'>Read dictionary into Pandas dataframe and write results to CSV</a></h2>

In [51]:
import pandas as pd

dates_14 = list(news_dictionary.keys())
articleslist_14 = list(news_dictionary.values())

df_14 = pd.DataFrame({'Date':dates_14})
df_14['Articles'] = articleslist_14
df_14.to_csv('articles_2014.csv', index=False)

## YEAR = 2015

### MONTH = JANUARY

In [None]:
import datetime

dt = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2015, 2, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = FEBRUARY

In [None]:
dt = datetime.datetime(2015, 2, 1)
end = datetime.datetime(2015, 3, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = MARCH

In [None]:
dt = datetime.datetime(2015, 3, 1)
end = datetime.datetime(2015, 4, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = APRIL

In [None]:
dt = datetime.datetime(2015, 4, 1)
end = datetime.datetime(2015, 5, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = MAY

In [None]:
dt = datetime.datetime(2015, 5, 1)
end = datetime.datetime(2015, 6, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = JUNE

In [None]:
dt = datetime.datetime(2015, 6, 1)
end = datetime.datetime(2015, 7, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = JULY

In [None]:
dt = datetime.datetime(2015, 7, 1)
end = datetime.datetime(2015, 8, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = AUGUST

In [None]:
dt = datetime.datetime(2015, 8, 1)
end = datetime.datetime(2015, 9, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = SEPTEMBER

In [None]:
dt = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2015, 10, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = OCTOBER

In [None]:
dt = datetime.datetime(2015, 10, 1)
end = datetime.datetime(2015, 11, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = NOVEMBER

In [None]:
dt = datetime.datetime(2015, 11, 1)
end = datetime.datetime(2015, 12, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = DECEMBER

In [None]:
dt = datetime.datetime(2015, 12, 1)
end = datetime.datetime(2016, 1, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## YEAR = 2016

### MONTH = JANUARY

In [None]:
import datetime

dt = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2016, 2, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = FEBRUARY

In [None]:
dt = datetime.datetime(2016, 2, 1)
end = datetime.datetime(2016, 3, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = MARCH

In [None]:
dt = datetime.datetime(2016, 3, 1)
end = datetime.datetime(2016, 4, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = APRIL

In [None]:
dt = datetime.datetime(2016, 4, 1)
end = datetime.datetime(2016, 5, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = MAY

In [None]:
dt = datetime.datetime(2016, 5, 1)
end = datetime.datetime(2016, 6, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = JUNE

In [None]:
dt = datetime.datetime(2016, 6, 1)
end = datetime.datetime(2016, 7, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = JULY

In [None]:
dt = datetime.datetime(2016, 7, 1)
end = datetime.datetime(2016, 8, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = AUGUST

In [None]:
dt = datetime.datetime(2016, 8, 1)
end = datetime.datetime(2016, 9, 1)
step = datetime.timedelta(days=1)

dates_list = []
news_dictionary={}

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
                choice = input("Do you need to log into FT.com? (y/n): ")
                if choice == "y":
                    open_browser()    
                elif choice == "n":
                    article_search(url)        
                else:
                    exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = SEPTEMBER

In [None]:
dt = datetime.datetime(2016, 9, 1)
end = datetime.datetime(2016, 10, 1)
step = datetime.timedelta(days=1)

dates_list = []
news_dictionary={}

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = OCTOBER

In [None]:
dt = datetime.datetime(2016, 10, 1)
end = datetime.datetime(2016, 11, 1)
step = datetime.timedelta(days=1)

dates_list = []
news_dictionary={}

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())


## MONTH = NOVEMBER

In [None]:
dt = datetime.datetime(2016, 11, 1)
end = datetime.datetime(2016, 12, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = DECEMBER

In [None]:
dt = datetime.datetime(2016, 12, 1)
end = datetime.datetime(2017, 1, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## YEAR = 2017

### MONTH = JANUARY

In [None]:
import datetime

dt = datetime.datetime(2017, 1, 1)
end = datetime.datetime(2017, 2, 1)
step = datetime.timedelta(days=1)

dates_list = []
news_dictionary={}

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = FEBRUARY

In [None]:
dt = datetime.datetime(2017, 2, 1)
end = datetime.datetime(2017, 3, 1)
step = datetime.timedelta(days=1)

dates_list = []
news_dictionary={}

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = MARCH

In [None]:
dt = datetime.datetime(2017, 3, 1)
end = datetime.datetime(2017, 4, 1)
step = datetime.timedelta(days=1)

dates_list = []
news_dictionary={}

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = APRIL

In [None]:
dt = datetime.datetime(2017, 4, 1)
end = datetime.datetime(2017, 5, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = MAY

In [None]:
dt = datetime.datetime(2017, 5, 1)
end = datetime.datetime(2017, 6, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = JUNE

In [None]:
dt = datetime.datetime(2017, 6, 1)
end = datetime.datetime(2017, 7, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = JULY

In [None]:
dt = datetime.datetime(2017, 7, 1)
end = datetime.datetime(2017, 8, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

### MONTH = AUGUST

In [None]:
dt = datetime.datetime(2017, 8, 1)
end = datetime.datetime(2017, 9, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = SEPTEMBER

In [None]:
dt = datetime.datetime(2017, 9, 1)
end = datetime.datetime(2017, 10, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = OCTOBER

In [None]:
dt = datetime.datetime(2017, 10, 1)
end = datetime.datetime(2017, 11, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = NOVEMBER

In [None]:
dt = datetime.datetime(2017, 11, 1)
end = datetime.datetime(2017, 12, 1)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())

## MONTH = DECEMBER

In [None]:
dt = datetime.datetime(2017, 12, 1)
end = datetime.datetime(2017, 1, 13)
step = datetime.timedelta(days=1)

dates_list = []

while dt < end:
    dates_list.append(dt.strftime('%m/%d/%Y'))
    dt += step

for dt in dates_list:
    def scrape_news_urls(link):
        time.sleep(randint(0, 20))
        payload = {'as_epq': 'bitcoin', 'tbs':'cdr:1,cd_min:'+dt+',cd_max:'+dt+"'",'tbm':'nws'}
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
        r = requests.get(link, headers=headers, params=payload)
        print(r.status_code)  # Print the status code
        content = r.text
        news_urls = []
        soup = BeautifulSoup(content, "lxml")
        news_url = soup.findAll("a", {"class": "l _PMs"}, href=True)
        for url in news_url:
            news_urls.append(url['href'])
        return news_urls
    
    URL = 'https://www.google.com/search'
    n_url = scrape_news_urls(URL)
    article_list = []
    
    for url in n_url:
        if re.match(r'https://www.ft.com/',url) is not None:
            from sys import exit
            from selenium import webdriver
            
            # Function to access Financial Times(needs mandatory subscription) through Selenium enabled authentication 
            def open_browser():
                global browser
                browser = webdriver.Firefox()
                browser.get('https://accounts.ft.com/login?location=https%3A%2F%2Fwww.ft.com%2F')
                
                username_input = input("Please enter your username (email address): ")
                username_submit = browser.find_element_by_id("email")
                username_submit.send_keys(username_input)
                browser.find_element_by_name("Next").click()
                
                print("""Please type in your password, then answer the following:""")
                password_input = input("Please enter your password: ")
                password_submit = browser.find_element_by_id("password")
                password_submit.send_keys(password_input)
                confirmation = input("Have you typed in your password yet? (y/n): ")
                
                if confirmation == "y":
                    browser.find_element_by_name("Sign in").click() #submit
                else:
                    browser.close()
                    exit()
            
            def article_search(url):
                browser.get(url)
                art = browser.find_element_by_xpath("//*[@id='site-content']/div[3]")
                article_list.append(art.text.rstrip('\n').replace('\n', ' '))
            
            # Uncomment if you are running for the first time to ensure access via authentication 
            def start(url):
#                 choice = input("Do you need to log into FT.com? (y/n): ")
#                 if choice == "y":
#                     open_browser()    
#                 elif choice == "n":
                article_search(url)        
#                 else:
#                     exit()

            start(url)
            
        else:
            try:
                from newspaper import Article
                article = Article(url)
                article.download()
                article.parse()
                article_list.append(article.text)
            except:
                article_list.append("")

    news_dictionary[dt] = article_list

In [None]:
# Check if all dates were processed 
print(news_dictionary.keys())