# Web Scraping of Machine Learning Mastery Blog Entries Using Python and BeautifulSoup
### David Lowe
### February 3, 2019

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping code was written in Python 3 and leverages the BeautifulSoup module.

INTRODUCTION: David Lowe hosts his blog at merelydoit.blog. The purpose of this exercise is to practice web scraping by gathering the blog entries from Merely Do It’s RSS feed. This iteration of the script automatically traverses the RSS feed to capture all blog entries.

Starting URLs: https://merelydoit.blog/feed or https://merelydoit.blog/feed/?paged=1

## Loading Libraries and Packages

In [1]:
import numpy as np
import pandas as pd
import os
import shutil
import smtplib
import sys
from email.message import EmailMessage
from datetime import datetime
import urllib.request
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup
from random import randint
from time import sleep

startTimeScript = datetime.now()

## Setting up the email notification function

In [2]:
def email_notify(msg_text):
    sender = "luozhi2488@gmail.com"
    receiver = "dave@contactdavidlowe.com"
    with open('../email_credential.txt') as f:
        password = f.readline()
        f.close()
    msg = EmailMessage()
    msg.set_content(msg_text)
    msg['Subject'] = 'Notification from Python Web Scraping Script'
    msg['From'] = sender
    msg['To'] = receiver
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.starttls()
    server.login(sender, password)
    server.send_message(msg)
    server.quit()

In [3]:
email_notify("The web scraping process has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

## Setting up the necessary parameters

In [4]:
# Specifying the URL of desired web page to be scrapped
rss_url = "https://merelydoit.blog/feed/?paged="
pageNum = 1
starting_url = rss_url + str(pageNum)

# Creating an html document from the URL
uastring = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36"
req = urllib.request.Request(
    starting_url,
    data=None,
    headers={'User-Agent': uastring}
)

try:
    session = urllib.request.urlopen(req)
except HTTPError as e:
    print('The server could not serve up the rss page!')
    sys.exit("Script Processing Aborted!!!")
except URLError as e:
    print('The server could not be reached!')
    sys.exit("Script Processing Aborted!!!")

try:
    webpage = BeautifulSoup(session.read(), 'lxml-xml')
except AttributeError as e:
    print('Page title could not be found - Might indicate problems!')
    sys.exit("Script Processing Aborted!!!")
else:
    print('Successfully accessed the web page: ' + starting_url)

Successfully accessed the web page: https://merelydoit.blog/feed/?paged=1


## Performing the Scraping and Processing

In [5]:
email_notify("The web page loading and item extraction process has begun! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))

# Setting up a dataframe to capture the records
df = pd.DataFrame(columns=['blog_title','author','date','blog_url','blog_text'])
i = 0

article_list = webpage.find_all('item')

done = False

while not done :
    for article_item in article_list:
        blog_title = "[Not Found]"
        author = "[Not Found]"
        date = "[Not Found]"
        blog_url = "[Not Found]"
        blog_text = "[Not Found]"

        blog_title = article_item.title.string
        author = article_item.find('dc:creator').string
        date = article_item.pubDate.string
        blog_url = article_item.link.string

        if blog_url != "[Not Found]" :
            # Adding random wait time so we do not hammer the website needlessly
            waitTime = randint(3,8)
            print("Waiting " + str(waitTime) + " seconds to retrieve the next blog URL...")
            sleep(waitTime)
            req = urllib.request.Request(
                blog_url,
                data=None,
                headers={'User-Agent': uastring}
            )
            try:
                session = urllib.request.urlopen(req)
            except HTTPError as e:
                print('The server could not serve up the blog page!')
            else:
                try:
                    blogpage = BeautifulSoup(session.read(), 'html5lib')
                except AttributeError as e:
                    print('Page title could not be found - Might indicate problems!')
                    sys.exit("Script Processing Aborted!!!")
                else:
                    print('Successfully accessed the blog page: ' + blog_url)
                    blog_text = blogpage.find("div", class_="entry-content").get_text()

        df.loc[i] = [blog_title,author,date,blog_url,blog_text]
        i = i + 1

    if ((pageNum % 5)==0) :
        email_notify("Finished parsing web page: "+next_page_url+" at "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))
        
    pageNum = pageNum + 1
    next_page_url = rss_url + str(pageNum)
    # Adding random wait time so we do not hammer the website needlessly
    waitTime = randint(3,6)
    print("Waiting " + str(waitTime) + " seconds to process the next RSS URL...")
    sleep(waitTime)
    req = urllib.request.Request(
        next_page_url,
        data=None,
        headers={'User-Agent': uastring}
    )
    try:
        session = urllib.request.urlopen(req)
    except HTTPError as e:
        print("No more page to retrieve. The RSS feed processing has completed!")
        done = True
    else:
        try:
            webpage = BeautifulSoup(session.read(), 'lxml-xml')
        except AttributeError as e:
            print('Page title could not be found - Might indicate problems!')
            sys.exit("Script Processing Aborted!!!")
        else:
            print('Successfully accessed the web page: ' + next_page_url)


Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-class

Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r

Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 4 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=10
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 7 seconds to retrieve the next blog

Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting

Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 6 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=19
Waiting 3 seconds to r

Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 3 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=23
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 7 se

Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machin

Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 5 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=32
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 7 seconds to retrieve the next 

Waiting 6 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=36
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 3 s

Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 5 seconds to r

Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 5 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=45
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 3 seconds to retrieve the next blog URL...
Successfully

Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identificati

Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r

Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 4 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=58
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 7 seconds to retrieve the next blog

Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting

Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 3 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=67
Waiting 7 seconds to r

Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 3 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=71
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/22/its-your-turn/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 8 se

Successfully accessed the blog page: https://merelydoit.blog/2019/01/21/binary-classification-model-for-miniboone-particle-identification-using-r-take-4/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/20/web-scraping-of-merely-do-it-blog-entries-using-r/
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/19/10x%e7%9a%84%e7%b6%93%e9%a9%97/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/18/binary-classification-model-for-miniboone-particle-identification-using-python-take-3/
Waiting 6 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 4 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machin

Successfully accessed the blog page: https://merelydoit.blog/2019/01/17/drucker-on-effective-decisions-part-2/
Waiting 8 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/16/web-scraping-of-machine-learning-mastery-blog-using-r-take-2/
Waiting 5 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/15/waiting-for-godiva/
Waiting 7 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/14/binary-classification-model-for-miniboone-particle-identification-using-r-take-3/
Waiting 6 seconds to process the next RSS URL...
Successfully accessed the web page: https://merelydoit.blog/feed/?paged=80
Waiting 3 seconds to retrieve the next blog URL...
Successfully accessed the blog page: https://merelydoit.blog/2019/01/23/time-series-model-for-monthly-armed-robberies-in-boston-using-python/
Waiting 7 seconds to retrieve the next 

## Organizing Data and Producing Outputs

In [6]:
out_file = df.to_json(orient='records')
with open('web-scraping-py-bsoup-merelydoit-blog.json', 'w') as f:
    f.write(out_file)
email_notify("The web scraping process has completed! "+datetime.now().strftime('%a %B %d, %Y %I:%M:%S %p'))
print ('Total time for the script:',(datetime.now() - startTimeScript))

Total time for the script: 1:25:18.052981
