Adapted from the notebook found at [How to Build a Law Bot](https://lawyerist.com/how-build-law-bot/)

## Install libraries

If you haven't already, you may need to install some dependencies. On the command line, run the following to install/update gspread, oauth2client, PyOpenSSL, and python-twitter.
```
pip install gspread
pip install --upgrade oauth2client
pip install PyOpenSSL
pip install python-twitter
```
Library installs are one and done. So after doing this once, you should be all set. 

## Import modules and set variables

Now we're getting into the bot's code. This is what will run every time your bot is called. 

You will need to create a new Google Sheet (same instructions as [last time](https://lawyerist.com/126074/online-forms-meet-local-document-automation-cut-and-paste-coding/)). Delete rows 2-999. This is because the code below appends values to the end of your sheet. So if you fail to remove rows 2-999, values will be appended to row 1000. Additionally, it looks at the last row of the sheet for your old values. So right off the bat it will be looking at your one solitary row. Also, delete columns D through Z to avoind having to print a bunch of empty columns.

As for a Twitter account and Twitter credentials, follow the instruction in [this post](https://lawyerist.com/?p=127093). 

In [90]:
# Load the module for visiting and reading websites.
import urllib.request
# Load the module for running regular expressions (regex).
import re 
# Load the module for date and time stuff.
import datetime
# Define the variable now as equal to the current date and time.
now = datetime.datetime.now()

In [91]:
# Set the URL you want to scrape.
url_1 = "http://www.insurancelitigationregulatorylaw.com/category/court-cases/"
url_2 = "http://www.grinsurancecoveragelawblog.com"

# If you want to scrape data from multiple pages, you can, 
# just replicate the above and below but change url_1 to url_2 et al.

In [92]:
# Load the module for accessing Google Sheets.
import gspread
# Load the module needed for securely communicating with Google Sheets.
from oauth2client.service_account import ServiceAccountCredentials
# The scope for your access credentials
scope = ['https://spreadsheets.google.com/feeds']

# Your spreadsheet's ID
document_key = "1owZJO_V-Wd1WT1QhP1PVLA2iGa37UezwY3aapWSO6VI" 
#              ^^^^^^^^^^^ SWAP OUT FOR YOUR DOCUMENT ID/KEY
# Your Google project's .json key
credentials = ServiceAccountCredentials.from_json_keyfile_name('../../../../../twitterbotkey.json', scope)
#                                                                              ^^^^^^^^ SWAP OUT FOR YOUR JSON KEY
# Use your credentials to authorize yourself.
gc = gspread.authorize(credentials)
# Open up the Sheet with the defined ID.
wks = gc.open_by_key(document_key)

#########################################
#
#  NOTE: The name of the sheet you are 
#  trying to access should be in the 
#  parenthetical below (e.g., Data). By
#  Default this is probably "Sheet1".
#
#########################################
worksheet = wks.worksheet("Sheet1")

# Count the number of rows in your Sheet &
# resize to remove blank rows.
worksheet.resize(worksheet.row_count)

In [93]:
# Print out the old values stored in your sheet 
# Note: The first time you run this code, it will be empty as nothing has yet to be stored in your sheet.

print(worksheet.row_values(worksheet.row_count))
#############################
# DELETE CELL AFTER TESTING
#############################

['', '', '', '', '']


In [94]:
# Import the relevant Twitter libraries so you can use Twitter.
import twitter
from twitter import TwitterError

# create the following four text files and add them to the same diretctry as you 
# Google API key. In each file add the appropriate value found when retrieving your 
# Twitter credentials

with open('../../../../../key.txt', 'r') as myfile:
    key=myfile.read()
    
with open('../../../../../secret.txt', 'r') as myfile:
    secret=myfile.read()
    
with open('../../../../../token_key.txt', 'r') as myfile:
    token_key=myfile.read()

with open('../../../../../token_secret.txt', 'r') as myfile:
    token_secret=myfile.read()

# Set you Twitter API credentials.
api = twitter.Api(consumer_key=key,
                  consumer_secret=secret,
                  access_token_key=token_key,
                  access_token_secret=token_secret)

## Read the contents of your first webpage

When you run the next cell, your program will visit the first URL you defined above. It will then print out that page's HTML. 

In [95]:
p_1 = urllib.request.build_opener(urllib.request.HTTPCookieProcessor).open(url_1).read()
print(p_1)

b'<!DOCTYPE html>\n<html lang="en-US" prefix="og: http://ogp.me/ns#" class="no-js no-svg">\n<head>\n<meta charset="UTF-8">\n<meta name="viewport" content="width=device-width, initial-scale=1">\n<link rel="profile" href="http://gmpg.org/xfn/11">\n\n<script>(function(html){html.className = html.className.replace(/\\bno-js\\b/,\'js\')})(document.documentElement);</script>\n<title>Court Cases Archives &#8902; Insurance Litigation and Regulatory Law</title>\n\n<!-- This site is optimized with the Yoast SEO plugin v5.6.1 - https://yoast.com/wordpress/plugins/seo/ -->\n<meta name="robots" content="noindex,follow"/>\n<link rel="canonical" href="http://www.insurancelitigationregulatorylaw.com/category/court-cases/" />\n<meta property="og:locale" content="en_US" />\n<meta property="og:type" content="object" />\n<meta property="og:title" content="Court Cases Archives &#8902; Insurance Litigation and Regulatory Law" />\n<meta property="og:url" content="http://www.insurancelitigationregulatorylaw.c

In [96]:
p_2 = urllib.request.build_opener(urllib.request.HTTPCookieProcessor).open(url_2).read()
print(p_2)

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xmlns:fb="http://ogp.me/ns/fb#" xmlns:addthis="http://www.addthis.com/help/api-spec" >\n<head profile="http://gmpg.org/xfn/11">\n\t<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n\t<title>Insurance Coverage Law Blog</title>\n\t<link rel="pingback" href="http://www.grinsurancecoveragelawblog.com/xmlrpc.php" />\n\t<link rel=\'dns-prefetch\' href=\'//s.w.org\' />\n<link rel="alternate" type="application/rss+xml" title="Insurance Coverage Law Blog &raquo; Feed" href="http://www.grinsurancecoveragelawblog.com/feed/" />\n<link rel="alternate" type="application/rss+xml" title="Insurance Coverage Law Blog &raquo; Comments Feed" href="http://www.grinsurancecoveragelawblog.com/comments/feed/" />\n\t\t<script type="text/javascript">\n\t\t\twindow._wpemojiSettings = {"baseUrl":"https:\\/\\/s.w.org

# Two Data Points, One Match

---------------------------
## Parse the site's contents

In [97]:
res_1 = re.search(b"entry-title.*href=\"(.*)\" .*>(.*)</a",p_1)
output_1 = res_1.group(1).decode('UTF-8')
print(output_1)
output_2 = res_1.group(2).decode('UTF-8')
print(output_2)
res_2 = re.search(b"url:\s\"(.*)\",\s*title: \"(.*)\"",p_2)
output_3 = res_2.group(1).decode('UTF-8')
print(output_3)
output_4 = res_2.group(2).decode('UTF-8')
print(output_4)

http://www.insurancelitigationregulatorylaw.com/supreme-court-case-united-states-vs-south-eastern-underwriters-association-explained/
The Supreme Court Case: United States vs South Eastern Underwriters Association Explained
http://www.grinsurancecoveragelawblog.com/winning-arbitration-battle-in-the-connecticut-supreme-court-regarding-historic-home-restoration-costs-still-leaves-insurer-defending-legal-war-in-state-trial-court/
Winning Arbitration Battle in the Connecticut Supreme Court Regarding Historic Home Restoration Costs Still Leaves Insurer Defending Legal War in State Trial Court


In [98]:
print(output_4[0:110])

Winning Arbitration Battle in the Connecticut Supreme Court Regarding Historic Home Restoration Costs Still Le


## Post to Twitter and Save to Google (Two Data Point, One Match)

In [99]:
if (res_1 and ((worksheet.row_values(worksheet.row_count)[1]) != output_1
          and (worksheet.row_values(worksheet.row_count)[2]) != output_2)
           or ((worksheet.row_values(worksheet.row_count)[3]) != output_3
          and (worksheet.row_values(worksheet.row_count)[4]) != output_4)):
    # same as above but now comparing two values
    
    if ((worksheet.row_values(worksheet.row_count)[1]) != output_1
          and (worksheet.row_values(worksheet.row_count)[2]) != output_2):
        try:
            # Post to Twitter.
            status = api.PostUpdate('%s . . . %s'%(output_2[0:108],output_1))
            print(status.text)
        except TwitterError:
            # Post to Twitter.
            status = api.PostUpdate('%s . . . %s'%(output_2[0:108],output_1))
            print(status.text)

    if ((worksheet.row_values(worksheet.row_count)[3]) != output_3
          and (worksheet.row_values(worksheet.row_count)[4]) != output_4):
        try:
            # Post to Twitter.
            status = api.PostUpdate('%s . . . %s'%(output_4[0:108],output_3))
            print(status.text)
        except TwitterError:
            # Post to Twitter.
            status = api.PostUpdate('%s . . . %s'%(output_4[0:108],output_3))
            print(status.text)


    # Save to Google only after Tweeting
    worksheet.append_row([now,output_1,output_2,output_3,output_4])

The Supreme Court Case: United States vs South Eastern Underwriters Association Explained . . . https://t.co/ZfY8BpuzUH
Winning Arbitration Battle in the Connecticut Supreme Court Regarding Historic Home Restoration Costs Still  . . . https://t.co/cgrDKAwW5c


In [100]:
print(worksheet.row_values(worksheet.row_count))
#############################
# DELETE CELL AFTER TESTING
#############################

['2017-10-23 17:47:05', 'http://www.insurancelitigationregulatorylaw.com/supreme-court-case-united-states-vs-south-eastern-underwriters-association-explained/', 'The Supreme Court Case: United States vs South Eastern Underwriters Association Explained', 'http://www.grinsurancecoveragelawblog.com/winning-arbitration-battle-in-the-connecticut-supreme-court-regarding-historic-home-restoration-costs-still-leaves-insurer-defending-legal-war-in-state-trial-court/', 'Winning Arbitration Battle in the Connecticut Supreme Court Regarding Historic Home Restoration Costs Still Leaves Insurer Defending Legal War in State Trial Court']
