Adapted from the notebook found at [How to Build a Law Bot](https://lawyerist.com/how-build-law-bot/)

My twitter bot will take a quote from [Notable Law Quotes](http://www.notable-quotes.com/l/law_quotes.html). [Notable Law Quotes](http://www.notable-quotes.com/l/law_quotes.html) lists quotes in order of "most liked" to "least liked." As the most liked quote changes, my bot will tweet the new most liked quote. My regular expression should take the quote, the author, and where the quote appears--such as in which book it appears--and string them together and save it into a [google drive spread sheet I created](https://docs.google.com/spreadsheets/d/1-teq-x4A1ITVOcQQhZEysoDziNqdl2ZRwHGiipJHDUk/edit#gid=0). After that, it should tweet the quote in a single tweet to [this twitter page](https://twitter.com/LawQuotes2?lang=en). It could become a problem should a quote be more than 140 characters. I am not sure what will happen in that case. To ensure my bot complies with the CFAA, I added "robots.txt" to the url to see if bots had permission. Notable Law Quotes doesn't have a robots.txt page. I then searched through its [privacy page](http://www.notable-quotes.com/privacy_policy.html). It appears the website wants you to share the quotes because it describes no prohibition, and does not have a terms of service page that I can find. 

In [16]:
# Load the module for visiting and reading websites.
import urllib.request
# Load the module for running regular expressions (regex).
import re 
# Load the module for date and time stuff.
import datetime
# Define the variable now as equal to the current date and time.
now = datetime.datetime.now()

In [17]:
# Set the URL you want to scrape.
url_1 = "http://www.notable-quotes.com/l/law_quotes.html"

# If you want to scrape data from multiple pages, you can, 
# just replicate the above and below but change url_1 to url_2 et al.

In [18]:
# Load the module for accessing Google Sheets.
import gspread
# Load the module needed for securely communicating with Google Sheets.
from oauth2client.service_account import ServiceAccountCredentials
# The scope for your access credentials
scope = ['https://spreadsheets.google.com/feeds']

# Your spreadsheet's ID
document_key = "1-teq-x4A1ITVOcQQhZEysoDziNqdl2ZRwHGiipJHDUk" 
#              ^^^^^^^^^^^ SWAP OUT FOR YOUR DOCUMENT ID/KEY
# Your Google project's .json key
credentials = ServiceAccountCredentials.from_json_keyfile_name('../../../../../Twitterbot-f07c559e66ba.json', scope)
#                                                                              ^^^^^^^^ SWAP OUT FOR YOUR JSON KEY
# Use your credentials to authorize yourself.
gc = gspread.authorize(credentials)
# Open up the Sheet with the defined ID.
wks = gc.open_by_key(document_key)

#########################################
#
#  NOTE: The name of the sheet you are 
#  trying to access should be in the 
#  parenthetical below (e.g., Data). By
#  Default this is probably "Sheet1".
#
#########################################
worksheet = wks.worksheet("Sheet1")

# Count the number of rows in your Sheet &
# resize to remove blank rows.
worksheet.resize(worksheet.row_count)

In [19]:
# Import the relevant Twitter libraries so you can use Twitter.
import twitter
from twitter import TwitterError

# create the following four text files and add them to the same diretctry as you 
# Google API key. In each file add the appropriate value found when retrieving your 
# Twitter credentials

with open('../../../../../key.txt', 'r') as myfile:
    key=myfile.read()
    
with open('../../../../../secret.txt', 'r') as myfile:
    secret=myfile.read()
    
with open('../../../../../token_key.txt', 'r') as myfile:
    token_key=myfile.read()

with open('../../../../../token_secret.txt', 'r') as myfile:
    token_secret=myfile.read()

# Set you Twitter API credentials.
api = twitter.Api(consumer_key=key,
                  consumer_secret=secret,
                  access_token_key=token_key,
                  access_token_secret=token_secret)

## Read the contents of your first webpage

When you run the next cell, your program will visit the first URL you defined above. It will then print out that page's HTML. 

In [20]:
p_1 = urllib.request.build_opener(urllib.request.HTTPCookieProcessor).open(url_1).read()
print(p_1)



# Two Data Points, One Match

---------------------------
## Parse the site's contents

In [21]:
res_1 = re.search(b"<p class=\"quotation\">([^<]*)<\/p><p class=\"attribution\">([^,]*),\s*<em>([^<]*)",p_1)
output_1 = res_1.group(1).decode('UTF-8')
print(output_1)
output_2 = res_1.group(2).decode('UTF-8')
print(output_2)
output_3 = res_1.group(3).decode('UTF-8')
print(output_3)

Laws grind the poor, and rich men rule the law.
OLIVER GOLDSMITH
The Traveller


## Post to Twitter and Save to Google (Two Data Point, One Match)

In [23]:
if (res_1 and (worksheet.row_values(worksheet.row_count)[1]) != output_1
          and (worksheet.row_values(worksheet.row_count)[2]) != output_2
          and (worksheet.row_values(worksheet.row_count)[3]) != output_3):
    # same as above but now comparing two values
    
    try:
        # Post to Twitter.
        status = api.PostUpdate('"%s" by: %s in: %s'%(output_1,output_2,output_3))
        print(status.text)
    except TwitterError:
        # Post to Twitter.
        status = api.PostUpdate('"%s" by, %s in, %s'%(output_1,output_2,output_3))
        print(status.text)

    # Save to Google only after Tweeting
    worksheet.append_row([now,output_1,output_2,output_3])