## DIC Lab 2 - Common Crawl Data Extraction

Submitted by Aniruddha Sinha (asinha6@buffalo.edu, 50289428) and Shashank Dhar (sdhar2@buffalo.edu, 50289718)

### Import the necessary librariers

In [22]:
import requests
import argparse
import time
import json
import StringIO
import gzip
import csv
import codecs
import csv

In [23]:
from bs4 import BeautifulSoup

### Domain for Common Crawl is www.huffpost.com - has abundant data on political affairs and activities. Index chosen in Week 13 of 2019. 

In [24]:
domain = 'https://www.huffpost.com/news/politics'
index_list = ["2019-13"]

### Function to scrape data on the particular domain and obtain records

In [25]:
def searchInDomain(domain):
    CC_records = []
    
#     print "\n\nSearching content in %s domain" %domain
    
    for index in index_list:
#         print "\nSeeking data from domain: %s" %domain
        
        cc_url = "http://index.commoncrawl.org/CC-MAIN-%s-index?" %index
        cc_url += "url=%s&matchType=domain&output=json" %domain
        
        response = requests.get(cc_url)
        
        if response.status_code == 200:
            records = response.content.splitlines()
            
            for record in records:
                CC_records.append(json.loads(record))
                
#             print "\nAdded %d results." %len(records)
            
#         print "\n\n\n"
#     print "\nTotal hits = %d on domain ." %len(CC_records)
    
    return CC_records

### Following function visits AWS's Common Crawl repository to download all the HTML content scraped so far for the particular domain. Further, the content is extracted from the gzip format in which it is stored, cleaned and passed as a response to the next function that extracts all related links from the page.

In [26]:
def download_htmlPage(record):

    offset, length = int(record['offset']), int(record['length'])
    offset_end = offset + length - 1
    
    cc_aws_address = 'https://commoncrawl.s3.amazonaws.com/'

    aws_response = requests.get(cc_aws_address + record['filename'], headers={'Range': 'bytes={}-{}'.format(offset, offset_end)})
    
    raw_data = StringIO.StringIO(aws_response.content)
    f = gzip.GzipFile(fileobj=raw_data)
    pageData = f.read()
    
    response = ""

    if len(pageData):
        try:
            warc, header, response = pageData.strip().split('\r\n\r\n', 2)
        except:
            pass
        
    return response      

### Following function takes the page content from all HTML pages extracted from the Common Crawl repository, passes it to Beautiful Soup to convert it to json content and finds all the links related to the domain name. It returns a list of all such links obtained. 

In [27]:
def extractAllExternalLinks(page_content, links):
    
    parser = BeautifulSoup(page_content)
    
#     print "\n\nRunning extractAllExternalLinks"
    found_links = parser.find_all("a")
    
    if found_links:
        for link in found_links:
            href = link.attrs.get("href")
            
            if (href is not None) and (domain not in href) and (href not in links):
                if href.startswith("http"):
#                     print "\nFound external domain link: %s" %href
                    links.append(href)
    
#     print "\n Exiting extract_external_links"
    return links

### Following function finally writes the links to a CSV file row by row. 

In [28]:
def writeToCSV(links):
    count = 0
    with codecs.open("politics_US_CC.csv", "a", encoding="utf-8") as output:
        fields = ["URL"]

        logger = csv.DictWriter(output, fieldnames=fields)
        logger.writeheader()

        for link in links:
            logger.writerow({"URL":link})
            if count >= 1500:
                break
            count += 1

### We start calling the functions to get the Common Crawl Data. 

In [29]:
all_records = searchInDomain(domain)
links = []
check = 0

### Get data for a certain number of records found from searching the domain. We check for 5000 records. Also, we write the URLs to a CSV file. 

In [30]:
for i in range(1500):
    record = all_records[i]
    
    page_content = download_htmlPage(record)
#     print "[*] Retrieved %d bytes for %s" %(len(page_content), record['url'])  
    links = extractAllExternalLinks(page_content, links)
    check += 1
    print "\n\nRUNNING ", check, " iteration."
    
print "Total domain related links discovered: %d" %len(links)

writeToCSV(links)




RUNNING  1  iteration.


RUNNING  2  iteration.


RUNNING  3  iteration.


RUNNING  4  iteration.


RUNNING  5  iteration.


RUNNING  6  iteration.


RUNNING  7  iteration.


RUNNING  8  iteration.


RUNNING  9  iteration.


RUNNING  10  iteration.


RUNNING  11  iteration.


RUNNING  12  iteration.


RUNNING  13  iteration.


RUNNING  14  iteration.


RUNNING  15  iteration.


RUNNING  16  iteration.


RUNNING  17  iteration.


RUNNING  18  iteration.


RUNNING  19  iteration.


RUNNING  20  iteration.


RUNNING  21  iteration.


RUNNING  22  iteration.


RUNNING  23  iteration.


RUNNING  24  iteration.


RUNNING  25  iteration.


RUNNING  26  iteration.


RUNNING  27  iteration.


RUNNING  28  iteration.


RUNNING  29  iteration.


RUNNING  30  iteration.


RUNNING  31  iteration.


RUNNING  32  iteration.


RUNNING  33  iteration.


RUNNING  34  iteration.


RUNNING  35  iteration.


RUNNING  36  iteration.


RUNNING  37  iteration.


RUNNING  38  iteration.


RUNNING  39  iterat



RUNNING  309  iteration.


RUNNING  310  iteration.


RUNNING  311  iteration.


RUNNING  312  iteration.


RUNNING  313  iteration.


RUNNING  314  iteration.


RUNNING  315  iteration.


RUNNING  316  iteration.


RUNNING  317  iteration.


RUNNING  318  iteration.


RUNNING  319  iteration.


RUNNING  320  iteration.


RUNNING  321  iteration.


RUNNING  322  iteration.


RUNNING  323  iteration.


RUNNING  324  iteration.


RUNNING  325  iteration.


RUNNING  326  iteration.


RUNNING  327  iteration.


RUNNING  328  iteration.


RUNNING  329  iteration.


RUNNING  330  iteration.


RUNNING  331  iteration.


RUNNING  332  iteration.


RUNNING  333  iteration.


RUNNING  334  iteration.


RUNNING  335  iteration.


RUNNING  336  iteration.


RUNNING  337  iteration.


RUNNING  338  iteration.


RUNNING  339  iteration.


RUNNING  340  iteration.


RUNNING  341  iteration.


RUNNING  342  iteration.


RUNNING  343  iteration.


RUNNING  344  iteration.


RUNNING  345  iteration.





RUNNING  613  iteration.


RUNNING  614  iteration.


RUNNING  615  iteration.


RUNNING  616  iteration.


RUNNING  617  iteration.


RUNNING  618  iteration.


RUNNING  619  iteration.


RUNNING  620  iteration.


RUNNING  621  iteration.


RUNNING  622  iteration.


RUNNING  623  iteration.


RUNNING  624  iteration.


RUNNING  625  iteration.


RUNNING  626  iteration.


RUNNING  627  iteration.


RUNNING  628  iteration.


RUNNING  629  iteration.


RUNNING  630  iteration.


RUNNING  631  iteration.


RUNNING  632  iteration.


RUNNING  633  iteration.


RUNNING  634  iteration.


RUNNING  635  iteration.


RUNNING  636  iteration.


RUNNING  637  iteration.


RUNNING  638  iteration.


RUNNING  639  iteration.


RUNNING  640  iteration.


RUNNING  641  iteration.


RUNNING  642  iteration.


RUNNING  643  iteration.


RUNNING  644  iteration.


RUNNING  645  iteration.


RUNNING  646  iteration.


RUNNING  647  iteration.


RUNNING  648  iteration.


RUNNING  649  iteration.





RUNNING  917  iteration.


RUNNING  918  iteration.


RUNNING  919  iteration.


RUNNING  920  iteration.


RUNNING  921  iteration.


RUNNING  922  iteration.


RUNNING  923  iteration.


RUNNING  924  iteration.


RUNNING  925  iteration.


RUNNING  926  iteration.


RUNNING  927  iteration.


RUNNING  928  iteration.


RUNNING  929  iteration.


RUNNING  930  iteration.


RUNNING  931  iteration.


RUNNING  932  iteration.


RUNNING  933  iteration.


RUNNING  934  iteration.


RUNNING  935  iteration.


RUNNING  936  iteration.


RUNNING  937  iteration.


RUNNING  938  iteration.


RUNNING  939  iteration.


RUNNING  940  iteration.


RUNNING  941  iteration.


RUNNING  942  iteration.


RUNNING  943  iteration.


RUNNING  944  iteration.


RUNNING  945  iteration.


RUNNING  946  iteration.


RUNNING  947  iteration.


RUNNING  948  iteration.


RUNNING  949  iteration.


RUNNING  950  iteration.


RUNNING  951  iteration.


RUNNING  952  iteration.


RUNNING  953  iteration.





RUNNING  1213  iteration.


RUNNING  1214  iteration.


RUNNING  1215  iteration.


RUNNING  1216  iteration.


RUNNING  1217  iteration.


RUNNING  1218  iteration.


RUNNING  1219  iteration.


RUNNING  1220  iteration.


RUNNING  1221  iteration.


RUNNING  1222  iteration.


RUNNING  1223  iteration.


RUNNING  1224  iteration.


RUNNING  1225  iteration.


RUNNING  1226  iteration.


RUNNING  1227  iteration.


RUNNING  1228  iteration.


RUNNING  1229  iteration.


RUNNING  1230  iteration.


RUNNING  1231  iteration.


RUNNING  1232  iteration.


RUNNING  1233  iteration.


RUNNING  1234  iteration.


RUNNING  1235  iteration.


RUNNING  1236  iteration.


RUNNING  1237  iteration.


RUNNING  1238  iteration.


RUNNING  1239  iteration.


RUNNING  1240  iteration.


RUNNING  1241  iteration.


RUNNING  1242  iteration.


RUNNING  1243  iteration.


RUNNING  1244  iteration.


RUNNING  1245  iteration.


RUNNING  1246  iteration.


RUNNING  1247  iteration.


RUNNING  1248  ite

#### This function takes in a HTML page content, passes it to Beautiful soup, cleans it by removing all JS and CSS content and extracts only the relevant body content with the paragraphs. Returns the cleaned content. 

In [31]:
def cleanHTMLData(requested):
    soup = BeautifulSoup(requested.content, "html.parser")
    
    for script in soup(["script", "style"]):
        script.extract()
    
    text = []
    paras = soup.find_all("p")
    
    for i in range(len(paras)):
        if paras[i].text != 'Advertisement' and paras[i] != 'Supported by':
            text.append(paras[i].text)

#     for script in soup(["script", "style"]):
#         script.extract()

#     HTML_text = soup.get_text()
    
    HTML_text = " ".join(text)
    
    return HTML_text

#### Following part of code takes every URL from the CSV file, requests its content and cleans it after Beautiful Soup and writes the cleaned content to a txt file for MR and analysis. 

In [34]:
f1 = codecs.open("politics_US_CC.txt", "a+", encoding = 'utf-8')
# headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

count = 0
rowCount = 0
with open ("politics_US_CC.csv", "r") as f:
    rowCount = len(f.readlines())
    
with open ("politics_US_CC.csv", "r") as f:
    reader = csv.reader(f)    
    row = next(reader)


    for i in range(rowCount):
        row = next(reader)
        res = row[0]
        
        print "\n\nCurrent Link:   ", res
        print "\nRequesting URL content"
        
        try:
            r = requests.get(res)
        except:
            continue

#         r = requests.get(res)

        content = cleanHTMLData(r)
        f1.write("\n" + content + "\n\n\n\n\n\n")
        count += 1

f1.close()



Current Link:    https://www.huffpost.com

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2F

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2F

Requesting URL content


Current Link:    https://www.huffpost.com/entry/jeff-bezos-national-enquirer-michael-sanchez_n_5c904a40e4b04ed2c1ad9825

Requesting URL content


Current Link:    https://www.huffpost.com/entry/steve-king-civil-war-graphic_n_5c8ef5b9e4b03e83bdc25c86

Requesting URL content


Current Link:    https://www.huffpost.com/entry/kellyanne-conway-new-zealand-killer-white-supremacy-manifesto_n_5c8fa75fe4b0d7f6b0f61cc9

Requesting URL content


Current Link:    https://www.huffpost.com/entry/trump-raids-military_n_5c901777e4b0d50544fee195

Requesting URL content


Current Link:    https://www.huffpost.com/entry/trump-foundation-ny-ag-fine_n_5c8ef7bbe4b0db7da9f4cf3a

Requesting URL content


Current Lin



Current Link:    https://www.huffpost.com/impact/business

Requesting URL content


Current Link:    https://www.huffpost.com/impact/green

Requesting URL content


Current Link:    https://www.huffpost.com/section/health

Requesting URL content


Current Link:    https://www.huffpost.com/impact/topic/social-justice

Requesting URL content


Current Link:    https://www.huffpost.com/news/topic/us-congress

Requesting URL content


Current Link:    https://www.huffpost.com/news/topic/donald-trump

Requesting URL content


Current Link:    https://www.huffpost.com/news/topic/2018-elections

Requesting URL content


Current Link:    https://www.huffpost.com/news/topic/extremism

Requesting URL content


Current Link:    https://www.huffpost.com/entertainment/

Requesting URL content


Current Link:    https://www.huffpost.com/entertainment/arts

Requesting URL content


Current Link:    https://www.huffpost.com/entertainment/celebrity

Requesting URL content


Current Link:    https://w



Current Link:    https://www.huffpost.com/entry/trump-special-counsel-robert-mueller-investigation_n_5c5c9891e4b03afe8d65ada3

Requesting URL content


Current Link:    https://www.huffpost.com/entry/democrats-immediate-release-mueller-report_n_5c95524ee4b01ebeef0f3b74

Requesting URL content


Current Link:    https://www.huffpost.com/entry/trump-investigation-campaign-finance-charges_n_5c952574e4b01ebeef0eface

Requesting URL content


Current Link:    https://www.huffpost.com/entry/rupert-murdoch-executive-quit-fox-news-anti-immigrant-islamophobic-coverage_n_5c94ca5de4b0a6329e155dec

Requesting URL content


Current Link:    https://www.huffpost.com/entry/joe-biden-stacey-abrams_n_5c952b88e4b0a6329e16565f

Requesting URL content


Current Link:    https://www.huffpost.com/entry/beto-orourke-bernie-sanders-democratic-fundraising-primary_n_5c941176e4b01ebeef0d1cae

Requesting URL content


Current Link:    https://www.huffpost.com/author/daniel-marans

Requesting URL content


Curre



Current Link:    https://www.huffpost.com/entry/maryland-15-minimum-wage-bill_n_5c929006e4b0d952b222b9e5

Requesting URL content


Current Link:    https://www.huffpost.com/entry/machelle-hackney-abusive-youtube-channel_n_5c9225cfe4b0f7ed945e215f

Requesting URL content


Current Link:    https://www.huffpost.com/entry/tomi-lahren-liberal-indoctrination-school_n_5c925740e4b0d952b2222e94

Requesting URL content


Current Link:    https://www.huffpost.com/author/ron-dicker

Requesting URL content


Current Link:    https://www.huffpost.com/entry/beto-orourke-24-hours-fundraising-record_n_5c8f7d31e4b0d7f6b0f5b082

Requesting URL content


Current Link:    https://www.huffpost.com/entry/ilhan-omar-israel-palestine_n_5c8f7004e4b0db7da9f582e6

Requesting URL content


Current Link:    https://www.huffpost.com/entry/why-did-kamala-harris-let-herbalife-off-the-hook_n_5c8fab16e4b03e83bdc39a37

Requesting URL content


Current Link:    https://www.huffpost.com/author/christopher-mathias

Reque



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Farchive%2F2019-03-22

Requesting URL content


Current Link:    https://www.huffpost.com/entry/zodiac-erogenous-zones-sexual-pleasure-astrology_l_5c93cab1e4b0e9efc8b59ab3

Requesting URL content


Current Link:    https://www.huffpost.com/author/-brittany-wong

Requesting URL content


Current Link:    https://www.huffpost.com/entry/devincow-cowsuit-protest-fresno-event_n_5c940000e4b0a6329e142d67

Requesting URL content


Current Link:    https://www.huffpost.com/author/carla-herreria

Requesting URL content


Current Link:    https://www.huffpost.com/entry/oil-industry-climate_n_5c940962e4b0a6329e144e6c

Requesting URL content


Current Link:    https://www.huffpost.com/author/alexander-c-kaufman

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2F-brittany-wong

Requesting URL content


Current Link:    https://oidc.huffpost.c



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fabby-maxman

Requesting URL content


Current Link:    https://www.twitter.com/abbymaxman

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fabby-miller-king-216

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fabby-miller-king-216

Requesting URL content


Current Link:    https://www.twitter.com/abbykingwriter

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fabby-smith-rumsey

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fabby-smith-rumsey

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fabby-sugar

Requesting URL content


Cu



Current Link:    https://www.twitter.com/happinessplunge

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadam-banner

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadam-banner

Requesting URL content


Current Link:    https://www.twitter.com/OKCDefenseLaw

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadam-c-levine

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadam-c-levine

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadam-forrest

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadam-forrest

Requesting URL content


Current Link:    http



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadinaciment-173

Requesting URL content


Current Link:    https://www.twitter.com/aciment

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadjoro-914

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadjoro-914

Requesting URL content


Current Link:    https://www.twitter.com/adjoro

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadmin-238

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadmin-238

Requesting URL content


Current Link:    https://www.twitter.com/adigaskell

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fadmi



Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Faimee-chan

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Faimee-chan

Requesting URL content


Current Link:    https://www.twitter.com/suitcasesstroll

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Faimee-heckel

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Faimee-heckel

Requesting URL content


Current Link:    https://www.twitter.com/Aimeemay

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Faitda001-993

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Faitda001-993

Requesting URL content


Current Link:    https://oidc.



Current Link:    https://www.twitter.com/alanamoceri

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falana-pace

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falana-pace

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falanblack02-239

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falanblack02-239

Requesting URL content


Current Link:    https://www.twitter.com/alanblackwriter

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falankohll-296

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falankohll-296

Requesting URL content


Current Link:    ht



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falex-simon

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falex-stewart

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falex-stewart

Requesting URL content


Current Link:    http://www.putnielsingoal.com

Requesting URL content


Current Link:    https://oidc.huffpost.com/login

Requesting URL content


Current Link:    https://oidc.huffpost.com/create

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falexander-221

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falexander-221

Requesting URL content


Current Link:    https://www.twitter.com/alexkjerulf

Requesting URL content


Current Link:    



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falexpirouz-678

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falf-lamont

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falf-lamont

Requesting URL content


Current Link:    https://www.twitter.com/alflamont

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falfred-ryan-nerz

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falfred-ryan-nerz

Requesting URL content


Current Link:    http://www.amazon.com/Falling-Ryan-Love-Stories-No/dp/0553492527

Requesting URL content


Current Link:    http://www.thedailyshow.com/watch/tue-april-18-2006/ryan-nerz

Requesting URL content


Current Link:    http://



Current Link:    https://www.twitter.com/alisonchino

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falison-malmon

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falison-malmon

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falison-patton

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falison-patton

Requesting URL content


Current Link:    https://www.twitter.com/LemonadeDivorce

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falison-rose-levy

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Falison-rose-levy

Requesting URL content


Current Lin



Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fallisonabrams04-278

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fallisonabrams04-278

Requesting URL content


Current Link:    https://www.twitter.com/alliabramslcsw

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fallure-magazine

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fallure-magazine

Requesting URL content


Current Link:    https://www.twitter.com/Allure_Magazine

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fally-647

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fally-647

Requesting URL content


Current 



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famanda-gardner

Requesting URL content


Current Link:    https://www.twitter.com/amandasgardner

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famanda-golden

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famanda-golden

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famanda-heijbel-robin-jakobsson

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famanda-heijbel-robin-jakobsson

Requesting URL content


Current Link:    http://www.mangomanjaro.se

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famanda-l-chan

Requesting URL co



Current Link:    http://www.mommysmetime.com/

Requesting URL content


Current Link:    https://www.instagram.com/ambermkuiper/

Requesting URL content


Current Link:    https://www.facebook.com/momsmetime

Requesting URL content


Current Link:    http://www.periscope.com/ambermkuiper

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famber-sabathia

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famber-sabathia

Requesting URL content


Current Link:    https://www.twitter.com/AmberSabathia

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fambmcg-507

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fambmcg-507

Requesting URL content


Current Link:    https://www.twitter.com/ambmcg

Requesting 



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famy-jurkowitz

Requesting URL content


Current Link:    http://materialusa.com/home.htm

Requesting URL content


Current Link:    http://jurkowitz.com/

Requesting URL content


Current Link:    http://www.getmilkshake.com/

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famy-kurzweil

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famy-kurzweil

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famy-l-freeman

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Famy-l-freeman

Requesting URL content


Current Link:    https://www.twitter.com/FreemanAmyL

Requesting URL content


Current Link:    https://o



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrea-jarrell

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrea-jimenez-rael

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrea-jimenez-rael

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrea-rhoades

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrea-rhoades

Requesting URL content


Current Link:    https://www.twitter.com/selfiesselfless

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrea-smith

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.



Current Link:    https://www.twitter.com/andrewmerle

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrew-offenburger

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrew-offenburger

Requesting URL content


Current Link:    http://www.persistentfrontiers.com

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrew-perez

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrew-perez

Requesting URL content


Current Link:    http://www.ibtimes.com/reporters/andrew-perez

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandrew-schmertz

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandy-watts-948

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandy-worthington

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandy-worthington

Requesting URL content


Current Link:    http://www.amazon.com/Guantanamo-Files-Stories-Detainees-Americas/dp/0745326641%3FSubscriptionId%3D15VEWHERF6Q30X94NX82%26tag%3Dthehuffingtop-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0745326641

Requesting URL content


Current Link:    http://www.andyworthington.co.uk/outside-the-law-stories-from-guantanamo/

Requesting URL content


Current Link:    http://www.andyworthington.co.uk/

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fandybard-842

Requesting



Current Link:    https://www.twitter.com/AnjaliSareen

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fanjhula-345

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fanjhula-345

Requesting URL content


Current Link:    https://www.twitter.com/Anjhula

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fanju-bhargava

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fanju-bhargava

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fann-brenoff

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fann-brenoff

Requesting URL content


Current Link:    https://www.twi



Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fanna-szymanski

Requesting URL content


Current Link:    https://www.twitter.com/three_guineas

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fannabelle-buggle

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fannabelle-buggle

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fannaleahy-849

Requesting URL content


Current Link:    https://oidc.huffpost.com/create?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fannaleahy-849

Requesting URL content


Current Link:    https://www.twitter.com/GenerationSpace

Requesting URL content


Current Link:    https://oidc.huffpost.com/login?dest=https%3A%2F%2Fwww.huffpost.com%2Fauthor%2Fannasowinski-429

Requesting URL content


Curre

StopIteration: 