# Scraping Web Data with Beautiful Soup

## Objectives

1. Use `requests` to download a web page
2. Use `BeautifulSoup` to parse the web page
3. Inspect elements in a browser to locate the desired information
3. Use various search methods to navigate the web page
    1. `find`
    2. `find_all`
    3. `tag-like methods`
    4. search with function call

**Source:** some material adapted from http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html

## Web pages and the DOM

* Text files
* Use html tags 
* Have a tree structure
    * Called the Document Object Model (DOM)

<img src="http://www.openbookproject.net/tutorials/getdown/css/images/lesson4/HTMLDOMTree.png">

<img src="http://www.cs.toronto.edu/~shiva/cscb07/img/dom/treeStructure.png">

## Using Firefox or Chrome to inspect a page

* Right click on something and select
    * Firefox: Inspect Element
    * Chrome: Inspect
* Opens a representation of the DOM
* Mouse over elements to highlight corresponding parts of the page.

## <font color="red"> Exercise 1</font>

Inspect the following part of https://en.wikipedia.org/wiki/Web_scraping

* The table of contents
* The section headers
* A link

## HTML tags

* Use `<` and `>`
* Most have beginning and end tags
    `<p> a paragraph </p>`
* Some common tags
    * `<div>`
    * `<span>`
    * `<a>` (link)
    * `<img>`
* Can contain other attributes
    * `<img src="my_image.png">`
    * `<div class="some-identifier">`

## Using `requests` to download a raw web page

* `requests` makes it easy to programmically navigate a website
* Three steps
    * Create a session
    * Use `get` method to get the result
    * Use `content` on the results to see the website as a string

In [12]:
import requests

s = requests.Session()
r = s.get('https://en.wikipedia.org/wiki/Web_scraping')
r.content[:1000]

b'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Web scraping - Wikipedia</title>\n<script>document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );</script>\n<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Web_scraping","wgTitle":"Web scraping","wgCurRevisionId":775652305,"wgRevisionId":775652305,"wgArticleId":2696619,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with limited geographic scope from October 2015","USA-centric","Web scraping","World Wide Web","Spamming"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","Janua

## Using Beautiful Soup (`bs4`) to parse a page

* Module is `bs4`
* `BeautifulSoup` takes the content from the `requests` result
* Parses and adds search tools

In [13]:
import bs4

soup = bs4.BeautifulSoup(r.content, 'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Web scraping - Wikipedia
  </title>
  <script>
   document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );
  </script>
  <script>
   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Web_scraping","wgTitle":"Web scraping","wgCurRevisionId":775652305,"wgRevisionId":775652305,"wgArticleId":2696619,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with limited geographic scope from October 2015","USA-centric","Web scraping","World Wide Web","Spamming"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonth

## Using `find` to find the first instance

* Look for the first instance
* Recursive search
* First argument is the html tag type
* Add additional information as needed
    * Frequently use `class_` for class
    * (class is a special python statement)

In [25]:
tag = soup.find('div', class_='toctitle')
tag

<div class="toctitle" id="toctitle">
<h2>Contents</h2>
</div>

## Pulling attribute information from a tag

* We use something like indexing to access the information in a tag.

In [26]:
tag

<div class="toctitle" id="toctitle">
<h2>Contents</h2>
</div>

In [28]:
tag['id']

'toctitle'

## Using tag attributes to get tags of a certain type

* allow access to the next embedded tags 
* using special html tag attributes


In [38]:
tag = soup.find('div', class_="mw-jump")
tag

<div class="mw-jump" id="jump-to-nav">
					Jump to:					<a href="#mw-head">navigation</a>, 					<a href="#p-search">search</a>
</div>

In [40]:
tag.a

<a href="#mw-head">navigation</a>

## Searching for text

* Web page text is a string
* Use the `string=` argument to search for any text

In [48]:
soup.find(string="Contents")

'Contents'

## Use `find_all` to find all instances of a tag

* Called the same way as `find`
* Returns a list of tags
    * Process with a comprehension

In [49]:
soup.find_all('a')[:5]

[<a id="top"></a>,
 <a href="#mw-head">navigation</a>,
 <a href="#p-search">search</a>,
 <a href="/wiki/Data_scraping" title="Data scraping">Data scraping</a>,
 <a href="/wiki/Data_scraping" title="Data scraping">data scraping</a>]

In [52]:
# Short cut tag( args) == tag.find_all(args)
soup('a')[:5]

[<a id="top"></a>,
 <a href="#mw-head">navigation</a>,
 <a href="#p-search">search</a>,
 <a href="/wiki/Data_scraping" title="Data scraping">Data scraping</a>,
 <a href="/wiki/Data_scraping" title="Data scraping">data scraping</a>]

## Important note

Always use `find` when the page/subpage has exactly one of something

## Searching for parents

* Use `find_parent` and `find_parents` to move back up the tree

In [58]:
# We know the Contents is in the toc
soup.find(string="Contents")

'Contents'

In [64]:
# Keep stepping up the tree until we have the whole toc
# Not yet
soup.find(string="Contents").find_parent('div')

<div class="toctitle" id="toctitle">
<h2>Contents</h2>
</div>

In [69]:
toc = soup.find(string="Contents").find_parent('div').find_parent('div')
print(toc.prettify())

<div class="toc" id="toc">
 <div class="toctitle" id="toctitle">
  <h2>
   Contents
  </h2>
 </div>
 <ul>
  <li class="toclevel-1 tocsection-1">
   <a href="#Techniques">
    <span class="tocnumber">
     1
    </span>
    <span class="toctext">
     Techniques
    </span>
   </a>
   <ul>
    <li class="toclevel-2 tocsection-2">
     <a href="#Human_copy-and-paste">
      <span class="tocnumber">
       1.1
      </span>
      <span class="toctext">
       Human copy-and-paste
      </span>
     </a>
    </li>
    <li class="toclevel-2 tocsection-3">
     <a href="#Text_pattern_matching">
      <span class="tocnumber">
       1.2
      </span>
      <span class="toctext">
       Text pattern matching
      </span>
     </a>
    </li>
    <li class="toclevel-2 tocsection-4">
     <a href="#HTTP_programming">
      <span class="tocnumber">
       1.3
      </span>
      <span class="toctext">
       HTTP programming
      </span>
     </a>
    </li>
    <li class="toclevel-2 tocsection-5

## Using list comprehensions

* `find_all` returns a list of soup objects.
* Use a comprehension to process all tags
    * Start with on example tag
    * Put the resulting expression in a comprehension

In [80]:
tags = toc.find_all('span', class_="toctext")
tags[:3]

[<span class="toctext">Techniques</span>,
 <span class="toctext">Human copy-and-paste</span>,
 <span class="toctext">Text pattern matching</span>]

In [74]:
example_tag = tags[0]
example_tag

<span class="toctext">Techniques</span>

In [77]:
example_tag.next

'Techniques'

In [81]:
sections = [tag.next for tag in toc.find_all('span', 'toctext')]
sections[:3]

['Techniques', 'Human copy-and-paste', 'Text pattern matching']

## More information

There is much more to Beautiful Soup, take a look at the documentation for more information.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#going-down

# Case Study - The Current

* The Current is an alternative radio station
* We will pull information about the play list.

## <font color="red">Exercise - Go the the following page and inspect the following </font>

* Song title
* Artist
* Play time
* Day, date, period (am/pm)

http://www.thecurrent.org/playlist/2014-01-01/01

In [85]:
import requests
import bs4
import datetime

In [86]:
example_url = 'http://www.thecurrent.org/playlist/2014-01-01/01'
s = requests.Session()
r = s.get(example_url)

soup = bs4.BeautifulSoup(r.content, "lxml")

# Pull off the period of the day (am/pm)

Pull out the "am"/"pm"

1. Inspect the element
2. Identify the html tag and class
3. Search the soup
    1. There should be one item returned
4. Use soup\string methods to pull out the info

In [91]:
# I should have used find here...Why?
soup('span', class_="hour-header open")

[<span class="hour-header open">
      1:00 am to  2:00 am
   </span>]

In [92]:
len(soup('span', class_="hour-header open"))

1

In [93]:
soup('span', class_="hour-header open")[0]

<span class="hour-header open">
     1:00 am to  2:00 am
  </span>

In [95]:
soup('span', class_="hour-header open")[0].next

'\n     1:00 am to  2:00 am\n  '

In [96]:
soup('span', class_="hour-header open")[0].next.split('to')

['\n     1:00 am ', '  2:00 am\n  ']

In [97]:

soup('span', class_="hour-header open")[0].next.split('to')[0]

'\n     1:00 am '

In [98]:
soup('span', class_="hour-header open")[0].next.split('to')[0].rstrip()

'\n     1:00 am'

In [99]:
period = soup('span', class_="hour-header open")[0].next.split('to')[0].rstrip()[-2:]
period

'am'

## <font color="red"> Breakout activity </font>

* Pull out the day of the week

In [12]:
soup('a', class_="start-picker")


[<a class="start-picker" href="#archives">Wednesday, Jan 01, 2014</a>]

In [13]:
soup('a', class_="start-picker")[0]

<a class="start-picker" href="#archives">Wednesday, Jan 01, 2014</a>

In [14]:
soup('a', class_="start-picker")[0].next

'Wednesday, Jan 01, 2014'

In [16]:
soup('a', class_="start-picker")[0].next.split(',')

['Wednesday', ' Jan 01', ' 2014']

In [17]:
day_of_week = soup('a', class_="start-picker")[0].next.split(',')[0]
day_of_week

'Wednesday'

# Title of each song

## Work through it together

1. Inspect the element
2. Identify the html tag and class
3. Use `soup.findAll` to make a list of all relevant tags
4. Pull off an example case
5. Use soup/string methods to pull out the title
6. Use a list comprehension to process all tags

In [18]:
soup.findAll('h5', class_ = "title")

[<h5 class="title">Holy Roller
         <h5 class="artist">Thao and The Get Down Stay Down
       </h5></h5>, <h5 class="title">Kingdom of Rust
         <h5 class="artist">Doves
       </h5></h5>, <h5 class="title">Black Dog
         <h5 class="artist">Frankie Lee
       </h5></h5>, <h5 class="title">Turn It Around
         <h5 class="artist">Lucius
       </h5></h5>, <h5 class="title">Flavor of the Month
         <h5 class="artist">The Posies
       </h5></h5>, <h5 class="title">Potential Wife
         <h5 class="artist">Strange Names
       </h5></h5>, <h5 class="title">24 Hours
         <h5 class="artist">Sky Ferreira
       </h5></h5>, <h5 class="title">Who's Gonna Shoe Your Pretty Little Feet?
         <h5 class="artist">Billie Joe and Norah
       </h5></h5>, <h5 class="title">Marigold
         <h5 class="artist">J. Roddy Walston and The Business
       </h5></h5>, <h5 class="title">High Road
         <h5 class="artist">Cults
       </h5></h5>, <h5 class="title">The Vampyre Of Ti

In [19]:
example_tag =  soup.findAll('h5', class_ = "title")[0]
example_tag

<h5 class="title">Holy Roller
        <h5 class="artist">Thao and The Get Down Stay Down
      </h5></h5>

In [20]:
example_tag.next

'Holy Roller\n        '

In [21]:
example_tag.next.strip()

'Holy Roller'

In [22]:
titles = [tag.next.strip() for tag in soup.findAll('h5', class_ = "title")]
titles

['Holy Roller',
 'Kingdom of Rust',
 'Black Dog',
 'Turn It Around',
 'Flavor of the Month',
 'Potential Wife',
 '24 Hours',
 "Who's Gonna Shoe Your Pretty Little Feet?",
 'Marigold',
 'High Road',
 'The Vampyre Of Time and Memory',
 'Valerie Plame',
 'Morning Song',
 '(You Will) Set The World On Fire',
 'Sixteen Saltines',
 'Wave of Mutilation']

# Name of each artist

## Break out session

1. Inspect the element
2. Identify the html tag and class
3. Use `soup.findAll` to make a list of all relevant tags
4. Pull off an example case
5. Use soup/string methods to pull out the title
6. Use a list comprehension to process all tags

In [23]:
soup.findAll('h5',class_='artist')

[<h5 class="artist">Thao and The Get Down Stay Down
       </h5>, <h5 class="artist">Doves
       </h5>, <h5 class="artist">Frankie Lee
       </h5>, <h5 class="artist">Lucius
       </h5>, <h5 class="artist">The Posies
       </h5>, <h5 class="artist">Strange Names
       </h5>, <h5 class="artist">Sky Ferreira
       </h5>, <h5 class="artist">Billie Joe and Norah
       </h5>, <h5 class="artist">J. Roddy Walston and The Business
       </h5>, <h5 class="artist">Cults
       </h5>, <h5 class="artist">Queens of the Stone Age
       </h5>, <h5 class="artist">The Decemberists
       </h5>, <h5 class="artist">The Avett Brothers
       </h5>, <h5 class="artist">David Bowie
       </h5>, <h5 class="artist">Jack White
       </h5>, <h5 class="artist">Pixies
       </h5>]

In [24]:
example_tag =soup.findAll('h5',class_='artist')[0] 
example_tag

<h5 class="artist">Thao and The Get Down Stay Down
      </h5>

In [25]:
example_tag.next

'Thao and The Get Down Stay Down\n      '

In [26]:
example_tag.next.strip()

'Thao and The Get Down Stay Down'

In [27]:
artists = [tag.next.strip() for tag in  soup.findAll('h5',class_='artist')]
artists

['Thao and The Get Down Stay Down',
 'Doves',
 'Frankie Lee',
 'Lucius',
 'The Posies',
 'Strange Names',
 'Sky Ferreira',
 'Billie Joe and Norah',
 'J. Roddy Walston and The Business',
 'Cults',
 'Queens of the Stone Age',
 'The Decemberists',
 'The Avett Brothers',
 'David Bowie',
 'Jack White',
 'Pixies']

# Song Start Time

## Break out session

1. Inspect the element
    1. This one is tricky
    2. Time tag does not have a tag, but
    3. The surrounding div does have a class
2. Identify the html tag and class
3. Use `soup.findAll` to make a list of all relevant tags
4. Pull off an example case
5. Use soup/string methods to pull out the title
6. Use a list comprehension to process all tags

In [29]:
soup.findAll('time')

[<time datetime="2017-05-11">
               10 a.m.
             </time>, <time datetime="2017-05-11">
                2 p.m.
             </time>, <time datetime="2017-05-11">
                6 p.m.
             </time>, <time datetime="2017-05-11">
               10 p.m.
             </time>, <time datetime="2017-05-11">
               11 p.m.
             </time>, <time datetime="2017-05-12">5/12</time>, <time datetime="2017-05-18">5/18</time>, <time datetime="2017-09-29">9/29</time>, <time>  1:59 </time>, <time>  1:54 </time>, <time>  1:51 </time>, <time>  1:46 </time>, <time>  1:44 </time>, <time>  1:38 </time>, <time>  1:34 </time>, <time>  1:31 </time>, <time>  1:27 </time>, <time>  1:23 </time>, <time>  1:19 </time>, <time>  1:13 </time>, <time>  1:09 </time>, <time>  1:05 </time>, <time>  1:03 </time>, <time>  1:01 </time>]

In [30]:
tags = soup('div', class_="two columns songTime")
tags

[<div class="two columns songTime">
 <a href="#song226645">
 <time>  1:59 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song196069">
 <time>  1:54 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song229900">
 <time>  1:51 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song235779">
 <time>  1:46 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song132616">
 <time>  1:44 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song224268">
 <time>  1:38 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song236492">
 <time>  1:34 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song237794">
 <time>  1:31 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song234211">
 <time>  1:27 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a href="#song235959">
 <time>  1:23 </time>
 </a>
 </div>, <div class="two columns songTime">
 <a 

In [31]:
example_tag = tags[0]
example_tag 

<div class="two columns songTime">
<a href="#song226645">
<time>  1:59 </time>
</a>
</div>

In [32]:
example_tag.time

<time>  1:59 </time>

In [33]:
example_tag.time.next

'  1:59 '

In [34]:
example_tag.time.next.strip()

'1:59'

In [35]:
start_times = [tag.time.next.strip() for tag in  soup('div', class_="two columns songTime")]
start_times

['1:59',
 '1:54',
 '1:51',
 '1:46',
 '1:44',
 '1:38',
 '1:34',
 '1:31',
 '1:27',
 '1:23',
 '1:19',
 '1:13',
 '1:09',
 '1:05',
 '1:03',
 '1:01']

# Putting it all together

In [55]:
def get_period(soup):
    search = soup('span', class_="hour-header open")
    if len(search) > 0:
        return search[0].next.split('to')[0].rstrip()[-2:]
    else:
        return None

def get_day(soup):
    search = soup('a', class_="start-picker")
    if len(search) > 0:
        return search[0].next.split(',')[0]
    else:
        return None
    
def get_song_info(url):
    print("Starting {0} urls".format(url))
    date = url.split('/')[-2]
    s = requests.Session()
    r = s.get(url)
    soup = bs4.BeautifulSoup(r.content, 'lxml')
    period = get_period(soup)
    day_of_week = get_day(soup)
    soup = bs4.BeautifulSoup(r.content)
    titles = [t.next.strip() for t in soup.findAll('h5', class_="title")]
    artists = [a.next.strip() for a in soup.findAll('h5',class_='artist')]
    times = [d.time.next.strip() for d in soup('div', class_="two columns songTime")]
    song_info = [(day_of_week, date, time, period, title, artist) 
             for time, title, artist in zip(times, titles, artists)]
    return song_info

In [57]:
get_song_info(example_url)

Starting http://www.thecurrent.org/playlist/2014-01-01/01 urls




 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))


[('Wednesday',
  '2014-01-01',
  '1:59',
  'am',
  'Holy Roller',
  'Thao and The Get Down Stay Down'),
 ('Wednesday', '2014-01-01', '1:54', 'am', 'Kingdom of Rust', 'Doves'),
 ('Wednesday', '2014-01-01', '1:51', 'am', 'Black Dog', 'Frankie Lee'),
 ('Wednesday', '2014-01-01', '1:46', 'am', 'Turn It Around', 'Lucius'),
 ('Wednesday',
  '2014-01-01',
  '1:44',
  'am',
  'Flavor of the Month',
  'The Posies'),
 ('Wednesday', '2014-01-01', '1:38', 'am', 'Potential Wife', 'Strange Names'),
 ('Wednesday', '2014-01-01', '1:34', 'am', '24 Hours', 'Sky Ferreira'),
 ('Wednesday',
  '2014-01-01',
  '1:31',
  'am',
  "Who's Gonna Shoe Your Pretty Little Feet?",
  'Billie Joe and Norah'),
 ('Wednesday',
  '2014-01-01',
  '1:27',
  'am',
  'Marigold',
  'J. Roddy Walston and The Business'),
 ('Wednesday', '2014-01-01', '1:23', 'am', 'High Road', 'Cults'),
 ('Wednesday',
  '2014-01-01',
  '1:19',
  'am',
  'The Vampyre Of Time and Memory',
  'Queens of the Stone Age'),
 ('Wednesday',
  '2014-01-01',


# Collecting a years worth of data

## Step 1 - Identify the url pattern

The Current uses urls of the following pattern

    'http://www.thecurrent.org/playlist/2017-05-04/10'

or 

    'http://www.thecurrent.org/playlist/year-month-day/hour'

## Question: How might you generate all combinations for a given year?

**Answer:** Python has a tool for that!

In [58]:
numdays = 365
base = datetime.datetime.today()
dts = [base - datetime.timedelta(hours = h) for h in range(0, numdays*24)]

In [59]:
def output_address(dt):
    fmt = 'http://www.thecurrent.org/playlist/%Y-%m-%d/%H'
    return dt.strftime(fmt)

def test_output_address():
    date = datetime.datetime(2000,1,1,1)
    assert output_address(date) == 'http://www.thecurrent.org/playlist/2000-01-01/01'
test_output_address()

In [60]:
urls = [output_address(d) for d in dts]
urls[:10]

['http://www.thecurrent.org/playlist/2017-05-12/11',
 'http://www.thecurrent.org/playlist/2017-05-12/10',
 'http://www.thecurrent.org/playlist/2017-05-12/09',
 'http://www.thecurrent.org/playlist/2017-05-12/08',
 'http://www.thecurrent.org/playlist/2017-05-12/07',
 'http://www.thecurrent.org/playlist/2017-05-12/06',
 'http://www.thecurrent.org/playlist/2017-05-12/05',
 'http://www.thecurrent.org/playlist/2017-05-12/04',
 'http://www.thecurrent.org/playlist/2017-05-12/03',
 'http://www.thecurrent.org/playlist/2017-05-12/02']

In [None]:
from toolz import pipe
from toolz.curried import take
from more_itertools import side_effect, consume

with open('the_current_last_year.csv', 'w') as outfile:
    header = "Weekday,Date,Time,Period,Song_Title,Artist"
    print(header, file=outfile)
    count = 0
    for url in urls:
        for song_info in get_song_info(url):
            count += 1
            if count % 5 == 0:
                print("Processed {0} songs".format(count))
            join_info = ",".join(song_info)
            print(join_info, file=outfile)

Starting http://www.thecurrent.org/playlist/2017-05-12/11 urls




 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))


Starting http://www.thecurrent.org/playlist/2017-05-12/10 urls
Processed 5 songs
Processed 10 songs
Processed 15 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/09 urls
Processed 20 songs
Processed 25 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/08 urls
Processed 30 songs
Processed 35 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/07 urls
Processed 40 songs
Processed 45 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/06 urls
Processed 50 songs
Processed 55 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/05 urls
Processed 60 songs
Processed 65 songs
Processed 70 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/04 urls
Processed 75 songs
Processed 80 songs
Processed 85 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/03 urls
Processed 90 songs
Processed 95 songs
Processed 100 songs
Starting http://www.thecurrent.org/playlist/2017-05-12/02 urls
Processed 105 songs
Processed 110 songs
Processed 115