# Find Cities in Songs

1. Load json of Dylan songs/lyrics
2. Use the `spacey` package for named entity recognition (ner)
3. Cross reference results of ner with csv of cities and their coordinates, to produce csv with cities, lat/lon, and count of references in songs

## Load Song Data

In [1]:
import pandas as pd

In [2]:
# Load from my JSON
df = pd.read_json('data/songs.json')
df.set_index('title', inplace=True)

In [3]:
# Peak at first 8 entries
df.head(n=8)

Unnamed: 0_level_0,albums,author,lyrics,url
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
‘Cross The Green Mountain,"[The Bootleg Series, Vol 8: Tell Tale Signs]",,,https://bobdylan.com/songs/cross-green-mountain/
‘Til I Fell In Love With You,[Time Out Of Mind],Bob Dylan,"Well, my nerves are exploding and my body’s te...",https://bobdylan.com/songs/til-i-fell-love-you/
"10,000 Men",[Under The Red Sky],Bob Dylan,Ten thousand men on a hill\r\nTen thousand men...,https://bobdylan.com/songs/10000-men/
2 Dollars and 99 Cents,"[The Bootleg Series, Vol. 11: The Basement Tap...",Bob Dylan,,https://bobdylan.com/songs/2-dollars-and-99-ce...
2 X 2,[Under The Red Sky],Bob Dylan,"One by one, they followed the sun\r\nOne by on...",https://bobdylan.com/songs/2-x-2/
32-20 Blues,"[The Bootleg Series, Vol 8: Tell Tale Signs]",Robert Johnson,,https://bobdylan.com/songs/32-20-blues/
900 Miles from My Home,"[The Bootleg Series, Vol. 11: The Basement Tap...","Traditional, arranged by Bob Dylan",,https://bobdylan.com/songs/900-miles-my-home/
A Fool Such As I,"[Dylan, The Bootleg Series, Vol. 11: The Basem...",B. Abner,,https://bobdylan.com/songs/fool-such-i/


## Identify Places in Lyrics

Use `spacy` for named entity recognition

In [4]:
import spacy
nlp = spacy.load('en')

In [5]:
# We want to extract entities labeled as 'GPE'
spacy.explain('GPE')

'Countries, cities, states'

In [6]:
def ner_cleanup(doc):
    """Add post process pipeline to omit bad results"""
    doc.ents = [
        e for e in doc.ents
        if not(e.text.isspace() or
               "’" in e.text)]
    return doc

# Cleanup results after named entiry recognition
nlp.add_pipe(ner_cleanup, after='ner')

In [7]:
def extract_places(text):
    print('.', end='')  # little status report, for slow fcn call
    doc = nlp(text)
    return list(set([
        x.text.strip() for x in doc.ents
        if x.label_ == 'GPE']))

# Extract places from each set of lyrics,
# put resulting list in 'places' column
df['places'] = df.lyrics.apply(extract_places)
print("done")

.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................done


In [8]:
# see some results
df[df.places.astype(str) != '[]'].sample(n=8)

Unnamed: 0_level_0,albums,author,lyrics,url,places
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Baby, Let Me Follow You Down","[The Bootleg Series, Vol 4: Bob Dylan Live 196...",Eric von Schmidt,I first heard this from Ric von Schmidt. He li...,https://bobdylan.com/songs/baby-let-me-follow-...,[Cambridge]
Oxford Town,"[The Freewheelin’ Bob Dylan, The Original Mono...",Bob Dylan,"Oxford Town, Oxford Town\r\nEv’rybody’s got th...",https://bobdylan.com/songs/oxford-town/,[Mississippi]
Gonna Change My Way Of Thinking,[Slow Train Coming],Bob Dylan,Gonna change my way of thinking\r\nMake myself...,https://bobdylan.com/songs/gonna-change-my-way...,[Georgia]
"Rambling, Gambling Willie","[The Bootleg Series, Vol 9: The Witmark Demos:...",Bob Dylan,Come around you rovin’ gamblers and a story I ...,https://bobdylan.com/songs/rambling-gambling-w...,"[New Orleans, the Rocky Mountains, Mississippi]"
Floater (Too Much To Ask),[“Love And Theft”],Bob Dylan,Down over the window\r\nComes the dazzling sun...,https://bobdylan.com/songs/floater-too-much-ask/,"[Cumberland, Ohio, Tennessee]"
Hard Times In New York Town,"[The Bootleg Series, Vol 9: The Witmark Demos:...",Bob Dylan,"Come you ladies and you gentlemen, a-listen to...",https://bobdylan.com/songs/hard-times-new-york...,"[New York, Washington Heights, Oklahoma, the E..."
Tin Angel,[Tempest],Bob Dylan,It was late last night when the boss came home...,https://bobdylan.com/songs/tin-angel/,"[Husband, Insomnia]"
Billy 4,[Pat Garrett & Billy the Kid],Bob Dylan,There's guns across the river about to pound y...,https://bobdylan.com/songs/billy-4/,"[Gypsy, La Rio Pecas, El Paso]"


## Count place references

In [9]:
from collections import Counter, defaultdict

c = Counter()          # count appearances of each place
p = defaultdict(list)  # map places to songs

for title, place_list in df.places.iteritems():
    c.update(place_list)  
    [p[pl].append(title) for pl in place_list]  

### Merge place count data with city/cooridinates data

In [10]:
# Make place counts into df
places_df = pd.DataFrame(c.most_common(), columns=['city','cnt'])

# Make mapping of places to songs into df
song_map = pd.DataFrame(list(p.items()), columns=['city','songs'])

# Load city meta-data
city_meta_df = pd.read_csv('data/simplemaps-worldcities-basic.csv')

In [11]:
# Merge all dataframes together
city_df = pd.merge(places_df, song_map, on='city')
city_df = pd.merge(city_df, city_meta_df, on='city')

In [12]:
# For duplicate cities, drop the less populated one
city_df = (city_df
           .sort_values(by='pop')
           .drop_duplicates(subset='city', keep='last'))

In [13]:
# drop some columns
city_df.drop(labels=['city_ascii','pop'], axis=1, inplace=True)

In [14]:
# See results (sorted by count)
city_df.sort_values('cnt', ascending=False, inplace=True)
city_df.head(n=10)

Unnamed: 0,city,cnt,songs,lat,lng,country,iso2,iso3,province
0,New Orleans,6,"[Blind Willie McTell, Bob Dylan’s New Orleans ...",29.995002,-90.039967,United States of America,US,USA,Louisiana
6,London,4,"[Jack-A-Roe, Not Dark Yet, Something’s Burning...",51.499995,-0.116722,United Kingdom,GB,GBR,Westminster
4,Memphis,4,"[Gypsy Lou, Kingsport Town, Someone’s Got A Ho...",35.119987,-89.999995,United States of America,US,USA,Tennessee
1,El Paso,4,"[Billy 1, Billy 4, She’s Your Lover Now, Wante...",31.779984,-106.509995,United States of America,US,USA,Texas
3,San Francisco,4,"[California, Maybe Someday, She’s Your Lover N...",37.740008,-122.459978,United States of America,US,USA,California
8,New York,3,"[Hard Times In New York Town, Joey, Talkin’ Ne...",40.749979,-73.980017,United States of America,US,USA,New York
19,Boston,2,"[Highlands, Two Soldiers]",42.32996,-71.070014,United States of America,US,USA,Massachusetts
15,Tallahassee,2,"[Got My Mind Made Up, Wanted Man]",30.449988,-84.280034,United States of America,US,USA,Florida
18,Kansas City,2,"[High Water (For Charley Patton), Wanted Man]",39.107089,-94.604094,United States of America,US,USA,Missouri
21,Baltimore,2,"[The Lonesome Death Of Hattie Carroll, Tryin’ ...",39.29999,-76.619985,United States of America,US,USA,Maryland


In [15]:
# Save to csv
city_df.to_csv('data/city_counts.csv', index=False)