Word! Automating a Hip-hop word of the day blog
===============================================

Chris Ing, @jsci http://rapwords.tumblr.com (Soon: https://github.com/cing/rapwords/)
----------------------------------------------------------------------------
* * *
![Rap Words Post](assets/Advisory.png)
* * *


![Rap Words Logo](assets/RapWordsLogo.png)

![Rap Words Post](assets/RapWordsPost1.png)
![Rap Words Post](assets/RapWordsPost2.png)

Requirements
============

- standard library (re, glob, collections, html)
- pandas (http://pandas.pydata.org/), numpy
- wiktionaryparser (https://github.com/Suyash458/WiktionaryParser)
- spotipy (https://github.com/plamere/spotipy) / Google Data API (https://github.com/google/google-api-python-client)
- nltk (https://github.com/nltk/nltk)
- pypronouncing (https://github.com/aparrish/pronouncingpy) 
- pytumblr (Python3 fork) (https://github.com/jabbalaci/pytumblr) / oauthlib / oauthlib_requests

In [2]:
import pandas as pd
import numpy as np
import glob
import re
from collections import defaultdict

Loading Lyrics into Memory
==========================

In [3]:
from html.parser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

In [4]:
df_data = defaultdict(list)
for filename in glob.iglob('Lyrics/ohhla.com/*/*/*/*.txt', recursive=True):
    with open(filename, 'r', encoding = "ISO-8859-1") as f:
        stripped_lyrics = strip_tags(f.read())
        
        artist = re.search('Artist:\s*(.*)\s*\n', stripped_lyrics)
        song = re.search('Song:\s*(.*)\s*\n', stripped_lyrics)
        lyrics = re.search('Typed by:\s*(.*)\s*\n([\s\S]*)', stripped_lyrics)

        if artist is not None and song is not None and lyrics is not None:
            df_data["filename"].append(filename)
            df_data["artist"].append(artist.group(1))
            df_data["song"].append(song.group(1))
            df_data["lyrics"].append(lyrics.group(2).lower())  # group(1) is the transcriber

In [5]:
rap_data = pd.DataFrame(df_data)
rap_data.iloc[105:120]

Unnamed: 0,artist,filename,lyrics,song
105,2 Chainz f/ The Weeknd,Lyrics/ohhla.com/anonymous/2_chainz/TRUstory/l...,"[chorus: the weeknd]\ngirl, i'm just another b...",Like Me
106,2 Chainz,Lyrics/ohhla.com/anonymous/2_chainz/TRUstory/m...,[intro: 2 chainz]\n(yeah~!) i told them get on...,Money Machine
107,2 Chainz f/ Drake,Lyrics/ohhla.com/anonymous/2_chainz/TRUstory/n...,[intro]\n{*mike will made it*}\nyo.. t.r.u.! (...,No Lie
108,2 Chainz f/ Dolla Boy,Lyrics/ohhla.com/anonymous/2_chainz/TRUstory/s...,[intro: r&b samples]\nnothing in the whole wid...,Stop Me Now
109,2 Chainz f/ Lil Wayne,Lyrics/ohhla.com/anonymous/2_chainz/TRUstory/y...,[intro]\nyuck daddy! yuck!\nyuck daddy! yuck!\...,Yuck!
110,2 Hungry Bros. present 8thW1 f/ Janelle Renee,Lyrics/ohhla.com/anonymous/2_hungry/no_room/mo...,[intro]\ngoo-oh-oh-ohhh {*2x*}\n(c'mon people!...,More Go
111,2 Hungry Bros. present 8thW1,Lyrics/ohhla.com/anonymous/2_hungry/no_room/sh...,[intro/chorus - interpolation of krs-one]\nthe...,Short and Sweet
112,2 Hungry Bros. present 8thW1,Lyrics/ohhla.com/anonymous/2_hungry/no_room/sy...,"[8thw1]\nthe teacher, class in session\nthe be...",Say My Name Right
113,2 Hungry Bros. present 8thW1,Lyrics/ohhla.com/anonymous/2_hungry/no_room/ta...,[8thw1]\nwhy stress mic skills? i upgrade to l...,Talkin
114,2 Live Crew,Lyrics/ohhla.com/anonymous/2_live/2liveis/chec...,[mr. mixx]\ncheck it out y'all\nch-ch-check it...,Check it Out Y'all


In [6]:
rap_data.shape

(33371, 4)

Checking Words
====================

In [7]:
for artist, song, lyrics in zip(df_data["artist"],
                                df_data["song"],
                                df_data["lyrics"]):
    if "python" in set(lyrics.split()):
        print(artist, " - ", song)

50 Cent  -  Leave the Lights On
Ab-Soul f/ A-Mack, Punch, SZA  -  Dub Sac
Action Bronson f/ Big Body Bes, Mac Miller  -  Twin Peugots
The Ambassador f/ LaKia Wise  -  Get You Open
Azealia Banks f/ MJ Cole, Peter Rosenberg  -  Desperado
Beast 1333  -  113 Bars
Beast 1333  -  Anonymous
Big Daddy Kane  -  Uncut, Pure (Original and Remix)
Childish Gambino (Donald Glover)  -  Lights Turned On
D-12  -  American Psycho
Deltron 3030 (Del the Funky Homosapien)  -  Battle Song
Fat Joe f/ Busta Rhymes, DJ Khaled, Jadakiss, Miguel, Mos Def, Roscoe Dash, Kanye West  -  Pride N Joy
G. Dep f/ Kool G. Rap, Rakim  -  I Am
Ghostface Killah, Masta Killa, U-God, Raekwon, Cappadonna  -  Winter Warz
Hell Razah f/ Fokis, Killah Priest  -  Gladiators
Ice-T f/ EJ Evil E the Great, Trigger Tha Gambler (SMG)  -  Cramp Your Style (Live)
Ice-T  -  Cramp Your Style
Joe Budden  -  Roll Call
Killah Priest f/ Antonio Chance  -  Atoms to Adams
DJ KaySlay f/ Fat Joe, Raekwon, Scarface  -  I Never Liked Ya Ass
DJ KaySlay

In [8]:
for artist, song, lyrics in zip(df_data["artist"],
                                df_data["song"],
                                df_data["lyrics"]):
    if "anaconda" in set(lyrics.split()):
        print(artist, " - ", song)

2Pac f/ Outlawz  -  Runnin On E
2nd II None f/ DJ Quik, AMG, Hi-C, Playa Hamm  -  Got A Nu Woman
All City  -  Ded Right
Big Ed  -  Head Busta
Big K.R.I.T. f/ Ludacris  -  What U Mean
Big L f/ Stan Spit  -  Who You Slidin' Wit'
Big Punisher  -  The Dream Shatterer
Big Punisher  -  The Dream Shatterer (Original)
Canibus  -  Who Stopped Ya?
Chamillionaire  -  Set it Off Freestyle
Chamillionaire f/ Rasaq  -  Panky Rang
Clika One f/ Kurupt, Chiko Dateh, Don Cisco  -  Gangsta Pimpin'
Danny!  -  D.A.N.N.Y.
Daz Dillinger & JT the Bigga Figga f/ Kurupt, Rappin' 4-Tay  -  Sweet Love
E-40  -  If If Was a 5th
Flo Rida f/ Robin Thicke, Verdine White  -  I Don't Like It, I Love It
Jim Crow f/ Jazze Pha, Too $hort  -  That Drama (Baby's Momma)
Juvenile f/ Jay Da Menace, Kango Slim  -  Make U Feel Alright
Kanye West f/ 2 Chainz, Marsha Ambrosius, Big Sean, James Fauntleroy II  -  The One
Kool G Rap  -  A Thug's Love Story (Chapter I, II, III)
Kurupt f/ Lil 1/2 Dead  -  On, OnSite
LL Cool J f/ Mashonda

Finding Rare Words
==================

In [9]:
# The 1/3 million most frequent words, all lowercase, with counts. 
# http://norvig.com/ngrams/
ngrams=pd.read_csv("count_1w.txt",sep="\t",names=["word","count"])
ngrams.tail(n=20)

Unnamed: 0,word,count
333313,goooglo,12711
333314,gooogla,12711
333315,gooogd,12711
333316,gooofa,12711
333317,goooao,12711
333318,goollo,12711
333319,goolld,12711
333320,goolh,12711
333321,goolgee,12711
333322,googook,12711


In [10]:
# The Tournament Word List (178,690 words) -- used by North American Scrabble players.
# http://norvig.com/ngrams/
twl_wordlist=pd.read_csv("TWL06.txt",names=["word"])
twl_wordlist=pd.DataFrame(twl_wordlist["word"].str.lower())

twl_w_ngrams = pd.merge(twl_wordlist, ngrams)

In [11]:
twl_w_ngrams.sort_values(by="count").head(n=20)

Unnamed: 0,word,count
73785,triose,12711
59922,retinae,12712
35948,inhibitive,12714
57100,rasbora,12714
40668,loadstar,12714
2765,antemortem,12714
45565,munting,12714
72551,toits,12716
11210,cestus,12716
36139,insatiably,12717


In [12]:
np.sum(twl_w_ngrams["count"] < 400000)

51195

In [13]:
all_bigwords = twl_w_ngrams[twl_w_ngrams["count"] < 400000]["word"].values
all_big_bigwords = [word for word in all_bigwords if len(word) > 4]
bigset = set(all_big_bigwords)

print(all_big_bigwords[:100])

['aardvarks', 'aardwolf', 'aargh', 'abaca', 'aback', 'abaft', 'abandonments', 'abandons', 'abase', 'abased', 'abasement', 'abashed', 'abated', 'abatements', 'abates', 'abating', 'abattoir', 'abattoirs', 'abaxial', 'abaya', 'abbes', 'abbess', 'abbesses', 'abbeys', 'abbots', 'abbreviate', 'abbreviates', 'abbreviating', 'abdicate', 'abdicated', 'abdicates', 'abdicating', 'abdication', 'abdomens', 'abdominals', 'abduct', 'abductee', 'abductees', 'abducting', 'abductions', 'abductor', 'abductors', 'abducts', 'abeam', 'abecedarian', 'abele', 'abeles', 'abelia', 'aberrant', 'abets', 'abetted', 'abetting', 'abettor', 'abeyance', 'abhor', 'abhorred', 'abhorrence', 'abhorrent', 'abhors', 'abided', 'abides', 'abiogenesis', 'abiotic', 'abject', 'abjection', 'abjectly', 'abjuration', 'abjure', 'abjured', 'ablate', 'ablated', 'ablations', 'ablative', 'ablaze', 'abled', 'abler', 'ables', 'ablest', 'ablution', 'ablutions', 'abnegation', 'abodes', 'abolishes', 'abolishing', 'abolishment', 'abolitionism

In [14]:
df_data["matches"] = []
for lyrics in df_data["lyrics"]:
    found_words = set(lyrics.split()) & bigset
    df_data["matches"].append(found_words)

In [15]:
rap_data = pd.DataFrame(df_data)
rap_data = rap_data[rap_data.matches != set()]

In [16]:
rap_data.head(n=20)

Unnamed: 0,artist,filename,lyrics,matches,song
0,10 K.A.N.'s,Lyrics/ohhla.com/anonymous/10_kans/rm_bside/u_...,[* applause *]\n\n[dj]\nright about now we got...,"{fellas, skeet, fiends}",U Need Dick In Your Life
1,10sion,Lyrics/ohhla.com/anonymous/10sion/tension/lets...,-=talking=-\nlets get it on every time\nholler...,"{playas, holler, hypnotized}",Let's Get it On
2,113,Lyrics/ohhla.com/anonymous/113/dans_lur/ausumm...,* send corrections to the typist\n\n[refrain]\...,"{favelas, typist, prise, pavillon, spliff, dom...",Au Summum
3,1200 Techniques,Lyrics/ohhla.com/anonymous/1200tech/c_theory/e...,* send corrections to the typist\n\n*scratchin...,"{typist, showdowns, shipwrecked, emcees, sicke...",Eye of the Storm
4,1200 Techniques,Lyrics/ohhla.com/anonymous/1200tech/c_theory/w...,"[dj peril]\nnfamous - step, step, step, step, ...","{melanin, renegades, booed, befriending, albin...",Where Ur At?
5,1200 Techniques,Lyrics/ohhla.com/anonymous/1200tech/infinite/k...,* send corrections to the typist\n\n[prolouge]...,"{typist, showdowns, hopscotch, quitters, fickl...",Karma
6,12 O'Clock w/ Raekwon the Chef,Lyrics/ohhla.com/anonymous/12oclock/rm_bside/n...,"intro: raekwon the chef \n\nyeah yeah, that's ...","{jubilant, boneyard, livest, beefs, gleam, swo...",Nasty Immigrants *
8,1982 (Statik Selektah & Termanology) f/ Cassid...,Lyrics/ohhla.com/anonymous/1982/1982/goinback....,"(i'm goin back, back, back)\n\n[verse 1 - cass...","{backer, iliad, porky, caskets}",Goin Back
9,1982 (Statik Selektah & Termanology) f/ Lil' F...,Lyrics/ohhla.com/anonymous/1982/rm_bside/thuga...,"[intro]\nbrrrrrrrrrrrrrrrrr\nshow off, show of...","{hugger, mobsters}",Thugathon
10,1.4.0. Productions f/ Chapel,Lyrics/ohhla.com/anonymous/1pt_four/po_poets/f...,"[intro: chapel]\nlet me bless this shit, i'mma...","{fiends, deadliest, cocking}",Freestyle


Finding Rare Rap Words
======================

In [17]:
from collections import Counter
rap_onegrams = Counter()

for lyrics in df_data["lyrics"]:
    rap_onegrams.update(lyrics.split())

In [18]:
rap_onegrams_df = pd.DataFrame.from_dict(rap_onegrams, orient='index').reset_index()
rap_onegrams_df.columns = ["word","count"]

In [19]:
rap_onegrams_df.sort_values(by="count").tail(n=20)

Unnamed: 0,word,count
192727,is,112697
36746,get,112745
39709,we,129374
360668,your,131125
155504,with,137393
110645,of,141163
6432,like,151442
146271,that,163367
138161,on,174972
200839,me,175888


In [20]:
raptwl_df = pd.merge(twl_wordlist, rap_onegrams_df, on="word")
raptwl_df.head()

all_bigrapwords = set(raptwl_df[raptwl_df["count"] < 5]["word"].values)
len(all_bigrapwords)

19515

In [21]:
df_data["rap_matches"] = []
for lyrics in df_data["lyrics"]:
    found_words = set(lyrics.split()) & all_bigrapwords
    df_data["rap_matches"].append(found_words)

In [22]:
rap_data = pd.DataFrame(df_data)
rap_data = rap_data[rap_data.rap_matches != set()]
rap_data

Unnamed: 0,artist,filename,lyrics,matches,rap_matches,song
2,113,Lyrics/ohhla.com/anonymous/113/dans_lur/ausumm...,* send corrections to the typist\n\n[refrain]\...,"{favelas, typist, prise, pavillon, spliff, dom...","{cartes, favelas, prise, pavillon, haute, domi...",Au Summum
3,1200 Techniques,Lyrics/ohhla.com/anonymous/1200tech/c_theory/e...,* send corrections to the typist\n\n*scratchin...,"{typist, showdowns, shipwrecked, emcees, sicke...",{shipwrecked},Eye of the Storm
4,1200 Techniques,Lyrics/ohhla.com/anonymous/1200tech/c_theory/w...,"[dj peril]\nnfamous - step, step, step, step, ...","{melanin, renegades, booed, befriending, albin...","{albinos, befriending, constructing, swooned}",Where Ur At?
5,1200 Techniques,Lyrics/ohhla.com/anonymous/1200tech/infinite/k...,* send corrections to the typist\n\n[prolouge]...,"{typist, showdowns, hopscotch, quitters, fickl...","{dodgy, umbilicals}",Karma
6,12 O'Clock w/ Raekwon the Chef,Lyrics/ohhla.com/anonymous/12oclock/rm_bside/n...,"intro: raekwon the chef \n\nyeah yeah, that's ...","{jubilant, boneyard, livest, beefs, gleam, swo...","{jubilant, sweetened, boneyard, connives}",Nasty Immigrants *
8,1982 (Statik Selektah & Termanology) f/ Cassid...,Lyrics/ohhla.com/anonymous/1982/1982/goinback....,"(i'm goin back, back, back)\n\n[verse 1 - cass...","{backer, iliad, porky, caskets}",{iliad},Goin Back
9,1982 (Statik Selektah & Termanology) f/ Lil' F...,Lyrics/ohhla.com/anonymous/1982/rm_bside/thuga...,"[intro]\nbrrrrrrrrrrrrrrrrr\nshow off, show of...","{hugger, mobsters}",{hugger},Thugathon
10,1.4.0. Productions f/ Chapel,Lyrics/ohhla.com/anonymous/1pt_four/po_poets/f...,"[intro: chapel]\nlet me bless this shit, i'mma...","{fiends, deadliest, cocking}",{cloves},Freestyle
11,"1.4.0. Productions f/ Franky Botts, Molly-Q",Lyrics/ohhla.com/anonymous/1pt_four/po_poets/g...,"[molly-q]\ndart double click, through your car...","{detonate, godfathers, plutons}","{plutons, sirloins}",Godfathers
12,"1.4.0. Productions f/ Cheesey Rat, Crunch Lo, ...",Lyrics/ohhla.com/anonymous/1pt_four/po_poets/g...,"""only the gods could watch the earth twist"" ->...","{willies, caramels, nobly, wheezy, realest}","{caramels, nobly}",God Twist


In [23]:
#rap_data[rap_data["artist"].str.startswith("Kanye")]
#rap_data[rap_data["artist"].str.startswith("Wu-Tang") & rap_data["filename"].str.contains("enter")]
rap_data[rap_data["artist"].str.startswith("Nas") & rap_data["filename"].str.contains("illmatic")]

Unnamed: 0,artist,filename,lyrics,matches,rap_matches,song
23373,"Nas w/ AZ, The Firm",Lyrics/ohhla.com/anonymous/nas/illmatic/genesi...,*sound of a subway train going overhead*\n*in ...,"{coupes, takin, infantryman, trifle}",{infantryman},The Genesis
23375,Nas,Lyrics/ohhla.com/anonymous/nas/illmatic/memory...,"\t(check that shit)\n\taight fuck that shit, w...","{dingbats, reminisce, gassed, takin, trifle, s...","{dingbats, physiology}",Memory Lane (Sittin' in Da Park)
23376,Nas,Lyrics/ohhla.com/anonymous/nas/illmatic/nysomi...,"[intro: nas]\nyeah yeah, aiyyo black it's time...","{beepers, backtrack, snitch, gunfights, takin,...",{peepholes},N.Y. State of Mind
23379,Nas,Lyrics/ohhla.com/anonymous/nas/illmatic/rep.na...,"represent, represent!! (repeat 4x)\n\nstraight...","{guzzle, dissed, dweller, hillbillies, blunts}",{accelerator},Represent
23380,Nas w/ Pete Rock (uncredited chorus vocals),Lyrics/ohhla.com/anonymous/nas/illmatic/world_...,"""it's yours!"" --> t la rock\n\nchorus: nas, pe...","{amped, caved, phlegm, toothed}",{toothed},The World is Yours


In [24]:
print(rap_data.loc[23375]["matches"])
print(rap_data.loc[23375]["rap_matches"])

{'dingbats', 'reminisce', 'gassed', 'takin', 'trifle', 'shoelaces', 'fiends', 'ganja', 'overdoses', 'vexed'}
{'dingbats', 'physiology'}


In [25]:
word = "trifle"

* * *

Finding 'Good' Songs
====================

In [26]:
import spotipy
sp = spotipy.Spotify()
    
def test_track_search(sp, search_str):
    results = sp.search(q=search_str, type='track', limit=1)
    if len(results['tracks']['items']) > 0:
        print(results['tracks']['items'][0]['artists'][0]['name']," - ",
              results['tracks']['items'][0]['name'],
              results['tracks']['items'][0]['popularity'],)
    else:
        print("")

test_track_search(sp, 'Nas Memory Lane')

Nas  -  Memory Lane (Sittin' in da Park) 52


Finding 'Good' Songs with YouTube
=================================

In [27]:
# Activate Google Data API
DEVELOPER_KEY = ""

from apiclient.discovery import build
from datetime import datetime

def youtube_search(q):
  youtube = build("youtube", "v3", developerKey=DEVELOPER_KEY)

  search_response = youtube.search().list(
    q=q, type="video",
    part="id,snippet", maxResults=1
  ).execute()

  for search_result in  search_response.get("items", []):
      if search_result is not None:
          video_id = search_result["id"]["videoId"]
          date_posted = search_result["snippet"]["publishedAt"]
          results = youtube.videos().list(
                    part="statistics", id=video_id
                    ).execute()
  return (video_id,
          float(results["items"][0]["statistics"]["viewCount"]),
          float((datetime.now()-datetime.strptime(date_posted, "%Y-%m-%dT%H:%M:%S.000Z")).days))

In [28]:
youtube_data = youtube_search("Nas Memory Lane")
print(youtube_data)

('JXBFG2vsyCM', 5865358.0, 3402.0)


In [1]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/JXBFG2vsyCM?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')

* * *

Getting Definitions
===================

In [30]:
from wiktionaryparser import WiktionaryParser

def get_definition(word):
    parser = WiktionaryParser()
    worddef = parser.fetch(word)
    
    possible_defs = []
    for entymologies in worddef:
        for dd in entymologies["definitions"]:
            all_defs = re.sub(word+"\s*\u200e","",
                                  dd['text']).strip().split("\n")
            all_gdefs = [d for d in all_defs if re.match("^\(.*\)$",d) == None]
            possible_defs.append((dd['partOfSpeech'],
                                  all_gdefs))
    return possible_defs

In [31]:
get_definition("racket")

[('noun',
  ['(countable) A racquet: an implement with a handle connected to a round frame strung with wire, sinew, or plastic cords, and used to hit a ball, such as in tennis or a birdie in badminton.',
   '(Canada) A snowshoe formed of cords stretched across a long and narrow frame of light wood.',
   'A broad wooden shoe or patten for a man or horse, to allow walking on marshy or soft ground.']),
 ('verb', ['To strike with, or as if with, a racket.']),
 ('noun',
  ['A loud noise.',
   'A fraud or swindle; an illegal scheme for profit.',
   '(dated, slang) A carouse; any reckless dissipation.',
   '(dated, slang) Something taking place considered as exciting, trying, unusual, etc. or as an ordeal.'])]

In [32]:
get_definition("trifle")

[('noun',
  ['An English dessert made from a mixture of thick custard, fruit, sponge cake, jelly and whipped cream.',
   'An insignificant amount.',
   'Anything that is of little importance or worth.',
   'A particular kind of pewter.',
   '(uncountable) Utensils made from this particular kind of pewter.']),
 ('verb',
  ['(intransitive) To deal with something as if it were of little importance or worth.',
   '(intransitive) To act, speak, or otherwise behave with jest.',
   '(intransitive) To inconsequentially toy with something.',
   '(transitive) To squander or waste.'])]

Get Definitions from Context
============================

In [33]:
lyrics = rap_data.loc[23375]["lyrics"]
rap_sentence = [line for line in lyrics.split("\n") if word in line][0]
print(rap_sentence)

word to christ, a disciple of streets, trifle on beats


In [34]:
import nltk
split_sentence = nltk.word_tokenize(rap_sentence)
tagged_sentence = nltk.pos_tag(split_sentence,tagset="universal")
print(tagged_sentence)

[('word', 'NOUN'), ('to', 'PRT'), ('christ', 'VERB'), (',', '.'), ('a', 'DET'), ('disciple', 'NOUN'), ('of', 'ADP'), ('streets', 'NOUN'), (',', '.'), ('trifle', 'NOUN'), ('on', 'ADP'), ('beats', 'NOUN')]


Universal Part of Speech Tags
-----------------------------

- VERB - verbs (all tenses and modes)
- NOUN - nouns (common and proper)
- PRON - pronouns 
- ADJ - adjectives
- ADV - adverbs
- ADP - adpositions (prepositions and postpositions)
- CONJ - conjunctions
- DET - determiners
- NUM - cardinal numbers
- PRT - particles or other function words
- X - other: foreign words, typos, abbreviations

In [35]:
rap_sentence = "word to christ, a disciple of streets, I trifle on beats"
split_sentence = nltk.word_tokenize(rap_sentence)
tagged_sentence = nltk.pos_tag(split_sentence,tagset="universal")
print(tagged_sentence)

[('word', 'NOUN'), ('to', 'PRT'), ('christ', 'VERB'), (',', '.'), ('a', 'DET'), ('disciple', 'NOUN'), ('of', 'ADP'), ('streets', 'NOUN'), (',', '.'), ('I', 'PRON'), ('trifle', 'VERB'), ('on', 'ADP'), ('beats', 'NOUN')]


In [36]:
def get_definition_with_sentence(word, rap_sentence):
    split_sentence = nltk.word_tokenize(rap_sentence)
    tagged_sentence = nltk.pos_tag(split_sentence,
                                      tagset="universal")
    
    index_of_word = split_sentence.index(word)
    pos_of_word = tagged_sentence[index_of_word][1].lower()
    
    parser = WiktionaryParser()
    worddef = parser.fetch(word)

    possible_defs = []
    for entymologies in worddef:
        for dd in entymologies["definitions"]:
            part_of_speech = dd['partOfSpeech']
            all_defs = re.sub(word+"\s*\u200e","",
                                  dd['text']).strip().split("\n")
            all_gdefs = [d for d in all_defs if re.match("^\(.*\)$",d) == None]
            
            # Take the first definition that matches part of speech
            if part_of_speech == pos_of_word:
                return (dd['partOfSpeech'], all_gdefs)

    return ("N/A","N/A")

In [37]:
get_definition_with_sentence("trifle", rap_sentence)

('verb',
 ['(intransitive) To deal with something as if it were of little importance or worth.',
  '(intransitive) To act, speak, or otherwise behave with jest.',
  '(intransitive) To inconsequentially toy with something.',
  '(transitive) To squander or waste.'])

In [38]:
def get_definition_with_lyrics(word, lyrics):
    rap_sentence = [line for line in lyrics.split("\n") if word in line][0]
    return get_definition_with_sentence(word, rap_sentence)

* * *

Extract Surrounding Lines
=====================

In [39]:
lyrics_split=["Twinkle, twinkle, little star",
"How I wonder what you are",
"Up above the world so high",
"Like a diamond in the sky"]

In [40]:
def get_rhymegroup(target_word, lyrics_split):
    group_index_of_target = -1
    
    for groupid, line in enumerate(lyrics_split):
        if target_word in line:
            group_index_of_target = groupid

    if group_index_of_target > 0 and group_index_of_target < len(lyrics_split)-1:
        return lyrics_split[group_index_of_target-1:group_index_of_target+2]
    else:
        return "N/A"

In [41]:
get_rhymegroup("high", lyrics_split)

['How I wonder what you are',
 'Up above the world so high',
 'Like a diamond in the sky']

Extract Rhyming Couplet
=====================

In [42]:
import pronouncing

In [43]:
pronouncing.rhymes("star")[0:20]

['adar',
 'afar',
 'ahr',
 'ajar',
 'algar',
 'all-star',
 'allar',
 'almanzar',
 'almodovar',
 'alvare',
 'amar',
 'andujar',
 'aquilar',
 'ar',
 'are',
 'avelar',
 'azar',
 'azhar',
 'baar',
 'babar']

In [44]:
print(pronouncing.phones_for_word("high"))
print(pronouncing.phones_for_word("sky"))

['HH AY1']
['S K AY1']


In [45]:
print(pronouncing.phones_for_word("orange"))
print(pronouncing.phones_for_word("hinge"))

['AO1 R AH0 N JH', 'AO1 R IH0 N JH']
['HH IH1 N JH']


In [46]:
def rhymes_per_line(lyrics_split):
    rhymes = []
    for line in lyrics_split:
        words = line.strip().split()
        last_word = words[-1].strip('.,?!;:')
        last_word_p = pronouncing.phones_for_word(last_word)
        if len(last_word_p) > 0:
            rhymes.append((pronouncing.rhyming_part(last_word_p[0]),line))
    return rhymes

In [47]:
rhymes_per_line(lyrics_split)

[('AA1 R', 'Twinkle, twinkle, little star'),
 ('AA1 R', 'How I wonder what you are'),
 ('AY1', 'Up above the world so high'),
 ('AY1', 'Like a diamond in the sky')]

* * *

In [48]:
lyrics_split = [line for line in df_data["lyrics"][23375].split("\n") if len(line) > 0]
print(lyrics_split[10:-30])

['i rap for listeners, blunt heads, fly ladies and prisoners', 'henessey holders and old school niggaz, then i be dissin a', 'unofficial that smoke woolie thai', 'i dropped out of kooley high, gassed up by a cokehead cutie pie', "jungle survivor, fuck who's the liver", 'my man put the battery in my back, a differencem from energizer', 'sentence begins indented.. with formality', "my duration's infinite, moneywise or physiology", "poetry, that's a part of me, retardedly bop", 'i drop the ancient manifested hip-hop, straight off the block', 'i reminisce on park jams, my man was shot for his sheep coat', 'childhood lesson make me see him drop in my weed smoke', "it's real, grew up in trife life, did times or white lines", 'the hype vice, murderous nighttimes, and knife fights invite crimes', 'chill on the block with cog-nac, hold strap', "with my peeps that's into drug money, market into rap", 'no sign of the beast in the blue chrysler, i guess that means peace', 'for niggaz no sheisty vi

In [49]:
target_word = "trifle"

In [50]:
def get_rhymes(lyrics_split):
    lines_with_rhyming_parts = list()
    for line in lyrics_split:
        words = line.split()
        last_word = words[-1].strip('.,?!;:') # .strip() to remove any punctuation
        last_word_p = pronouncing.phones_for_word(last_word)
        
        if len(last_word_p) > 0:
            if len(last_word_p) > 1:
                last_word_p = [last_word_p[0],]
                
            for phones in last_word_p:
                rhyming_part = pronouncing.rhyming_part(phones)
                line_with_part = [rhyming_part[:2], line]
                #print(line_with_part)
                lines_with_rhyming_parts.append(line_with_part)
        else:
            line_with_part = ["N/A", line]
            lines_with_rhyming_parts.append(line_with_part)
            
    return lines_with_rhyming_parts

In [51]:
rhyme_lines = get_rhymes(lyrics_split)

In [52]:
rhyme_lines[40:60]

[['OW', 'i rap divine gods check the prognosis, is it real or showbiz?'],
 ['OW', 'my window faces shootouts, drug overdoses'],
 ['IY', 'live amongst no roses, only the drama, for real'],
 ['N/A', 'a nickel-plate is my fate, my medicine is the ganja'],
 ['EY', "here's my basis, my razor embraces, many faces"],
 ['EY', 'your telephone blowin, black stitches or fat shoelaces'],
 ['OW', "peoples are petrol, dramatic automatic fo'-fo' i let blow"],
 ['OW', "and back down po-po when i'm vexed so"],
 ['AE', "my pen taps the paper then my brain's blank"],
 ['AE', 'i see dark streets, hustlin brothers who keep the same rank'],
 ['EY', 'pumpin for somethin, some uprise, plus some fail'],
 ['EY', 'judges hangin niggaz, uncorrect bails, for direct sales'],
 ['EY', 'my intellect prevails from a hangin cross with nails'],
 ['IY', "i reinforce the frail, with lyrics that's real"],
 ['IY', 'word to christ, a disciple of streets, trifle on beats'],
 ['IY', 'i decifer prophecies through a mic and say p

In [53]:
import itertools
import operator

grouped_rhymes = []
for key,group in itertools.groupby(rhyme_lines, operator.itemgetter(0)):
    merged_group = [g[1] for g in group]
    grouped_rhymes.append(list(merged_group))

In [54]:
grouped_rhymes[15:35]

[['chill on the block with cog-nac, hold strap',
  "with my peeps that's into drug money, market into rap"],
 ['no sign of the beast in the blue chrysler, i guess that means peace'],
 ['for niggaz no sheisty vice to just snipe ya'],
 ['start off the dice-rollin mats for craps to cee-lo ',
  'with sidebets, i roll a deuce, nothin below (peace god!)'],
 ['peace god -- now the shit is explained',
  "i'm takin niggaz on a trip straight through memory lane"],
 ["it's like that y'all .. it's like that y'all .. it's like that y'all"],
 ['chorus: repeat scratches 4x',
  '"now let me take a trip down memory lane" -> bizmarkie',
  '\t"comin outta queensbridge"',
  '[nas]'],
 ['one for the money'],
 ['two for pussy and foreign cars',
  'three for alize niggaz deceased or behind bars'],
 ['i rap divine gods check the prognosis, is it real or showbiz?',
  'my window faces shootouts, drug overdoses'],
 ['live amongst no roses, only the drama, for real'],
 ['a nickel-plate is my fate, my medicine is 

In [55]:
def rhymegroup(target_word, grouped_rhymes):
    group_index_of_target = -1
    
    for groupid, rhymes in enumerate(grouped_rhymes):
        #print(groupid, rhymes)
        for line in rhymes:
            if target_word in line:
                group_index_of_target = groupid
    
    if group_index_of_target != -1:
        return grouped_rhymes[group_index_of_target]
    else:
        return "N/A"

In [56]:
rhymegroup("trifle", grouped_rhymes)

["i reinforce the frail, with lyrics that's real",
 'word to christ, a disciple of streets, trifle on beats',
 'i decifer prophecies through a mic and say peace.']

In [57]:
def rhymegroup_from_word(word, lyrics):
    lyrics_split = [line for line in lyrics.split("\n") if len(line) > 0]
    grouped_rhymes = []
    for key,group in itertools.groupby(rhyme_lines, operator.itemgetter(0)):
        merged_group = [g[1] for g in group]
        grouped_rhymes.append(list(merged_group))
    return rhymegroup(word, grouped_rhymes)

* * *

Posting on Tumblr
=================

In [58]:
# Get API key, https://www.tumblr.com/docs/en/api/v2 and do OATHv1
tumblr_client = ''
tumblr_secret = ''
access_key    = ''
access_secret = ''

import pytumblr

user = pytumblr.TumblrRestClient(
    tumblr_client, tumblr_secret,
    access_key, access_secret)

In [59]:
def post_template(word, part_of_speech, worddef, lyrics, artist, song, youtube=None):
    
    post = '''<p><a href="http://en.wiktionary.org/wiki/{}">{}</a> '''.format(word, word)
    post += '''- {} -\xa0 {}</p>'''.format(part_of_speech, worddef)

    post += "<p>"
    for line in lyrics:
        if word in line:
            post += line.replace(word,"<b>"+word+"</b>")+"<br />"
        else:
            post += line+"<br />"
    post += "</p>"
    
    if youtube is not None:
        post += '''<p>-{} on\xa0“'''.format(artist)
        post += '''<a href="https://www.youtube.com/watch?v={}">{}</a>”</p>'''.format(youtube,
                                                                                      song)
    else:
        post += '''<p>-{} on\xa0“{}”</p>'''.format(artist,song)        
    
    return post

Puttin' it Together
===================

In [60]:
print(rap_data.loc[23375])

artist                                                       Nas
filename       Lyrics/ohhla.com/anonymous/nas/illmatic/memory...
lyrics         \t(check that shit)\n\taight fuck that shit, w...
matches        {dingbats, reminisce, gassed, takin, trifle, s...
rap_matches                               {dingbats, physiology}
song                            Memory Lane (Sittin' in Da Park)
Name: 23375, dtype: object


In [61]:
word = "trifle"
artist = rap_data.loc[23375].artist
song = rap_data.loc[23375].song

part_of_speech, worddef = get_definition_with_lyrics(word,
                                                       rap_data.loc[23375].lyrics)
lyrics = rhymegroup_from_word(word, rap_data.loc[23375].lyrics)
youtube = youtube_search(artist + " " + song)[0]

slug = word+"-"+part_of_speech+"-"+artist

#print(word, part_of_speech, worddef[0], lyrics, artist, song, youtube)

In [62]:
post_body = post_template(word, part_of_speech, worddef[0], lyrics, artist, song, youtube)

In [63]:
user.create_text("rapwords",
                 format="html",
                 state="published",
                 slug=slug,
                 body=post_body,
                 tags=[part_of_speech],)

{'id': 153129975615}

Peace
=====

![Rap Trifle](assets/RapTrifle.png)