# Got Raps?

## Background

I am someone who *loves* to music. One of my favorite genres to listen to is hip-hop/rap because of, in some cases, the focus on lyricism. 

In recent years, there has been less of a focus in this regard, leading some fans of "true hip-hop" to look down upon the musicians currently in the spotlight. Part of the criticism of modern rap is how forgettable the lyrics and songs are due to their simplistic nature. 

While learning about *recurrent neural networks* (RNNs)I came across a [video](https://www.youtube.com/watch?v=ZMudJXhsUpY) by Laurence Moroney explaining how AI can be used to generate poetry after training upon a corpus of Irish poems. This sparked an idea to try and do the same with a modern song lyrics. I specifically decided to choose rap because of my familiarity with the genre and thinking that the songs may have more words since the artists are not singing (most of the time).

I will be scraping lyrics from [azlyrics](https://www.azlyrics.com/t/tyga.html) using BeautifulSoup and Keras to create my RNN.

supplemental site https://www.allthelyrics.com/lyrics/tyga
alternative https://www.lyricsbox.com/tyga-lyrics-hdgvd.html

## Imports

In [14]:
import requests
import regex
import pickle
import time
import pandas as pd
import numpy as np
import functions as dlf
from bs4 import BeautifulSoup
from importlib import reload

In [27]:
reload(dlf)

<module 'functions' from 'C:\\Users\\d_ful\\Documents\\GitHub\\Rap_Generator\\functions.py'>

In [35]:
#### WRITING ####
# with open('Pickles/song_dict.pickle', 'wb') as f:
#     pickle.dump(song_dict, f)
#     f.close()

#### READING ####
with open('Pickles/song_dict.pickle', 'rb') as f:
    song_dict = pickle.load(f)
    f.close()


#### WRITING ####
# with open('Pickles/album_dict.pickle', 'wb') as f:
#     pickle.dump(album_dict, f)
#     f.close()

#### READING ####
# with open('Pickles/album_dict.pickle', 'rb') as f:
#     album_dict = pickle.load(f)
#     f.close()

## Getting Soupy

#### Connecting

In [2]:
## Starting page + Q.C. of response
# start_url = 'https://www.azlyrics.com/t/tyga.html'  ## Banned
# start_url = 'https://www.lyricsbox.com/tyga-lyrics-hdgvd.html' ## Doesn't work
start_url = 'https://www.allthelyrics.com/lyrics/tyga'
start_resp = requests.get(start_url)
print(f'Starting Response: {start_resp}')

Starting Response: <Response [200]>


#### Main Page Soup

In [3]:
## Creating soup + Q.C.
start_soup = BeautifulSoup(start_resp.text, 'html.parser')
print(start_soup.prettify()[13900:14500])

            TYGA lyrics: 'Never Be The Same', 'Hookah (feat. Young Thug)', 'Riot (feat. Honey Cocaine)', 'Make It Nasty', 'We Up'
              </div>
              <div class="content-top-lyricspopular">
               <div class="artist-lyrics-list artist-lyrics-list-popular">
                <h2 class="artist-lyrics-list-h2">
                 Top 5 songs
                </h2>
                <ol>
                 <li class="lyrics-list-item lyrics-list-item-1737736">
                  <a href="/lyrics/tyga-never_be_the_same">
                   Never Be The Same
                  </a>
     


#### Collecting Song Links + Names

In [None]:
## Create container for names/links + find links ('a') in soup
song_dict = {}
link_list = start_soup.findAll('a')

## Select and store links to Tyga songs
for a in link_list:
    if '/lyrics/tyga-' in a['href']:
        song_dict[a.text] = a['href']

## Q.C.
display(song_dict)

In [4]:
## Create iterable with song names for operations
song_names = list(song_dict.keys())

#### Collecting Lyrics - Test

In [4]:
#### Testing strategy for lyric removal ####

## Join strings to create full song URL
end_url = song_dict['Hookah (feat. Young Thug)']
start_url = 'https://www.allthelyrics.com'
full_url = start_url + end_url

## Generating soup
# song_resp = requests.get(full_url)
song_soup = BeautifulSoup(song_resp.text, 'html.parser')

## Q.C. + HTML structure info
print("Example of target 'div' and section names:\n")
print(song_soup.prettify()[12110:12210], '\n\t...')
print(song_soup.prettify()[13823:14000])

## Collect lyrics 'div' from song soup as a bs4 tag + store in song dict
lyrics = song_soup.findAll('div', attrs={'class': 'content-text-inner'}).pop()

song_dict['Hookah (feat. Young Thug)'] = lyrics

Example of target 'div' and section names:

       <div class="content-text-inner">
              <p>
               [Hook x4 - Young Thug:]
    
	...
               [Verse 2 - Tyga:]
               <br/>
               Rubbin on my chain blowing cloudmatic
               <br/>
               Smoke something with a G and bend 


#### Collecting Lyrics

In [33]:
dlf.song_scraper(song_dict, song_names, verbose=False)

----------------------------------------
Song to be scraped: B.M.F.
Something wrong with link for B.M.F.
----------------------------------------
Song to be scraped: Lil Homie (feat. Pharrell)
Sleeping 278.42 seconds...
Ding!
----------------------------------------
----------------------------------------
Song to be scraped: LovaGain
Sleeping 241.16 seconds...
Ding!
----------------------------------------
----------------------------------------
Song to be scraped: Love Game
Sleeping 226.26 seconds...
Ding!
----------------------------------------
----------------------------------------
Song to be scraped: Love T-Raww
Sleeping 285.88 seconds...
Ding!
----------------------------------------
----------------------------------------
Song to be scraped: Luv Dem
Sleeping 181.54 seconds...
Ding!
----------------------------------------
Total number of songs skipped: 101


In [34]:
dlf.song_scraping_stats(song_dict)

{'tag': 106, 'string': 77}

In [166]:
len(song_names)

183

## Lyric Filtering

In [7]:
for t in lyrics[0]:
    try:
        if 'Tyga' in t.text:
            print(t.text)
            print(10*'--')
    except AttributeError:
        print(10*'**')        

********************
********************
********************
[Verse 2 - Tyga:]
Rubbin on my chain blowing cloudmatic
Smoke something with a G and bend that ass backwards
Lay back relax and talk mathematics
Later on we test a little sex practice
Write my name on the wall
Money in the mattress bet she wanna get involved
She hopped on the blunt said 'Where the hookah y'all? '
I tell her pass back if the shit too strong
It's all set; Mozart art on the blog so wet
I got a where I want her and I ain't done yet
Looking at your future baby put down the cigarette
Come hop on this kush jet and take flight
Tell em bitches 'mmm fuck that'
You wanna lay in bed I got a magic carpet for that ass yes
I'm also on one
Got two Olsen's on me
Big homie
Young Thugger Thugger rolling
Rolls Royce so shorty
--------------------
********************
********************
[Verse 3 - Tyga:]
Ben Frank baddies in the Benz waggin'
You know she want a ride home hop on the band wagon
I got the chain saggy
You know th

In [10]:
for t in lyrics[0]:
    print(t)

<p>[Hook x4 - Young Thug:]<br/>
Baby pass me the hookah<br/>
Pass me the hookah<br/>
Pass the hookah<br/>
Pass the hookah</p>


<p>[Verse 1 - Young Thug:]<br/>
Tearing up the place<br/>
I'm a rich nigga got chanel on my waist<br/>
Run up on me playin I'm a aim it at ya face<br/>
And that go for anybody... anyways<br/>
I'm a rich blood by the way<br/>
And I have swag roll it all like a tape<br/>
Fish scale, yeah I got these bitches on the bait<br/>
Don't wanna talk, man I said I need some space (woop woop woop)<br/>
My new car get geeked up<br/>
I just paid a cop, now I'm running out of court<br/>
Panoramic top, I'm a put it on the rocks<br/>
Crawl, walk and hop, got all these bitches shocked<br/>
Stone molly whiter than my socks<br/>
I don't, I don't wanna talk if it ain't 'bout guap<br/>
Clowning you niggas I see you flop<br/>
I buy pints by the 2 no Pac</p>


<p>[Hook x7 - Young Thug]</p>


<p>[Verse 2 - Tyga:]<br/>
Rubbin on my chain blowing cloudmatic<br/>
Smoke something with a G 

#### AZ Lyrics

In [51]:
## THIS IS FOR AZLYRICS ##

## Creating soup + Q.C.
start_soup = BeautifulSoup(start_resp.text, 'html.parser')
print(start_soup.prettify()[8422:8946])

      <div class="album" id="9188">
       album:
       <b>
        "No Introduction"
       </b>
       (2008)
      </div>
      <div class="listalbum-item">
       <a href="../lyrics/tyga/diamondlife.html" target="_blank">
        Diamond Life
       </a>
      </div>
      <div class="listalbum-item">
       <a href="../lyrics/tyga/coconutjuice.html" target="_blank">
        Coconut Juice
       </a>
      </div>
      <div class="listalbum-item">
       <a href="../lyrics/tyga/supersizeme.html" target="_blank">
 


### Linking Up

In [8]:
## Collecting all album/song titles + song links
albums_songs = start_soup.findAll('div', attrs={'class': ['album', 'listalbum-item']})

In [16]:
## Agg. each album into a dict of a dict {Album: {song:song_link,...}}
album_mid = dlf.album_aggregator(albums_songs)
## Extra function needed for sorting 'Other songs'
album_dict = dlf.legendary_album_splitter(album_mid)

1st Album!
Empty list!


#### Link Scraping

**Albums Done**
* No Introduction
* The Potential
* Fan Of A Fan
* Well Done
* Black Thoughts Vol. 2
* Well Done 2
* #BitchImTheShit
* Careless World: Rise Of The Last King
* Well Done 3
* 187
* Hotel California
* Well Done 4
* Fan Of A Fan: The Album
* The Gold Album: 18th Dynasty
* Fuk Wat They Talkin Bout
* Rawwest Nigga Alive
* Bitch I'm The Shit 2
* Bugatti Raww
* Kyoto
* Legendary
* Other Songs

**STRATEGY FOR SCRAPING LYRICS**
* Copy `album_dict` into `res_dict`
* Iterate through each album:
    * Iterate through each song:
        * Scrape using link (**NEED TO CHECK FORMAT**)
        * Replace link w/scraped soup
* Return *copied* `res_dict`

In [33]:
# album_names = []
# for key in album_dict:
#     album_names.append(key)

albums_w_lyrics = album_dict.copy()

In [37]:
# alb_num = 0
# song_start_url = 'https://www.azlyrics.com'
# test_album = album_dict[album_names[alb_num]].copy()

def song_collector(start_url, dict_, alb_num=0, sleep_time=30, verbose=True):
    
    # import requests
    # from bs4 import BeautifulSoup
    
    ## Create list for iteration
    list_of_albums = []
    for album in dict_:
        list_of_albums.append(album)
    
    ## Setting album for use    
    album = list_of_albums[alb_num]
    
    ## Create and retrieve soup from full song URL
    for song in dict_[album]:
        end_url = dict_[album][song]
        full_url = start_url + end_url
        resp = requests.get(full_url)
        song_soup = BeautifulSoup(resp.text, 'html.parser')
        
        ## Optional display
        if verbose:
            print(f'Song: {song}')    
    
    ## Select divs from soup including lyrics
    song_lyrics = song_soup.findAll('div', attrs={'class': None})
    
    ## Store single div containing lyrics only (div w/o an id attr)
    for tag in song_lyrics:
        try:
            _ = tag['id']
        except KeyError:
            t_tag = tag
            ## Optional display
            if verbose:
                print('Lyrics!')
    
    ## Store
    dict_[album][song] = t_tag
    
    ## Optional display
    if verbose:
        print('Sleeping...')
        print('--'*30)
    
    ## Pacing for URL calls
    time.sleep(10)

    return dict_
    
# song_res = []
# song_dump = []
# switch = False

In [38]:
albums_w_lyrics['No Introduction']['Diamond Life']

'/lyrics/tyga/diamondlife.html'

In [36]:
albums_w_lyrics = song_collector('https://www.azlyrics.com', albums_w_lyrics, alb_num=0)

/lyrics/tyga/diamondlife.html
Song: Diamond Life
/lyrics/tyga/coconutjuice.html
Song: Coconut Juice
/lyrics/tyga/supersizeme.html
Song: Supersize Me
/lyrics/tyga/dontregretitnow.html
Song: Don't Regret It Now
/lyrics/tyga/pillowtalkin.html
Song: Pillow Talkin'
/lyrics/tyga/aim.html
Song: AIM
/lyrics/tyga/firsttimers.html
Song: First Timers
/lyrics/tyga/cartoonz.html
Song: Cartoonz
/lyrics/tyga/summertime.html
Song: Summertime
/lyrics/tyga/press7.html
Song: Press 7
/lyrics/tyga/woww.html
Song: Woww
/lyrics/tyga/2am.html
Song: 2 AM
/lyrics/tyga/est80sbaby.html
Song: Est. (80's Baby)
/lyrics/tyga/iam.html
Song: I Am(iTunes Bonus Track)
Lyrics!


NameError: name 'list_of_keys' is not defined

In [None]:
tester = t_tag.findAll('i')

for tag in tester:
    print(tag.text)

In [12]:
albums_w_lyrics['No Introduction']['Diamond Life']

<div>
<!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->
<i>[Chorus: Patty Cash]</i><br/>
Diamond Life<br/>
Sugar baby we dynamite<br/>
Playboy and socialites<br/>
Young and Fly Fly Fly<br/>
<br/>
<i>[Verse 1: Tyga]</i><br/>
1989 no pressure<br/>
But to be the bestest in my section<br/>
Levels of a professional<br/>
Skip school create my own lessons<br/>
Confessions of a mad rapper<br/>
Music got me rapping<br/>
Green stretching only leads to red stretches<br/>
He's next in line for the blessing<br/>
Get your mind off minds<br/>
Hustle something and stop relying on mine<br/>
More then a legend why you letting time fly by<br/>
At age 17 addicted to ink<br/>
A rap fein who had money dreams<br/>
My chase of fame couldn't compare to what I seen<br/>
Them die government lie hard for the paper cheese<br/>
Moms crying watching her only son<br/>
Through TV mtv bet<br/>
He on now wipe me down<br/>
No longer fight

In [35]:
test_start_url = 'https://www.azlyrics.com'
# print(test_start_url)
test_album = album_dict['Well Done 4']
test_end_url = test_album['Bang Out']
# print(test_end_url)
test_url = test_start_url + test_end_url
print(test_url)
# test_resp = requests.get(test_url)
print(f'Starting Response: {test_resp}')
test_soup = BeautifulSoup(test_resp.text, 'html.parser')
# test_soup

https://www.azlyrics.com/lyrics/tyga/bangout.html
Starting Response: <Response [200]>


In [34]:
test_lyrics = test_soup.findAll('div', attrs={'class': None})
test_res = []
test_dump = []
switch = False
for tag in test_lyrics:
    try:
        _ = tag['id']
    except KeyError:
        t_tag = tag
        print('Lyrics!')
        
tester = t_tag.findAll('i')

for tag in tester:
    print(tag.text)

Lyrics!
[Verse 1: Tyga]
[Hook]
[Verse 2: Tyga]
[Hook]
[Verse 3: Eazy-E]
[Outro: Ice Cube]


In [12]:
test_lyrics

[<div id="RTK_67Y3"></div>, <div>
 <!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->
 <i>[Verse 1: Tyga]</i><br/>
 Hold up, money talk so you know what?<br/>
 Ainât nothing to talk about, you ainât got enough cuz<br/>
 Rock star drugs break a bitch heart, no love<br/>
 Emma Watts, Charlie Sheen, fuckin with no scrub<br/>
 Niggas want connects, got no plugs<br/>
 Nigga say they high, got no buzz<br/>
 Popsicle niggas wanna talk shit then say you froze up<br/>
 Young niggas wanna pop pills, just po up<br/>
 Went on a bang, Went on a bang<br/>
 Bitches came for me and my nigga eazy<br/>
 Threw that bitch out, got that ho one way<br/>
 Said she tryna stay, told that bitch no way<br/>
 Thatâs a preme nigga, B ripper, grim reaper<br/>
 I donât get mad bitch, I just get even<br/>
 T-Raw magician, I donât gotta trick or treat it<br/>
 That Ferrari California make a bitch a believer<br/>
 <br/>
 <i>[Ho

#### Link Saving

In [32]:
#### WRITING ####
# with open('Pickles/album_dict.pickle', 'wb') as f:
#     pickle.dump(album_dict, f)
#     f.close()

#### READING ####
with open('Pickles/album_dict.pickle', 'rb') as f:
    album_dict = pickle.load(f)
    f.close()