# Artist WordClouds

## Skills Shown:
> <font color='red'> Web Scraping </font> <br>
> <font color='blue'>  Data Visualization </font> <br>
> <font color='green'> Natural Language Processing </font> <br>

### As an avid music listener, I have always had a deep appreciation for great lyrics. In my opinion, an artist's lyrics can act like a mirror into his/her soul. Websites like Genius do a fantastic job of providing lyrics for a great sum of artists. However, at this point, the website does not provide the functionality for users to summarize their favorite artist's lyrics. 
### Understanding this problem, I set out to find out a way to condense an artist's lyrics. After much thought, I figured the best way to do this would be by creating a word cloud. For those who are unaware, a Wordcloud "is a novelty visual representation of text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.[2]" (Wikipedia)

In [1]:
#Importing Dependencies
import re
import requests
import pandas as pd
from bs4 import BeautifulSoup
import lyricsgenius as genius
import wikipedia
import nltk
from nltk.tokenize import word_tokenize
import string
from nltk.corpus import stopwords
import matplotlib.pyplot as plt
from collections import Counter
import numpy as np
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import random
%matplotlib inline
nltk.download("stopwords")

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/barbarapunturo/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Part 1
### Initializing the Genius API. Genius offers a simple API for accessing Artist and lyrics from its website (more information can be found here: https://docs.genius.com/). Thus, I found that it made the most sense to get the lyrics from Genius.

In [2]:
api = genius.Genius('Your_Own_API_Key')
artist = api.search_artist('Kanye West', max_songs=4)

Searching for songs by Kanye West...

Song 1: "'03 Electric Relaxation"
Song 2: "18 Years"
Song 3: "1996 Fat Beats Freestyle"
Song 4: "2004 Tim Westwood Freestyle"

Reached user-specified song limit (4).
Found 4 songs.
Done.


In [3]:
#Adding common stopwords
stopword=set(stopwords.words('english'))
words=["thats","yeah","i'm","got","get"]
for x in words:
    stopword.add(x)

## Part 2
### Scraping Song Titles
### One of the problems with Genius is that the website has too much information. For many artists, there are logs for their appearances on Guest Shows (e.g. not their music). For the case of this project, this information is not useful. So before I use the API, I have to scrape an artist's song titles from the web. I chose to scrape from two websites because for some artists, songfacts.com has more songs. To be as thorough as possible, I want to have the most the maximum amount of lyrics from an indivudal artist.

In [4]:
def get_songs(artist_name):
    """ Input the name of an artist.
    This function returns a list of songs from that artist scraped from songfacts.com"""
    original=artist_name
    if (' ' in artist_name) == True:
        artist_name=str.lower(artist_name.replace(" ","_"))
    else:
        artist_name=str.lower(artist_name)
    result = requests.get("http://www.songfacts.com/artist-{}.php".format(artist_name))
    soup = BeautifulSoup(result.content)
    songs=[]
    for string in soup.strings:
        songs.append(repr(string))
    index=songs.index("'List of songs by {}'".format(original))
    songs_indexed=songs[index+1:]
    end_point=songs_indexed.index("'\\n'")
    final_list=songs_indexed[:end_point]
    return final_list

In [5]:
def get_other_songs(artist_name):
    original_artist=artist_name
    if (' ' in artist_name) == True: 
        artist_name=str.lower(artist_name.replace(" ",""))
    else: 
        artist_name=str.lower(artist_name)
    table=pd.read_html("http://www.song-list.net/{}/songs".format(artist_name))[1]
    return table[0].values

# Part III
### Getting the Lyrics. Once I captured the song titles, I needed to capture the lyrics for these songs. Using the Genius API, I searched  for every song indivudally. Due to the fact I may have upwards of 100 songs to search for, this takes a good chunk of time to run.

In [6]:
def get_lyrics(list_of_songs,artist_yes):
    """Input a list of songs, and the name of the artist for which the list of songs is by.
    This function returns a DataFrame which contains the lyrics for all of those songs. """
    artist = api.search_artist(artist_yes, max_songs=1)
    for song in list_of_songs:
        try:
            search = api.search_song(song,artist.name)
            artist.add_song(search)
        except:
            print ("Not the first result")
    artist.save_lyrics()



# Part IV
### Cleaning the Data
### Ok, almost done now. I have now gotten the most time-consuming task out of the way. However, now I need to process the language. The lyrics sourced from Genius contain punctuation, sentences and capital-letters. In order to get the most accurate results, I must break these sentences down into individual words and make sure all word are lower-case.


In [7]:
def clean_it(p):
    """ This function takes in a dataframe. It cleans up all of the lyrics for that dataframe.
    It returns a list of all lyrics in a list. They are lowercased and all stopwords have been removed."""
    final=[]
    for number in range(len(p)):
        example_text=p["songs"][number]["lyrics"]
        clean_text=example_text.replace("\n"," ")
        clean_text=re.sub('\[(.*?)\]',"",clean_text)
        list_it=clean_text.split(" ")
        list_words = [word for word in list_it if x != ""]
        lower_words=[str.lower(x) for x in list_words]
        new_term_vector=[]
        for word in lower_words:
            if not word in stopword:
                new_term_vector.append(word)
        regex = re.compile('[%s]' % re.escape(string.punctuation))
        new_review = []
        for token in new_term_vector: 
            new_token = regex.sub(u'', token)
            if not new_token == u'':
                new_review.append(new_token)
        final.append(new_review)
    real_final=[]
    for lists in final:
        for words in lists:
            real_final.append(words)
    return real_final

# Final Part
### Creating the WordCloud. 
### Here comes the fun part-outputing the WordCloud. Using the Python WordCloud Library and the cleaned-up lyrics, I am able to create the lyrics.

In [8]:
colormaps=["GnBu","PuBu","BuGn","winter","autumn","Spectral","seismic","cool","summer","spring","Pastel1","Pastel2"]

In [9]:
def word_cloud(list_of_words):
    count=Counter(list_of_words)
    wordcloud = WordCloud(width=480, height=480, margin=0,colormap=random.choice(colormaps)).generate_from_frequencies(count)
    fig = plt.gcf()
    fig.set_size_inches(18.5, 10.5)
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.margins(x=0, y=0)
    plt.show()

In [None]:
if __name__ == "__main__":
    artist="Drake"
    got_them=get_songs(artist)
    try:
        get_others=get_other_songs(artist)
    except:
        get_others=[]
    if len(get_others)>len(got_them):
        print("yes")
        got_them=pd.Series(get_others)
        got_them=got_them.str.replace("\((.*?)\)","")
        got_them=list(got_them.values)
    get_lyrics(got_them,artist)
    df=pd.read_json('Lyrics_Drake.json')
    list_words=clean_it(df)
    word_cloud(list_words)



 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "html5lib")

  markup_type=markup_type))


Searching for songs by Drake...

Song 1: "0 to 100 / The Catch Up"

Reached user-specified song limit (1).
Found 1 songs.
Done.
Searching for "'0 to 100/ The Catch Up'" by Drake...
Specified song was not first result :(
Not the first result
Searching for "'10 Bands'" by Drake...
Done.
Searching for "'305 to My City'" by Drake...
Done.
Searching for "'5AM In Toronto'" by Drake...
Done.
Searching for "'6 God'" by Drake...
Done.
Searching for "'6 Man'" by Drake...
Done.
Searching for "'6PM in New York'" by Drake...
Done.
Searching for "'8 out of 10'" by Drake...
Done.
Searching for "'9'" by Drake...
Done.
Searching for "'All Me'" by Drake...
Done.
Searching for "'Back to Back'" by Drake...
Done.
Searching for "'Behind Barz'" by Drake...
Specified song was not first result :(
Not the first result
Searching for "'Best I Ever Had'" by Drake...
Done.
Searching for "'Blem'" by Drake...
Done.
Searching for "'Blue Tint'" by Drake...
Done.
Searching for "'Buried Alive Interlude'" by Drake...
Done