# Demo music query

We perform the following in order to generate recommendation.

1. Load an embeddings and cluster model.
2. Query by specifying song title and any metadata to condition.
3. Get lyrics through an API.
    - First with [this API](http://www.chartlyrics.com/api.aspx), as it's free and does not require an API key.
    - Otherwise fall back on [this API](https://github.com/johnwmillr/LyricsGenius) to access Genius. **Note you will need an API key which can create [here](https://genius.com/api-clients).**
4. Get Spotify acoustic features and metadata with [this API](https://spotipy.readthedocs.io/en/2.19.0/). **Note you will need a client ID and secret key which can create [here](https://developer.spotify.com).**
5. Return top K recommendations by:
    - Computing embedding.
    - Identifying corresponding cluster.
    - Subset based on query.
    
First some imports.

In [1]:
import urllib.request
import json
import numpy as np
import lyricsgenius
import re
import xml.etree.ElementTree as ET
from pprint import pprint
import os
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from sentence_transformers import SentenceTransformer


def get_token(token_name, token_path="tokens.json"):
    TOKEN = None
    if os.environ.get(token_name):
        TOKEN = os.environ.get(token_name)
    elif os.path.isfile(token_path):
        f = open(token_path)
        data = json.loads(f.read())
        TOKEN = data[token_name]
    else:
        assert TOKEN is not None, f"No value for {token_name}."
    return TOKEN


def standardize_lyrics(lyrics, i=0, verbose=False):
    if verbose:
        print(i)
    if lyrics is np.nan or len(lyrics) == 0:
        return np.nan
    
    # remove new lines
    clean = lyrics.replace("\\n\\n", ". ").replace("\\n", ". ").replace("\\", "")
    
    # remove square brackets around lyrics
    # if possible, extract chorus, pre-chorus, post-chorus, bridge, verses
    song_parts = ["Chorus", "Pre-Chorus", "Post-Chorus", "Bridge", "Verse 1", "Verse 2", "Verse 3", "Verse 4"]
    if verbose:
        for part in song_parts:
            text = find_between(clean, f"[{part}]. ", "[")
            if len(text):
                print(f"\n{part} : {text}")
    
    for part in song_parts:
        clean = clean.replace(f"[{part}]. ", "")
        
        
    # remove anything else in square brackets
    clean = re.sub("[\[].*?[\]]", "", clean)
    
    # clean up
    clean = clean.replace('"', "")
    try:
        while clean[0] == "." or clean[0] == " " or clean[0] == "'":
            clean = clean[1:]
    except:
        return np.nan
    try:
        if clean[-1] == "'":
            clean = clean[:-1]
    except:
        return np.nan
    
    clean = clean.strip().replace("\n", " ")
        
    return clean

# 1) load encoder and clusters 

In [2]:
# from model hub or local path
# TODO detect when local path and need to remove classification head
embeddings_model = "sentence-transformers/all-mpnet-base-v2"
clusters_path  = ""


if not os.path.isfile(embeddings_model):
    # coming from model hub
    model = SentenceTransformer(embeddings_model)
else:
    raise ValueError("strip of classification head")

# 2) specify query

In [3]:
# song_title = "car radio"
# artist = "twenty one pilots"
song_title = "we are the champions"
artist = "queen"
genre = "hip-hop/rap"    # get available genres from Pandas dataframe
danceability = 1   # positive for more, 0 for no preference, negative for less

# 3) get lyrics

- http://www.chartlyrics.com/api.aspx
- https://github.com/johnwmillr/LyricsGenius

For last approach, you need an [API token](https://genius.com/api-clients) and add it to your environment variables:
```
export GENIUS_ACCESS_TOKEN="my_access_token_here"
```

In [4]:
GENIUS_ACCESS_TOKEN = get_token("GENIUS_ACCESS_TOKEN")

In [5]:
start_url = f"http://api.chartlyrics.com/apiv1.asmx/SearchLyricDirect?artist={artist}&song={song_title}"
url = start_url.replace(" ","%20")
contents = urllib.request.urlopen(url).read()
root = ET.fromstring(contents.decode("utf-8"))
for child in root:
    tag = child.tag.split("}")[1]
    if tag == "Lyric":
        lyrics = child.text
if lyrics is not None:
    lyrics = lyrics.strip().replace("\n", " ")
elif os.environ.get('GENIUS_ACCESS_TOKEN'):
    # use Genius API
    print("Using Genius...")
    genius = lyricsgenius.Genius(GENIUS_ACCESS_TOKEN)
    song = genius.search_song(song_title, artist)
    lyrics = standardize_lyrics(song.lyrics)
    lyrics = ' '.join(lyrics.split(' ')[:-1])[:-13]   # remove last part Genius adds
else:
    raise ValueError("Could not find song.")
    
print(lyrics)

I've paid my dues Time after time I've done my sentence But committed no crime  And bad mistakes I've made a few I've had my share of sand kicked in my face But I've come through And we mean to go on and on and on  We are the champions - my friends And we'll keep on fighting - till the end We are the champions We are the champions No time for losers 'Cause we are the champions - of the world  I've taken my bows And my curtain calls You brought me fame and fortune and everything that goes with it I thank you all  But it's been no bed of roses No pleasure cruise I consider it a challenge before the whole human race And I ain't gonna lose And I need to go on and on and on  We are the champions - my friends And we'll keep on fighting - till the end We are the champions We are the champions No time for losers 'Cause we are the champions - of the world  We are the champions - my friends And we'll keep on fighting - till the end We are the champions We are the champions No time for losers 'Ca

# 4) get spotipy metadata and features

be sure to have credentials from [here](https://developer.spotify.com) and save them as environment variables.
```
export SPOTIPY_CLIENT_ID='your-spotify-client-id'
export SPOTIPY_CLIENT_SECRET='your-spotify-client-secret'
```

In [6]:
SPOTIPY_CLIENT_ID = get_token("SPOTIPY_CLIENT_ID")
SPOTIPY_CLIENT_SECRET = get_token("SPOTIPY_CLIENT_SECRET")

auth_manager = SpotifyClientCredentials(client_id=SPOTIPY_CLIENT_ID, client_secret=SPOTIPY_CLIENT_SECRET)
sp = spotipy.Spotify(auth_manager=auth_manager)

In [7]:
# search for song, https://developer.spotify.com/documentation/web-api/reference/#/operations/search
query = f"track:{song_title}"
if artist is not None:
    query += f" artist:{artist}"
res = sp.search(q=query, type='track')

# take top entry
_id = 0
rx_song = res["tracks"]["items"][0]["name"]
rx_artists = [artist["name"] for artist in res["tracks"]["items"][0]["artists"]]
print(f"{rx_song} by {rx_artists}")

We Are The Champions - Remastered 2011 by ['Queen']


In [8]:
song_metadata = dict()
song_metadata["release_year"] = int(res["tracks"]["items"][_id]["album"]["release_date"][:4])
song_metadata["popularity"] = res["tracks"]["items"][_id]["popularity"]

# get acoustic features
acoustic_features = ["mode", "acousticness", "danceability", "energy", "instrumentalness", "liveness", "loudness", "speechiness", "valence", "tempo"]
uri = res["tracks"]["items"][_id]["uri"]
feat_results = sp.audio_features(uri)[0]
for _feat in acoustic_features:
    song_metadata[_feat] = feat_results[_feat]
pprint(song_metadata)

# could probably also get genre metadata from this API

{'acousticness': 0.378,
 'danceability': 0.268,
 'energy': 0.459,
 'instrumentalness': 0,
 'liveness': 0.119,
 'loudness': -6.948,
 'mode': 0,
 'popularity': 66,
 'release_year': 1977,
 'speechiness': 0.0346,
 'tempo': 64.223,
 'valence': 0.172}


# 5) return top K recommendations

first compute embedding

In [9]:
embedding = model.encode(lyrics)
print(embedding.shape)

(768,)


identify corresponding cluster

subset based on query and give top K recommendations

In [10]:
K = 3