# Enrich Data

We use the [Last FM API](http://last.fm/api/) to get the tags for each artist. Below we assume that the first matching artist is the correct one. We do this to add more features that reflect the listern's preferences. IF they don't listen to any "rock" tags then they probably won't like The Beatles.

In [1]:
lastfm_api_keys = dict([x.split('\t') for x in open('lastfm.conf').read().split('\n')])

In [2]:
import requests
import pandas as pd

In [3]:
artists_df = pd.read_feather("artist_df.feather")

For each artists:

1. Last FM Search with the artist name
2. grab the id (mbid)
3. we then call getInfo to get more data on the artists
4. get the tags
5. add the tags to the columns and set as True: meaning we saw that tag for that artist



In [7]:
name = "Muse"
tags = []

res = requests.get(url=lastfm_api_keys['API Root'],
         params=dict(
             method="artist.search",
             artist=name,
             api_key=lastfm_api_keys['API key'],
             format="json"))

res_json = res.json()
mbid = res_json['results']['artistmatches']['artist'][0]['mbid']

res = requests.get(url=lastfm_api_keys['API Root'],
         params=dict(
             method="artist.getInfo",
             artist=name,
             mbid=mbid,
             api_key=lastfm_api_keys['API key'],
             format="json"))

res_json = res.json()
tags = ["tag_{}".format(x['name'].lower().replace("-", " ")) 
        for x in res_json['artist']['tags']['tag']]


tags

['tag_alternative rock',
 'tag_rock',
 'tag_alternative',
 'tag_progressive rock',
 'tag_seen live']

In [4]:

for index, row in artists_df.iterrows():
    name = row['artist_name']
    tags = []
    try:
        res = requests.get(url=lastfm_api_keys['API Root'],
                 params=dict(
                     method="artist.search",
                     artist=name,
                     api_key=lastfm_api_keys['API key'],
                     format="json"))

        res_json = res.json()
        mbid = res_json['results']['artistmatches']['artist'][0]['mbid']

        res = requests.get(url=lastfm_api_keys['API Root'],
                 params=dict(
                     method="artist.getInfo",
                     artist=name,
                     mbid=mbid,
                     api_key=lastfm_api_keys['API key'],
                     format="json"))

        res_json = res.json()
        tags = ["tag_{}".format(x['name'].lower().replace("-", " ")) 
                for x in res_json['artist']['tags']['tag']]
        for tag in tags:
            artists_df.at[index, tag] = True
    except Exception as e:
        print(e)
     

In [5]:
artists_df.fillna(False, inplace=True)
artists_df.head(5)

Unnamed: 0,artist_name,tag_alternative,tag_alternative rock,tag_rock,tag_indie,tag_electronic,tag_classic rock,tag_british,tag_60s,tag_pop,...,tag_east coast rap,tag_jay z,tag_shoegazer,tag_hair metal,tag_rapcore,tag_symphonic black metal,tag_darkwave,tag_world,tag_latin,tag_spanish
0,Radiohead,True,True,True,True,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,The Beatles,False,False,True,False,False,True,True,True,True,...,False,False,False,False,False,False,False,False,False,False
2,Pink Floyd,False,False,True,False,False,True,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,Daft Punk,False,False,False,False,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,Muse,True,True,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [6]:
artists_df.to_csv("bands_tags.csv")