# Last FM API, but with pandas!

Spotify's API is dead to us, so we're using Last.fm's - it's still music, just not as nice of an API.

1. Create an account at https://www.last.fm/api/
2. Create an "application" to get a key: https://www.last.fm/api/account/create
    - It isn't a real application, it's just your project
    - Name/description doesn't matter, ignore callback key and callback url
3. And save the API key that shows up on the next screen

We are going to use pandas instead of "normal" Python to do this analysis.

# FIRST: SETUP

## 1) Import the libraries/packages you might need

We need a library to read in the data for us! We don't like `urllib2`, so it must be something cooler and better.

In [1]:
# Import what you need here
import requests
import pandas as pd

## 2) Save your API key

Write your API key here so you don't forget it - it's the "api key" one, not the "shared secret" one

In [None]:
# e8f3c22a140c30779a9a652e0e209e5a

## 3) The death of an API

I used to have some code here that allowed you to display images, but _the images don't work any more._ Let this be an important lesson: when you depend on external services, they can die at any time.

# NOW: YOUR ASSIGNMENT

## 1) Search for and print a list of 50 musicians with `lil` in their name, along with the number of listeners they have

There are a lot of musicians with "Lil" in their name - it used to be all Lil Wayne and Lil Kim, but we live in a new world now!

**I've already gotten the data for you.** Your job is to put it in pandas.

In [2]:
# /2.0/?method=artist.search&artist=cher&api_key=YOUR_API_KEY&format=json
url = "http://ws.audioscrobbler.com/2.0/?method=artist.search&artist=lil&api_key=e8f3c22a140c30779a9a652e0e209e5a&format=json&limit=50"
response = requests.get(url)
data = response.json()

In [3]:
# STEP ONE: find a list of dictionaries
data['results']['artistmatches']['artist']

[{'name': 'LIL UZI VERT',
  'listeners': '1010914',
  'mbid': '',
  'url': 'https://www.last.fm/music/LIL+UZI+VERT',
  'streamable': '0',
  'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'small'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'medium'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'large'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'extralarge'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'mega'}]},
 {'name': 'LIL PEEP',
  'listeners': '814142',
  'mbid': '',
  'url': 'https://www.last.fm/music/LIL+PEEP',
  'streamable': '0',
  'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'small'

In [4]:
# STEP TWO: Feed it to pandas
df = pd.DataFrame(data['results']['artistmatches']['artist'])
df.head()

Unnamed: 0,name,listeners,mbid,url,streamable,image
0,LIL UZI VERT,1010914,,https://www.last.fm/music/LIL+UZI+VERT,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
1,LIL PEEP,814142,,https://www.last.fm/music/LIL+PEEP,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
2,Lil Nas X,1378100,,https://www.last.fm/music/Lil+Nas+X,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
3,Lil' Wayne,3377150,,https://www.last.fm/music/Lil%27+Wayne,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
4,Lily Allen,2507905,6e0c7c0e-cba5-4c2c-a652-38f71ef5785d,https://www.last.fm/music/Lily+Allen,0,[{'#text': 'https://lastfm.freetls.fastly.net/...


In [5]:
df.sort_values(by='listeners', ascending=False)

Unnamed: 0,name,listeners,mbid,url,streamable,image
1,LIL PEEP,814142,,https://www.last.fm/music/LIL+PEEP,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
28,lilbubblegum,78778,,https://www.last.fm/music/lilbubblegum,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
48,Lil Candy Paint,75139,,https://www.last.fm/music/Lil+Candy+Paint,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
37,Lil Kleine,70940,,https://www.last.fm/music/Lil+Kleine,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
34,LIL BO WEEP,70896,,https://www.last.fm/music/LIL+BO+WEEP,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
9,Lil Wayne,693585,ac9a487a-d9d2-4f27-bb23-0f4686488345,https://www.last.fm/music/Lil+Wayne,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
42,Lil' Kleine,68885,,https://www.last.fm/music/Lil%27+Kleine,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
24,Lilypichu,67558,,https://www.last.fm/music/Lilypichu,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
25,Lil Jon,668967,a95384b1-6aec-468c-ae0d-8c6daf87c4c2,https://www.last.fm/music/Lil+Jon,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
47,Lil Boodang,66777,,https://www.last.fm/music/Lil+Boodang,0,[{'#text': 'https://lastfm.freetls.fastly.net/...


In [6]:
int("5")

5

In [10]:
# Convert a column that is a string
# into an integer so you can do numbers
# stuff with it

# update the column with the integer version of the column
df.listeners = df.listeners.astype(int)

In [12]:
df.dtypes

name          object
listeners      int32
mbid          object
url           object
streamable    object
image         object
dtype: object

In [11]:
df.sort_values(by='listeners', ascending=False)

Unnamed: 0,name,listeners,mbid,url,streamable,image
3,Lil' Wayne,3377150,,https://www.last.fm/music/Lil%27+Wayne,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
4,Lily Allen,2507905,6e0c7c0e-cba5-4c2c-a652-38f71ef5785d,https://www.last.fm/music/Lily+Allen,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
2,Lil Nas X,1378100,,https://www.last.fm/music/Lil+Nas+X,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
0,LIL UZI VERT,1010914,,https://www.last.fm/music/LIL+UZI+VERT,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
1,LIL PEEP,814142,,https://www.last.fm/music/LIL+PEEP,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
9,Lil Wayne,693585,ac9a487a-d9d2-4f27-bb23-0f4686488345,https://www.last.fm/music/Lil+Wayne,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
25,Lil Jon,668967,a95384b1-6aec-468c-ae0d-8c6daf87c4c2,https://www.last.fm/music/Lil+Jon,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
5,Lil Baby,623316,,https://www.last.fm/music/Lil+Baby,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
27,Lil Jon & The East Side Boyz,598301,243c6f61-d83b-4459-bebd-5899df0da111,https://www.last.fm/music/Lil+Jon+&+The+East+S...,0,[{'#text': 'https://lastfm.freetls.fastly.net/...
7,Lil Yachty,585678,,https://www.last.fm/music/Lil+Yachty,0,[{'#text': 'https://lastfm.freetls.fastly.net/...


Your results should begin something like this:
    
```
Lil' Wayne has 3086628 listeners
Lily Allen has 2074266 listeners
Lil B has 194116 listeners
Lilly Wood & The Prick has 359886 listeners
Lil Ugly Mane has 31955 listeners
LIL UZI VERT has 88517 listeners
```

## 2) How many listeners does your list have in total?

The answer should be roughly **15,000,000**. If it's lower, make sure you have 50 artists instead of 30 artists.

- *Tip: What's the data type of the `listeners` count? It's going to cause a problem!*
- *Tip: If you were crazy you could use sum and a list comprehension. But you really don't have to!*

In [13]:
df.listeners.sum()

20820014

## 3) Show each artist's name and the URL to the extra-large image

The images don't work any more, but we'll print their URLs out anyway.

Each artist **has a list of images of different sizes**. We're interested in the second-to-last one, where `size` is `extralarge`. Print their name and use `display_image` to display their extra-large image.

- *Tip: The URL should look like this: `https://lastfm-img2.akamaized.net/i/u/300x300/0fc7d7a1812dc79e9925d80382cde594.png`*
- *Tip: You can always assume it's the second to the last, or assume it's `extralarge`, or whatever you want to do to find it.*
- *Tip: Make sure the URL is correct before you try to display it.*

Your output should look something like

```
Lil' Wayne
https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png
---
LIL UZI VERT
https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png
---
Lily Allen
https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png
---
```

(but with more people, obviously)

## 4) Find Lil Jon's `mbid` (or anyone else's!).

Oftentimes in an API, you can do a few things: you can **search** for items, and you can **see more information** about items. To find more information about the item, you need to use their **unique id**. In this dataset, it's called an `mbid` (MusicBrainz, I think - another company associated with last.fm!).

Go through the artists and print their **name and mbid**. Find Lil Jon's `mbid`. I *wanted* Lil Uzi Vert's, but for some reason it isn't there. Then I wanted us to look at Lily Allen's, but I just couldn't bring myself to do that. If you'd rather do someone else, go for it.

In [14]:
# Find Lil Jon's mbid
# Find the place where the artist's name is "Lil Jon"
# if you get a list of Trues and Falses, wrap it in df[....]df[]
df[df.name == "Lil Jon"]

Unnamed: 0,name,listeners,mbid,url,streamable,image
25,Lil Jon,668967,a95384b1-6aec-468c-ae0d-8c6daf87c4c2,https://www.last.fm/music/Lil+Jon,0,[{'#text': 'https://lastfm.freetls.fastly.net/...


In [15]:
df.query("name == 'Lil Jon'")

Unnamed: 0,name,listeners,mbid,url,streamable,image
25,Lil Jon,668967,a95384b1-6aec-468c-ae0d-8c6daf87c4c2,https://www.last.fm/music/Lil+Jon,0,[{'#text': 'https://lastfm.freetls.fastly.net/...


## 5) Find the artist's name and bio using their `mbid`.

It can either be Lil Jon or whoever you selected above.

If you look at the [last.fm documentation](http://www.last.fm/api/show/artist.getInfo), you can see how to use the artist's `mbid` to find more information about them. Print **every tag associated with your artist**.

- *Tip: It's a new request to the API*
- *Tip: Use the `mbid`, and make sure you delete the `&name=Cher` from the sample endpoint*
- *Tip: If you use `print` for the bio it looks a little nicer than it would otherwise*

In [18]:
mbid = "a95384b1-6aec-468c-ae0d-8c6daf87c4c2"
url = f"http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&mbid={mbid}&api_key=e8f3c22a140c30779a9a652e0e209e5a&format=json"
response = requests.get(url)
data = response.json()

In [19]:
data

{'artist': {'name': 'Lil Jon',
  'mbid': 'a95384b1-6aec-468c-ae0d-8c6daf87c4c2',
  'url': 'https://www.last.fm/music/Lil+Jon',
  'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'small'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'medium'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'large'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'extralarge'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'mega'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': ''}],
  'streamable': '0',
  'ontour': '0',
  'stats': {'listeners': '668967', 'playcount': '3962471'},
  'similar': {'artist': [{'name': 'Lil Jon & The E

In [20]:
data['artist'].keys()

dict_keys(['name', 'mbid', 'url', 'image', 'streamable', 'ontour', 'stats', 'similar', 'tags', 'bio'])

In [21]:
data['artist']['bio']['summary']

'Jonathan Mortimer Smith (born January 27, 1971 in Atlanta, Georgia), better known by his stage name Lil Jon, is an American rapper, actor, producer and member of the crunk group Lil Jon & The East Side Boyz. Lil Jon formed the group with friends Big Sam and Lil Bo, and they released five studio albums and have had many hit songs. He\'s prehaps best known for his iconic single "Get Low", which featured the Ying Yang Twins and reached #2 on the Billboard 100. He released his debut solo album, \'Crunk Rock\', in June 2010. <a href="https://www.last.fm/music/Lil+Jon">Read more on Last.fm</a>'

## 6) Print every tag of that artist

In [22]:
data['artist']['tags']['tag']

[{'name': 'Crunk', 'url': 'https://www.last.fm/tag/Crunk'},
 {'name': 'Hip-Hop', 'url': 'https://www.last.fm/tag/Hip-Hop'},
 {'name': 'rap', 'url': 'https://www.last.fm/tag/rap'},
 {'name': 'Dirty South', 'url': 'https://www.last.fm/tag/Dirty+South'},
 {'name': 'hip hop', 'url': 'https://www.last.fm/tag/hip+hop'}]

In [23]:
pd.DataFrame(data['artist']['tags']['tag']).name.unique()

array(['Crunk', 'Hip-Hop', 'rap', 'Dirty South', 'hip hop'], dtype=object)

In [24]:
[tag['name'] for tag in data['artist']['tags']['tag']]

['Crunk', 'Hip-Hop', 'rap', 'Dirty South', 'hip hop']

# GETTING A LITTLE CRAZY

So you know your original list of musicians? I want to get tag data for ALL OF THEM. How are we going to do that?

## 7) Find the mbid URLs

If we have a musician with an mbid of `AAA-AAA-AAA`, we get their info from a url like `http://ws.audioscrobbler.com/blahblah/?api_key=12345&mbid=AAA-AAA-AAA`.

|artist|url|
|---|---|
|`AAA-AAA-AAA`|`http://ws.audioscrobbler.com/blahblah/?api_key=12345&mbid=AAA-AAA-AAA`|
|`BBB-BBB-BBB`|`http://ws.audioscrobbler.com/blahblah/?api_key=12345&mbid=BBB-BBB-BBB`|
|`CCC-CCC-CCC`|`http://ws.audioscrobbler.com/blahblah/?api_key=12345&mbid=CCC-CCC-CCC`|

Calculate a new column called `mbid_url` for the URLs.

In [25]:
"http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&mbid=" + mbid + "&api_key=e8f3c22a140c30779a9a652e0e209e5a&format=json"

'http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&mbid=a95384b1-6aec-468c-ae0d-8c6daf87c4c2&api_key=e8f3c22a140c30779a9a652e0e209e5a&format=json'

In [28]:
df['mbid_url'] = "http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&mbid=" + df.mbid + "&api_key=e8f3c22a140c30779a9a652e0e209e5a&format=json"
df.head()

Unnamed: 0,name,listeners,mbid,url,streamable,image,mbid_url
0,LIL UZI VERT,1010914,,https://www.last.fm/music/LIL+UZI+VERT,0,[{'#text': 'https://lastfm.freetls.fastly.net/...,http://ws.audioscrobbler.com/2.0/?method=artis...
1,LIL PEEP,814142,,https://www.last.fm/music/LIL+PEEP,0,[{'#text': 'https://lastfm.freetls.fastly.net/...,http://ws.audioscrobbler.com/2.0/?method=artis...
2,Lil Nas X,1378100,,https://www.last.fm/music/Lil+Nas+X,0,[{'#text': 'https://lastfm.freetls.fastly.net/...,http://ws.audioscrobbler.com/2.0/?method=artis...
3,Lil' Wayne,3377150,,https://www.last.fm/music/Lil%27+Wayne,0,[{'#text': 'https://lastfm.freetls.fastly.net/...,http://ws.audioscrobbler.com/2.0/?method=artis...
4,Lily Allen,2507905,6e0c7c0e-cba5-4c2c-a652-38f71ef5785d,https://www.last.fm/music/Lily+Allen,0,[{'#text': 'https://lastfm.freetls.fastly.net/...,http://ws.audioscrobbler.com/2.0/?method=artis...


In [29]:
# Pandas tries to be helpful and cuts off long cells
# but if you set max_colwidth it will show you more
# None means no limit
# usually I just say 1000
pd.set_option("display.max_colwidth", 1000)

## 7.5) Remove everyone who is missing an mbid

In [31]:
# in pandas, missing data is NaN
# but because we read this in from an API
# pandas didn't convert numbers to integers
# and didn't convert missing stuff to NaN
# df.mbid.notnull()

# Hey dataframe, find 

df = df[df.mbid != ""]

## 9) Printing our API urls

To get tag data for each artist, you need to use those `mbid` values to access their artist page on the API. Loop through the mbids, displying the URL you'll need to access.

## 10) Using the first three `mbids` and `scrape_artist` request the API urls and print the artist's info.

You built the URLs in the last question... but we aren't going to use them! It's often important to build new columns, but in this case there's a better way.

In [None]:
def scrape_artist(row):
    # Fill in the blank for the URL mbid
    mbid = row['mbid']
    print("Requesting", mbid)

    url = f"http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&mbid={mbid}&api_key=e8f3c22a140c30779a9a652e0e209e5a&format=json"
    response = requests.get(url)
    data = response.json()

    # Get the info we want
    bio = data['artist']['bio']['summary']
    tags = [tag['name'] for tag in data['artist']['tags']['tag']]

    # Send it on back
    return pd.Series({
        'mbid': mbid,
        'bio': bio,
        'tags': tags        
    })

In [32]:
df[:3].apply(scrape_artist, axis=1)

NameError: name 'scrape_artist' is not defined

## 11) Using the first ten `mbids`, save the artist bio and tags as a dataframe called `df_bios`

In [None]:
df_bios = df[:10].apply(scrape_artist, axis=1)
df_bios

## 12) Merge this with the original dataframe, saving it as `merged`

In [None]:
merged = df.merge(df_bios, on='mbid')
merged

## 12) Only select the artists that have 'hip hop' in their tag list

In [None]:
merged.tags.apply(lambda tags: 'hip hop' in tags)

In [None]:
merged[merged.tags.apply(lambda tags: 'hip hop' in tags)]

## 13) What percent of "lil" results are rappers?

In [None]:
merged.tags.apply(lambda tags: 'hip hop' in tags).value_counts()

In [None]:
merged.tags.apply(lambda tags: 'hip hop' in tags).value_counts(normalize=True)

## 14) Seriously you are all-powerful now (it isn't cheating, it's PANDAS!)