# last.fm

By Alejandro Fernández Sánchez

The subject of study in this notebook will be [last.fm](https://www.last.fm/)'s API, a very large API containing (hopefully) useful data. Data that might fit our use-case.

## Imports

In [1]:
import networkx as nx
import json
import requests
import os
from dotenv import load_dotenv
load_dotenv()

API_KEY = os.getenv("LAST_FM_API_KEY")

## API

For starters, I was asked to check if we can extract some music genres for each artist using the [show/artist.getInfo](https://www.last.fm/api/show/artist.getInfo) endpoint.

In [2]:
artist_name = "Miley cyrus"

In [3]:
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getInfo&artist={artist_name}&api_key={API_KEY}&format=json").json()

{'artist': {'name': 'Miley Cyrus',
  'mbid': '7e9bd05a-117f-4cce-87bc-e011527a8b18',
  'url': 'https://www.last.fm/music/Miley+Cyrus',
  'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'small'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'medium'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'large'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'extralarge'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'mega'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': ''}],
  'streamable': '0',
  'ontour': '0',
  'stats': {'listeners': '3936207', 'playcount': '220845080'},
  'similar': {'artist': [{'name': 'Hann

That's curious, there's a similar field in the response JSON to another endpoint that I was asked to explore ([show/artist.getSimilar](https://www.last.fm/api/show/artist.getSimilar)). What does it return then?

In [4]:
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getSimilar&artist={artist_name}&api_key={API_KEY}&format=json").json()

{'similarartists': {'artist': [{'name': 'Hannah Montana',
    'mbid': 'e25744d3-ec9f-4bbf-8760-7bd3370906ea',
    'match': '1',
    'url': 'https://www.last.fm/music/Hannah+Montana',
    'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'small'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'medium'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'large'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'extralarge'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'mega'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': ''}],
    'streamable': '0'},
   {'name': 'ASHLEY O',
    'match': '0.4395

This is perfect! This way we don't need the similar field in the previous API, I bet that, given an artist name, we can get all the information at once. We even have a weight measurement for the graph edges.

Let's start by getting the information we already have and by creating some helper functions.

In [5]:
# I'll help myself with a Graph and a dictionary
id_dict = dict()
G = nx.Graph()
with open("artists.jsonl", "r", encoding="utf-8") as in_file:
    artist_data = [json.loads(line) for line in in_file]
    for artist in artist_data:
        known_names = artist["known_names"]
        G.add_node(artist["main_id"], known_names=artist["known_names"], listeners=0, playcount=0, tags=list())
        for name in known_names:
            id_dict[name.lower()] = artist["main_id"]

In [6]:
def get_artist_info(artist_name: str) -> dict[str] | None:
    info = requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getInfo&artist={artist_name}&api_key={API_KEY}&format=json").json()
    if "error" in info:
        error_code = info["error"]
        error_message = info["message"]
        print(f"An error with code {error_code} happened when trying to get info for {artist_name}.")
        print(f"Error message: {error_message}.")
        return None
    artist_info = info["artist"]
    similar = requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getSimilar&artist={artist_name}&api_key={API_KEY}&format=json").json()["similarartists"]["artist"]
    artist_info["similar"] = similar
    return artist_info
get_artist_info("Melendi")

{'name': 'Melendi',
 'mbid': '86d61837-890c-4d04-aeec-30d70ad77298',
 'url': 'https://www.last.fm/music/Melendi',
 'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'small'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'medium'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'large'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'extralarge'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'mega'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': ''}],
 'streamable': '0',
 'ontour': '0',
 'stats': {'listeners': '255621', 'playcount': '5391939'},
 'similar': [{'name': 'Estopa',
   'mbid': '49fc7e46-5441-4b7e-996a-12dddf22a6

The idea is now to iterate through the names and update the graph with the information that the API provides. Let's see in how much time we can get results with a naive algorithm. I'll let this run for some time and interpolate.

In [7]:
artists_updated = 0
artists_skipped = 0

# For each combination of name and id in our dataset
# Note: some ids will be repeated, but that's ok, since an artist may be known by more than one name
# We're handling the case of repeating edges by finding out the greater weight found
for name, node_idx in id_dict.items():
    artist_node = G.nodes[node_idx]      # Node in our graph
    artist_info = get_artist_info(name)  # Information in last.fm's API
    
    # If artist not found in their API go next
    if artist_info is None:
        artists_skipped += 1
        continue

    # Node update
    artist_node["listeners"] += int(artist_info["stats"].get("listeners", 0))
    artist_node["playcount"] += int(artist_info["stats"].get("playcount", 0))
    artist_node["tags"].extend([tag["name"] for tag in artist_info["tags"]["tag"]])
    
    # Create/modify edges for each similar artist
    for similar_artist in artist_info["similar"]:
        similar_artist_name = similar_artist["name"].lower()
        if similar_artist_name not in id_dict:  # If the artist is not in our dataset, continue
            continue
        # We get their id in our dataset and update/create the link/edge/connection
        # I'm under the assumption that A can have a match value with B different that B with A because idk if that can't be
        similar_artist_id = id_dict[similar_artist_name]
        if G.has_edge(node_idx, similar_artist_id):
            current_weight = G.edges[node_idx, similar_artist_id]["weight"]
            found_weight = float(similar_artist["match"])
            if found_weight > current_weight:
                G.edges[node_idx, similar_artist_id]["weight"] = found_weight
        else:
            G.add_edge(node_idx, similar_artist_id, weight=float(similar_artist["match"]))
    artists_updated += 1


An error with code 6 happened when trying to get info for gianluca chiarini.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for d r woning.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for t.e.z p’unk.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for bill tesar.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for 宇田川美奈.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for d‐dre.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for sonja & shanti sungkono.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for the western ramblers.
Error 

KeyboardInterrupt: 

In [8]:
print("artists_updated:", artists_updated)
print("artists_skipped:", artists_skipped)

artists_updated: 270
artists_skipped: 18


In 184 seconds it went through 288 artists. We have 783597 artists at the time of writing this markdown cell, so that means that one core of this machine would get the job done in approximately 347.66 hours. We better get into it then! And that's without exporting the results.