# last.fm

By Alejandro Fernández Sánchez

The subject of study in this notebook will be [last.fm](https://www.last.fm/)'s API, a very large API containing (hopefully) useful data. Data that might fit our use-case.

## Imports

In [1]:
import networkx as nx
import json
import requests
import os
from dotenv import load_dotenv
load_dotenv()

API_KEY = os.getenv("LAST_FM_API_KEY")

## Information of artists

For starters, I was asked to check if we can extract some music genres for each artist using the [show/artist.getInfo](https://www.last.fm/api/show/artist.getInfo) endpoint.

In [2]:
artist_name = "Miley cyrus"

In [3]:
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getInfo&artist={artist_name}&api_key={API_KEY}&format=json").json()

{'artist': {'name': 'Miley Cyrus',
  'mbid': '7e9bd05a-117f-4cce-87bc-e011527a8b18',
  'url': 'https://www.last.fm/music/Miley+Cyrus',
  'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'small'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'medium'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'large'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'extralarge'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': 'mega'},
   {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
    'size': ''}],
  'streamable': '0',
  'ontour': '0',
  'stats': {'listeners': '3972571', 'playcount': '223182339'},
  'similar': {'artist': [{'name': 'Hann

That's curious, there's a similar field in the response JSON to another endpoint that I was asked to explore ([show/artist.getSimilar](https://www.last.fm/api/show/artist.getSimilar)). What does it return then?

In [4]:
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getSimilar&artist={artist_name}&api_key={API_KEY}&format=json").json()

{'similarartists': {'artist': [{'name': 'Hannah Montana',
    'mbid': 'e25744d3-ec9f-4bbf-8760-7bd3370906ea',
    'match': '1',
    'url': 'https://www.last.fm/music/Hannah+Montana',
    'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'small'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'medium'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'large'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'extralarge'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'mega'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': ''}],
    'streamable': '0'},
   {'name': 'ASHLEY O',
    'match': '0.4453

This is perfect! This way we don't need the similar field in the previous API, I bet that, given an artist name, we can get all the information at once. We even have a weight measurement for the graph edges.

Let's start by getting the information we already have and by creating some helper functions.

In [5]:
# I'll help myself with a Graph and a dictionary
id_dict = dict()
G = nx.Graph()
with open("artists.jsonl", "r", encoding="utf-8") as in_file:
    artist_data = [json.loads(line) for line in in_file]
    for artist in artist_data:
        known_names = artist["known_names"]
        G.add_node(artist["main_id"], known_names=artist["known_names"], listeners=0, playcount=0, tags=list())
        for name in known_names:
            id_dict[name.lower()] = artist["main_id"]

In [6]:
def get_artist_info(artist_name: str) -> dict[str] | None:
    info = requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getInfo&artist={artist_name}&api_key={API_KEY}&format=json").json()
    if "error" in info:
        error_code = info["error"]
        error_message = info["message"]
        print(f"An error with code {error_code} happened when trying to get info for {artist_name}.")
        print(f"Error message: {error_message}.")
        return None
    artist_info = info["artist"]
    similar = requests.get(f"http://ws.audioscrobbler.com/2.0/?method=artist.getSimilar&artist={artist_name}&api_key={API_KEY}&format=json").json()["similarartists"]["artist"]
    artist_info["similar"] = similar
    return artist_info
get_artist_info("Melendi")

{'name': 'Melendi',
 'mbid': '86d61837-890c-4d04-aeec-30d70ad77298',
 'url': 'https://www.last.fm/music/Melendi',
 'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'small'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'medium'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'large'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'extralarge'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': 'mega'},
  {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
   'size': ''}],
 'streamable': '0',
 'ontour': '0',
 'stats': {'listeners': '258209', 'playcount': '5499596'},
 'similar': [{'name': 'Estopa',
   'mbid': '49fc7e46-5441-4b7e-996a-12dddf22a6

The idea is now to iterate through the names and update the graph with the information that the API provides. Let's see in how much time we can get results with a naive algorithm. I'll let this run for some time and interpolate.

In [7]:
artists_updated = 0
artists_skipped = 0

# For each combination of name and id in our dataset
# Note: some ids will be repeated, but that's ok, since an artist may be known by more than one name
# We're handling the case of repeating edges by finding out the greater weight found
for name, node_idx in id_dict.items():
    artist_node = G.nodes[node_idx]      # Node in our graph
    artist_info = get_artist_info(name)  # Information in last.fm's API
    
    # If artist not found in their API go next
    if artist_info is None:
        artists_skipped += 1
        continue

    # Node update
    artist_node["listeners"] += int(artist_info["stats"].get("listeners", 0))
    artist_node["playcount"] += int(artist_info["stats"].get("playcount", 0))
    artist_node["tags"].extend([tag["name"] for tag in artist_info["tags"]["tag"]])
    
    # Create/modify edges for each similar artist
    for similar_artist in artist_info["similar"]:
        similar_artist_name = similar_artist["name"].lower()
        if similar_artist_name not in id_dict:  # If the artist is not in our dataset, continue
            continue
        # We get their id in our dataset and update/create the link/edge/connection
        # I'm under the assumption that A can have a match value with B different that B with A because idk if that can't be
        similar_artist_id = id_dict[similar_artist_name]
        if G.has_edge(node_idx, similar_artist_id):
            current_weight = G.edges[node_idx, similar_artist_id]["weight"]
            found_weight = float(similar_artist["match"])
            if found_weight > current_weight:
                G.edges[node_idx, similar_artist_id]["weight"] = found_weight
        else:
            G.add_edge(node_idx, similar_artist_id, weight=float(similar_artist["match"]))
    artists_updated += 1


An error with code 6 happened when trying to get info for kaustuv kanti ganguli.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for chloe saavedra.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for pat boyack & the prowlers.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for sisma8h williams.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for de wolkendragers.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for immanuelkyrkans ungdomskör och orkester.
Error message: The artist you supplied could not be found.
An error with code 6 happened when trying to get info for toasting to messiah.
Error message: The artist you supplied could not be found.
An error with cod

KeyboardInterrupt: 

In [8]:
print("artists_updated:", artists_updated)
print("artists_skipped:", artists_skipped)

artists_updated: 231
artists_skipped: 20


In 180 seconds it went through 251 artists. We have 810336 artists at the time of writing this markdown cell, so that means that one core of this machine would get the job done in A LOT of hours. We better get into it then!

## Information on releases

We are also interested in getting some information about our releases (or tracks, like LastFM call them). Let's explore further into the API using the [show/track.getInfo](https://www.last.fm/api/show/track.getInfo) endpoint.

In [9]:
artist_name = "Daisy the great"
track_name = "record player"
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=track.getInfo&artist={artist_name}&track={track_name}&autocorrect=1&api_key={API_KEY}&format=json").json()

{'track': {'name': 'Record Player',
  'url': 'https://www.last.fm/music/Daisy+the+Great/_/Record+Player',
  'duration': '148000',
  'streamable': {'#text': '0', 'fulltrack': '0'},
  'listeners': '44549',
  'playcount': '241677',
  'artist': {'name': 'Daisy the Great',
   'url': 'https://www.last.fm/music/Daisy+the+Great'},
  'album': {'artist': 'Daisy the Great',
   'title': '2021-12-06: Pamnation HQ, New York, NY, USA',
   'url': 'https://www.last.fm/music/Daisy+the+Great/2021-12-06:+Pamnation+HQ,+New+York,+NY,+USA',
   'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/c26225604c57eed64657158e0fddd2ab.png',
     'size': 'small'},
    {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/c26225604c57eed64657158e0fddd2ab.png',
     'size': 'medium'},
    {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/c26225604c57eed64657158e0fddd2ab.png',
     'size': 'large'},
    {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/c26225604c57eed64657158e0fddd2ab.png',
     'si

Seems easy enough. Only problem is collaboration songs, after testing a song seems to be attributed to only one of the artists (although another artist can also be related to another entity that represents the same song). I'll have to iterate through the artists and stay with the maximum number of listeners and playcount, while also saving all the tags (without reps). Nevertheless, after some talks we've considered to stay away from networkx and to work with Neo4j. I will not test everything like I did with the artists.

Lastly, I've noticed another potentially useful endpoint worth exploring, [show/track.getSimilar](https://www.last.fm/api/show/track.getSimilar).

In [10]:
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=track.getSimilar&artist={artist_name}&track={track_name}&autocorrect=1&api_key={API_KEY}&format=json").json()

{'similartracks': {'track': [], '@attr': {'artist': 'Daisy the Great'}}}

In [11]:
# Maybe a more famous song?
artist_name = "Coldplay"
track_name = "Something like this"
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=track.getSimilar&artist={artist_name}&track={track_name}&autocorrect=1&api_key={API_KEY}&format=json").json()

{'similartracks': {'track': [], '@attr': {'artist': 'Coldplay'}}}

In [12]:
# ...maybe older?
artist_name = "Queen"
track_name = ("Bohemian rhapsody")
requests.get(f"http://ws.audioscrobbler.com/2.0/?method=track.getSimilar&artist={artist_name}&track={track_name}&autocorrect=1&api_key={API_KEY}&format=json").json()

{'similartracks': {'track': [{'name': 'We Are the Champions',
    'playcount': 5029320,
    'mbid': '4b3e7f93-5c74-4ad8-9a88-8b0676b5604e',
    'match': 1.0,
    'url': 'https://www.last.fm/music/Queen/_/We+Are+the+Champions',
    'streamable': {'#text': '0', 'fulltrack': '0'},
    'duration': 180,
    'artist': {'name': 'Queen',
     'mbid': '420ca290-76c5-41af-999e-564d7c71f1a7',
     'url': 'https://www.last.fm/music/Queen'},
    'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'small'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'medium'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'large'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'extralarge'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2

Doesn't seem very useful...