# FMA: A Dataset For Music Analysis

Kirell Benzi, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.

## Free Music Archive web API

All the data in the `fma.json` DataFrame was collected from the Free Music Archive [public API](https://freemusicarchive.org/api). With this notebook, you can:
* reconstruct the original data, 
* update some fields, e.g. the `track_listens` (play count),
* augment the data with other (potentially newer) fields provided by their API but not included in the release,
* update the dataset with new songs added to the archive.

Notes:
* You need a key to access the API, which you can [request online](https://freemusicarchive.org/api/agreement) and write into your `.env` file as a new line reading `FMA_KEY=MYPERSONALKEY`.
* Requests take some hunderd milliseconds to complete.

In [None]:
import utils
import IPython.display as ipd
import requests
import os

In [None]:
fma = utils.FreeMusicArchive(os.environ.get('FMA_KEY'))

## 1 Get recently added tracks

Note that `track_id` are assigned in monotonically increasing order. Tracks may be removed, so that number does not indicate the number of available tracks.

In [None]:
for track_id, artist_name, date_created in zip(*fma.get_recent_tracks()):
    print(track_id, date_created, artist_name)

## 2 Get meta-data about tracks, albums and artists

Given IDs, we can get information about tracks, albums and artists. See the available fields in the [API documentation](https://freemusicarchive.org/api).

In [None]:
fma.get_track(track_id=2, fields=['track_title', 'track_date_created',
                                  'track_duration', 'track_bit_rate',
                                  'track_listens', 'track_interest', 'track_comments', 'track_favorites',
                                  'artist_id', 'album_id'])

In [None]:
fma.get_track_genres(track_id=20)

In [None]:
fma.get_album(album_id=1, fields=['album_title', 'album_tracks',
                                  'album_listens', 'album_comments', 'album_favorites',
                                  'album_date_created', 'album_date_released'])

In [None]:
fma.get_artist(artist_id=1, fields=['artist_name', 'artist_location',
                                    'artist_comments', 'artist_favorites'])

## 3 Get data, i.e. raw audio

We can download the original track as well. Tracks are provided by the archive as MP3 with various bitrates.

In [None]:
fma.download_track(track_id=2, path='track.mp3')

## 4 Get genres

Instead of compiling the genres of each track, we can get all the genres present on the archive by some API calls.

In [None]:
genres = utils.Genres(fma.get_all_genres())
print('{} genres'.format(genres.df.shape[0]))
genres.df[10:25]

And look for genres related to Rock.

In [None]:
genres.df[['Rock' in title for title in genres.df['genre_title']]]

In [None]:
genres.df[genres.df['genre_parent_id'] == 12]

As genres have parent genres, we can plot a tree using the [DOT] language.

[DOT]: https://en.wikipedia.org/wiki/DOT_(graph_description_language)

In [None]:
graph = genres.create_tree([25, 31], 1)
ipd.Image(graph.create_png())

Data cleaning: some genres returned by the archive have a `parent_id` which does not exist.

In [None]:
# 13 (Easy Listening) has parent 126 which is missing
# --> a root genre on the website, although not in the genre menu
genres.df.loc[13, 'genre_parent_id'] = 0

# 580 (Abstract Hip-Hop) has parent 1172 which is missing
# --> listed as child of Hip-Hop on the website
genres.df.loc[580, 'genre_parent_id'] = 21

# 810 (Nu-Jazz) has parent 51 which is missing
# --> listed as child of Easy Listening on website
genres.df.loc[810, 'genre_parent_id'] = 13

Save the full genre tree as a PDF.

In [None]:
roots = genres.find_roots()
print('{} roots'.format(len(roots)))
graph = genres.create_tree(roots)
graph.write_pdf('genres.pdf');