# Genres and MusicBrainz

By Alejandro Fernández Sánchez

## Setting up the connection

In [1]:
# Just in case you're the host and it's not already started
!service postgresql start

In [2]:
# Imports
import pandas as pd
from sqlalchemy import create_engine
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
DB_NAME = os.getenv("DB_NAME")
DB_HOST = os.getenv("DB_HOST")
DB_USER = os.getenv("DB_USER")
DB_PASS = os.getenv("DB_PASS")
DB_PORT = os.getenv("DB_PORT")

In [4]:
# Used for saving results to pandas dataframes
engine_url = f"postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}:{DB_PORT}/{DB_NAME}"
engine = create_engine(engine_url)
engine

Engine(postgresql://musicbrainz:***@localhost:5432/musicbrainz_db)

## Some information on genres

The `genre` table is not very useful in an out of itself. It only shows the different genres names. We'll have to look for the possibility of connecting them to another table. With that objective in mind, let's see if any `l_*_*` table can prove itself useful.

Let's start getting a list of all posssible relations.

In [5]:
query =\
"""
SELECT table_name
FROM information_schema.tables
WHERE table_name ~ 'genre'
AND table_name ~ 'l_'
AND table_name !~ 'redirect|edit|example'
;
"""
genre_tables = pd.read_sql_query(query, engine)["table_name"].to_list()
genre_tables

['l_area_genre',
 'l_artist_genre',
 'l_event_genre',
 'l_genre_genre',
 'l_genre_instrument',
 'l_genre_label',
 'l_genre_mood',
 'l_genre_place',
 'l_genre_recording',
 'l_genre_release',
 'l_genre_release_group',
 'l_genre_series',
 'l_genre_url',
 'l_genre_work']

How many rows are in each table?

In [6]:
for table in genre_tables:
    print(table, end=" => ")
    print(pd.read_sql_query(f"SELECT COUNT(*) FROM {table}", engine)["count"].to_list()[0])

l_area_genre => 1185
l_artist_genre => 1
l_event_genre => 0
l_genre_genre => 3118
l_genre_instrument => 228
l_genre_label => 2
l_genre_mood => 0
l_genre_place => 1
l_genre_recording => 0
l_genre_release => 0
l_genre_release_group => 1
l_genre_series => 0
l_genre_url => 4070
l_genre_work => 0


- `l_area_genre`: This table relates genres with their origin. It's not important to us.
- `l_genre_genre`: Only useful if the others are useful.
- `l_genre_instrument`: `l_instrument_release` and `l_instrument_recording` are empty. Only 91 rows in `l_artist_instrument`.

Only `l_genre_url` left.

## Urls

Let's start by checking if URLs connect to useful entities. We're only interested in the URLs that connect with at least one genre.

In [7]:
query =\
"""
SELECT table_name
FROM information_schema.tables
WHERE table_name ~ 'url'
AND table_name ~ 'l_'
AND table_name !~ 'redirect|example|edit'
AND NOT table_name LIKE 'url'
;
"""
url_tables = pd.read_sql_query(query, engine)["table_name"].to_list()
url_tables

['l_area_url',
 'l_artist_url',
 'l_event_url',
 'l_genre_url',
 'l_instrument_url',
 'l_label_url',
 'l_mood_url',
 'l_place_url',
 'l_recording_url',
 'l_release_url',
 'l_release_group_url',
 'l_series_url',
 'l_url_url',
 'l_url_work']

In [8]:
for table in url_tables:
    where = f"WHERE entity{"0" if table.startswith("l_url") else "1"} IN (SELECT entity1 FROM l_genre_url)"
    print(table, end=" => ")
    print(pd.read_sql_query(f"SELECT COUNT(*) FROM {table} {where}", engine)["count"].to_list()[0])

l_area_url => 0
l_artist_url => 2
l_event_url => 0
l_genre_url => 4070
l_instrument_url => 1
l_label_url => 0
l_mood_url => 0
l_place_url => 0
l_recording_url => 0
l_release_url => 0
l_release_group_url => 0
l_series_url => 0
l_url_url => 0
l_url_work => 2


I think it's safe to say that genres are not obtainable in MusicBrainz. 

## Cleanup

In [9]:
engine.dispose()

In [10]:
!service postgresql stop