In [1]:
import pandas as pd

# Dataset overview from public domain

## Discogs Datasets (January 2025)

Source: https://www.kaggle.com/datasets/ofurkancoban/discogs-datasets-january-2025

This file contains detailed information about artists from the Discogs Data Dump as of January 1, 2025. It provides structured data about individual artists, including their names, unique IDs, profiles, and related information, all in CSV format.

Columns:
• Artist ID: Unique identifier for each artist.
• Artist Name: The name of the artist.
• Profile: A short description or biography of the artist.
• Real Name: The real name of the artist (if available).
• Aliases: Alternative names or pseudonyms used by the artist.
• Groups: Bands or groups the artist is associated with.
• URLs: Links to external websites or profiles related to the artist.
• Data Quality: Indicates the quality or completeness of the data entry.

Use Case:
This file is ideal for applications like music cataloging, trend analysis, artist mapping, or building recommendation systems based on artist relationships and metadata.

In [4]:
# this path is ignored by git
path1="../datasets/discogs_20250101_artists.csv"

# to load from a backup
data_1 = pd.read_csv(path1)
df1 = data_1.copy()
df1.sample(10)

Unnamed: 0.1,Unnamed: 0,aliases_name,aliases_name_id,artist_data_quality,artist_id,artist_name,artist_profile,artist_realname,groups_name,groups_name_id,members_name,members_name_id,namevariations_name,urls_url
7902522,7902522,,,Needs Vote,1045631,Fred Robbins,Radio disc jockey/producer.,,,,,,Robbins,
8391085,8391085,,,Needs Major Changes,12844265,The Millennium Promise Jazz Project,,,,,,,,
4303827,4303827,,,Needs Major Changes,5795050,The Fuse!,,,,,,,The Fuse,
4088982,4088982,,,Needs Major Changes,5551723,Ena Beats,,,,,,,,
1557361,1557361,,,Needs Major Changes,2921278,Keijo Kaivo-ojan Yhtye,,,,,Paavo Kauppi,2921245.0,,
5391577,5391577,,,Needs Major Changes,6905196,The Kathleen Stobart Orchestra,,,,,Kathy Stobart,1744473.0,,
4303540,4303540,,,Needs Major Changes,5794732,Ya Nur,,,,,Bira (3),751467.0,,
3040798,3040798,,,Needs Major Changes,4464573,Gabriella Chiavarini,,,,,,,,
2918764,2918764,,,Needs Major Changes,4338251,Candé,,,,,,,,
9038380,9038380,,,Needs Major Changes,14657028,Torture Hammer,,,,,,,,


In [5]:
df1.shape

(9194907, 14)

## Discogs Database All Release Data
This dataset encompasses over 13 million recordings, providing a broad view of music distribution and physical sales, particularly vinyl records.

https://www.kaggle.com/datasets/sohrabdaemi/discogs-database-all-release-data

In [6]:
# this path is ignored by git
path2="../datasets/release_data.csv"

data_2 = pd.read_csv(path2)
df2 = data_2.copy()
df2.sample(10)

Unnamed: 0,release_id,country,year,genre,format
14203964,12881539,Germany,2018.0,Electronic,File
14935498,13541183,Australia,,Stage & Screen,Vinyl
5560994,4875823,Japan,1987.0,Rock,Vinyl
3782345,3349169,US,2011.0,Electronic,File
10161797,9088099,Europe,2014.0,Pop,Vinyl
3158778,2845130,France,2010.0,Electronic,File
4960018,4346109,Spain,1966.0,Latin,Vinyl
15455520,14004841,France,2007.0,Rock,CD
6401333,5638779,Australia,2010.0,Rock,CD
2101145,1985537,US,1989.0,"Folk, World, & Country",Vinyl


In [8]:
df2.shape

(17372035, 5)

## Bandcamp

Source: https://www.kaggle.com/datasets/mathurinache/1000000-bandcamp-sales

This dataset contains 1,000,000 items from Bandcamp's sales feed between 9/9/2020 and 10/2/2020, and is a slice of the whole dataset used in The Chaos Bazaar. It contains the following columns:

- _id: unique identifier combining the sale's URL and UTC timestamp.
- url: the path to the item on Bandcamp. Use this column to join this dataset to the dataset of Bandcamp items.
- artist_name: Name of the artist.
- album_title: Title of the album, if applicable.
- art_url: path to the item's art image.
- item_type: denotes the type of object. a for digital albums, p for physical items, and t for digital tracks.
- slug_type: also denotes the type of object. a for all albums, p for merch, and t for tracks.
- utc_date: the UTC timestamp of the sale datetime.
- country_code: country code of the buyer.
- country: full country code name of the buyer.
- item_price: price of the item in the seller's currency.
- currency: the seller's currency.
- amount_paid: amount paid in the seller's currency.
- amount_paid_fmt: amount paid in the seller's currency, with the currency symbol.
- amount_paid_usd: amount paid converted to US Dollars.
- amount_over_fmt: amount voluntarily paid over the item price in the seller's currency.

Sample code, charts and reports:
https://www.kaggle.com/code/mathurinache/bandcamp-dataset-starter


In [9]:
# this path is ignored by git
path3="../datasets/1000000-bandcamp-sales.csv"

data_3 = pd.read_csv(path3)
df3 = data_3.copy()
df3.sample(10)

Unnamed: 0,_id,art_url,item_type,utc_date,country_code,track_album_slug_text,country,slug_type,amount_paid_fmt,item_price,...,amount_paid,releases,artist_name,currency,album_title,amount_paid_usd,package_image_id,amount_over_fmt,item_slug,addl_count
52057,1599804596.68458&//amphiarecords.bandcamp.com/...,https://f4.bcbits.com/img/a0826275630_7.jpg,t,1599805000.0,au,,Australia,t,€2,2.0,...,2.0,,Amorf,EUR,Dimensions EP,2.36,,,,
367661,1600461044.51009&//diamondortiz.bandcamp.com/a...,https://f4.bcbits.com/img/0021672698_37.jpg,p,1600461000.0,nl,,Netherlands,a,$25,25.0,...,25.0,,Diamond Ortiz,USD,Classy Chassis,25.0,21672698.0,,,
733003,1601234408.80457&//sundaybest.bandcamp.com/alb...,https://f4.bcbits.com/img/a1130873696_7.jpg,a,1601234000.0,us,,United States,a,£7,7.0,...,7.0,,Wild Smiles,GBP,,8.92,,,,
58964,1599819547.21387&//catherinemajor.bandcamp.com...,https://f4.bcbits.com/img/a0280256802_7.jpg,t,1599820000.0,fr,,France,t,$2 CAD,1.0,...,2.0,,Catherine Major,CAD,Carte mère,1.52,,,,
538282,1600864533.98496&//agvarcrds.bandcamp.com/trac...,https://f4.bcbits.com/img/a0664932743_7.jpg,t,1600865000.0,id,,Indonesia,t,$2,2.0,...,2.0,,O.L.I.V.I.A x MAJA,USD,,2.0,,,,
943465,1601633004.99411&//deathkvltproductions.bandca...,https://f4.bcbits.com/img/0021653633_37.jpg,p,1601633000.0,us,,United States,a,£7,7.0,...,7.0,,Lamp of Murmuur,GBP,Heir of Ecliptical Romanticism,9.0,21653633.0,,,
725414,1601220007.31343&//iamtheos.bandcamp.com/track...,https://f4.bcbits.com/img/a0668718392_7.jpg,t,1601220000.0,gb,,United Kingdom,t,€1.29,1.29,...,1.29,,THEOS,EUR,,1.5,,,,
951591,1601635557.96612&//glossymistakes.bandcamp.com...,https://f4.bcbits.com/img/0021814716_37.jpg,p,1601636000.0,gb,,United Kingdom,a,€22,22.0,...,22.0,,Yas-Kaz,EUR,Yas-Kaz - Jomon-sho (縄文頌),25.78,21814716.0,,,
724779,1601218902.48877&//twrp.bandcamp.com/album/gua...,https://f4.bcbits.com/img/a4126128372_7.jpg,a,1601219000.0,jp,,Japan,a,$6,6.0,...,6.0,,TWRP,USD,,6.0,,,,
406702,1600538676.38562&//boldconnectionsunlimited.ba...,https://f4.bcbits.com/img/a2614982620_7.jpg,t,1600539000.0,be,,Belgium,t,$5,5.0,...,5.0,,Barbara Marciniak,USD,,5.0,,,,


In [11]:
df3.shape

(1000000, 23)