Note: This is not a proper, production-ready database! It's sole purpose is to demonstrate some SQL performance optimization techniques in PostgreSQL.
Music Streamer DB is a database model for hypothetical music streaming service that suffers from performance issues when loaded with large quantities of data on a single machine.
DDL available here.
- 1,612,297 artists
- 23,969,487 tracks
- 4 levels of quality (recording types)
- 95,877,948 recordings
- 1,990,576 albums
- 1,001 user
- 1,040 playlists
- 5,214,584 records in play_history
Oracle VirtualBox virtual machine with 8GB of RAM and 2 CPUs, running Ubuntu linux 20.04.2 LTS and PostgreSQL 12.6.
PostgreSQL default configuration is updated with these settings.
Some random testing shows that performance of most of the queries is bad at best. Depending on the amount of buffered data in memory, some queries can last up to a minute. Imagine waiting for 48 seconds on Spotify to start playing a song 😦
Query | Best time | Worst time | Optimization steps | New time |
---|---|---|---|---|
Get recording | 4s | 48s | Steps | 2-10ms |
Artist search | 4.3s | 21.2s | Steps | <100ms |
Track search | 12.4s | 31.6s | Steps | 5-800ms* |
Album search | 4.8s | 22.2s | Steps | 5-150ms* |
Playlist search | 20ms | 1s | Steps | <3ms |
Recently played | 275ms | 44s | Steps | <10ms |
Popular artists | 4s | 54s | Steps | <3ms |
Favorite artists | 200ms | 11s | Steps | <3ms |
* Due to large amount of data, some data cannot be obtained faster when located farther within the index, like rarely played songs for example.
Moving complex trigger work into the background process is demonstrated here, with a sample app written in C#.
MIT