Convolutional Neural Network for multitrack mix leveling
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Mulitrack mix leveling with convolutional neural nets.


Install dependancies.

$ pip install --upgrade -r requirements.txt

Install python ITU-R BS.1770-4 loudness package.

$ git clone
$ cd pyloudnorm
$ python install


Download and extract the DSD100 dataset: (12 GB)

Ensure that the extracted DSD100 directory is placed in the top of the directory structure.


To generate the input and output data run the script.

$ python

This will first measure the true mix loudness levels (and then calculate loudness ratios w.r.t the bass) which are saved to a .csv file. Then all of the stems are normalized to -24 LUFS. Next melspectrograms with frame size 1024 and and hop length of the same size are generated of the normalized stems and stored in a pickle file.

During training the melspectrograms of each subgroup is frammed with frame size of 128 (about 3 seconds of audio) and then stacked depth-wise to produce inputs of size 128x128x4. A single stack of TF-patches of length 128 are shown below for a single song in the data



To train the CNN model run the script.

$ python