# Cormorant 
---
## Summary
Everyone thinks they're special.  In fact, I feel that no other person has an adequately similar taste in music as I do.  This is probably false, but it does provide an interesting premise for a ML project.

Recommendation systems typically work (or at least used to) by saying Susie likes song A, Betsy likes song A and B, so Susie probably would like song B.  

That's not good enough for a special snowflake like myself, who only likes a very certain type of EDM (at least that's what I'm thinking for the onset).

What I'm going to do instead is make a classifier of song spectrograms to determine features of songs I enjoy.  From there, I'll make a youtube crawler to bring me new music (hence the name Cormorant).

## Plan




1.  Pretraining
    - ImageNet for the base model.
    - Training off the Larger Dataset, then transfering weights over.
    
    
2.  Data
    - Need to convert Youtube Playlists to spectrograms.
    - Train, Test, Validation Sets
    
    

3.  Learning
    - This is a very simple BinaryClassifier (Do I like this music or not?)
    - Or do I want to do Regression for how MUCH I like the song?
    

4.  Action
    - Youtube crawler, with seed keywords, URLs.
    - Some kind of action to determine live learning and updating of the net - will probably end up making minibatches, then updating all at once.
    - Those that pass need to be sent up for review by me somehow - adding to Google Music probably.
    
    
    
### Key Technologies Explored

- Youtube API
- Online learning for a neural network
- Neural network with a smaller dataset



---

## Step 0 : Getting a dataset to train on
---
Since the Million Song Dataset seems a bit hard to come by in the 9 years since it was published, we're using `https://github.com/mdeff/fma`.   

Going to use the large dataset, 30s with 100k or so of classified songs.  

Should be easy enough to run the spectrogram on them and classify them. 

In [1]:
import os

In [3]:
!apt update && apt install -y wget

Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Hit:2 http://archive.ubuntu.com/ubuntu bionic InRelease                  [0m  [0m[33m
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Get:4 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [116 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]      [0m
Get:6 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [10.1 kB]3m
Get:7 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [1089 kB]33m
Ign:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release [697 B]
Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release [564 B]
Get:11 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 

In [5]:
!wget https://os.unil.cloud.switch.ch/fma/fma_large.zip

--2020-09-16 05:06:32--  https://os.unil.cloud.switch.ch/fma/fma_large.zip
Resolving os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)... 86.119.28.16, 2001:620:5ca1:201::214
Connecting to os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)|86.119.28.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100306112191 (93G) [application/zip]
Saving to: 'fma_large.zip'

fma_large.zip        33%[=====>              ]  31.35G  9.14MB/s    in 59m 9s  

2020-09-16 06:05:42 (9.05 MB/s) - Connection closed at byte 33662652544. Retrying.

--2020-09-16 06:05:43--  (try: 2)  https://os.unil.cloud.switch.ch/fma/fma_large.zip
Connecting to os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)|86.119.28.16|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 100306112191 (93G), 66643459647 (62G) remaining [application/zip]
Saving to: 'fma_large.zip'


2020-09-16 07:05:03 (9.06 MB/s) - Connection closed at byte 67487679040. Retrying.

--2020-09-16 07:

### Worries
So I'm a little concerned at this point.  I looked at the Kaggle competition for the Million Song Dataset - the top results were not promising.  Hopefully, since I'm just classifying whether or not I like the song, I can get better results.  If this fails, I'll look back at this moment as the one where the small cloud on the horizon popped up.

I've always been a little concerned that the spectrograms couldn't capture enough subtle differences between songs to make a distinction - i.e. that the signal was too faint to detect.  Since it's very hard to hear a picture (outside of synthesia), it's hard to reason intuitively what the spectrogram actually means.

---

## Step 1: Spectrograms
---
This work is already done in the Cormorant Github repository, so we'll download a copy of that and go from there.

In [None]:
!apt update && apt -y install git

In [10]:
!git clone https://github.com/amunchet/cormorant.git

Cloning into 'cormorant'...
remote: Enumerating objects: 49, done.[K
remote: Counting objects: 100% (49/49), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 49 (delta 14), reused 42 (delta 10), pack-reused 0[K
Unpacking objects: 100% (49/49), done.


In [25]:
!apt install -y unzip

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  zip
The following NEW packages will be installed:
  unzip
0 upgraded, 1 newly installed, 0 to remove and 50 not upgraded.
Need to get 167 kB of archives.
After this operation, 558 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 unzip amd64 6.0-21ubuntu1 [167 kB]
Fetched 167 kB in 1s (256 kB/s)0m[33m
debconf: delaying package configuration, since apt-utils is not installed

7[0;23r8[1ASelecting previously unselected package unzip.
(Reading database ... 16385 files and directories currently installed.)
Preparing to unpack .../unzip_6.0-21ubuntu1_amd64.deb ...
7[24;0f[42m[30mProgress: [  0%][49m[39m [..........................................................] 87[24;0f[42m[30mProgress: [ 17%][49m[39m [#########.................................................] 8Unpacking unzip (6.0-21ubuntu1) ...
7[24;0

In [None]:
!unzip fma_large.zip

## Step 2: Convert training data to spectrogram and move to proper folders
---
Will need to run spectrogram converter on all ~100k files and put them in the proper folders.  

Do we want to multithread this?  FFMPEG can use GPU, but I'm not sure the spectrogramming itself can.

Probably want to do 4 threads (seemed to work well last time).

### More Thoughts
- Our average song length (NCS) is going to be around 3 minutes.  However, I don't want to just train on 30 seconds, when the actual data is going to be around 1-5 minutes.  I think I'm going to randomize and repeat the entries
- This is going to take up quite a bit of space.  I better check that it's okay.

In [2]:
pip install scipy

Collecting scipy
  Downloading https://files.pythonhosted.org/packages/c8/89/63171228d5ced148f5ced50305c89e8576ffc695a90b58fe5bb602b910c2/scipy-1.5.4-cp36-cp36m-manylinux1_x86_64.whl (25.9MB)
[K    100% |################################| 25.9MB 83kB/s  eta 0:00:011
Installing collected packages: scipy
Successfully installed scipy-1.5.4
Note: you may need to restart the kernel to use updated packages.


In [1]:
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
import os

In [14]:
ls /mnt/sdf/fma_large/000 | head -n 20

000002.mp3
000003.mp3
000005.mp3
000010.mp3
000020.mp3
000026.mp3
000030.mp3
000046.mp3
000048.mp3
000134.mp3
000135.mp3
000136.mp3
000137.mp3
000138.mp3
000139.mp3
000140.mp3
000141.mp3
000142.mp3
000144.mp3
000145.mp3


In [16]:
cd phase_one

/data/Cormorant/cormorant/phase_one


In [18]:
cp /mnt/sdf/fma_large/000/000002.mp3 .

In [19]:
ls

000002.mp3  [0m[01;32mcormorant.py[0m*  requirements.txt  [01;34mspider[0m/
README.md   [01;34mgenerator[0m/     [01;32mspectrogram.py[0m*


In [28]:
!apt update && apt install -y cython

Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:5 http://security.ubuntu.com/ubuntu bionic-security InRelease      [0m    
Hit:6 http://archive.ubuntu.com/ubuntu bionic InRelease                
Hit:8 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:10 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Reading package lists... Done [0m                 [33m[33m[33m[33m
Building dependency tree       
Reading state information... Done
All packages are up to date.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
cython is already the newest version (0.26.1-0.4).
0 upgraded, 0 n

### Installing Librosa
This requires llvm-10 to be installed.  That is a pain and requires symlinking `llvm-config` in the path.

#### Final conclusion: Librosa installation is horrible.  Going to convert all files to wav instead - way easier.  Which is insane.

In [4]:
cat */spectrogram.py

#!/usr/bin/env python3
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
import os

""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
    win = window(frameSize)
    hopSize = int(frameSize - np.floor(overlapFac * frameSize))

    # zeros at beginning (thus center of 1st window should be for sample nr. 0)   
    samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)    
    # cols for windowing
    cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
    # zeros at end (thus samples can be fully covered by frames)
    samples = np.append(samples, np.zeros(frameSize))

    frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
    frames *= win

    return np.fft.rfft(frames)    

""" scale frequency axis logarithmically """ 

### Step 2A: Converting mp3 files to wav

So the way that I did this in OneTrickFan was to use a `ffmpeg` docker with Nvidia runtime.  Several problems have emerged, however.  The primary being that the MP3 compression is very efficient and the WAV is not - meaning my disk usage will increase by about 20x.  

This means that I will instead have to in-place generate the spectrograms all in one shot - cleaning up any wav files I leave behind.

---



## Step 3: Initial Model training with spectrograms from large Dataset

One of the main questions is: which to use - Tensorflow or FastAI?

I think I'll still use one of the larger ImageNets for transfer learning, so FastAI.  I should then be able to train on the large dataset, then transfer over to the much smaller actual dataset.

I don't really have a need to do an AutoEncoder, since this is pretty well labelled dataset that I'll be transfer learning off.



## Step 4: Dataset splitting - songs I like/don't

## Step 5: Binary Classifier on Songs

### Step 5a: Data verification and spot checking

## Step 6: Youtube API exploration and crawling
- I've done this in spider


## Step 7: Online Learning/Updating a neural net

So what I'm actually thinking of doing is having batches of songs presented, then graded.  After grading, they'll be added to the data collection of spectrograms and then have a new model trained.  Then a new batch of top song prospects will be brought in, and so on, and so forth.

Eventually, there should be a large enough dataset to give good results.

This isn't exactly "online" learning, but it is incremental, automated improvements of the model.