# **Preprocessing**

* Tune all guitars to 7-string B-standard
* Lowest note in this tuning is 25 (low B)
* Highest note on 24-fret in this tuning is 88
* This is a dictionary size of 55 playable notes
    * [25, 88] inclusive + a rest note

*To add note lengths, we must increase this 
dictionary size by many times…*

Dictionary of note lengths:  
    [32nd, 16th, 8th, quarter, half, whole,  
     dotted {16th, 8th, quarter, half, whole},  
     triplet {16th, 8th, quarter}, two whole]

This brings our total dictionary length up to…  

**15 note lengths x 55 playable notes = 825 total**

### Standard
32nd = 120  
16th = 240  
8th = 480  
Quarter = 960  
Half = 1920  
Whole = 3840  
Two whole = 7680  

### Dotted
16th = 360  
8th = 720  
Quarter = 1440  
Half = 2880  
Whole = 5760  

### Triplet
16th = 160  
8th = 320  
Quarter = 640

In [1]:
import os
import re
import numpy as np

from mido import MidiFile
from multiprocessing import pool

In [None]:
MIN_PITCH = 32

In [2]:
midi_dir = "./midi"
save_dir = "./data"

In [3]:
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)

In [4]:
files = [os.path.join(midi_dir, file) for file in os.listdir(midi_dir) if file.endswith(".mid")]

In [5]:
for file in files:
    mid = MidiFile(file)
    
    for i, track in enumerate(mid.tracks[1:]):
        
        note_on = False
        notes = dict(pitch=[], length=[])
        
        for msg in track:
            if "note_on" in str(msg):
                
                match = re.search('note=(\d+)', str(msg))
                pitch = int(match.group(1))
                
                if not note_on:
                    notes['pitch'].append(pitch)
                    note_on = True
                
                elif note_on and pitch < notes['pitch'][-1]:
                    notes['pitch'][-1] = pitch
                
            elif "note_off" in str(msg):
                if note_on:

                    match = re.search('time=(\d+)', str(msg))
                    notes['length'].append(int(match.group(1))//240)
                    note_on = False
        
        song = np.stack(notes.values()).T
        basename = os.path.splitext(os.path.basename(file))[0]
        filename = basename + " - {}".format(i)
        
        np.save(os.path.join(save_dir, filename), arr=song)