<a href="https://colab.research.google.com/github/Benned-H/LSTMjazz/blob/master/Keras_Time.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import os
import glob
import pandas as pd
import numpy as np

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

In [2]:
!ls

gdrive	sample_data


In [0]:
os.chdir('gdrive/My Drive/Datasets')

In [4]:
!ls

'Chords Bits'  'Chords Tokens'	'Melodies 18-bit'  'Melodies Piano Roll'


In [5]:
# We have all of our data in these subfolders.
len(glob.glob("*/*.csv"))

2400

# Additional Data Formatting Considerations

Per my past summaries of previous works, we have this editted description of formatting data for Keras:

We have dataset $D=(X, Y)$ of "labelled" chord progression segments. $X = \{X_1, X_2, ..., X_{|X|}\}$ and $Y = \{Y_1, Y_2, ..., Y_{|X|}\}$. We thus have $|X|$ chord progression segments, and each $Y_i$ is the corresponding melody label. My original piano matrix is of dimensions $(\text{# timesteps}, |\text{note range}|)$. In the 18-bit case, this is more simply $(\text{# timesteps}, 18)$.

First, we need to sample these matrices into $n$-timestep-long sequences of chord data (these are our $Y_i$). We'll then label each of these with the melody information from the $n+1$ timestep. The number of samples, $S$, will be the total length of each song (in timesteps) minus $n+1$.

Also, in the work of Brunner et al. (2017), their LSTM received vectors of piano rolls with these appended features:
1. Embedded chord vector of the next time step.
2. Embedded chord vector of the chord following that chord.
3. A binary counter from 0 to 7 each bar.

In my case, I would need to count to 48 OR simplify the bits a bit to give information only about on-beat, off-beat, sixteenth, and triplet information. What about:

---
Bit 8 | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1
--- | ---
Third triplet | Second triplet | Any triplet | On any down-beat | On any half-note | On any beat | On any 8th | On any 16th

So I'd calculate these based on the offset within each measure. The offset would range from 0 to 47:

---
Bit 8 | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1
--- | ---
Offset % 12 = 8 | Offset % 12 = 4 | Offset % 4 = 0 | Offset % 48 = 0 | Offset % 24 = 0 | Offset % 12 = 0 | Offset % 6 = 0  | Offset % 3 = 0

In [0]:
def binaryCounter(offset):
  """
  Returns a DataFrame based on the above encodings given an offset.
  """
  bit8 = (offset % 12 == 8)
  bit7 = (offset % 12 == 4)
  bit6 = (offset % 4 == 0)
  bit5 = (offset % 48 == 0)
  bit4 = (offset % 24 == 0)
  bit3 = (offset % 12 == 0)
  bit2 = (offset % 6 == 0)
  bit1 = (offset % 3 == 0)
  return pd.DataFrame(np.array([bit8, bit7, bit6, bit5, bit4, bit3, bit2, bit1])).T

In [17]:
binaryCounter(4)

Unnamed: 0,0,1,2,3,4,5,6,7
0,False,True,True,False,False,False,False,False
