# CL Bach: Using Machine Learning to Compose New Bach Chorales 
----------------
(Page 2)
The dataframe still has embedded music21 objects. (That is why I pickled instead of exporting and importing as a CSV file).

In order to use the data in models, the data must be in categorical or continuous variables. This page seeks to do that.

In [1]:
import music21 as m21

import pandas as pd
import numpy as np

import copy
import cPickle


In [2]:
f = open('bach.p', 'rb')
bach_df = cPickle.load(f)
f.close()

In [3]:
bach_df.head()

Unnamed: 0,BWV,orig_key,beats_per_measure,forward_note,back_note,forward_measure,back_measure,note_in_measure_position,quarter_in_measure_position,soprano,alto,tenor,bass
bwv421_0,bwv421,a minor,4,0,159,0,10,1,4,<music21.note.Note G>,<music21.note.Note D>,<music21.note.Note B->,<music21.note.Note G>
bwv421_1,bwv421,a minor,4,1,158,0,10,2,4,<music21.note.Note G>,<music21.note.Note D>,<music21.note.Note B->,<music21.note.Note G>
bwv421_2,bwv421,a minor,4,2,157,0,10,3,4,<music21.note.Note G>,<music21.note.Note D>,<music21.note.Note C>,<music21.note.Note A>
bwv421_3,bwv421,a minor,4,3,156,0,10,4,4,<music21.note.Note G>,<music21.note.Note D>,<music21.note.Note C>,<music21.note.Note A>
bwv421_4,bwv421,a minor,4,4,155,1,9,1,1,<music21.note.Note G>,<music21.note.Note D>,<music21.note.Note D>,<music21.note.Note B->


## Create functions to create and substitute categorical and continuous pitch variables for each note in the dataframe.

Each note's pitch is defined by the pitch letter (taken from the set [C, D, E, F, G, A, B] as well as the octave it's in: [0, 1, 2, 3, 4...] (see https://en.wikipedia.org/wiki/Scientific_pitch_notation).

For categorical variables, I wanted to keep both pieces of information, so the variable is simply a string with the pitch letter and the octave - for example, 'C4' is middle C, 'G4' is the G above middle C. In music21, sharps are notated as "#" and flats are notated as '-'. For example, C# above middle C is 'C#4' and B-flat below middle C is 'B-3'.

For continuous variables, I set G4, the tonic and most common note for the soprano line, as equal to 1. All other notes were identified as below or above G4, so accordingly, below or above 1.

Rests were a little weird to handle.  For categorical variables, I replaced them with 'Rest'. For continuous variables, I would have loved to use np.nan, because that seems to coincide best with the actual meaning of a rest (the absence of a note), but most models do not play well with NaNs. So I'm not sure if this will work, but I decided to set Rests as a large number - so large it would be way off the hertz frequency charts and way beyond audible for human ears! We'll see later how the model likes or doesn't like that.

In [4]:
# a function that determines what 'pitch number' each pitch is.  
# this uses the basis that G4 = 1. I am using G4 because I want to
# make all bach chorales be in the key of G

def define_pitch_number(a_note):
    if type(a_note) == m21.note.Note:
    
        # identify how far the pitch is from G4
        interval_from_G4 = m21.interval.Interval(
                                noteStart= m21.note.Note('G4'), 
                                noteEnd  = a_note)

        # return this interval in terms of number of half steps
        pitch_offset = interval_from_G4.semitones

        # the new pitch number is relative to G4 = 100
        # so offset it by the pitch offset
        new_pitch_number = 1 + pitch_offset
        
    else:
        new_pitch_number = 1000000
    
    return new_pitch_number

In [5]:
# make a function to turn a note into a pitch category - ie. 'G4'
# input is a df
# no output; it changes the df
def create_cat_pitch_vars(df):
    
    for column in ['soprano', 'alto','tenor', 'bass']:
        a_list  = df[column].tolist()
        
        # create categorical pitch column
        cat_col_name = column + '_pitch_cat'
        
        # map using note pitch data
        df[cat_col_name] = [x.pitch.nameWithOctave if type(x) == m21.note.Note else 'Rest' for x in a_list]
        #.pitch.nameWithOctave)
#         print column, cat_col_name, df.loc[:,column][0]
         
    return df
    

In [6]:
# make a function to turn a note into a pitch category - ie. 'G4'
# input is a df
# no output; it changes the df
def create_cont_pitch_vars(df):
    for column in ['soprano', 'alto','tenor', 'bass']:
        a_list = df[column]
        
        # create continuous pitch column
        cont_col_name = column + '_pitch_cont'

        df[cont_col_name] = [define_pitch_number(x) for x in a_list]

    return df

In [7]:
bach_df = create_cat_pitch_vars(bach_df)

In [8]:
bach_df[bach_df['soprano_pitch_cat']=='Rest']
# looking good

Unnamed: 0,BWV,orig_key,beats_per_measure,forward_note,back_note,forward_measure,back_measure,note_in_measure_position,quarter_in_measure_position,soprano,alto,tenor,bass,soprano_pitch_cat,alto_pitch_cat,tenor_pitch_cat,bass_pitch_cat
bwv346_76,bwv346,C major,4,76,167,5,10,9,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv346_77,bwv346,C major,4,77,166,5,10,10,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv346_78,bwv346,C major,4,78,165,5,10,11,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv346_79,bwv346,C major,4,79,164,5,10,12,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv346_124,bwv346,C major,4,124,119,8,7,9,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv346_125,bwv346,C major,4,125,118,8,7,10,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv346_126,bwv346,C major,4,126,117,8,7,11,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv346_127,bwv346,C major,4,127,116,8,7,12,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv341_44,bwv341,g minor,4,44,99,3,6,9,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest
bwv341_45,bwv341,g minor,4,45,98,3,6,10,3,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,<music21.note.Rest rest>,Rest,Rest,Rest,Rest


In [9]:
bach_df = create_cont_pitch_vars(bach_df)

In [10]:
bach_df.head()

Unnamed: 0,BWV,orig_key,beats_per_measure,forward_note,back_note,forward_measure,back_measure,note_in_measure_position,quarter_in_measure_position,soprano,...,tenor,bass,soprano_pitch_cat,alto_pitch_cat,tenor_pitch_cat,bass_pitch_cat,soprano_pitch_cont,alto_pitch_cont,tenor_pitch_cont,bass_pitch_cont
bwv421_0,bwv421,a minor,4,0,159,0,10,1,4,<music21.note.Note G>,...,<music21.note.Note B->,<music21.note.Note G>,G4,D4,B-3,G2,1,-4,-8,-23
bwv421_1,bwv421,a minor,4,1,158,0,10,2,4,<music21.note.Note G>,...,<music21.note.Note B->,<music21.note.Note G>,G4,D4,B-3,G2,1,-4,-8,-23
bwv421_2,bwv421,a minor,4,2,157,0,10,3,4,<music21.note.Note G>,...,<music21.note.Note C>,<music21.note.Note A>,G4,D4,C4,A2,1,-4,-6,-21
bwv421_3,bwv421,a minor,4,3,156,0,10,4,4,<music21.note.Note G>,...,<music21.note.Note C>,<music21.note.Note A>,G4,D4,C4,A2,1,-4,-6,-21
bwv421_4,bwv421,a minor,4,4,155,1,9,1,1,<music21.note.Note G>,...,<music21.note.Note D>,<music21.note.Note B->,G4,D4,D4,B-2,1,-4,-4,-20


![](../presentation/yay_gifs/dwight.gif)

In [11]:
bach_df.describe()


Unnamed: 0,beats_per_measure,forward_note,back_note,forward_measure,back_measure,note_in_measure_position,quarter_in_measure_position,soprano_pitch_cont,alto_pitch_cont,tenor_pitch_cont,bass_pitch_cont
count,46676.0,46676.0,46676.0,46676.0,46676.0,46676.0,46676.0,46676.0,46676.0,46676.0,46676.0
mean,3.860828,165.51474,165.51474,11.208501,10.550947,8.098766,2.4296,11316.981554,11825.955737,12034.797498,12368.615134
std,0.34613,190.88104,190.88104,12.21657,12.216967,4.534788,1.10032,105755.325612,108104.594504,109068.178743,110591.6691
min,3.0,0.0,0.0,0.0,0.0,1.0,1.0,-8.0,-13.0,-18.0,-32.0
25%,4.0,63.0,63.0,4.0,4.0,4.0,1.0,1.0,-4.0,-9.0,-18.0
50%,4.0,127.0,127.0,9.0,8.0,8.0,2.0,5.0,0.0,-6.0,-15.0
75%,4.0,204.0,204.0,14.0,13.0,12.0,3.0,8.0,3.0,-3.0,-11.0
max,4.0,1699.0,1699.0,106.0,106.0,40.0,10.0,1000000.0,1000000.0,1000000.0,1000000.0


In [12]:
f = open('complete_bach.p','wb')
cPickle.dump(bach_df, f)
f.close()

In [13]:
clean_bach = bach_df.drop(['soprano','alto','tenor','bass'], axis=1)

In [14]:
f = open('clean_bach.p','wb')
cPickle.dump(clean_bach, f)
f.close()

## Data Visualizations
To be filled in later.

The purpose of this section is to show the underlying trends in the music. The hope is that the model picks up on these trends that reflect what music majors learn are the basics of composition.

I don't have a way to exactly evaluate quantitatively how well the models perform. I only have my ears. But I can show you the charts for the compositions I am making!

In [15]:
# create a histogram of the frequency of notes by part
# x-axis is all pitches, lined up. you can set it using the continuous
# variables but label them with categorical. y is the # of times they
# show up. use different colors for each voice part

In [16]:
# can i show a histogram that shows frequency of pitches by CHORD?
# ie I chord vs V chord vs IV chord?

In [None]:
# a visualization that shows frequency of pitch BY POSITION IN THE MEASURE
# ie. G occurs most often when quarter = 1, and F# shows up most when 
# quarter = 4

In [None]:
# histogram of the keys/harmonies that show up in each piece. 
# Need to add this to the dataframe later.

-----------
## Please go on to the third page of this technical report, available back in the Github folder.