<a href="https://colab.research.google.com/github/curtiscu/LYIT/blob/master/PlayingWithPandasStructures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Testing datastructures, exploring what's possible

# Setup env


In [0]:
# print all cell output
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"


## Google drive access

In [2]:
# mount google drive

from google.colab import drive
drive.mount('/content/drive', force_remount=True)


Mounted at /content/drive


In [3]:
# test, peek at data
! ls -al '/content/drive/My Drive/groove-v1.0.0-midionly/groove/drummer1/eval_session/'

# test, modules from local  'E:\Google Drive\LYIT\Dissertation\modules'
! ls -al '/content/drive/My Drive/LYIT/Dissertation/modules/'

total 35
-rw------- 1 root root 2589 Apr 27 12:01 10_soul-groove10_102_beat_4-4.mid
-rw------- 1 root root 4793 Apr 27 12:01 1_funk-groove1_138_beat_4-4.mid
-rw------- 1 root root 3243 Apr 27 12:01 2_funk-groove2_105_beat_4-4.mid
-rw------- 1 root root 4466 Apr 27 12:01 3_soul-groove3_86_beat_4-4.mid
-rw------- 1 root root 2551 Apr 27 12:01 4_soul-groove4_80_beat_4-4.mid
-rw------- 1 root root 3798 Apr 27 12:01 5_funk-groove5_84_beat_4-4.mid
-rw------- 1 root root 3760 Apr 27 12:01 6_hiphop-groove6_87_beat_4-4.mid
-rw------- 1 root root 1894 Apr 27 12:01 7_pop-groove7_138_beat_4-4.mid
-rw------- 1 root root 2437 Apr 27 12:01 8_rock-groove8_65_beat_4-4.mid
-rw------- 1 root root 3448 Apr 27 12:01 9_soul-groove9_105_beat_4-4.mid
total 21
-rw------- 1 root root 16580 May 25 20:01 data_prep.py
drwx------ 2 root root  4096 May 25 16:59 __pycache__


## Auto reload module

Now using library code I've created and saved to google drive which is automatically pushed to the cloud and made available to the colab env. The autoreload stuff below should help imports to 'reimport' to load changes to the library code.

It's not the quickest/ most reliable, so if in a hurry, brute force loading of changes by restarting the runtime.

In [0]:
# tool to auto reload modules.
%load_ext autoreload

# config to auto-reload all modules, handy to make 
# writing and testing modules much easier.
%autoreload 2

## Imports and accessing lib functions

In [5]:
# install required libs
!pip install mido



In [6]:
# import my modules
import sys
sys.path.append('/content/drive/My Drive/LYIT/Dissertation/modules/')
import data_prep

LOADING - data_prep.py module name is: data_prep


In [0]:
# imports
import pandas as pd
import math
import matplotlib.pyplot as plt
import numpy as np


# object that provides colours for charts
from itertools import cycle



In [8]:
# testing auto reload of modules 
data_prep.test_function_call('bling')

test function called worked! :)  bling


## Pandas display options

In [0]:
def set_pandas_display_options() -> None:
    # Ref: https://stackoverflow.com/a/52432757/
    display = pd.options.display

    display.max_columns = 1000
    display.max_rows = 2000
    display.max_colwidth = 1000
    display.width = None
    # display.precision = 2  # set as needed

set_pandas_display_options()
#pd.reset_option('all')


# Setup test file

## Load file

In [0]:
gmt = data_prep.GrooveMidiTools()

In [11]:
file_name = '/content/drive/My Drive/groove-v1.0.0-midionly/groove/drummer5/eval_session/1_funk-groove1_138_beat_4-4.mid'
midi_file = data_prep.MIDI_File_Wrapper(file_name, gmt.mappings)
f = midi_file
f_df = f.df_midi_data

FILE: /content/drive/My Drive/groove-v1.0.0-midionly/groove/drummer5/eval_session/1_funk-groove1_138_beat_4-4.mid
    tracks: [<midi track 'Base Midi' 1037 messages>]
    time sig: <meta message time_signature numerator=4 denominator=4 clocks_per_click=24 notated_32nd_notes_per_beat=8 time=0>
    tempo: <meta message set_tempo tempo=434783 time=0>
    note_on span - first tick: 5 , last tick: 30634 
    good instruments: 4, {36.0: 'Bass Drum 1 (36)', 38.0: 'Acoustic Snare (38)', 42.0: 'Closed Hi Hat (42)', 51.0: 'Ride Cymbal 1 (51)'}


... the above verifies I'm able to create custom objects from custom code, great!


## Setup MIDI event timing bins...

In [12]:
# MTT object for parsing file and
# calculating crticial time metrics
mtt = data_prep.MidiTimingTools(file_name, f.ticks(), f.tempo_us(), f.ts_num(), f.ts_denom(), f.last_hit())

# values needed these for making MultiIndex later
quantize_level = mtt.bins_per_bar()
bars_in_file = mtt.bars_in_file()
tp_beat = mtt.ts_ticks_per_beat()
tp_bin = mtt.bin_size()

print('bar info - bars in file: {}, bar quantize level: {}'.format(bars_in_file, quantize_level))
print('tick info - ticks per time sig beat: {}, ticks per quantize bin: {}'.format(tp_beat, tp_bin))

# capture timing data from MidiTimingTools in df...
beats_col, offsets_col = mtt.get_offsets(f_df[f.cum_ticks_col])
f_df['beat_offset'] = offsets_col
f_df['beat_center'] = beats_col
f_df['file_beat_number'] = pd.Categorical(f_df.beat_center).codes

f_df.head(20)

bar info - bars in file: 16, bar quantize level: 16.0
tick info - ticks per time sig beat: 480.0, ticks per quantize bin: 120


Unnamed: 0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number
0,track_name,0,0,0.0,,,"{'type': 'track_name', 'name': 'Base Midi', 'time': 0}",0,0,0
1,instrument_name,0,0,0.0,,,"{'type': 'instrument_name', 'name': 'Brooklyn', 'time': 0}",0,0,0
2,time_signature,0,0,0.0,,,"{'type': 'time_signature', 'numerator': 4, 'denominator': 4, 'clocks_per_click': 24, 'notated_32nd_notes_per_beat': 8, 'time': 0}",0,0,0
3,key_signature,0,0,0.0,,,"{'type': 'key_signature', 'key': 'C', 'time': 0}",0,0,0
4,smpte_offset,0,0,0.0,,,"{'type': 'smpte_offset', 'frame_rate': 24, 'hours': 33, 'minutes': 1, 'seconds': 15, 'frames': 16, 'sub_frames': 24, 'time': 0}",0,0,0
5,set_tempo,0,0,0.0,,,"{'type': 'set_tempo', 'tempo': 434783, 'time': 0}",0,0,0
6,control_change,4,4,0.003623,,,"{'type': 'control_change', 'time': 4, 'control': 4, 'value': 77, 'channel': 9}",4,0,0
7,note_on,1,5,0.004529,42.0,55.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 55, 'channel': 9}",5,0,0
8,note_on,4,9,0.008152,36.0,39.0,"{'type': 'note_on', 'time': 4, 'note': 36, 'velocity': 39, 'channel': 9}",9,0,0
9,note_on,6,15,0.013587,51.0,67.0,"{'type': 'note_on', 'time': 6, 'note': 51, 'velocity': 67, 'channel': 9}",15,0,0


In [13]:
f_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1037 entries, 0 to 1036
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   msg_type          1037 non-null   string  
 1   delta_ticks       1037 non-null   int64   
 2   total_ticks       1037 non-null   int64   
 3   total_seconds     1037 non-null   float64 
 4   note              746 non-null    float64 
 5   velocity          746 non-null    float64 
 6   raw_data          1037 non-null   string  
 7   beat_offset       1037 non-null   int64   
 8   beat_center       1037 non-null   category
 9   file_beat_number  1037 non-null   int16   
dtypes: category(1), float64(3), int16(1), int64(3), string(2)
memory usage: 69.1 KB


## Setup columns indicating bars & beats

These new columns will be needed for the new MultiIndex that needs to be created

In [0]:
# make a copy, just to practice on 
tmp_df = f_df.copy(deep=True)

In [0]:
# add column for bear index
tmp_df['bar_number'] = (tmp_df.file_beat_number // quantize_level) + 1
tmp_df['bar_number'] = tmp_df['bar_number'].astype(int) 

In [0]:
# add column for beat within the bar index
tmp_df['bar_beat_number'] = (tmp_df.file_beat_number % 16) + 1

In [0]:
# filter to only note_on events
tmp_df = tmp_df[tmp_df['msg_type'] == 'note_on'].copy() 


In [18]:
tmp_df.info()
tmp_df.head(20)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 373 entries, 7 to 1034
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   msg_type          373 non-null    string  
 1   delta_ticks       373 non-null    int64   
 2   total_ticks       373 non-null    int64   
 3   total_seconds     373 non-null    float64 
 4   note              373 non-null    float64 
 5   velocity          373 non-null    float64 
 6   raw_data          373 non-null    string  
 7   beat_offset       373 non-null    int64   
 8   beat_center       373 non-null    category
 9   file_beat_number  373 non-null    int16   
 10  bar_number        373 non-null    int64   
 11  bar_beat_number   373 non-null    int16   
dtypes: category(1), float64(3), int16(2), int64(4), string(2)
memory usage: 31.5 KB


Unnamed: 0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number,bar_number,bar_beat_number
7,note_on,1,5,0.004529,42.0,55.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 55, 'channel': 9}",5,0,0,1,1
8,note_on,4,9,0.008152,36.0,39.0,"{'type': 'note_on', 'time': 4, 'note': 36, 'velocity': 39, 'channel': 9}",9,0,0,1,1
9,note_on,6,15,0.013587,51.0,67.0,"{'type': 'note_on', 'time': 6, 'note': 51, 'velocity': 67, 'channel': 9}",15,0,0,1,1
14,note_on,100,226,0.20471,36.0,41.0,"{'type': 'note_on', 'time': 100, 'note': 36, 'velocity': 41, 'channel': 9}",-14,240,2,1,3
15,note_on,32,258,0.233696,51.0,58.0,"{'type': 'note_on', 'time': 32, 'note': 51, 'velocity': 58, 'channel': 9}",18,240,2,1,3
18,note_on,7,344,0.311594,36.0,6.0,"{'type': 'note_on', 'time': 7, 'note': 36, 'velocity': 6, 'channel': 9}",-16,360,3,1,4
23,note_on,0,478,0.432971,42.0,67.0,"{'type': 'note_on', 'time': 0, 'note': 44, 'velocity': 67, 'channel': 9}",-2,480,4,1,5
24,note_on,17,495,0.44837,51.0,119.0,"{'type': 'note_on', 'time': 17, 'note': 51, 'velocity': 119, 'channel': 9}",15,480,4,1,5
26,note_on,0,513,0.464674,38.0,106.0,"{'type': 'note_on', 'time': 0, 'note': 40, 'velocity': 106, 'channel': 9}",33,480,4,1,5
30,note_on,122,746,0.675725,51.0,45.0,"{'type': 'note_on', 'time': 122, 'note': 51, 'velocity': 45, 'channel': 9}",26,720,6,1,7


In [0]:
tmp_df.set_index(['bar_number', 'bar_beat_number'], inplace=True)

In [20]:
tmp_df.index.names

FrozenList(['bar_number', 'bar_beat_number'])

In [21]:
# select first bar
tmp_df.loc[1]

Unnamed: 0_level_0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number
bar_beat_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,note_on,1,5,0.004529,42.0,55.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 55, 'channel': 9}",5,0,0
1,note_on,4,9,0.008152,36.0,39.0,"{'type': 'note_on', 'time': 4, 'note': 36, 'velocity': 39, 'channel': 9}",9,0,0
1,note_on,6,15,0.013587,51.0,67.0,"{'type': 'note_on', 'time': 6, 'note': 51, 'velocity': 67, 'channel': 9}",15,0,0
3,note_on,100,226,0.20471,36.0,41.0,"{'type': 'note_on', 'time': 100, 'note': 36, 'velocity': 41, 'channel': 9}",-14,240,2
3,note_on,32,258,0.233696,51.0,58.0,"{'type': 'note_on', 'time': 32, 'note': 51, 'velocity': 58, 'channel': 9}",18,240,2
4,note_on,7,344,0.311594,36.0,6.0,"{'type': 'note_on', 'time': 7, 'note': 36, 'velocity': 6, 'channel': 9}",-16,360,3
5,note_on,0,478,0.432971,42.0,67.0,"{'type': 'note_on', 'time': 0, 'note': 44, 'velocity': 67, 'channel': 9}",-2,480,4
5,note_on,17,495,0.44837,51.0,119.0,"{'type': 'note_on', 'time': 17, 'note': 51, 'velocity': 119, 'channel': 9}",15,480,4
5,note_on,0,513,0.464674,38.0,106.0,"{'type': 'note_on', 'time': 0, 'note': 40, 'velocity': 106, 'channel': 9}",33,480,4
7,note_on,122,746,0.675725,51.0,45.0,"{'type': 'note_on', 'time': 122, 'note': 51, 'velocity': 45, 'channel': 9}",26,720,6


In [22]:
# selects bars 3-4
tmp_df.loc[3:4]

Unnamed: 0_level_0,Unnamed: 1_level_0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number
bar_number,bar_beat_number,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
3,1,note_on,24,3846,3.483699,51.0,68.0,"{'type': 'note_on', 'time': 24, 'note': 51, 'velocity': 68, 'channel': 9}",6,3840,32
3,1,note_on,3,3849,3.486416,36.0,43.0,"{'type': 'note_on', 'time': 3, 'note': 36, 'velocity': 43, 'channel': 9}",9,3840,32
3,1,note_on,0,3852,3.489134,42.0,62.0,"{'type': 'note_on', 'time': 0, 'note': 44, 'velocity': 62, 'channel': 9}",12,3840,32
3,2,note_on,13,3977,3.602358,38.0,32.0,"{'type': 'note_on', 'time': 13, 'note': 38, 'velocity': 32, 'channel': 9}",17,3960,33
3,3,note_on,96,4073,3.689315,36.0,46.0,"{'type': 'note_on', 'time': 96, 'note': 36, 'velocity': 46, 'channel': 9}",-7,4080,34
3,3,note_on,2,4075,3.691127,51.0,52.0,"{'type': 'note_on', 'time': 2, 'note': 51, 'velocity': 52, 'channel': 9}",-5,4080,34
3,4,note_on,4,4162,3.769931,36.0,6.0,"{'type': 'note_on', 'time': 4, 'note': 36, 'velocity': 6, 'channel': 9}",-38,4200,35
3,5,note_on,0,4303,3.897648,42.0,70.0,"{'type': 'note_on', 'time': 0, 'note': 44, 'velocity': 70, 'channel': 9}",-17,4320,36
3,5,note_on,34,4337,3.928446,51.0,107.0,"{'type': 'note_on', 'time': 34, 'note': 51, 'velocity': 107, 'channel': 9}",17,4320,36
3,5,note_on,8,4346,3.936598,38.0,127.0,"{'type': 'note_on', 'time': 8, 'note': 40, 'velocity': 127, 'channel': 9}",26,4320,36


In [23]:
tmp_df.index.names

FrozenList(['bar_number', 'bar_beat_number'])

In [24]:
# extract all notes in the first 1/16th note of a bar
# NOTE: 'level' refers to index, level 0 = bar_number, level 1 = bar_beat_number
tmp_df[tmp_df.index.isin([1], level=1)]

Unnamed: 0_level_0,Unnamed: 1_level_0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number
bar_number,bar_beat_number,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,1,note_on,1,5,0.004529,42.0,55.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 55, 'channel': 9}",5,0,0
1,1,note_on,4,9,0.008152,36.0,39.0,"{'type': 'note_on', 'time': 4, 'note': 36, 'velocity': 39, 'channel': 9}",9,0,0
1,1,note_on,6,15,0.013587,51.0,67.0,"{'type': 'note_on', 'time': 6, 'note': 51, 'velocity': 67, 'channel': 9}",15,0,0
2,1,note_on,16,1935,1.752719,51.0,67.0,"{'type': 'note_on', 'time': 16, 'note': 51, 'velocity': 67, 'channel': 9}",15,1920,16
2,1,note_on,3,1938,1.755436,36.0,43.0,"{'type': 'note_on', 'time': 3, 'note': 36, 'velocity': 43, 'channel': 9}",18,1920,16
2,1,note_on,1,1944,1.760871,42.0,62.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 62, 'channel': 9}",24,1920,16
3,1,note_on,24,3846,3.483699,51.0,68.0,"{'type': 'note_on', 'time': 24, 'note': 51, 'velocity': 68, 'channel': 9}",6,3840,32
3,1,note_on,3,3849,3.486416,36.0,43.0,"{'type': 'note_on', 'time': 3, 'note': 36, 'velocity': 43, 'channel': 9}",9,3840,32
3,1,note_on,0,3852,3.489134,42.0,62.0,"{'type': 'note_on', 'time': 0, 'note': 44, 'velocity': 62, 'channel': 9}",12,3840,32
4,1,note_on,111,5767,5.223737,51.0,43.0,"{'type': 'note_on', 'time': 111, 'note': 51, 'velocity': 43, 'channel': 9}",7,5760,48


In [25]:
# extract all notes in the first 1/16th note of a bar
# NOTE: 'level' refers to index, level 0 = bar_number, level 1 = bar_beat_number

# all I've changed from the last cell is the level, this now
# selects every note played in the first bar, and is actually 
# a bit verbose, as same can be managed with -> tmp_df.loc[1]
tmp_df[tmp_df.index.isin([1], level=0)]

Unnamed: 0_level_0,Unnamed: 1_level_0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number
bar_number,bar_beat_number,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,1,note_on,1,5,0.004529,42.0,55.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 55, 'channel': 9}",5,0,0
1,1,note_on,4,9,0.008152,36.0,39.0,"{'type': 'note_on', 'time': 4, 'note': 36, 'velocity': 39, 'channel': 9}",9,0,0
1,1,note_on,6,15,0.013587,51.0,67.0,"{'type': 'note_on', 'time': 6, 'note': 51, 'velocity': 67, 'channel': 9}",15,0,0
1,3,note_on,100,226,0.20471,36.0,41.0,"{'type': 'note_on', 'time': 100, 'note': 36, 'velocity': 41, 'channel': 9}",-14,240,2
1,3,note_on,32,258,0.233696,51.0,58.0,"{'type': 'note_on', 'time': 32, 'note': 51, 'velocity': 58, 'channel': 9}",18,240,2
1,4,note_on,7,344,0.311594,36.0,6.0,"{'type': 'note_on', 'time': 7, 'note': 36, 'velocity': 6, 'channel': 9}",-16,360,3
1,5,note_on,0,478,0.432971,42.0,67.0,"{'type': 'note_on', 'time': 0, 'note': 44, 'velocity': 67, 'channel': 9}",-2,480,4
1,5,note_on,17,495,0.44837,51.0,119.0,"{'type': 'note_on', 'time': 17, 'note': 51, 'velocity': 119, 'channel': 9}",15,480,4
1,5,note_on,0,513,0.464674,38.0,106.0,"{'type': 'note_on', 'time': 0, 'note': 40, 'velocity': 106, 'channel': 9}",33,480,4
1,7,note_on,122,746,0.675725,51.0,45.0,"{'type': 'note_on', 'time': 122, 'note': 51, 'velocity': 45, 'channel': 9}",26,720,6


In [26]:
# extract all notes in some 1/16th note of a bar
# NOTE: 'level' refers to index, level 0 = bar_number, level 1 = bar_beat_number

# this will select all notes played in 6th or 16th note
# of every bar in the MIDI file ...
tmp_df[tmp_df.index.isin([6, 16], level=1)]

Unnamed: 0_level_0,Unnamed: 1_level_0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number
bar_number,bar_beat_number,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,16,note_on,30,1808,1.637683,38.0,56.0,"{'type': 'note_on', 'time': 30, 'note': 38, 'velocity': 56, 'channel': 9}",8,1800,15
2,16,note_on,29,3711,3.361416,38.0,38.0,"{'type': 'note_on', 'time': 29, 'note': 38, 'velocity': 38, 'channel': 9}",-9,3720,31
4,6,note_on,42,6351,5.752723,38.0,39.0,"{'type': 'note_on', 'time': 42, 'note': 38, 'velocity': 39, 'channel': 9}",-9,6360,53
5,16,note_on,5,9472,8.579718,38.0,51.0,"{'type': 'note_on', 'time': 5, 'note': 38, 'velocity': 51, 'channel': 9}",-8,9480,79
6,16,note_on,13,11391,10.317944,38.0,53.0,"{'type': 'note_on', 'time': 13, 'note': 38, 'velocity': 53, 'channel': 9}",-9,11400,95
7,16,note_on,5,13288,12.036243,38.0,28.0,"{'type': 'note_on', 'time': 5, 'note': 38, 'velocity': 28, 'channel': 9}",-32,13320,111
8,16,note_on,12,15266,13.827911,38.0,85.0,"{'type': 'note_on', 'time': 12, 'note': 38, 'velocity': 85, 'channel': 9}",26,15240,127
9,16,note_on,18,17167,15.549833,38.0,46.0,"{'type': 'note_on', 'time': 18, 'note': 38, 'velocity': 46, 'channel': 9}",7,17160,143
10,16,note_on,25,19100,17.30074,38.0,48.0,"{'type': 'note_on', 'time': 25, 'note': 38, 'velocity': 48, 'channel': 9}",20,19080,159
12,6,note_on,10,21728,19.681177,38.0,31.0,"{'type': 'note_on', 'time': 10, 'note': 38, 'velocity': 31, 'channel': 9}",8,21720,181


In [27]:
# alternatively, this will select every note played
# in the 6th and 16th bars of the MIDI file ...
tmp_df.loc[tmp_df.index.isin([6, 16], level=0)]

Unnamed: 0_level_0,Unnamed: 1_level_0,msg_type,delta_ticks,total_ticks,total_seconds,note,velocity,raw_data,beat_offset,beat_center,file_beat_number
bar_number,bar_beat_number,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
6,1,note_on,6,9589,8.685696,36.0,45.0,"{'type': 'note_on', 'time': 6, 'note': 36, 'velocity': 45, 'channel': 9}",-11,9600,80
6,1,note_on,2,9591,8.687508,51.0,74.0,"{'type': 'note_on', 'time': 2, 'note': 51, 'velocity': 74, 'channel': 9}",-9,9600,80
6,1,note_on,1,9593,8.689319,42.0,70.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 70, 'channel': 9}",-7,9600,80
6,2,note_on,20,9724,8.807979,38.0,36.0,"{'type': 'note_on', 'time': 20, 'note': 38, 'velocity': 36, 'channel': 9}",4,9720,81
6,3,note_on,104,9828,8.902182,36.0,42.0,"{'type': 'note_on', 'time': 104, 'note': 36, 'velocity': 42, 'channel': 9}",-12,9840,82
6,3,note_on,6,9834,8.907617,51.0,54.0,"{'type': 'note_on', 'time': 6, 'note': 51, 'velocity': 54, 'channel': 9}",-6,9840,82
6,5,note_on,1,10063,9.115044,42.0,82.0,"{'type': 'note_on', 'time': 1, 'note': 44, 'velocity': 82, 'channel': 9}",-17,10080,84
6,5,note_on,19,10082,9.132255,51.0,125.0,"{'type': 'note_on', 'time': 19, 'note': 51, 'velocity': 125, 'channel': 9}",2,10080,84
6,5,note_on,8,10090,9.139501,38.0,108.0,"{'type': 'note_on', 'time': 8, 'note': 40, 'velocity': 108, 'channel': 9}",10,10080,84
6,7,note_on,122,10324,9.351458,51.0,48.0,"{'type': 'note_on', 'time': 122, 'note': 51, 'velocity': 48, 'channel': 9}",4,10320,86


# Pandas series and dataframe Indexing techniques

Working through some of these...
- https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
- https://github.com/ZaxR/pandas_multiindex_tutorial/blob/master/Pandas%20MultiIndex%20Tutorial.ipynb

In [28]:
dates = pd.date_range('1/1/2000', periods=24)
dates

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
               '2000-01-09', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14', '2000-01-15', '2000-01-16',
               '2000-01-17', '2000-01-18', '2000-01-19', '2000-01-20',
               '2000-01-21', '2000-01-22', '2000-01-23', '2000-01-24'],
              dtype='datetime64[ns]', freq='D')

In [29]:
df = pd.DataFrame(np.random.randn(24,4),
                  index=dates, columns=['A','B', 'C','D'])
df

Unnamed: 0,A,B,C,D
2000-01-01,0.305962,-0.172249,0.03309,-2.694771
2000-01-02,-0.420785,1.962162,0.458413,-0.237421
2000-01-03,-0.162125,0.074268,1.004685,-0.097182
2000-01-04,0.27238,-0.451521,-0.677017,-1.191242
2000-01-05,1.27599,0.777484,-0.29785,0.678866
2000-01-06,-1.580413,0.56329,0.682548,1.609484
2000-01-07,-0.500073,-0.34874,1.798803,0.305215
2000-01-08,0.65031,-0.60927,1.600144,-0.684545
2000-01-09,0.447592,0.20294,0.024391,-0.174897
2000-01-10,-0.570527,1.567525,-0.010495,-0.23508


In [30]:
s = df['A']
my_i = 13
print('index {}: {}'.format(my_i, s[dates[my_i]]))
s[:my_i]

index 13: 0.5766598749574005


2000-01-01    0.305962
2000-01-02   -0.420785
2000-01-03   -0.162125
2000-01-04    0.272380
2000-01-05    1.275990
2000-01-06   -1.580413
2000-01-07   -0.500073
2000-01-08    0.650310
2000-01-09    0.447592
2000-01-10   -0.570527
2000-01-11    1.021143
2000-01-12    0.050225
2000-01-13   -1.498815
Freq: D, Name: A, dtype: float64

In [31]:
s[::3]  # select every 3rd element

2000-01-01    0.305962
2000-01-04    0.272380
2000-01-07   -0.500073
2000-01-10   -0.570527
2000-01-13   -1.498815
2000-01-16   -1.154869
2000-01-19    0.089200
2000-01-22    0.059398
Freq: 3D, Name: A, dtype: float64

In [32]:
# select rows 3 thru 8, not including 8, then
# columns 2 thru 3, not including 3
df.iloc[3:8, 2:3] 

# select rows 3 and 4, column 2
df.iloc[[3,4], [2]]

Unnamed: 0,C
2000-01-04,-0.677017
2000-01-05,-0.29785
2000-01-06,0.682548
2000-01-07,1.798803
2000-01-08,1.600144


Unnamed: 0,C
2000-01-04,-0.677017
2000-01-05,-0.29785


In [33]:
df.sample(3)  # select random sample

Unnamed: 0,A,B,C,D
2000-01-14,0.57666,0.044007,-0.82479,1.914514
2000-01-23,0.241446,0.378895,-1.680329,-0.875916
2000-01-19,0.0892,0.493572,0.659653,0.19754


## Boolean indexing series

This looks pretty powerful...
- https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing

In [34]:
s = pd.Series(range(-3,4))
s
s[s > 0]
s[(s < -2)|(s>1.8)]

0   -3
1   -2
2   -1
3    0
4    1
5    2
6    3
dtype: int64

4    1
5    2
6    3
dtype: int64

0   -3
5    2
6    3
dtype: int64

## MultiIndex and 'isin' stuff

using Series...

In [35]:
s_mi = pd.Series(np.arange(6),
                           index=pd.MultiIndex.from_product([[0,1], ['a','b','c']]))
s_mi

0  a    0
   b    1
   c    2
1  a    3
   b    4
   c    5
dtype: int64

In [36]:
# this matches on 2 levels of index
s_mi.iloc[s_mi.index.isin([(1,'a'), (0,'c')])]

0  c    2
1  a    3
dtype: int64

In [37]:
# specify index level to test against

# pull out everything with index 'c' at 
# level 1 (i.e. 0, 1, therefore 1 is 
# the second index level)
s_mi.iloc[s_mi.index.isin(['c'], level=1)]

0  c    2
1  c    5
dtype: int64

using DataFrames...

In [38]:
df = pd.DataFrame({'vals': [1,2,3,4], 'ids': ['a','b','f','n'],
                   'ids2': ['a','n','c','n']})
df

# search entire df for list of values
values = ['a','b',1,3]
df.isin(values)

Unnamed: 0,vals,ids,ids2
0,1,a,a
1,2,b,n
2,3,f,c
3,4,n,n


Unnamed: 0,vals,ids,ids2
0,True,True,True
1,False,True,False
2,True,False,False
3,False,False,False


In [39]:
# search only specific columns
values = {'ids': ['a','n'], 'vals': [1,4]}
df.isin(values)

Unnamed: 0,vals,ids,ids2
0,True,True,False
1,False,False,False
2,False,False,False
3,True,True,False


Renaming multiindexes, levels, etc. and whatnot ...

In [40]:
index = pd.MultiIndex.from_product([range(3),['wun','too']], names=['furst', 'secund'])
index

MultiIndex([(0, 'wun'),
            (0, 'too'),
            (1, 'wun'),
            (1, 'too'),
            (2, 'wun'),
            (2, 'too')],
           names=['furst', 'secund'])

In [41]:
print('look at second level index, at [1]: {}'.format(index.levels[1]))
index.set_levels(['x','y','t'], level=1, inplace=True)
index


look at second level index, at [1]: Index(['too', 'wun'], dtype='object', name='secund')


MultiIndex([(0, 'y'),
            (0, 'x'),
            (1, 'y'),
            (1, 'x'),
            (2, 'y'),
            (2, 'x')],
           names=['furst', 'secund'])

## Setting multi/index from columns, resetting

More info here ...
- https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#set-reset-index


# MultiIndex in depth

Digging in here, refs I found useful..
- https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html
- https://jakevdp.github.io/PythonDataScienceHandbook/03.05-hierarchical-indexing.html
