# Skipgram Generation from Mozart Piano Sonatas

Compute skipgrams over skipgrams

## Setup

Add worker processes (local and remote).
The code must be in the same directory on all nodes, otherwise addprocs needs to be told where to start.

In [None]:
# remotes go first, otherwise a two-way ssh connection is needed
#addprocs([("remote-name-or-addr", :auto)])

addprocs(3) # leave one core free locally

In [None]:
# loading code on all workers

# musicology library
# Pkg.clone("https://github.com/DCMLab/DigitalMusicology.jl.git")
using DigitalMusicology

# load the schema skipgrams code
@everywhere include("skipgrams.jl")

In [None]:
# set the corpus directory (must be the same on all nodes, otherwise don't use @everywhere)
#@everywhere DigitalMusicology.usekern("/path/to/kern/dir");
@everywhere DigitalMusicology.usekern("/home/chfin/Uni/phd/data/csapp/mozart-piano-sonatas/");

In order to efficiently schedule the pieces to multiple processes, we sort them by their estimated complexity:

In [None]:
# found experimentally, but needs to be only a very rough estimate
function complexity(id)
    notes = getpiece(id, :notes_wholes)
    (id, length(notes)^2.0/(onset(notes[end])-onset(notes[1])))
end
sortedpieces = sort(map(complexity, allpieces()), by=x->x[2], rev=true)

## Parameters

For regenerating the skipgram counts for a set of parameters, change the variables below and then run then notebook from here!

In [None]:
# number of voices per stage
voices = 3

# number of stages
stages = 2

In [None]:
# samplin in the first pass (groups of notes)
# p1 = 1.0 # 2 voices
p1 = 0.1 # 3 voices

# sampling in the second pass (sequences of groups)
p2 = 1.0 # 2x2, 3x2
#p2 = 0.001 # 2x3, 3x3
#p2 = 1.0e-6 # 2x4

In [None]:
# list pieces with their respective bar lengths
pieces = map(p -> (p, Unsims.piecebarlen(p)), map(first, sortedpieces))

In [None]:
srand(111)

## Enumerate Skipgrams

In [None]:
@time counts = Unsims.countpiecesschemasbars(pieces, voices, stages, p2, p1) # yes, p2, p1!

In [None]:
# save computed skipgrams
fn = "counts_$(voices)x$(stages)_p1_$(p1)_p2_$(p2)_$(now()).jls"
open(f -> serialize(f, counts), joinpath("official_counts", fn), "w")

Execution times:

| v | s | p1  | p2  | time |
|---|---|-----|-----|------|
| 2 | 2 | 1.0 | 1.0 | 796s |
| 2 | 3 | 1.0 | 0.001 | 4930s |
| 3 | 2 | 0.1 | 1.0 | 2300s |

In [None]:
counts = nothing