# Simple Data Generation for Tableau

We will store the data in a data frame to aid with exporting to a \*.csv file

In [1]:
using DataFrames

We initialize the data frame with 3 columns and 1024 samples

1. The first column contains the subject `subject`, randomly selected from 64 identifiers.
2. The second column contains the precursor to `time`, the cumulative sum of random increments from 1 to 8
3. The third column contains the `event`, randomly selected from a central distribution on the letters A to E

In [25]:
exportdata = DataFrame(
    subject = sort(rand(1:64, 1024)),
    time = cumsum(rand(1:8, 1024)),
    event = rand(['A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'E'], 1024)
)

Unnamed: 0,subject,time,event
1,1,6,A
2,1,14,A
3,1,21,D
4,1,24,A
5,1,26,B
6,1,31,C
7,1,32,B
8,1,37,C
9,1,41,D
10,1,48,C


Notice that the time is increasing across all subjects. Instead we want to be the cumulative sum strictly within each subject. To accomplish this we use logical indexing to pull the last time from each subject and subtract it from the times of the next subject.

First find the logic index of the row before the subject changed.

In [26]:
exportdata[:before] = [exportdata[:subject][1:end-1] .!= exportdata[:subject][2:end];false];
exportdata

Unnamed: 0,subject,time,event,before
1,1,6,A,false
2,1,14,A,false
3,1,21,D,false
4,1,24,A,false
5,1,26,B,false
6,1,31,C,false
7,1,32,B,false
8,1,37,C,false
9,1,41,D,false
10,1,48,C,false


Similarly find the logical index of the row after the subject changed.

In [27]:
exportdata[:after] = [false; exportdata[:before][1:end-1]];
exportdata

Unnamed: 0,subject,time,event,before,after
1,1,6,A,false,false
2,1,14,A,false,false
3,1,21,D,false,false
4,1,24,A,false,false
5,1,26,B,false,false
6,1,31,C,false,false
7,1,32,B,false,false
8,1,37,C,false,false
9,1,41,D,false,false
10,1,48,C,false,false


Next initialize a column to contain the shift of baseline time for each subject.

In [28]:
exportdata[:shift] = zeros(Int, 1024);
exportdata

Unnamed: 0,subject,time,event,before,after,shift
1,1,6,A,false,false,0
2,1,14,A,false,false,0
3,1,21,D,false,false,0
4,1,24,A,false,false,0
5,1,26,B,false,false,0
6,1,31,C,false,false,0
7,1,32,B,false,false,0
8,1,37,C,false,false,0
9,1,41,D,false,false,0
10,1,48,C,false,false,0


Into the baseline shift of each row following the change in subject place the difference between the successive start times of each subject.

In [29]:
exportdata[:shift][exportdata[:after]] = 
    exportdata[:time][exportdata[:before]] - 
    [0; exportdata[:time][exportdata[:before]][1:end-1]];
exportdata

Unnamed: 0,subject,time,event,before,after,shift
1,1,6,A,false,false,0
2,1,14,A,false,false,0
3,1,21,D,false,false,0
4,1,24,A,false,false,0
5,1,26,B,false,false,0
6,1,31,C,false,false,0
7,1,32,B,false,false,0
8,1,37,C,false,false,0
9,1,41,D,false,false,0
10,1,48,C,false,false,0


Fill in the zeroes by running a cummulative sum over the shift column.

In [30]:
exportdata[:shift] = cumsum(exportdata[:shift]);
exportdata

Unnamed: 0,subject,time,event,before,after,shift
1,1,6,A,false,false,0
2,1,14,A,false,false,0
3,1,21,D,false,false,0
4,1,24,A,false,false,0
5,1,26,B,false,false,0
6,1,31,C,false,false,0
7,1,32,B,false,false,0
8,1,37,C,false,false,0
9,1,41,D,false,false,0
10,1,48,C,false,false,0


Rectify the time for each subject by subtracting the baseline shift.

In [31]:
exportdata[:time] = exportdata[:time] - exportdata[:shift];
exportdata

Unnamed: 0,subject,time,event,before,after,shift
1,1,6,A,false,false,0
2,1,14,A,false,false,0
3,1,21,D,false,false,0
4,1,24,A,false,false,0
5,1,26,B,false,false,0
6,1,31,C,false,false,0
7,1,32,B,false,false,0
8,1,37,C,false,false,0
9,1,41,D,false,false,0
10,1,48,C,false,false,0


We can now export the data, choosing the first 3 columns.

In [34]:
writetable("export.csv", exportdata[[:subject, :time, :event]])