# Performance to Tiny Performance Format - Differential

The purpose of this script is to break down all the metatone touch-data into a long Tiny Touchscreen Performance array. Each performance is split per player, cleaned into `[dx,dy,moving,time,dt]` columns and then concatenated to produce a very long array. This is stored in `h5` format for later use in training ANNs.

This script uses x, y values in [0,1000] and t values in ms.

Verdict - doesn't really help!

In [3]:
from __future__ import print_function
import os
import time
from datetime import timedelta
import pandas as pd
import numpy as np
import random
import h5py

### Load up data

- Loads metatone logs.
- Checks if output directory exists, otherwise creates it.

In [4]:
log_files = []
log_frames = []

for local_file in os.listdir("../data"):
    if local_file.endswith("-touches.csv"):
        log_files.append("../data/" + local_file)

print("Loading all the frames.")
for log in log_files:
    log_frames.append(pd.DataFrame.from_csv(log,parse_dates=True,header=0))
print("Done Loading", len(log_frames), "logs.")

Loading all the frames.
Done Loading 163 logs.


# Process touch logs

- Divide each touch log by performer
- Convert each log to tiny performance format (time as float, x, y, moving)
- Save each log individually (currently commented)
- Concatenate all logs into one
- Save as one big file in float32 format.
- In h5 format the 4.3M touches end up around 52MB.

### Problems

- A small number of logs are in y,x format (rotated), so some x,y values end up less than 0 or more than 1.
- Velocity is dropped in the interest of simplicity.
- Some pauses are greater than 5 seconds and should really start again as zero.
- The last two problems are kicked down the road to the next processing script.


In [5]:
def clean_sound_object(frame):
    """Cleans up sound object frames by removing unneeded 
        columns and changing times to differences."""
    first_time = frame.index[0].to_pydatetime()
    output = frame[['x_pos','y_pos','velocity']]
    output['time'] = output.index
    output.velocity = output.velocity/output.velocity
    output.velocity = output.velocity.fillna(0)
    output.x_pos = output.x_pos * 1000.0/1024.0 # convert to [0,1000]
    output.y_pos = output.y_pos * 1000.0/768.0 # convert to [0,1000]
    output.time = (output.time - first_time).apply(timedelta.total_seconds)
    output['dt'] = output.time.diff() * 1000.0 # time diff, convert to ms
    output.time = output.time.fillna(0) # give a decent value for gaps
    output.dt = output.dt.fillna(1000)
    output['dx'] = output.x_pos.diff()
    output.dx = output.dx.fillna(500)
    output['dy'] = output.y_pos.diff()
    output.dy = output.dy.fillna(500)
    output = output.rename(columns={'x_pos': 'x', 'y_pos': 'y', 'velocity': 'moving', 'time': 'time', 'dx': 'dx', 'dy': 'dy', 'dt': 'dt'})
    return output

total_touches = 0
total_performances = 0
total_performers = 0
individual_tiny_perfs = []

print("Dividing all performances by player and converting to tiny performance format.")

for log in log_frames:
    total_performances += 1
    for n in log.device_id.unique():
        total_performers += 1
        l = log[log.device_id == n]
        individual_log_title = l.index[0].to_pydatetime().strftime("%Y-%m-%d-%H-%M-%S")
        individual_log_title += "-" + n
        l = clean_sound_object(l)
        l = l.set_index('time')
        total_touches += l.x.count() # Add to total touches processed
        #l.to_csv(output_directory + individual_log_title + output_fileending)
        individual_tiny_perfs.append(l)
                
print()
print("Processed", total_performances, "performances.")
print("There were", total_performers, "performers in total.")
print("Total touches recorded was:", total_touches)
print("Now saving a big file with all performances concatenated.")
total_perf_df = pd.concat(individual_tiny_perfs)
# total_perf_df.to_csv("metatone_corpus_tiny_perf_format.csv" # not saving to CSV anymore.

xy_max = 1000 * 1.0
dt_max = 1000 * 5.0

# clip to x,y in [0,1]
total_perf_df.set_value(total_perf_df[total_perf_df.x > xy_max].index, 'x', xy_max)
total_perf_df.set_value(total_perf_df[total_perf_df.x < 0].index, 'x', 0.0)
total_perf_df.set_value(total_perf_df[total_perf_df.y > xy_max].index, 'y', xy_max)
total_perf_df.set_value(total_perf_df[total_perf_df.y < 0].index, 'y', 0.0)

# could clip dt as well.
total_perf_df.set_value(total_perf_df[total_perf_df.dt > dt_max].index, 'dt', dt_max)

# make into one huge array
total_perf_array = np.array(total_perf_df[['dx','dy','dt']])

# save huge array in h5 format
data_file_name = "MetatoneDifferentialTinyPerfCorpus-1000.h5"
with h5py.File(data_file_name, 'w') as data_file:
    dset = data_file.create_dataset('total_performances', data=total_perf_array, dtype='float32')

# done.
print("Done.")
print("Data looks something like this:")
print(total_perf_df.describe())

Dividing all performances by player and converting to tiny performance format.

Processed 163 performances.
There were 548 performers in total.
Total touches recorded was: 4298418
Now saving a big file with all performances concatenated.
Done.
Data looks something like this:
                  x             y        moving            dt            dx  \
count  4.298418e+06  4.298418e+06  4.298418e+06  4.298418e+06  4.298418e+06   
mean   4.976371e+02  5.115529e+02  9.120721e-01  4.144776e+01  6.109386e-02   
std    2.349282e+02  2.473850e+02  2.831901e-01  2.099729e+02  1.571173e+02   
min    0.000000e+00  0.000000e+00  0.000000e+00  5.100000e-02 -1.000000e+03   
25%    3.271484e+02  3.248698e+02  1.000000e+00  8.463000e+00 -9.277344e+00   
50%    4.892578e+02  5.019531e+02  1.000000e+00  1.616600e+01  0.000000e+00   
75%    6.611328e+02  6.829427e+02  1.000000e+00  2.026100e+01  8.789062e+00   
max    1.000000e+03  1.000000e+03  1.000000e+00  5.000000e+03  1.000000e+03   

            