# Welcome !

PyMouseGesture is a project that aims to show how to build a simple Keras RNN model by using the example of interpreting mouse movement as gestures.

## Organization
1. [Collecting data](#Collecting_data)
1. [Cleaning and Structuring data](#Process_data)
1. [Data Visualization](#data_vis)
1. [Building an RNN in Keras](#build)
1. [Training the model](#train)
1. [Live inference !](#Live_Inference)

<a id = "Collecting_data"></a>

## Collecting data


The data was collected using the python library [pynput](https://pypi.org/project/pynput/). The script mouse_data_collector.py is a helper script that is used to collect and label the mouse data simultaneously. However if you exit the script the collected data is written to the data.csv file in the directory overwriting any previous file of the same name.

The logic behind the script is simple:
- create a mouse event listener.
- define methods to be called when mouse is clicked and mouse is moved
- if mouse is moved collect data for the current gesture
- if mouse is clicked stop collecting data for the current gesture.
- ask the user to label the gesture
- save the collected data as a list of x and y co-ordinates

In [1]:
#first let us import the necessary packages and finish setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#set numpy random seed for reproducible results
np.random.seed(0)

%matplotlib inline

<a id = 'Process_data'></a>

## Cleaning and Structuring Data

Let us first read our collected data before doing anything else. We are using [pandas](https://pandas.pydata.org/) to read and manipulate the data.

In [13]:
#read data.csv using pd.read_csv
data = pd.read_csv('data.csv')
print(data.columns)
data.head()

Index(['sequence', 'x_coordinates', 'y_coordinates'], dtype='object')


Unnamed: 0,sequence,x_coordinates,y_coordinates
0,0,"[305, 305, 303, 299, 295, 289, 281, 272, 264, ...","[141, 141, 141, 141, 141, 139, 138, 137, 136, ..."
1,0,"[238, 234, 226, 214, 200, 175, 149, 124, 103, ...","[166, 165, 165, 165, 167, 174, 183, 194, 206, ..."
2,0,"[162, 157, 146, 134, 102, 69, 17, -37, -66, -5...","[257, 256, 253, 252, 250, 250, 255, 267, 293, ..."
3,0,"[239, 237, 234, 230, 223, 212, 199, 181, 162, ...","[205, 204, 204, 204, 204, 205, 209, 214, 219, ..."
4,0,"[248, 247, 245, 240, 226, 214, 188, 160, 120, ...","[221, 220, 219, 218, 217, 215, 214, 214, 214, ..."


As we saved our data into a csv file, while reading the file input, the list variables are read as strings. A helper function in the cell below *string_to_list* converts them back to integer lists.

In [62]:
type(data.at[0,'x_coordinates'])

str

In [63]:
def string_to_list(string_list):
    """
    Converts list read from csv as string back into integer list. Returns error if string literal has non numeric characters
    
    Args:
    ----
    string_list - A list of integers read as a string literal
    
    Returns:
    -------
    int_list - A list of integers corresponding to the input string list
    
    ##Example
    z = string_to_list(data.at[0,'x_coordinates']) where data is a pandas DataFrame
        
    """
    return list(map(int,string_list.strip('[]').split(',')))
    



We just defined a value of the maximum length of the sequence to be 75. This is obtained by analyzing the lengths of the sequences output from *string_to_list*. 

Also *m* represents the number of training examples. Here it is 101.

Feel free to replace the value of the index in the last line of the cell from 50 to anyother value. You will see values between 60 - 80.

In [64]:
#let maxLen be 120
maxLen = 75
m = data.shape[0]
print(m)
len(string_to_list(data.at[50,'x_coordinates']))

101


73

Now let us create our input and output variables. X will refer to our input variable of shape (2,m,maxLen) and Y will refer to our output variable of shape (m,)

X is initialized as a numpy array of zeros with m rows, 2 columns and depth equal to maxLen
Y is directly assigned values by converting a dataframe column to numpy array

In [38]:
X = np.zeros((m,2,maxLen))
Y = data['sequence'].values
print("The shape of X is ",X.shape)
print("The shape of Y is ",Y.shape)

The shape of X is  (101, 2, 75)
The shape of Y is  (101,)


In [57]:
for i in range(m):
    x = string_to_list(data.at[i,'x_coordinates'])
    y = string_to_list(data.at[i,'y_coordinates'])
    if len(x) > maxLen:
        X[i,0,:len(x)] =x[:maxLen]
        X[i,1,:len(y)] =y[:maxLen]
    else:
        X[i,0,:len(x)] =x
        X[i,1,:len(y)] =y

Now comes the time for data processing. We'll first do data normalization then proceed to 

## Live Inference <a id = 'Live_Inference' ></a> 