#### Mood:
- A state of mind that is not as specific as emotion
- Biases which emotions are felt
(DOI:10.1080/10803548.2003.11076589)

#### Behavioral correlates with mood include
- Voice modulation
- Gestures
- Cognitive performance
- Cognitive strategy
- Motor behavior 
(DOI:10.1080/10803548.2003.11076589)

**Question is does our sensorkit provide these metrics**

Keyboard stokes may predict future moods. 
- But it is not known over how long the data needs to be collected for prediction to be reliable
(https://doi.org/10.1016/j.asej.2021.101660)

### Data Available to us
- Phone Usage
- Keyboard Metrics

#### Data that seems interesting
- Total Deletions
- Total AutoCorrects
- Total emojiCount (we get the total of each emoji type)
- Total ScreenTime

#### Data that can be used as labels
- Self report survey scores
- Calculated IB Gaps

## Desired folders
RK.8D1DBFAD.DJW Thesis_20220930-20221001/sensorkit-keyboard-metrics

In [10]:
import pandas as pd
import numpy as np
import pickle
import datetime
import json
import re
import os

In [3]:
## data directory
## directory = "/Users/farhan/DNL/BuddingScholar/Budding_Scholar_22-23/Data_20220930-20221001/sensorkit-keyboard-metrics/iPhone/1e7aef96-16cc-43f8-95d4-e3bc582eb6d3"

In [11]:
## Good Subjects
participant = "1e7aef96-16cc-43f8-95d4-e3bc582eb6d3"

#### Functions to unzip files in a folder

In [12]:
## Recursively unzip everything
import fnmatch
import gzip
import shutil

def gunzip(file_path, output_path):
    with gzip.open(file_path,"rb") as f_in, open(output_path,"wb") as f_out:
        shutil.copyfileobj(f_in, f_out)

def recurse_and_gunzip(root):
    walker = os.walk(root)
    for rootx,dirs,files in walker:
        for f in files:
            if fnmatch.fnmatch(f,"*.gz"):
                gunzip(rootx+"/"+f, rootx+"/"+f.replace(".gz",""))

##### Each JSON File contains two dictionaries:
    - device
    - sample
________________________________________________________________________________
##### The device dictionary contains
    - name
    - phone type
________________________________________________________________________________

##### The sample dictionary has a list of samples
##### Each sample inside this list has the following variables types of interest:
    - Corrections
    - Errors
________________________________________________________________________________
1. Corrections of interest:
    - total Retro Corrections
    - total Insert Key Corrections
    - total Near Key Corrections
    - total Hit Test Corrections
    - total Substitution Corrections
________________________________________________________________________________
2. Errors of interest:
    - shortWordCharKeyUpErrorDistance
    - shortWordCharKeyDownErrorDistance
    - spaceUpErrorDistance
________________________________________________________________________________
3. Other variables of interest:
    - total Typing Episodes
    - timestamp
________________________________________________________________________________

In [13]:
## Functions for single participants
## Get data and corresponding
from sqlite3 import Date, Timestamp

## Need a loop here to loop over all files in the directory
directory = "/Users/farhan/DNL/BuddingScholar/Budding_Scholar_22-23/Data_20220930-20221001/sensorkit-keyboard-metrics/iPhone/1b9b62f1-095b-4819-92a0-ea8e7abee884/C4168B14-53AD-4091-97B5-7A3E4EB4A738"
recurse_and_gunzip(directory)

correctionsList = []
errorsList = []

## Loop over all files in the directory
for fname in os.listdir(directory):
    
    filename = ""

    ## name of the file
    if fname.endswith("json"):
        filename = directory + "/" + fname
    else: 
        continue
    
    ## Load the JSON File
    file = open(filename)
    
    ## Need to use json.load and not json.loads
    loaded_file = json.load(file)

    ## Get the samples list
    samples = loaded_file["samples"]

    ## Get the name
    name = loaded_file["device"]["name"]

    ## Need a loop here to iterate over all samples
    for i in range(len(samples)):

        ## Get the TimeStamp for the current sample
        timeStamp = samples[i]["timestamp"]

        ## Get the sample dictionary
        sample_dict_iterator = samples[i]["sample"]

        ## Get the variables for the current dict iterator
        totalTypingEpisodes = sample_dict_iterator["totalTypingEpisodes"]

        ## Correction variables
        correction_dict_temp = {
            "name": name,
            "timeStamp ": timeStamp,
            "totalRetroCorrections": sample_dict_iterator["totalRetroCorrections"], "totalInsertKeyCorrections": sample_dict_iterator["totalInsertKeyCorrections"],
            "totalNearKeyCorrections": sample_dict_iterator["totalNearKeyCorrections"], "totalHitTestCorrections": sample_dict_iterator["totalHitTestCorrections"],
            "totalSubstitutionCorrections": sample_dict_iterator["totalSubstitutionCorrections"], "totalTranspositionCorrections": sample_dict_iterator["totalTranspositionCorrections"],
            "totalSpaceCorrections": sample_dict_iterator["totalSpaceCorrections"], "totalAutoCorrections": sample_dict_iterator["totalAutoCorrections"]
        }
        correctionsList.append(correction_dict_temp)

        ## Error variables
        ## These are distribution
        ## Taking the mean of the distribution for each sample
        shortWordCharKeyUpErrorDistance = sum(sample_dict_iterator["shortWordCharKeyUpErrorDistance"]["distributionSampleValues"])/len(sample_dict_iterator["shortWordCharKeyUpErrorDistance"]["distributionSampleValues"])
        shortWordCharKeyDownErrorDistance = sum(sample_dict_iterator["shortWordCharKeyDownErrorDistance"]["distributionSampleValues"])/len(sample_dict_iterator["shortWordCharKeyDownErrorDistance"]["distributionSampleValues"])
        spaceUpErrorDistance = sum(sample_dict_iterator["spaceUpErrorDistance"]["distributionSampleValues"])/len(sample_dict_iterator["spaceUpErrorDistance"]["distributionSampleValues"])

        error_dict_temp = {
            "name": name,
            "timeStamp": timeStamp,
            "shortWordCharKeyUpErrorDistance": shortWordCharKeyUpErrorDistance,
            "shortWordCharKeyDownErrorDistance": shortWordCharKeyDownErrorDistance,
            "spaceUpErrorDistance": spaceUpErrorDistance
        }
        errorsList.append(error_dict_temp)

In [14]:
correctionsDF = pd.DataFrame(correctionsList)
errorDF = pd.DataFrame(errorsList)
errorDF

Unnamed: 0,name,timeStamp,shortWordCharKeyUpErrorDistance,shortWordCharKeyDownErrorDistance,spaceUpErrorDistance
0,iPhone,2022-09-29T12:17:14-0400,12.189729,11.946486,60.635624
1,iPhone,2022-09-29T12:03:33-0400,11.714285,12.805,60.977142
2,iPhone,2022-09-29T12:17:14-0400,12.189729,11.946486,60.635624
3,iPhone,2022-09-29T16:05:18-0400,12.383846,12.293846,46.893332
4,iPhone,2022-09-29T20:43:16-0400,10.823333,10.742291,56.617776
5,iPhone,2022-09-29T20:43:16-0400,10.823333,10.742291,56.617776
6,iPhone,2022-09-29T12:03:33-0400,11.714285,12.805,60.977142


In [128]:
# ## Want to extract the keyboard metrics in a good way
# file_path = "RK.8D1DBFAD.DJW Thesis_20220930-20221001/sensorkit-keyboard-metrics/iPhone/2f32cd19-e9c5-4aad-8999-6f4646169ab6/3400296D-7399-44F9-9E9D-2CA824598AE8/2022-09-28T163510-0400_2022-09-29T071630-0400.json.gz"
# a = gzip.open(file_path, 'rb')
# contents = json.loads(a.read())
# print(pd.DataFrame(contents))