# Our Data-driven Development

Originally we planned to analyze our data in conjunction with facial recognition APIs. We stored pictures of ourselves using a quick an dirty [Node.js script](https://gist.github.com/aaronkh/add182a3b12a2a5b3d3487b54a93b7c9), then we were left with the data below. Unfortunately, we couldn't get to analyzing the data in time, but feel free to look at the cleaning process below. 

In [65]:
import pandas as pd
import matplotlib.pyplot as plt
import requests
from IPython.core.display import display, HTML
import json

# Step 0
Data Cleaning    
This is a snapshot of our raw data. It was uploaded to an Azure server for NLP and facial recognition APIs.

In [35]:
# We pull the table
raw_tsv = requests.get('https://hacktech-2020.azurewebsites.net/data.csv').text
with open('data.csv', 'w+') as f:
    f.write(raw_tsv)
    f.close()
df = pd.read_csv('./data.csv', sep='\t', header=None)
df.head() # Let's take a look at what's in our table

Unnamed: 0,0,1,2,3,4,5
0,1583551162518,aaron,1583551162284-230606255.jpg,t,"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""9ae8eefa-9a7c-4137-8859-590a885f57..."
1,1583563313049,aaron,1583563313575-827655847.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""83405c3c-5f02-48be-b85e-81ddc158d0..."
2,1583563364060,aaron,1583563364482-79528200.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""ee6683dd-fd5b-4005-b59c-8bb99ca5df..."
3,1583563387332,aaron,1583563387718-833058606.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""b2caf758-9d55-4a4c-87c5-f64d7cca71..."
4,1583563423251,aaron,1583563423677-886855802.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""20ecd124-c47a-4ff6-b5b6-6547eff4e1..."


In [36]:
df.columns = ['timestamp', 'author', 'img', 'message', 'nlp',  'face'] # rename the columns
df = df.drop(0)
df.head()

Unnamed: 0,timestamp,author,img,message,nlp,face
1,1583563313049,aaron,1583563313575-827655847.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""83405c3c-5f02-48be-b85e-81ddc158d0..."
2,1583563364060,aaron,1583563364482-79528200.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""ee6683dd-fd5b-4005-b59c-8bb99ca5df..."
3,1583563387332,aaron,1583563387718-833058606.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""b2caf758-9d55-4a4c-87c5-f64d7cca71..."
4,1583563423251,aaron,1583563423677-886855802.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""20ecd124-c47a-4ff6-b5b6-6547eff4e1..."
5,1583563604700,skylar,1583563605003-602064868.jpg,message,"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...",[]


In [37]:
df['timestamp'] = pd.to_datetime(df.timestamp, unit="ms") # Cast time to something more pleasnt than epoch time
df.head()

Unnamed: 0,timestamp,author,img,message,nlp,face
1,2020-03-07 06:41:53.049,aaron,1583563313575-827655847.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""83405c3c-5f02-48be-b85e-81ddc158d0..."
2,2020-03-07 06:42:44.060,aaron,1583563364482-79528200.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""ee6683dd-fd5b-4005-b59c-8bb99ca5df..."
3,2020-03-07 06:43:07.332,aaron,1583563387718-833058606.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""b2caf758-9d55-4a4c-87c5-f64d7cca71..."
4,2020-03-07 06:43:43.251,aaron,1583563423677-886855802.jpg,{krystal},"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...","[{""faceId"":""20ecd124-c47a-4ff6-b5b6-6547eff4e1..."
5,2020-03-07 06:46:44.700,skylar,1583563605003-602064868.jpg,message,"{""documents"":[{""id"":""1"",""sentiment"":""neutral"",...",[]


In [38]:
# Turn raw JSON into a Python dict for sentiment analysis
df.nlp = df.nlp.map(lambda item: json.loads(item)['documents'][0]['documentScores']) 
df.head()

Unnamed: 0,timestamp,author,img,message,nlp,face
1,2020-03-07 06:41:53.049,aaron,1583563313575-827655847.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","[{""faceId"":""83405c3c-5f02-48be-b85e-81ddc158d0..."
2,2020-03-07 06:42:44.060,aaron,1583563364482-79528200.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","[{""faceId"":""ee6683dd-fd5b-4005-b59c-8bb99ca5df..."
3,2020-03-07 06:43:07.332,aaron,1583563387718-833058606.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","[{""faceId"":""b2caf758-9d55-4a4c-87c5-f64d7cca71..."
4,2020-03-07 06:43:43.251,aaron,1583563423677-886855802.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","[{""faceId"":""20ecd124-c47a-4ff6-b5b6-6547eff4e1..."
5,2020-03-07 06:46:44.700,skylar,1583563605003-602064868.jpg,message,"{'positive': 0.1, 'neutral': 0.73, 'negative':...",[]


In [50]:
# ... do the same for face...
json.loads(df.face[2])[0]['faceRectangle']
df.face = df.face.map(lambda item: json.loads(df.face[2])[0]['faceRectangle'])
df.head()

Unnamed: 0,timestamp,author,img,message,nlp,face
1,2020-03-07 06:41:53.049,aaron,1583563313575-827655847.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","{'top': 108, 'left': 211, 'width': 124, 'heigh..."
2,2020-03-07 06:42:44.060,aaron,1583563364482-79528200.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","{'top': 108, 'left': 211, 'width': 124, 'heigh..."
3,2020-03-07 06:43:07.332,aaron,1583563387718-833058606.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","{'top': 108, 'left': 211, 'width': 124, 'heigh..."
4,2020-03-07 06:43:43.251,aaron,1583563423677-886855802.jpg,{krystal},"{'positive': 0.02, 'neutral': 0.98, 'negative'...","{'top': 108, 'left': 211, 'width': 124, 'heigh..."
5,2020-03-07 06:46:44.700,skylar,1583563605003-602064868.jpg,message,"{'positive': 0.1, 'neutral': 0.73, 'negative':...","{'top': 108, 'left': 211, 'width': 124, 'heigh..."


As you can see all of our data exists within this table. You can check the images we got with the link below:

In [69]:
image_index = 1 # Pick a number to see the image associated with the git commit
display(HTML('<a href = https://hacktech-2020.azurewebsites.net/'+df.img[image_index]+'>Click me!</a>'))

# Next Steps

Obviously there is a lot we can do with this data. We were originally planning on analyzing how our faces moved through time, whether or not out messages got more negative/positive, and facial emotions vs git commit. But due to the time crunch we 