# Analysis of Workshops

WearAQ's first workshop took place at the Tower Hamlets council on Wednesday, March 14th. This notebook runs through some tests with and analysis of the data - both from the participants and the Airbeam device.

First, let's just load libraries and then move on to Workshop 1

In [35]:
# Loading libraries
import pandas as pd
import numpy as np
import seaborn as sns
import os
import pymc as pm

# for plots
from pymc.Matplot import plot as mcplot
from matplotlib import pyplot as plt
import matplotlib as mpl
from IPython.core.pylabtools import figsize


Now read in the data we have so far from the different workshops

# Workshop 1
## Reading in the data

In [33]:
# Read in Airbeam data

filenames = os.listdir("data/Workshop Data/Workshop 1")
print(filenames)

for i in range(len(filenames)):
    filenames[i] = "data/Workshop Data/Workshop 1/" + filenames[i]

w1 = [pd.read_csv(filename, header = 2) for filename in filenames]

for i in range(len(filenames)):
    w1[i]["Timestamp"] = pd.to_datetime(w1[i].Timestamp)

print()   

# Read in the data
w1_percep = pd.read_csv("data/Workshop data/Perception/W1_perception.csv",float_precision = 'high')

# Check the results
print(w1_percep.head(5))

['Airbeam V1 - Temperature.csv', 'Airbeam V1 - PM2.5.csv', 'Airbeam V1 - Humidity.csv']

      source  Location  gesture       lon        lat
0   Device 1         1        1 -0.006205  51.509988
1  Device 10         1        4 -0.004851  51.510117
2  Device 12         1        5 -0.004498  51.509808
3  Device 13         1        4 -0.006051  51.510064
4  Device 15         1        2 -0.004816  51.509993


This data was already cleaned up a little before processing here. The location column was added and erroneous values removed. However there are still some pieces of data missing. Lets take a look at a brief summary

In [3]:
w1.describe().head(1) # Look at count of data

Unnamed: 0,Location,gesture,lon,lat
count,84.0,84.0,61.0,61.0


We registerd a total of 84 measurements by location and gesture, however we are missing 23 values for locations. This is because, as part of the experiment we have a *neutral* point for *OK* air quality. This is not associated with any gesture that is recorded in the tables. As such, if over the course of a walk we notice that there is no gesture associated with a variable at that point, then we add in a reading of *Gesture 3* at that location. Therefore, we force the **lon** and **lat** at those points to the **lon/lat** of the location. So, to clean this up a little:

## Clean up

In [4]:
# Initialize locations for the firt workshop
w1_loc = np.array([[ 51.509841, -0.0047257],
       [ 51.510383, -0.003696],
       [ 51.511037, -0.002661],
       [ 51.511411, -0.003664],
       [ 51.510954, -0.004286],
       [ 51.510426, -0.005086]])


w1_loc = pd.DataFrame(w1_loc)
w1_loc.columns = ['lat','lon']
w1_loc.index += 1

w1.loc[w1['lon'].isnull(),'lon'] = w1['Location'].map(w1_loc.lon)
w1.loc[w1['lat'].isnull(),'lat'] = w1['Location'].map(w1_loc.lat)

w1.describe().head(1) 

Unnamed: 0,Location,gesture,lon,lat
count,84.0,84.0,84.0,84.0


There we go. Now we have a cleaner dataset and we can work on looking at the perception data in comparison with the data that we read in.

### Plot the data

# Workshop 2

In [34]:
# Read in Airbeam data

filenames = os.listdir("data/Workshop Data/Workshop 2")
print(filenames)

for i in range(len(filenames)):
    filenames[i] = "data/Workshop Data/Workshop 2/" + filenames[i]

w2 = [pd.read_csv(filename, header = 2) for filename in filenames]

for i in range(len(filenames)):
    w2[i]["Timestamp"] = pd.to_datetime(w2[i].Timestamp)

print()   

# Read in the data
w2_percep = pd.read_csv("data/Workshop data/Perception/W2_perception.csv",float_precision = 'high')

w2_

# Check the results
print(w2_percep.head(5))

['Airbeam V1 - Temperature.csv', 'Airbeam V2 - b - PM 2.5.csv', 'Airbeam V2 - a - Humidity.csv', 'Airbeam V2 - a - PM 1.csv', 'Airbeam V2 - b - PM10.csv', 'Airbeam V2 - b - Humidity.csv', 'Airbeam V1 - PM 2.5.csv', 'Airbeam V2 - a - PM 10.csv', 'Airbeam V1 - Humidity.csv', 'Airbeam V2 - a - Temperature.csv', 'Airbeam V2 - b - PM 1.csv', 'Airbeam V2 - a - PM 2.5.csv', 'Airbeam V2 - b - Temperature.csv']

     id            timestamp       lon        lat  gesture  workshop_id  \
0  7234  2018-03-28 12:05:54 -0.013466  51.518584        4            3   
1  7235  2018-03-28 12:06:00 -0.013353  51.518540        4            3   
2  7236  2018-03-28 12:06:04 -0.013400  51.518504        2            3   
3  7237  2018-03-28 12:06:04 -0.013455  51.518759        2            3   
4  7238  2018-03-28 12:06:04 -0.013390  51.518503        2            3   

      source  
0  Device 13  
1   Device 1  
2  Device 12  
3  Device 20  
4   Device 4  


# Workshop 3

In [31]:
# Read in Airbeam data

filenames = os.listdir("data/Workshop Data/Workshop 3")
print(filenames)

for i in range(len(filenames)):
    filenames[i] = "data/Workshop Data/Workshop 3/" + filenames[i]

w3 = [pd.read_csv(filename, header = 2) for filename in filenames]

for i in range(len(filenames)):
    w3[i]["Timestamp"] = pd.to_datetime(w3[i].Timestamp)
    
print()
    
# Read in the data
w3_clean = pd.read_csv("data/Workshop data/Perception/W3_perception.csv",float_precision = 'high')
w3_percep = pd.DataFrame(w3_clean)

# Check the results
print(w3_percep.head(5))

['Microphone - Mic.csv', 'Airbeam V2 - b - PM 2.5.csv', 'Airbeam V2 - b - PM 10.csv', 'Airbeam V2 - a - Humidity.csv', 'Airbeam V2 - a - PM 1.csv', 'Airbeam V1 - Termperature.csv', 'Airbeam V2 - b - Humidity.csv', 'Airbeam V1 - PM 2.5.csv', 'Airbeam V2 - a - PM 10.csv', 'Airbeam V1 - Humidity.csv', 'Airbeam V2 - a - Temperature.csv', 'Airbeam V2 - b - PM 1.csv', 'Airbeam V2 - a - PM 2.5.csv', 'Airbeam V2 - b - Temperature.csv']

     id            timestamp       lon        lat  gesture  workshop_id  \
0  7272  2018-03-31 11:38:08 -0.009107  51.491517        4            4   
1  7273  2018-03-31 11:38:13 -0.009109  51.491518        4            4   
2  7274  2018-03-31 11:38:17 -0.011903  51.487252        2            4   
3  7275  2018-03-31 11:38:18 -0.009119  51.491494        2            4   
4  7276  2018-03-31 11:38:18 -0.009227  51.491518        4            4   

      source  
0   Device 5  
1   Device 4  
2  Device 17  
3  Device 12  
4  Device 19  
