# CPC Logging Validation

This notebook contains code associated with validating and analyzing CPC/Raspi logging associated with the sensor box prior to deployment in Revere in March 2021. I will start by just loading and exploring the datasets, and then I will explore the extent of the accuracy and reliability associated with the raspberry pi's data logging.

In [1]:
import bisect
import datetime as dt
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats

First, parse the data recorded by the CPC itself.

In [5]:
cpc = pd.read_csv('data/cpc-logged-data.txt', sep='\t');
cpc.describe()


Unnamed: 0,aveconc,concent,rawconc,cnt_sec,condtmp,satttmp,satbtmp,optctmp,inlttmp,smpflow,...,condpwr,sattpwr,satbpwr,optcpwr,satfpwr,exhfpwr,fillcnt,err_num,mcpcpmp,mcpcpwr
count,91185.0,91185.0,91185.0,91185.0,91185.0,91185.0,91185.0,91185.0,91185.0,91185.0,...,91184.0,91184.0,91184.0,91184.0,91184.0,91184.0,91184.0,91184.0,91184.0,91184.0
mean,637.529407,637.6094,630.513211,3559.149597,21.978993,47.421997,46.996892,32.09804,31.742437,338.022361,...,249.318422,0.514443,66.05774,17.678705,43.170622,200.0,0.002358,0.0,1.0,1.0
std,1227.801271,1227.686741,1208.463286,6823.93614,2.05786,2.187245,2.247951,2.092854,2.238682,1.856988,...,9.104697,9.8256,28.358681,16.470893,1.318105,0.0,0.057406,0.0,0.0,0.0
min,0.0,0.0,0.0,0.0,18.4,18.8,18.8,19.1,19.2,71.0,...,0.0,0.0,0.0,0.0,29.0,200.0,0.0,0.0,1.0,1.0
25%,0.0,0.53,0.53,3.0,20.6,46.0,45.6,30.7,30.5,337.0,...,250.0,0.0,45.0,0.0,42.0,200.0,0.0,0.0,1.0,1.0
50%,99.0,99.87,99.87,562.0,21.3,46.8,46.4,31.4,31.3,338.0,...,250.0,0.0,45.0,26.0,43.0,200.0,0.0,0.0,1.0,1.0
75%,506.0,505.0,505.0,2854.0,22.8,48.3,47.9,33.0,32.8,339.0,...,250.0,0.0,91.0,26.0,44.0,200.0,0.0,0.0,1.0,1.0
max,5730.0,5730.0,5604.0,31177.0,28.8,54.4,53.9,38.9,37.7,427.0,...,250.0,200.0,200.0,200.0,73.0,200.0,3.0,0.0,1.0,1.0


Next, parse the data collected on the raspi.

In [3]:
raspi = pd.read_csv('data/raspi-logged-data.csv');
raspi.describe()

Unnamed: 0,concent,rawconc,cnt_sec,condtmp,satttmp,satbtmp,optctmp,inlttmp,smpflow,satflow,...,condpwr,sattpwr,satbpwr,optcpwr,satfpwr,exhfpwr,fillcnt,err_num,mcpcpmp,mcpcpwr
count,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,...,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,54272.0,0.0,0.0
mean,832.132866,822.539761,4643.132812,20.970677,46.456383,46.026273,31.088816,30.818142,338.060455,338.109393,...,249.602189,0.0,65.569926,17.94277,42.597564,200.0,0.002248,0.0,,
std,1374.440554,1352.582235,7638.008854,0.776005,0.760655,0.778519,0.795496,1.029014,1.531202,1.63755,...,0.82069,0.0,26.013666,15.011424,0.531678,0.0,0.053222,0.0,,
min,0.0,0.0,0.0,19.8,45.3,44.7,29.8,28.6,318.0,329.0,...,247.0,0.0,0.0,0.0,40.0,200.0,0.0,0.0,,
25%,0.88,0.88,5.0,20.1,45.7,45.2,30.3,30.1,337.0,337.0,...,250.0,0.0,45.0,0.0,42.0,200.0,0.0,0.0,,
50%,219.1,219.1,1234.0,21.0,46.5,46.1,31.2,31.0,338.0,338.0,...,250.0,0.0,45.0,26.0,43.0,200.0,0.0,0.0,,
75%,738.8,738.8,4161.0,21.5,47.0,46.6,31.6,31.5,339.0,340.0,...,250.0,0.0,91.0,26.0,43.0,200.0,0.0,0.0,,
max,5078.0,4979.0,28193.0,22.9,48.3,48.0,33.1,33.0,347.0,343.0,...,250.0,0.0,200.0,200.0,45.0,200.0,2.0,0.0,,


# Pre-Processing
I'll start by converting all of the timestamps in each of the files to indexed datetime objects.

In [50]:
# For CPC data.
index = []
for date, time in zip(cpc['#YY/MM/DD'], cpc['HR:MN:SC']):
    Y = int(date[:2]) + 2000
    M = int(date[3:5])
    D = int(date[6:8])
    H = int(time[:2])
    T = int(time[3:5])
    S = int(time[6:8])
    index.append(dt.datetime(Y, M, D, H, T, S))
cpc.index = index

In [51]:
# For Raspi Data
index = []
for timestamp in raspi['#YY/MM/DD:HR:MN:SC']:
    Y = int(timestamp[:4])
    M = int(timestamp[5:7])
    D = int(timestamp[8:10])
    H = int(timestamp[11:13])
    T = int(timestamp[14:16])
    S = int(timestamp[17:19]) # Truncates value instead of rounding
    index.append(dt.datetime(Y, M, D, H, T, S))
raspi.index = index

# Timestamp Observations
TODO: 
1. Confirm that the CPC is always 1 Hz the whole time 
2. See how off the raspi is as an overall chunk of time 
3. See how off the raspi is in terms of frequency
4. Characterize the ways that are off in ways that are relevant to implementation

# Data Validation
TODO 
1. Validate that data at specific timestamps matches on both logs
4. Characterize the ways that are off in ways that are relevant to implementation

In [1]:
cpc.index
raspi.index

NameError: name 'cpc' is not defined