# Sonyc Moving Window Abnormal Analysis

**Cowork by Xurui Chen and Pengzi Li**

[Get Moving Window](#Get_Moving_Window)

[Abnormal Detection](#Abnormal_Detection)

In [10]:
import pandas as pd
import numpy as np
import utils
import matplotlib as plt
import seaborn as sns
import matplotlib.ticker as ticker
import ipywidgets as widgets
import matplotlib.dates as mdates
from datetime import datetime

In [2]:
matplotlib.rcParams['timezone'] = 'America/New_York'
df = pd.read_csv('sonycnode-b827eb491436.sonyc.csv')

In [3]:
df['time'] = pd.to_datetime(df['timestamp'].values.astype(np.int64), unit='s')
df['time'] = df['time'].dt.tz_localize('UTC').dt.tz_convert('America/New_York')
df.set_index(pd.DatetimeIndex(df['time']), inplace=True)
df.drop(['time', 'timestamp'], axis=1, inplace=True)

In [4]:
df['weekday'] = df.index.weekday
df['min_of_day'] = (df.index.hour * 60.0) + df.index.minute
df['hour_of_day'] = df.index.hour

In [5]:
df.head()

Unnamed: 0_level_0,dBAS,weekday,min_of_day,hour_of_day
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-04-11 11:14:09-04:00,64.35,3,674.0,11
2019-04-11 11:14:10-04:00,65.92,3,674.0,11
2019-04-11 11:14:11-04:00,69.24,3,674.0,11
2019-04-11 11:14:12-04:00,71.78,3,674.0,11
2019-04-11 11:14:13-04:00,70.7,3,674.0,11


## Get_Moving_Window

Get moving average and std in 60s. 

In [11]:
rolling_mean = df.dBAS.rolling(window=60).mean()

In [14]:
rolling_std = df.dBAS.rolling(window=60).std()

In [15]:
df['rolling_mean'] = rolling_mean

In [16]:
df['rolling_std'] = rolling_std

In [17]:
df.head()

Unnamed: 0_level_0,dBAS,weekday,min_of_day,hour_of_day,rolling_mean,rolling_std
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-11 11:14:09-04:00,64.35,3,674.0,11,,
2019-04-11 11:14:10-04:00,65.92,3,674.0,11,,
2019-04-11 11:14:11-04:00,69.24,3,674.0,11,,
2019-04-11 11:14:12-04:00,71.78,3,674.0,11,,
2019-04-11 11:14:13-04:00,70.7,3,674.0,11,,


## Abnormal_Detection

In [20]:
# Print high level information
print('')
print('Start: \t %s' % df.index[0])
print('End: \t %s' % df.index[-1])
print('')
print('Total avg SPL: \t %0.2f dBA' % df['dBAS'].mean())
print('Total std SPL: \t %0.2f dBA' % df['dBAS'].std())
print('Total max SPL: \t %0.2f dBA' % df['dBAS'].max())
print('Total min SPL: \t %0.2f dBA' % df['dBAS'].min())


Start: 	 2019-04-11 11:14:09-04:00
End: 	 2019-04-23 09:02:06-04:00

Total avg SPL: 	 58.60 dBA
Total std SPL: 	 7.72 dBA
Total max SPL: 	 107.73 dBA
Total min SPL: 	 41.88 dBA


From overall average and standard deviation, we assume that if the noise higher 3 moving_std than this window time moving_avg, there is a truck pass by.

In [21]:
def truckDetection(df):
    return(df[(df['dBAS'] - df['rolling_mean']> 3*df['rolling_std'])])

In [23]:
truckNoisePoint = truckDetection(df)

In [24]:
truckNoisePoint.shape

(23799, 6)

In [26]:
truckNoisePoint.head()

Unnamed: 0_level_0,dBAS,weekday,min_of_day,hour_of_day,rolling_mean,rolling_std
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-11 11:26:36-04:00,73.76,3,686.0,11,59.271167,4.547259
2019-04-11 11:26:40-04:00,80.16,3,686.0,11,60.348667,5.996387
2019-04-11 11:26:41-04:00,87.45,3,686.0,11,60.878667,6.909823
2019-04-11 11:30:25-04:00,80.5,3,690.0,11,62.893833,5.619611
2019-04-11 11:38:40-04:00,82.62,3,698.0,11,61.627167,5.737159


These are unaggregated truck noise data point.

Analyze by each day.

In [36]:
truckNoisePoint['Date'] = truckNoisePoint.index.to_frame()['time'].apply(lambda x: x.date()).to_list()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [42]:
truckNoiseDate = truckNoisePoint.groupby('Date').size().to_frame().rename(columns = {0: 'DateCount'})

In [43]:
truckNoiseDate

Unnamed: 0_level_0,DateCount
Date,Unnamed: 1_level_1
2019-04-11,216
2019-04-12,2112
2019-04-13,1717
2019-04-14,1679
2019-04-15,2113
2019-04-16,1838
2019-04-17,1831
2019-04-18,2112
2019-04-19,2009
2019-04-20,2079


Future: This is the analysis in the raw data (record in second) with the 60s moving window, which means if a data point is higher 3 moving_std (the std of previous 60s data point) than moving_avg (the mean of previous 60s data point), this will be the noise data. But a truck might go through our study area and continue to make seconds or minutes noise. For the next step, we should aggregate the noise data point in seconds or 1 minute and get a more accurate truck count.