# Exploring relationship between cod & DeviceClass
Per [our documentation](https://github.com/CityofToronto/bdit_data-sources/blob/master/bluetooth/README.md#2-table-structure-aakash), the `device_class` field is configured to be a combination of different bit flags depending on the report configuration. Some of these reports are supposed to align with the `cod` field, which represents the Bluetooth Class of Device, in order to identify what is a Bluetooth Device or a WiFi one.

In [1]:
from psycopg2 import connect
import psycopg2.sql as pg
import configparser
import datetime
%matplotlib inline
import numpy as np
import pandas as pd
import pandas.io.sql as pandasql
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
sns.set(color_codes=True)
from IPython.display import HTML
def print_table(sql, con):
    return HTML(pandasql.read_sql(sql, con).to_html(index=False))

In [2]:
CONFIG = configparser.ConfigParser()
CONFIG.read('../../db.cfg')
dbset = CONFIG['DBSETTINGS']
con = connect(**dbset)

From the `bluetooth.all_analyses` table we can see which set of device classes is used for which routes. "BT or WiFi" is used for arterials, including the DT set of routes. 1 is Bluetooth, 2 is WiFi. This is how observations are distributed between the two classes in November 2017 for all those reports

In [5]:
sql = '''SELECT device_class, COUNT(1)
FROM bluetooth.observations_201711
INNER JOIN bluetooth.all_analyses USING (analysis_id)
WHERE device_class_set_name = 'BT or WiFi' 
GROUP BY device_class
order by count DESC'''

print_table(sql, con)

Unnamed: 0,device_class,count
0,2,13805703
1,1,5426366


So about **39%** of these observations are from Bluetooth devices, which is in line with previous analysis showing **>60% WiFi** devices on Adelaide. 
How well do these match up with the class of device reported by Bluetooth devices (I know this is confusing)

In [10]:
sql = '''WITH total_cnt AS (SELECT COUNT(1) AS total 
FROM bluetooth.observations_201711
INNER JOIN bluetooth.all_analyses USING (analysis_id)
WHERE device_class_set_name = 'BT or WiFi' )

SELECT device_class, CASE WHEN cod = 0 THEN 'WiFi' ELSE 'Bluetooth' END AS cod_val, COUNT(1),
to_char(100.0*COUNT(1)/total, '99.9%') AS "Percent of observations"
FROM bluetooth.observations_201711
INNER JOIN bluetooth.all_analyses USING (analysis_id)
CROSS JOIN total_cnt
WHERE device_class_set_name = 'BT or WiFi' 
-- AND cod = 5898764
GROUP BY device_class, cod_val, total
order by count DESC'''
print_table(sql, con)

device_class,cod_val,count,Percent of observations
2,WiFi,13805505,71.8%
1,Bluetooth,5233311,27.2%
1,WiFi,193055,1.0%
2,Bluetooth,198,.0%


Approximately **1.0%** of all observations on these routes are being misclassified by this filter, or approximately **3%** of observations identified as "Bluetooth" by this filter report no cod.

In [14]:
sql = '''SELECT report_name,
COUNT(1) AS "Number of Observations",
to_char(100.0*SUM(CASE WHEN device_class = 2 AND cod = 0 THEN 1 ELSE 0 END)/COUNT(1), '90.9%') AS "device_class = WiFi, cod = WiFi",
to_char(100.0*SUM(CASE WHEN device_class = 1 AND cod != 0 THEN 1 ELSE 0 END)/COUNT(1), '90.9%') AS "device_class = BT, cod = BT",
to_char(100.0*SUM(CASE WHEN device_class = 1 AND cod = 0 THEN 1 ELSE 0 END)/COUNT(1), '90.9%') AS "device_class = BT, cod = WiFi",
to_char(100.0*SUM(CASE WHEN device_class = 1 AND cod = 0 THEN 1 ELSE 0 END)/SUM(CASE WHEN device_class =1 THEN 1 END), '90.9%') AS "cod = WiFi / device_class = BT",
to_char(100.0*SUM(CASE WHEN device_class = 2 AND cod != 0 THEN 1 ELSE 0 END)/COUNT(1), '90.9%') AS "device_class = WiFi, cod = BT"

FROM bluetooth.observations_201711
INNER JOIN bluetooth.all_analyses USING (analysis_id)

WHERE device_class_set_name = 'BT or WiFi' AND report_name LIKE 'DT-%'
GROUP BY report_name
order by "device_class = BT, cod = WiFi" DESC'''
print_table(sql, con)

report_name,Number of Observations,"device_class = WiFi, cod = WiFi","device_class = BT, cod = BT","device_class = BT, cod = WiFi",cod = WiFi / device_class = BT,"device_class = WiFi, cod = BT"
DT-0046. Adelaide-EB_Yonge-to-Jarvis,84475,60.3%,36.6%,3.1%,7.8%,0.0%
DT-0051. King-EB_Strachan-to-Bathurst,25557,42.8%,54.1%,3.1%,5.5%,0.0%
DT-0063. King-WB_Spadina-to-Bathurst,17203,45.2%,51.9%,2.9%,5.4%,0.0%
DT-0047. Adelaide-EB_Jarvis-to-Parliament,67011,56.6%,40.6%,2.8%,6.5%,0.0%
DT-0068. Wellington-WB_Yonge-to-University,43437,64.9%,32.4%,2.7%,7.8%,0.0%
DT-0064. King-WB_Bathurst-to-Strachan,20357,36.9%,60.3%,2.7%,4.3%,0.0%
DT-0037. Eastern_Richmond-WB_Broadview-to-Parl...,27761,63.6%,33.8%,2.6%,7.1%,0.0%
DT-0094. Bathurst-SB_King-to-Front,20056,31.2%,66.2%,2.6%,3.8%,0.0%
DT-0045. Adelaide-EB_University-to-Yonge,100679,68.0%,29.4%,2.6%,8.1%,0.0%
DT-0037. Eastern/Richmond-WB_Broadview-to-Parl...,27761,63.6%,33.8%,2.6%,7.1%,0.0%


I was expecting to find some surprising variation in the fifth column: that some routes have a high percentage of these contradictory BT vs. WiFi classifications. Instead they vary steadily between **0.4%** and **3.1%** of total observations. As a _percentage of `device_class`=Bluetooth_ observations, this can be as high as **8.9%**. There doesn't appear to be any discernible pattern to this contradictory classification, however. Amongst the routes with the highest proportion of contradictory observations, the ratio of contradictory observations to Bluetooth observations varies between 3 and 8%.

The **more stunning finding** of the above table, is the **amount of variation in the proportion of Bluetooth and WiFi devices on the different routes**: from 56.6% to 90%. The below table shows absolute numbers of BT observations and the contradictory observations and the percentage of observations which are WiFi. 

There's a commonality across all the routes with a minority of WiFi observations: King and Bathurst. In subsequent months there are **no** WiFi observations, implying that the WiFi antenna failed in mid-November. 

In [15]:
sql = '''SELECT report_name,
to_char(100.0*SUM(CASE WHEN device_class = 2 AND cod = 0 THEN 1 ELSE 0 END)/COUNT(1), '90.9%') AS "device_class = WiFi, cod = WiFi",
SUM(CASE WHEN device_class = 1 AND cod != 0 THEN 1 ELSE 0 END) AS "device_class = BT, cod = BT",
SUM(CASE WHEN device_class = 1 AND cod = 0 THEN 1 ELSE 0 END) AS "device_class = BT, cod = WiFi",
SUM(CASE WHEN device_class = 2 AND cod != 0 THEN 1 ELSE 0 END) AS "device_class = WiFi, cod = BT"

FROM bluetooth.observations_201711
INNER JOIN bluetooth.all_analyses USING (analysis_id)

WHERE device_class_set_name = 'BT or WiFi' AND report_name LIKE 'DT-%'
GROUP BY report_name
order by "device_class = WiFi, cod = WiFi" DESC'''
print_table(sql, con)

report_name,"device_class = WiFi, cod = WiFi","device_class = BT, cod = BT","device_class = BT, cod = WiFi","device_class = WiFi, cod = BT"
DT-0053. King-EB_Spadina-to-University,90.0%,8144,512,2
DT-0062. King-WB_University-to-Spadina,89.3%,6682,446,0
DT-0124. Yonge-NB_Queen-to-Dundas,89.2%,12250,1148,3
DT-0055. King-EB_Yonge-to-Jarvis,89.2%,8347,443,2
DT-0060. King-WB_Jarvis-to-Yonge,89.1%,6917,403,2
DT-0119. Yonge-SB_Dundas-to-Queen,88.6%,13739,947,5
DT-0023. Queen-EB_Spadina-to-University,88.0%,11963,623,1
DT-0030. Queen-WB_Jarvis-to-Yonge,87.7%,10612,655,2
DT-0108. Spadina-NB_Queen-to-Dundas,87.5%,8691,286,0
DT-0025. Queen-EB_Yonge-to-Jarvis,87.5%,11239,708,1


## Highways
Highways mostly have the "TrafficFlow" deviceclassset, this is what the bit mapping for that looks like

In [11]:
sql = '''SELECT x.* FROM bluetooth.all_analyses, json_to_recordset(bluetooth.all_analyses.outcomes) AS x("deviceClassMask" int, id int, name TEXT) 
WHERE analysis_id = 1390125
'''

print_table(sql, con)

deviceClassMask,id,name
16,1390209,BT
64,1390208,Both
32,1390210,WiFi


But! This is what those look like in the `observations` table.

In [13]:
sql= '''SELECT device_class, CASE WHEN cod = 0 THEN 'WiFi' ELSE 'Bluetooth' END AS cod_val, COUNT(1)
FROM bluetooth.observations_201710
INNER JOIN bluetooth.all_analyses USING (analysis_id)
WHERE device_class_set_name = 'TrafficFlow' AND analysis_id!=1388022
GROUP BY device_class, cod_val 
ORDER BY count DESC
'''

print_table(sql, con)

device_class,cod_val,count
96,WiFi,1111481
80,Bluetooth,906246
98,WiFi,137354
103,WiFi,103757
82,Bluetooth,61339
87,Bluetooth,35813
97,WiFi,22655
80,WiFi,18835
81,Bluetooth,6932
82,WiFi,1194


The top two values of 96 and 80 are sums of 64 and 32 and 16 respectively, so the `Both` bit is being set regardless of whether the device is Bluetooth of WiFI.
We can also see that there's some crossover, like in the Arterial example above, {80, WiFi}, and {96, Bluetooth} make a _very minor_ appearance