# Exploring Volume Relationships
Author(s): Raphael Dumas
*Attempting to decipher the relationships between the various tables in the FLOW database*

In [3]:
from tabulate import tabulate
import pandas as pd
import pandas.io.sql as pandasql
import datetime
import configparser
from psycopg2 import connect

In [4]:
CONFIG = configparser.ConfigParser()
CONFIG.read('db.cfg')
dbset = CONFIG['DBSETTINGS']
#Setting up postgresql connection
con = connect(database=dbset['database'],
              host=dbset['host'],
              user=dbset['user'],
              password=dbset['password'])

## countinfo & countinfomics
Both tables appear to be the center of count information. Confusingly they both have a primary key of `count_info_id` for which there is some overlap. 

In [8]:
sql = ''' SELECT COUNT(DISTINCT c.count_info_id) AS "countinfo count",
 COUNT(DISTINCT cim.count_info_id) "countinfomics count",
 SUM(CASE WHEN c.count_info_id = cim.count_info_id THEN 1 ELSE 0 END) AS "Both"
 FROM traffic.countinfo c
 FULL OUTER JOIN traffic.countinfomics cim ON c.count_info_id = cim.count_info_id'''

data = pandasql.read_sql(sql, con)
print(tabulate(data, headers="keys", tablefmt="pipe"))

|    |   countinfo count |   countinfomics count |   Both |
|---:|------------------:|----------------------:|-------:|
|  0 |            762388 |                 20455 |   4777 |


### countinfomics and the `cal` and `det` tables
From the query below and [`cal_dictionary.md`](cal_dictionary.md) and [`det_dictionary.md`](det_dictionary.md) the `count_info_id` column from `countinfomics` is a foreign key in both tables. Thus countinfomics contains exclusively turning movement counts. 

In [9]:
sql = '''SELECT COUNT(DISTINCT c.count_info_id) AS "countinfomics count",
COUNT(DISTINCT det.count_info_id) "det count",
SUM(CASE WHEN c.count_info_id = det.count_info_id THEN 1 ELSE 0 END) AS "Both"
FROM traffic.countinfomics c
FULL OUTER JOIN traffic.det det ON c.count_info_id = det.count_info_id'''

data = pandasql.read_sql(sql, con)
print(tabulate(data, headers="keys", tablefmt="pipe"))

|    |   countinfomics count |   det count |   Both |
|---:|----------------------:|------------:|-------:|
|  0 |                 20455 |       20454 | 661937 |


This is further proven by joining to the `category` table

In [11]:
sql = '''SELECT category_name, COUNT(*)
FROM traffic.countinfomics
NATURAL JOIN traffic.category
GROUP BY category_name
ORDER BY count DESC'''

data = pandasql.read_sql(sql, con)
print(tabulate(data, headers="keys", tablefmt="pipe"))

|    | category_name   |   count |
|---:|:----------------|--------:|
|  0 | MANUAL          |   20455 |


### countinfo
The counts in `countinfo` appear to be more automated, here is the breakdown of count sources.

In [12]:
sql = '''SELECT source1, COUNT(1)
FROM traffic.countinfo 
GROUP BY source1
ORDER BY count DESC'''

data = pandasql.read_sql(sql, con)
print(tabulate(data, headers="keys", tablefmt="pipe"))

|    | source1   |   count |
|---:|:----------|--------:|
|  0 | RESCU     |  298463 |
|  1 |           |  264233 |
|  2 | JAMAR     |   92935 |
|  3 | PERMSTA   |   61114 |
|  4 | TRANSCORE |   41555 |
|  5 | 24HOUR    |    3163 |
|  6 | SENSYS    |     645 |
|  7 | TRANSUITE |     267 |
|  8 | MTO       |       8 |
|  9 | MTSS      |       5 |


## Counts over time
This shows the number of both types of counts over time

In [14]:
sql = '''SELECT extract('year' FROM COALESCE(c.count_date, cim.count_date)) AS "Year", COUNT(DISTINCT c.count_info_id) AS "countinfo count",
 COUNT(DISTINCT cim.count_info_id) "countinfomics count"
FROM traffic.countinfo c
FULL OUTER JOIN traffic.countinfomics cim ON c.count_date = cim.count_date::DATE
GROUP BY "Year"'''

data = pandasql.read_sql(sql, con, index_col='Year')
print(tabulate(data, headers="keys", tablefmt="pipe"))

|      |   countinfo count |   countinfomics count |
|-----:|------------------:|----------------------:|
| 1984 |                 0 |                   565 |
| 1985 |                 0 |                   495 |
| 1986 |                 0 |                   585 |
| 1987 |                 0 |                   408 |
| 1988 |                 0 |                   143 |
| 1989 |                 0 |                   689 |
| 1990 |                 0 |                   376 |
| 1991 |                 0 |                   591 |
| 1992 |                 0 |                   408 |
| 1993 |              2627 |                   488 |
| 1994 |              2162 |                   452 |
| 1995 |             16524 |                   600 |
| 1996 |             24063 |                   566 |
| 1997 |             24117 |                   493 |
| 1998 |             27771 |                   491 |
| 1999 |             26263 |                   635 |
| 2000 |             24676 |                  

## Geographic data
[`gis_table_schema.sql`](sql/gis_table_schema.sql) contains the `sql` for the tables containing information appearing to be geographic thus far, excluding `arterydata`. Aakash's investigation shows that 1/3 of rows in `arterydata` have a `geo_id` which matches up with the `geo_id` in `gis.street_centreline`. Otherwise rows in `arterydata` seem to link up to nodes.