# Queries

### NOTE: All queries made within the month of January 2010, in the San Diego geographic area
1. What traffic station has the largest difference in average speed over the first two weeks of the month?
2. How significant is the difference in traffic throughput on a rainy day vs a non-rainy day?
3. Is there an increase in CHP traffic incidents throughout the day?
4. Does trace amount of precipitation affect the number of CHP traffic incidents on a given day?  (Trace precipitation is defined as a weather station registering precipitation but less than the unit granularity of the sensor)
5. Does weather have an effect on the severity of CHP incidents in a given day?
6. Identify the top 5 freeways with respect to traffic throughput.
7. Identify the top 5 freeways with respect to traffic speed.
8. Is the traffic throughput of one freeway indicative of others?

In [1]:
import dbtemplate as dbt
import numpy as np
import pandas as pd
from dbtemplate import StatementExecutorTemplateCallback
from dbtemplate import StatementExecutorTemplate

In [2]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [3]:
db_name = 'dyerke'
username = 'dyerke'
password = 'dyerke'
hostname = 'localhost'
port = 5432

template = StatementExecutorTemplate(db_name, username, password, hostname, port)

# Results

#### 1. What traffic station has the largest difference in average speed over the first two weeks of the month?

In [10]:
class InternalWeekOneCallback(StatementExecutorTemplateCallback):
    def _get_query(self):
        query= """
        select t.pems_id, t.name, avg(o.avg_speed) as week_one_avg_speed
        from traffic_station t
        inner join observation o on t.id = o.station_id
        where o.time between '2010-01-01 00:00:00' and '2010-01-07 23:59:59'
        group by t.pems_id, t.name
        """
        return query
    
    def _map_row(self, row):
        m_pems_id, m_traffic_station_name, week_one_avg_speed= row
        return (m_pems_id, m_traffic_station_name, week_one_avg_speed)

class InternalWeekTwoCallback(StatementExecutorTemplateCallback):
    def _get_query(self):
        query= """
        select t.pems_id, t.name, avg(o.avg_speed) as week_two_avg_speed
        from traffic_station t
        inner join observation o on t.id = o.station_id
        where o.time between '2010-01-08 00:00:00' and '2010-01-14 23:59:59'
        group by t.pems_id, t.name
        """
        return query
    
    def _map_row(self, row):
        m_pems_id, m_traffic_station_name, week_two_avg_speed= row
        return (m_pems_id, m_traffic_station_name, week_two_avg_speed)

week_one_callback= InternalWeekOneCallback()
week_two_callback= InternalWeekTwoCallback()

m_week_one_list= template.execute(week_one_callback)
m_week_two_list= template.execute(week_two_callback)

week_one_df= dbt.to_data_frame(m_week_one_list, ['pems_id', 'traffic_station_name', 'week_one_avg_speed'])
week_two_df= dbt.to_data_frame(m_week_two_list, ['pems_id', 'traffic_station_name', 'week_two_avg_speed'])

if week_one_df is None or week_two_df is None:
    print "No Results"
else:
    m_working_df= pd.merge(week_one_df, week_two_df, on=['pems_id', 'traffic_station_name'])
    m_working_df['delta']= m_working_df['week_two_avg_speed'] - m_working_df['week_one_avg_speed']
    m_working_df.sort('delta', ascending=False, inplace=True)
    m_result= m_working_df.iloc[0]
    m_result

No Results


#### 2. How significant is the difference in traffic throughput on a rainy day vs a non-rainy day?