# Lab 3 - Joining Uber Pick-Ups, Stations, and Boroughs - SQL

FiveThirtyEight obtained information about all Uber pick-ups in NYC for two 6 month periods through a FOIA request.  More information about the data and subsequent analyses can be found [here](https://github.com/fivethirtyeight/uber-tlc-foil-response).

The combined data are too large for binder, so we have included a sample of 100,000 rows from each table.  In this exercise, you will join the base name and taxi zone name onto the original data from Jan-June 2015, add variaous date parts to the data, and finally aggregate the data to answer some questions about the frequency of Uber pick-ups.

The sampled data have been provided in the SQLite database named `uber_samples.db`.

In [2]:
%load_ext pyensae
%SQL_connect ./databases/uber_samples.db
%SQL_tables

['apr14',
 'aug14',
 'base_lookup',
 'janjune15',
 'jul14',
 'jun14',
 'may14',
 'sep14',
 'taxi_zone_lookup']

## <font color="red"> Problem 1 - Inspect the column names for the three tables of interest.

First, we will focus on the January-June 2015 data.  **Inspect the column schema for this table.**

In [3]:
%SQL_schema janjune15

{0: ('Dispatching_base_num', str),
 1: ('Pickup_date', datetime.datetime),
 2: ('Affiliated_base_num', str),
 3: ('locationID', int)}

The name taxi zone for each location is provided in `taxi-zone-lookup` and the Uber base station names can be found in `base_lookup`.  **Inspect the column schema for each table.**

In [4]:
%SQL_schema taxi_zone_lookup

{0: ('LocationID', int), 1: ('Borough', str), 2: ('Zone', str)}

In [5]:
%SQL_schema base_lookup

{0: ('base_code', str), 1: ('base_name', str)}

Your next task will be joining all the names onto the January-June 2015 data.  **Discuss the columns that will need to be joined.  What type of joins?**

* We need to join `base_lookup` to `janjune15` using `janjune15.Dispatching_base_num == base_lookup.base_code`
* We need to join `taxi_zone_lookup` to `janjune15` using `janjune15.locationID == taxi_zone_lookup.LocationID`

## <font color="red"> Problem 2 - Joining the tables</font>

**Use SQL to join the three tables together into one table.**

In [6]:
%%SQL
SELECT janjune15.Pickup_date,   base_lookup.base_name, taxi_zone_lookup.Borough, taxi_zone_lookup.Zone 
FROM janjune15 LEFT JOIN base_lookup ON janjune15.Dispatching_base_num == base_lookup.base_code LEFT JOIN taxi_zone_lookup ON janjune15.locationID == taxi_zone_lookup.LocationID 

Unnamed: 0,Pickup_date,base_name,Borough,Zone
0,2015-05-17 09:49:00.000000,Weiter,Manhattan,Upper West Side North
1,2015-05-17 09:56:00.000000,Weiter,Manhattan,East Village
2,2015-05-17 10:25:00.000000,Weiter,Manhattan,Upper West Side North
3,2015-01-18 18:44:30.000000,Hinter,Manhattan,West Village
4,2015-01-18 15:43:28.000000,Hinter,Manhattan,Union Sq
5,2015-05-17 10:47:00.000000,Weiter,Manhattan,Lower East Side
6,2015-01-18 16:57:09.000000,Hinter,Manhattan,East Village
7,2015-01-18 11:02:38.000000,Hinter,Manhattan,West Chelsea/Hudson Yards
8,2015-01-18 08:01:14.000000,Hinter,Manhattan,West Village
9,2015-01-18 10:36:07.000000,Hinter,Manhattan,Washington Heights South


## <font color="red"> Problem 3 - Adding Date Parts</font>

The questions in the next section ask questions about the hour of the day.  The hour can be compute using the `strftime` function using `strftime('%H', column)` in the `SELECT` statement.  **Use SQL to add the hour of the day and day of the week to the table**

In [7]:
%%SQL
SELECT janjune15.Pickup_date, strftime('%H', janjune15.Pickup_date) AS hour, base_lookup.base_name, taxi_zone_lookup.Borough, taxi_zone_lookup.Zone 
FROM janjune15 LEFT JOIN base_lookup ON janjune15.Dispatching_base_num == base_lookup.base_code LEFT JOIN taxi_zone_lookup ON janjune15.locationID == taxi_zone_lookup.LocationID 

Unnamed: 0,Pickup_date,hour,base_name,Borough,Zone
0,2015-05-17 09:49:00.000000,9,Weiter,Manhattan,Upper West Side North
1,2015-05-17 09:56:00.000000,9,Weiter,Manhattan,East Village
2,2015-05-17 10:25:00.000000,10,Weiter,Manhattan,Upper West Side North
3,2015-01-18 18:44:30.000000,18,Hinter,Manhattan,West Village
4,2015-01-18 15:43:28.000000,15,Hinter,Manhattan,Union Sq
5,2015-05-17 10:47:00.000000,10,Weiter,Manhattan,Lower East Side
6,2015-01-18 16:57:09.000000,16,Hinter,Manhattan,East Village
7,2015-01-18 11:02:38.000000,11,Hinter,Manhattan,West Chelsea/Hudson Yards
8,2015-01-18 08:01:14.000000,8,Hinter,Manhattan,West Village
9,2015-01-18 10:36:07.000000,10,Hinter,Manhattan,Washington Heights South


## <font color="red"> Problem 4 - Answer some questions</font>

We are interested in answering the following questions.

1. Which of the Uber base stations dispatched the most calls?
2. Which of the Boroughs dispatched the most calls?
3. Is there a difference between the Boroughs in terms of the distribution of pick-ups across the hours of the day?

**Use SQL to answer the following questions.**

#### Question 1

In [13]:
%%SQL
SELECT base_name, cnt
FROM (
        SELECT base_name, count(*) AS cnt
        FROM (SELECT janjune15.Pickup_date, strftime('%H', janjune15.Pickup_date) AS hour, base_lookup.base_name, taxi_zone_lookup.Borough, taxi_zone_lookup.Zone 
              FROM janjune15 LEFT JOIN base_lookup ON janjune15.Dispatching_base_num == base_lookup.base_code LEFT JOIN taxi_zone_lookup ON janjune15.locationID == taxi_zone_lookup.LocationID)
        GROUP BY base_name)
ORDER BY cnt DESC

Unnamed: 0,base_name,cnt
0,Danach-NY,40185
1,Schmecken,24561
2,Weiter,14501
3,Hinter,10754
4,Grun,7982
5,Unter,1817
6,Dreist,181
7,Drinnen,19


#### Question 2

In [14]:
%%SQL
SELECT Borough, cnt
FROM (
        SELECT Borough, count(*) AS cnt
        FROM (SELECT janjune15.Pickup_date, strftime('%H', janjune15.Pickup_date) AS hour, base_lookup.base_name, taxi_zone_lookup.Borough, taxi_zone_lookup.Zone 
              FROM janjune15 LEFT JOIN base_lookup ON janjune15.Dispatching_base_num == base_lookup.base_code LEFT JOIN taxi_zone_lookup ON janjune15.locationID == taxi_zone_lookup.LocationID)
        GROUP BY Borough)
ORDER BY cnt DESC

Unnamed: 0,Borough,cnt
0,Manhattan,72635
1,Brooklyn,16283
2,Queens,9452
3,Bronx,1542
4,Staten Island,44
5,Unknown,44


#### Question 3

In [15]:
%%SQL --df=out
SELECT Borough, hour, count(*) AS cnt
FROM (SELECT janjune15.Pickup_date, strftime('%H', janjune15.Pickup_date) AS hour, base_lookup.base_name, taxi_zone_lookup.Borough, taxi_zone_lookup.Zone 
      FROM janjune15 LEFT JOIN base_lookup ON janjune15.Dispatching_base_num == base_lookup.base_code LEFT JOIN taxi_zone_lookup ON janjune15.locationID == taxi_zone_lookup.LocationID)
GROUP BY Borough, hour

Unnamed: 0,Borough,hour,cnt
0,Bronx,0,73
1,Bronx,1,38
2,Bronx,2,41
3,Bronx,3,24
4,Bronx,4,27
5,Bronx,5,22
6,Bronx,6,33
7,Bronx,7,63
8,Bronx,8,92
9,Bronx,9,53


In [16]:
out

Unnamed: 0,Borough,hour,cnt
0,Bronx,00,73
1,Bronx,01,38
2,Bronx,02,41
3,Bronx,03,24
4,Bronx,04,27
5,Bronx,05,22
6,Bronx,06,33
7,Bronx,07,63
8,Bronx,08,92
9,Bronx,09,53


In [12]:
import seaborn as sns

sns.barplot(x='hour', y="cnt", hue ='Borough', data = out)

<matplotlib.axes._subplots.AxesSubplot at 0x1a1f866ef0>