# CTA Analysis - Visualizations
---

CTA - Ridership - Avg. Weekday Bus Stop Boardings in October 2012
This dataset shows approximate, average, weekday boardings by bus stop from the month of October 2012.

* `stop_id:` 
* `on_street:`
* `cross_street:`
* `routes:`
* `month_beginning:`
* `day_type:`
* `location:`
* `boardings:` the number of passengers boarding a bus
* `alightings:` the number of passengers exiting a bus 


Ridership Readme 
12-Aug-2011
Chicago Transit Authority

* About CTA ridership numbers *
Ridership statistics are provided on a system-wide and bus route/station-level basis. Ridership is primarily counted as boardings, that is, customers boarding a transit vehicle (bus or rail).  On the rail system, there is a distinction between station entries and total rides, or boardings. Datasets indicate such in their file name and description.

* How people are counted on the 'L' *
On the rail system, a customer is counted as an "entry" each time he or she passes through a turnstile to enter a station.  Customers are not counted as "entries" when they make a "cross-platform" transfer from one rail line to another, since they don't pass through a turnstile. Where the number given for rail is in "boardings," what's presented is a statistically valid estimate of the actual number of boardings onto the rail system. 

* How people are counted on buses *
Boardings are recorded using the bus farebox and farecard reader. In the uncommon situation when there is an operating error with the farebox and the onboard systems cannot determine on which route a given trip's boardings should be allocated, these boardings are tallied as Route 0 in some reports.  Route 1001 are shuttle buses used for construction or other unforeseen events.

* "Daytype" *
Daytype fields in the data are coded as "W" for Weekday, "A" for Saturday and "U" for Sunday/Holidays.  Note that New Year's Day, Memorial Day, Independence Day, Labor Day, Thanksgiving, and Christmas Day are considered as "Sundays" for the purposes of ridership reporting.  All other holidays are reported as the type of day they fall on.

In [2]:
import collections
import itertools
import sqlite3
import pandas as pd
import numpy as np

# import visualizations libraries
import folium
import matplotlib.pyplot as plt
import seaborn as sns
from folium import plugins

%matplotlib inline

In [3]:
# helper functions for sqlite3 
def connect(sqlite_file):
    """
    Make connection to an SQLite database file.
    """
    conn = sqlite3.connect(sqlite_file)
    c = conn.cursor()
    return conn, c

def close(conn):
    """ 
    Commit changes and close connection to the database.
    """
    conn.close()
    
def table_col_info(cursor, table_name, print_out=False):
    """
    Returns a list of tuples with column informations:
    (id, name, type, notnull, default_value, primary_key)
    """
    cursor.execute('PRAGMA TABLE_INFO({})'.format(table_name))
    info = cursor.fetchall()

    if print_out:
        print("\nColumn Info:\nID, Name, Type, NotNull, DefaultVal, PrimaryKey")
        for col in info:
            print(col)
    return info

In [4]:
# db is my connection and sql_command is my cursor
db, sql_command = connect('cta_bus_ridership.db')


In [5]:
test_map = pd.read_sql_query("select * from bus_stops;", db)
test_map.head()


Unnamed: 0,stop_id,on_street,cross_street,longitude,latitude
0,1,JACKSON,AUSTIN,-87.774105,41.876322
1,2,JACKSON,MAYFIELD (EXTENDED),-87.771318,41.877067
2,3,JACKSON,MENARD,-87.76975,41.876957
3,4,JACKSON,5700 WEST,-87.767451,41.877024
4,6,JACKSON,LOTUS,-87.761446,41.876513


In [None]:
test_map['new_col'] = list(zip(test_map.latitude, test_map.longitude))
test_map.dtypes

In [None]:
m = folium.Map([41.8781, -87.6298], zoom_start=11)
m

In [None]:
points = test_map.new_col.tolist()
folium.PolyLine(points).add_to(m)
m

In [None]:
num_routes = '''SELECT stop_id
            ,on_street
            ,cross_street
            ,COUNT(*) AS num_routes 
             FROM stop_routes 
             JOIN stops ON stops.id = stop_id
             GROUP BY stop_id 
             ORDER BY num_routes DESC LIMIT 10'''

longest_route = '''SELECT route
			,on_street
			,COUNT(*) AS num_stops
			 FROM stop_routes 
			 JOIN stops ON stops.id = stop_id
			 GROUP BY route 
			 ORDER BY num_stops DESC LIMIT 10'''

most_boardings = '''SELECT stop_id
				,on_street
				,cross_street
				,boardings
				,alightings
				FROM boardings
				JOIN stops ON stops.id = stop_id
				ORDER BY boardings DESC LIMIT 10'''

most_alightings = '''SELECT stop_id
				,on_street
				,cross_street
				,alightings
				,boardings
				FROM boardings
				JOIN stops ON stops.id = stop_id
				ORDER BY alightings DESC LIMIT 10'''

rail_transfers = '''SELECT
					GET_STATION(cross_street) AS line
					,SUM(alightings)
					FROM stops
					JOIN boardings ON stop_id = stops.id
					WHERE cross_street LIKE '%blue%line%' 
					OR cross_street LIKE '%red%line%'
					OR cross_street LIKE '%brown%line%'
					OR cross_street LIKE '%purple%line%'
					OR cross_street LIKE '%green%line%'
					OR cross_street LIKE '%orange%line%'
					GROUP BY line
					ORDER BY alightings DESC'''
					
rail_transfers_from_train = '''SELECT
					GET_STATION(cross_street) AS line
					,SUM(boardings)
					FROM stops
					JOIN boardings ON stop_id = stops.id
					WHERE cross_street LIKE '%blue%line%' 
					OR cross_street LIKE '%red%line%'
					OR cross_street LIKE '%brown%line%'
					OR cross_street LIKE '%purple%line%'
					OR cross_street LIKE '%green%line%'
					OR cross_street LIKE '%orange%line%'
					GROUP BY line
					ORDER BY boardings DESC'''
	

In [None]:
# # https://stackoverflow.com/questions/36392735/how-to-combine-multiple-rows-into-a-single-row-with-pandas
# t = b.groupby('stop_id')['route'].apply(','.join).reset_index()
# t.head()
