# Final Project - Super Blocks

In [155]:
#imports
%matplotlib inline

import pandas as pd
import math
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import patsy
import statsmodels.api as sm
from scipy.stats import ttest_ind
import folium
import geocoder
DEFAULT_LOCATION = [32.821932,-117.1509477] #center location to start the map at 
map_sd = folium.Map(location=DEFAULT_LOCATION,zoom_start=10, max_zoom=11, min_zoom=10) #constrain the zoom
map_sd2 = folium.Map(location=DEFAULT_LOCATION,zoom_start=10, max_zoom=11, min_zoom=10) #constrain the zoom
map_sd3 = folium.Map(location=DEFAULT_LOCATION,zoom_start=10, max_zoom=11, min_zoom=10) #constrain the zoom
map_sd4 = folium.Map(location=DEFAULT_LOCATION,zoom_start=10, max_zoom=11, min_zoom=10) #constrain the zoom
map_sd5 = folium.Map(location=DEFAULT_LOCATION,zoom_start=10, max_zoom=11, min_zoom=10) #constrain the zoom
map_sd6 = folium.Map(location=DEFAULT_LOCATION,zoom_start=10, max_zoom=11, min_zoom=10) #constrain the zoom



## Introduction and Background

Although most cities in America are designed in such a way that it is almost necessary to have a car to get around, today, factors like overpopulation, air and noise pollution and lack of free space constantly urge that a more efficient solution is created for people to get around for their daily needs.  Urban neighborhoods today are an intertwined mesh of commercial activity, transports and residences, and at most times, people live close by to the places they usually visit, for which having a car to get around isn’t entirely necessary, especially if there is a more efficient way to get around. However, issues like road safety, pollution, lack of bicycle lanes etc. often cause us to use our cars when there could be no apparent need, which eventually just adds to the problem, hence urging us to recognize a solution that is both more efficient towards the environment and our time, as well as has economic and commercial benefits for society.

Faced with increasing issues of pollution, traffic and crowded spaces, Barcelona’s city planners created and currently implement a system where they blocked off certain urban areas from vehicle transport, and promoted commercial and community use in these areas. The results were astounding: 60% of the total streets in the city freed up space, traffic has reduced overall by 21%, 300km of new cycling lanes have been developed all around the city, commercial activity has boomed and accessibility of transit stops has increased magnificently.  

Given the success of this project in Barcelona, we have created a similar model for the city of San Diego, where our analyses of the foot-traffic in various locations all over the city gives the ideal position of where a ‘Superblock’ can be placed to improve commercial activity and community engagement as well as reduce pollution. We obtained an index to represent how ‘busy' different locations are in San Diego, using three different data-sets from different local businesses, transit stations and parking meters that comprehensively cover foot-traffic of a given area, compared using geographical coordinates. After gathering this information, we recognized certain locations in San Diego that can be cordoned off to create a healthier and more efficient society, which we have geo-spatially illustrated in this project. 

##### Referencess:
  ######  1)https://www.theguardian.com/cities/2016/may/17/superblocks-rescue-barcelona-spain-plan-give-streets-back-residents
   ###### 2) https://www.walkscore.com/professional/research.php 
   ###### 3)https://www.wired.com/2017/04/brilliant-simplicity-new-yorks-new-times-square/ 
   ###### 4)http://krqe.com/2017/03/06/city-councilor-wants-wider-sidewalks-to-help-businesses-impacted-by-art/
   ###### 5)http://www.nyc.gov/html/dot/downloads/pdf/dot-economic-benefits-of-sustainable-streets.pdf

## Data Description

'Business.csv' is an offical San Diego govt dataset that contains the list of all active registered businesses in San Diego as of April 2017. 

'FY2014 Ridership_Trolley_Sept2013Booking.csv' is an official San Diego govt dataset that gives us the number of people that use transit system in San Deigo as of April 2017. 

'treas_parking_payments_2017_datasd.csv' is an official San Deigo govt dataset that gives us the information of all the parking meter transactions upto April 2017. 

## Data Cleaning / Pre-processing

In [2]:
#Importing the original file for active business data in SD

fname = "Business.csv"
business_df = pd.read_csv(fname)

In [3]:
#Importing the clean file for train stops data in SD

fname1 = "Stop_Counts.csv"
stops_df = pd.read_csv(fname1)

In [4]:
#Importing the clean file for parking meter transactions data in SD

fname2 = "Parking_Counts.csv"
parking_df = pd.read_csv(fname2)

### Active Businesses 

In [5]:
#Removing unneccesary columns and renaming to layman terms, filtering all business into places that serve food/drinks

business_df = business_df[['doing_bus_as_name','zip','naics_description','lat','lon']]
business_df = business_df.loc[(business_df['naics_description'] == 'full-service restaurants') |
               (business_df['naics_description'] == 'cafeterias') | 
               (business_df['naics_description'] == 'food services & drinking places') |
               (business_df['naics_description'] == 'limited-service eating places') |
               (business_df['naics_description'] == 'limited-service restaurants')  |
               (business_df['naics_description'] == 'mobile food services') |
               (business_df['naics_description'] == 'drinking places (alcoholic beverages)') |
               (business_df['naics_description'] == 'snack & nonalcoholic beverage bars')]
business_df.rename(columns = {'doing_bus_as_name':'Business title','naics_description':'Type of Place'}, inplace=True)
business_df = business_df.reset_index(drop=True)

In [6]:
#Adding the 5 columns for the different time brackets

business_df.insert(2,'AM early','Null')
business_df.insert(3,'AM peak','Null')
business_df.insert(4,'Mid-day','Null')
business_df.insert(5,'PM peak','Null')
business_df.insert(6,'PM late','Null')

In [7]:
#generalising the different categories into 3 categories: Only Food, Food & Drinks, Only Drinks

business_df.loc[(business_df['Type of Place'] == 'mobile food services') |
                (business_df['Type of Place'] == 'cafeterias') |
                (business_df['Type of Place'] == 'snack & nonalcoholic beverage bars') |
                (business_df['Type of Place'] == 'limited-service eating places') |
                (business_df['Type of Place'] == 'limited-service restaurants') 
                , 'Type of Place'] = 'Only Food'

business_df.loc[(business_df['Type of Place'] == 'full-service restaurants') 
                , 'Type of Place'] = 'Food & Drinks'

business_df.loc[(business_df['Type of Place'] == 'drinking places (alcoholic beverages)') |
                (business_df['Type of Place'] == 'food services & drinking places') 
                , 'Type of Place'] = 'Only Drinks'

In [8]:
#Assigning values to different time brackets by assuming foot traffic according to the type of place 

business_df.loc[(business_df['Type of Place'] == 'Only Food'), 
                ('AM early','AM peak','Mid-day','PM peak','PM late')] = ('3','25','20','18','5')
business_df.loc[(business_df['Type of Place'] == 'Food & Drinks'),
                ('AM early','AM peak','Mid-day','PM peak','PM late')] = ('7','28','25','35','25')
business_df.loc[(business_df['Type of Place'] == 'Only Drinks'),
                ('AM early','AM peak','Mid-day','PM peak','PM late')] = ('27','6','3','27','30') 

In [9]:
#Assigning a total score to all three categories (summ of scores of all time brackets)

business_df.loc[(business_df['Type of Place'] == 'Only Food'), 
                'Total Score'] = '71'
business_df.loc[(business_df['Type of Place'] == 'Food & Drinks'),
                'Total Score'] = '120'
business_df.loc[(business_df['Type of Place'] == 'Only Drinks'),
                'Total Score'] = '93'

In [10]:
business_df

Unnamed: 0,Business title,zip,AM early,AM peak,Mid-day,PM peak,PM late,Type of Place,lat,lon,Total Score
0,c r e a m,92115-1939,3,25,20,18,5,Only Food,32.767243,-117.096294,71
1,snow cones y raspados,92113-2911,3,25,20,18,5,Only Food,32.697974,-117.096250,71
2,jalapeno taco shop,92104-2047,7,28,25,35,25,Food & Drinks,32.748683,-117.126968,120
3,tacos el campechano inc,91950-1121,3,25,20,18,5,Only Food,32.697974,-117.096250,71
4,up2you cafe llc,92111-5000,3,25,20,18,5,Only Food,32.718370,-117.157817,71
5,awash ethiopian restaurant,92104-1102,7,28,25,35,25,Food & Drinks,32.677079,-117.107167,120
6,gueros taco shop,92102-4019,3,25,20,18,5,Only Food,32.890137,-117.150877,71
7,the fire spot,92111-2315,7,28,25,35,25,Food & Drinks,32.773069,-117.156144,120
8,up2you cafe llc,92111-1545,27,6,3,27,30,Only Drinks,33.016118,-117.075608,93
9,cold beers & cheeseburgers,92101-6910,7,28,25,35,25,Food & Drinks,32.708791,-117.160357,120


In [11]:
business_df.to_csv('Restaurant_Counts.csv')

### Transit Stops Dataset
The following cells take the original dataset and clean it so that we get the cleaned data, NOT MEANT TO BE RUN since we already have the cleaned dataset

#Code which we used to clean the original dataset which is way too big for Github 

trips = pd.read_csv('FY2014 Ridership_Trolley_Sept2013Booking.csv')

# Removing unnecessary columns: all we need is stop_id, passengers getting off on the station 
# and time_arrival to get the time split for the number of people getting off at a station at a time_period

trips = trips[['STOP_ID', 'PASSENGERS_OFF', 'TIME_ACTUAL_ARRIVE']]
trips.columns = ['stop_id', 'count', 'time']
trips

##### clean the data, removing any na rows we will also remove all rows that have value 0

trips.dropna(how='any')
trips = trips[trips['count'] != 0]
trips

# Using stop_id we can connect with lat/long. We will be grouping by pole_id to create a table with 
# the following columns: (stop_id, latitude, longitude, count_am_early, count_am_peak, count_midday, 
# count_pm_early, count_pm_late, count_daily)

#### Getting location data for all the stops using another dataset
stop_locs = pd.read_csv('FY2014 Ridership_Trolley_Sept2013_Stops.csv')
stop_locs = stop_locs[['STOP_ID', 'LAT', 'LON']]
stop_locs

stop_locs.columns = ['stop_id', 'latitude', 'longitude']

stops_df = trips.merge(stop_locs, how='left')
stops_df = stops_df.dropna(how='any')
stops_df

##### grouping by stop_id and aggregating over the count values

stop_counts = stops_df.groupby(['stop_id']).agg('count')

##### remove unnecessary columns

stop_counts = stop_counts['time'] 
stop_counts = stop_counts.to_frame()
stop_counts['stop_id'] = stop_counts.index
stop_counts = stop_counts.merge(stop_locs, how='left')
stop_counts.columns = ['total_count', 'stop_id', 'latitude', 'longitude']
stop_counts

##### We need to now find the number of days in the transactions dataset
##### We will be using this in order to get the count of transactions PER DAY

stops_df['time'] = pd.to_datetime(stops_df['time'])
dates = stops_df['time']
am_early_d = {}
am_peak_d = {}
midday_d = {}
pm_peak_d = {}
pm_late_d = {}

##### The time ranges for which we have split the transit ridership data are:
AM_Early = 12AM-6AM
AM_Late = 6AM-9AM
..
..


#### Classify the time slot based on the times during the day
def classify(x): 
    hour = x.time().hour
    if hour <=6:
        return 'am_early'
    elif hour <=9:
        return 'am_peak'
    elif hour <=14:
        return 'midday'
    elif hour <=19:
        return 'pm_peak'
    else:
        return 'pm_late'
stops_df['time_slot'] = stops_df['time'].apply(classify)
stops_df

##### Merging the two datasets by key value and column name

def checkSeriesColumn(s, col):
    val = False
    for row in s.keys().to_series().str.contains(col): 
        if(row == True):
            val = True
    return val

### Setting the values of the transactions for each time period during the day
def set_temporal_counts(p_id):
    v_counts = stops_df.loc[stops_df['stop_id'] == p_id]['time_slot'].value_counts(dropna=False)
    stop_counts.loc[stop_counts['stop_id'] == p_id,'am_early'] = 0 if not checkSeriesColumn(v_counts, 'am_early') else v_counts['am_early']
    stop_counts.loc[stop_counts['stop_id'] == p_id,'am_peak'] =  0 if not checkSeriesColumn(v_counts, 'am_peak') else v_counts['am_peak']
    stop_counts.loc[stop_counts['stop_id'] == p_id,'midday'] =  0 if not checkSeriesColumn(v_counts, 'midday') else v_counts['midday']
    stop_counts.loc[stop_counts['stop_id'] == p_id,'pm_peak'] =  0 if not checkSeriesColumn(v_counts, 'pm_peak') else v_counts['pm_peak']
    stop_counts.loc[stop_counts['stop_id'] == p_id,'pm_late'] = 0 if not checkSeriesColumn(v_counts, 'pm_late') else v_counts['pm_late']

stop_counts['stop_id'].apply(set_temporal_counts)

stop_counts.to_csv('Stop_Counts.csv')

In [12]:
#cleaned dataset for the transit spots
stops_df

Unnamed: 0.1,Unnamed: 0,total_count,stop_id,latitude,longitude,am_early,am_peak,midday,pm_peak,pm_late
0,0,214,75000,32.54,-117.03,48.0,36.0,40.0,70.0,20.0
1,1,214,75000,32.54,-117.03,48.0,36.0,40.0,70.0,20.0
2,2,101,75002,32.56,-117.05,21.0,18.0,20.0,32.0,10.0
3,3,98,75003,32.56,-117.05,20.0,16.0,20.0,32.0,10.0
4,4,101,75004,32.57,-117.07,21.0,18.0,20.0,32.0,10.0
5,5,101,75005,32.57,-117.07,23.0,17.0,19.0,32.0,10.0
6,6,101,75006,32.59,-117.08,20.0,19.0,20.0,32.0,10.0
7,7,95,75007,32.59,-117.08,16.0,17.0,20.0,32.0,10.0
8,8,101,75008,32.60,-117.08,20.0,19.0,20.0,31.0,11.0
9,9,99,75009,32.60,-117.08,21.0,16.0,20.0,32.0,10.0


### Parking meters 

###### Importing data and initial cleaning

In [13]:
#importing original dataset
parking = pd.read_csv('treas_parking_payments_2017_datasd.csv')

In [14]:
#removing unnecessary columns: all we need is pole_id, time_start and meter_expire
parking = parking.drop(['uuid'], axis=1)
parking = parking.drop(['trans_amt'], axis=1)
parking = parking.drop(['pay_method'], axis=1)
parking = parking.drop(['meter_type'], axis=1)
parking = parking.drop(['meter_expire'], axis=1)

In [15]:
#clean the data, removing any na rows 
parking.dropna(how='any')

Unnamed: 0,pole_id,trans_start
0,SL-216,01/01/17 0:15
1,5-402,01/01/17 1:03
2,G-503,01/01/17 1:23
3,G-503,01/01/17 1:27
4,G-503,01/01/17 1:27
5,G-503,01/01/17 1:27
6,G-503,01/01/17 1:27
7,Mar-41,01/01/17 1:29
8,Mar-41,01/01/17 1:30
9,Apr-65,01/01/17 3:01


###### Connecting with other dataset (with lat/long pairs)

In [16]:
# 2.) using pole_id we can connect with lat/long. We will be grouping by pole_id to create
#     a table with the following columns:
#     (pole_id, latitude, longitude, count_am_early, count_am_peak, 
#        count_midday, count_pm_early, count_pm_late, count_daily)
# NOTE: we will need to decide whether to use raw numbers or averages of counts per section per day

In [17]:
park_loc = pd.read_csv('treas_parking_meters_loc_datasd.csv')

In [18]:
park_loc = park_loc[['pole', 'longitude', 'latitude']]
park_loc.columns = ['pole_id', 'longitude', 'latitude']
park_loc = park_loc.dropna(how='any')

In [19]:
park_df = parking.merge(park_loc, how='left')
park_df = park_df.dropna(how='any')
park_df

Unnamed: 0,pole_id,trans_start,longitude,latitude
0,SL-216,01/01/17 0:15,-117.162112,32.710495
1,5-402,01/01/17 1:03,-117.160212,32.709683
2,G-503,01/01/17 1:23,-117.159897,32.712561
3,G-503,01/01/17 1:27,-117.159897,32.712561
4,G-503,01/01/17 1:27,-117.159897,32.712561
5,G-503,01/01/17 1:27,-117.159897,32.712561
6,G-503,01/01/17 1:27,-117.159897,32.712561
12,AH-715,01/01/17 5:18,-117.157665,32.719824
13,7-1000,01/01/17 5:51,-117.158444,32.715845
14,7-1000,01/01/17 5:51,-117.158444,32.715845


In [20]:
park_counts = park_df.groupby(['pole_id']).agg('count')
park_counts = park_counts['trans_start'] #remove unnecessary columns
park_counts = park_counts.to_frame()
park_counts['pole_id'] = park_counts.index
park_counts = park_counts.merge(park_loc, how='left')
park_counts.columns = ['total_count', 'pole_id', 'longitude', 'latitude']
park_counts

Unnamed: 0,total_count,pole_id,longitude,latitude
0,237,1-1004,-117.163929,32.715904
1,237,1-1006,-117.163930,32.716037
2,228,1-1008,-117.163931,32.716169
3,208,1-1020,-117.161278,32.717890
4,239,1-1310,-117.163951,32.719024
5,254,1-1312,-117.163952,32.719161
6,217,1-1313,-117.163770,32.719298
7,286,1-1314,-117.163953,32.719453
8,224,1-1315,-117.163772,32.719571
9,238,1-1317,-117.163772,32.719707


In [21]:
# we need to now find the number of days in the transactions dataset
# we will be using this in order to get the count of transactions PER DAY
park_df['trans_start'] = pd.to_datetime(park_df['trans_start'])
dates = park_df['trans_start']
am_early_d = {}
am_peak_d = {}
midday_d = {}
pm_peak_d = {}
pm_late_d = {}

def classify(x): 
    hour = x.time().hour
    if hour <=6:
        return 'am_early'
    elif hour <=9:
        return 'am_peak'
    elif hour <=14:
        return 'midday'
    elif hour <=19:
        return 'pm_peak'
    else:
        return 'pm_late'
park_df['time_slot'] = park_df['trans_start'].apply(classify)
park_df

Unnamed: 0,pole_id,trans_start,longitude,latitude,time_slot
0,SL-216,2017-01-01 00:15:00,-117.162112,32.710495,am_early
1,5-402,2017-01-01 01:03:00,-117.160212,32.709683,am_early
2,G-503,2017-01-01 01:23:00,-117.159897,32.712561,am_early
3,G-503,2017-01-01 01:27:00,-117.159897,32.712561,am_early
4,G-503,2017-01-01 01:27:00,-117.159897,32.712561,am_early
5,G-503,2017-01-01 01:27:00,-117.159897,32.712561,am_early
6,G-503,2017-01-01 01:27:00,-117.159897,32.712561,am_early
12,AH-715,2017-01-01 05:18:00,-117.157665,32.719824,am_early
13,7-1000,2017-01-01 05:51:00,-117.158444,32.715845,am_early
14,7-1000,2017-01-01 05:51:00,-117.158444,32.715845,am_early


In [22]:
def checkSeriesColumn(s, col):
    val = False
    for row in s.keys().to_series().str.contains(col): 
        if(row == True):
            val = True
    return val
def set_temporal_counts(p_id):
    v_counts = park_df.loc[park_df['pole_id'] == p_id]['time_slot'].value_counts(dropna=False)
    park_counts.loc[park_counts['pole_id'] == p_id,'am_early'] = 0 if not checkSeriesColumn(v_counts, 'am_early') else v_counts['am_early']
    park_counts.loc[park_counts['pole_id'] == p_id,'am_peak'] =  0 if not checkSeriesColumn(v_counts, 'am_peak') else v_counts['am_peak']
    park_counts.loc[park_counts['pole_id'] == p_id,'midday'] =  0 if not checkSeriesColumn(v_counts, 'midday') else v_counts['midday']
    park_counts.loc[park_counts['pole_id'] == p_id,'pm_peak'] =  0 if not checkSeriesColumn(v_counts, 'pm_peak') else v_counts['pm_peak']
    park_counts.loc[park_counts['pole_id'] == p_id,'pm_late'] = 0 if not checkSeriesColumn(v_counts, 'pm_late') else v_counts['pm_late']
#     park_counts.loc[park_counts['pole_id'] == ]
#      park_counts.loc[park_counts['pole_id'] == p_id,'am_early'] = 0 if not checkSeriesColumn(v_counts, 'am_early') else v_counts['am_early']
#      park_counts.loc[park_counts['pole_id'] == p_id,'am_peak'] =  0 if not checkSeriesColumn(v_counts, 'am_peak') else v_counts['am_peak']
#     park_counts.loc[park_counts['pole_id'] == p_id,'midday'] =  0 if not checkSeriesColumn(v_counts, 'midday') else v_counts['midday']
#     park_counts.loc[park_counts['pole_id'] == p_id,'pm_peak'] =  0 if not checkSeriesColumn(v_counts, 'pm_peak') else v_counts['pm_peak']
#     park_counts.loc[park_counts['pole_id'] == p_id,'pm_late'] = 0 if not checkSeriesColumn(v_counts, 'pm_late') else v_counts['pm_late']


park_counts['pole_id'].apply(set_temporal_counts)




# v_counts = park_df.loc[park_df['pole_id'] == 'N-1003']['time_slot'].value_counts(dropna=False)
# park_counts.loc[park_counts['pole_id'] == 'N-1003','am_early'] = 0 if not checkSeriesColumn(v_counts, 'am_early') else v_counts['am_early']
# park_counts.loc[park_counts['pole_id'] == 'N-1003','am_peak'] =  0 if not checkSeriesColumn(v_counts, 'am_peak') else v_counts['am_peak']
# park_counts.loc[park_counts['pole_id'] == 'N-1003','midday'] =  0 if not checkSeriesColumn(v_counts, 'midday') else v_counts['midday']
# park_counts.loc[park_counts['pole_id'] == 'N-1003','pm_peak'] =  0 if not checkSeriesColumn(v_counts, 'pm_peak') else v_counts['pm_peak']
# park_counts.loc[park_counts['pole_id'] == 'N-1003','pm_late'] = 0 if not checkSeriesColumn(v_counts, 'pm_late') else v_counts['pm_late']
# park_counts.loc[park_counts['pole_id'] == 'N-1003']



0       None
1       None
2       None
3       None
4       None
5       None
6       None
7       None
8       None
9       None
10      None
11      None
12      None
13      None
14      None
15      None
16      None
17      None
18      None
19      None
20      None
21      None
22      None
23      None
24      None
25      None
26      None
27      None
28      None
29      None
        ... 
3970    None
3971    None
3972    None
3973    None
3974    None
3975    None
3976    None
3977    None
3978    None
3979    None
3980    None
3981    None
3982    None
3983    None
3984    None
3985    None
3986    None
3987    None
3988    None
3989    None
3990    None
3991    None
3992    None
3993    None
3994    None
3995    None
3996    None
3997    None
3998    None
3999    None
Name: pole_id, dtype: object

In [23]:
max_date = dates.max()
min_date = dates.min()
days_elapsed = (max_date - min_date).days + 1 #to round off this number
days_elapsed

335

In [24]:
park_counts.to_csv("Parking_Counts.csv");

In [25]:
# let's now get average counts of parked vehicles
# dividing by total days to give per daily expected counts 
park_counts['total_count'] = park_counts['total_count'].apply(lambda x: x / days_elapsed)
park_counts['am_early'] = park_counts['am_early'].apply(lambda x: x / days_elapsed)
park_counts['am_peak'] = park_counts['am_peak'].apply(lambda x: x / days_elapsed)
park_counts['midday'] = park_counts['midday'].apply(lambda x: x / days_elapsed)
park_counts['pm_peak'] = park_counts['pm_peak'].apply(lambda x: x / days_elapsed)
park_counts['pm_late'] = park_counts['pm_late'].apply(lambda x: x / days_elapsed)

In [26]:
# we then multiply this amount by the mean number of people per vehicle
# as per https://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/publications/highlights_of_the_2001_national_household_travel_survey/html/table_a15.html
ppl_per_vehicle = 1.63
park_counts['total_count'] = park_counts['total_count'].apply(lambda x: x * ppl_per_vehicle)
park_counts['am_early'] = park_counts['am_early'].apply(lambda x: x * ppl_per_vehicle)
park_counts['am_peak'] = park_counts['am_peak'].apply(lambda x: x * ppl_per_vehicle)
park_counts['midday'] = park_counts['midday'].apply(lambda x:  x * ppl_per_vehicle)
park_counts['pm_peak'] = park_counts['pm_peak'].apply(lambda x:  x * ppl_per_vehicle)
park_counts['pm_late'] = park_counts['pm_late'].apply(lambda x:  x * ppl_per_vehicle)
park_counts

Unnamed: 0,total_count,pole_id,longitude,latitude,am_early,am_peak,midday,pm_peak,pm_late
0,1.153164,1-1004,-117.163929,32.715904,0.014597,0.209224,0.544955,0.379522,0.004866
1,1.153164,1-1006,-117.163930,32.716037,0.034060,0.204358,0.574149,0.340597,0.000000
2,1.109373,1-1008,-117.163931,32.716169,0.000000,0.267612,0.540090,0.287075,0.014597
3,1.012060,1-1020,-117.161278,32.717890,0.014597,0.155701,0.394119,0.447642,0.000000
4,1.162896,1-1310,-117.163951,32.719024,0.038925,0.398985,0.617940,0.107045,0.000000
5,1.235881,1-1312,-117.163952,32.719161,0.019463,0.350328,0.783373,0.082716,0.000000
6,1.055851,1-1313,-117.163770,32.719298,0.063254,0.262746,0.656866,0.068119,0.004866
7,1.391582,1-1314,-117.163953,32.719453,0.063254,0.423313,0.797970,0.107045,0.000000
8,1.089910,1-1315,-117.163772,32.719571,0.009731,0.311403,0.622806,0.145970,0.000000
9,1.158030,1-1317,-117.163772,32.719707,0.014597,0.321134,0.739582,0.082716,0.000000


In [27]:
park_counts = pd.read_csv("Parking_Counts.csv");
####Rough Calculations for algorithm
long_max = park_counts['longitude'].max()
long_min = park_counts['longitude'].min()
lat_max = park_counts['latitude'].max()
lat_min = park_counts['latitude'].min()
lat_dif = lat_max - lat_min
long_dif = long_max - long_min
NUMBER_BLOCKS_ROOT = 10 #this means 100 blocks 10x10
lat_gap = lat_dif / NUMBER_BLOCKS_ROOT
long_gap = long_dif / NUMBER_BLOCKS_ROOT

def classify_blocks(s):
    park_counts.loc[park_counts['pole_id'] == s,'row'] =  (park_counts.loc[park_counts['pole_id'] == s,'latitude'] - lat_min) // lat_gap
    park_counts.loc[park_counts['pole_id'] == s,'col'] = (park_counts.loc[park_counts['pole_id'] == s,'longitude'] - long_min) // long_gap

park_counts['pole_id'].apply(classify_blocks)

park_counts

Unnamed: 0.1,Unnamed: 0,total_count,pole_id,longitude,latitude,am_early,am_peak,midday,pm_peak,pm_late,row,col
0,0,237,1-1004,-117.163929,32.715904,3.0,43.0,112.0,78.0,1.0,9.0,9.0
1,1,237,1-1006,-117.163930,32.716037,7.0,42.0,118.0,70.0,0.0,9.0,9.0
2,2,228,1-1008,-117.163931,32.716169,0.0,55.0,111.0,59.0,3.0,9.0,9.0
3,3,208,1-1020,-117.161278,32.717890,3.0,32.0,81.0,92.0,0.0,9.0,9.0
4,4,239,1-1310,-117.163951,32.719024,8.0,82.0,127.0,22.0,0.0,9.0,9.0
5,5,254,1-1312,-117.163952,32.719161,4.0,72.0,161.0,17.0,0.0,9.0,9.0
6,6,217,1-1313,-117.163770,32.719298,13.0,54.0,135.0,14.0,1.0,9.0,9.0
7,7,286,1-1314,-117.163953,32.719453,13.0,87.0,164.0,22.0,0.0,9.0,9.0
8,8,224,1-1315,-117.163772,32.719571,2.0,64.0,128.0,30.0,0.0,9.0,9.0
9,9,238,1-1317,-117.163772,32.719707,3.0,66.0,152.0,17.0,0.0,9.0,9.0


In [28]:
print(long_max)
print(long_min)
print(long_dif)
print(lat_max)
print(lat_min)
print(lat_dif)
park_counts = park_counts.loc[park_counts['longitude'] != -180.0 ] #remove the outlier
long_min = park_counts['longitude'].min()
long_min

-117.0
-180.0
63.0
32.772126
32.1
0.672126


-117.250691

In [29]:
park_counts.to_csv("Parking_Counts_grid.csv");

## Data Analysis and Results 

In [115]:
park_counts = pd.read_csv('Parking_Counts.csv')
park_counts = park_counts.loc[park_counts['longitude'] != -180.0 ] #remove the outlier
park_counts.rename(columns={'pole_id':'id'}, inplace= True)
park_counts['type'] = 'parking'
del park_counts['Unnamed: 0']
park_counts 

Unnamed: 0,total_count,id,longitude,latitude,am_early,am_peak,midday,pm_peak,pm_late,type
0,237,1-1004,-117.163929,32.715904,3.0,43.0,112.0,78.0,1.0,parking
1,237,1-1006,-117.163930,32.716037,7.0,42.0,118.0,70.0,0.0,parking
2,228,1-1008,-117.163931,32.716169,0.0,55.0,111.0,59.0,3.0,parking
3,208,1-1020,-117.161278,32.717890,3.0,32.0,81.0,92.0,0.0,parking
4,239,1-1310,-117.163951,32.719024,8.0,82.0,127.0,22.0,0.0,parking
5,254,1-1312,-117.163952,32.719161,4.0,72.0,161.0,17.0,0.0,parking
6,217,1-1313,-117.163770,32.719298,13.0,54.0,135.0,14.0,1.0,parking
7,286,1-1314,-117.163953,32.719453,13.0,87.0,164.0,22.0,0.0,parking
8,224,1-1315,-117.163772,32.719571,2.0,64.0,128.0,30.0,0.0,parking
9,238,1-1317,-117.163772,32.719707,3.0,66.0,152.0,17.0,0.0,parking


In [116]:
restaurants = pd.read_csv('Restaurant_Counts.csv')
restaurants.columns = ['id', 'title', 'zip', 'am_early', 'am_peak', 'midday', 'pm_peak', 'pm_late', 'type', 'latitude', 'longitude', 'total_count']
restaurants['total_count'] = restaurants['total_count'].apply( lambda x : x * 2/3)
restaurants['am_peak'] = restaurants['am_peak'].apply( lambda x : x * 2/3)
restaurants['am_early'] = restaurants['am_early'].apply( lambda x : x * 2/3)
restaurants['midday'] = restaurants['midday'].apply( lambda x : x * 2/3)
restaurants['pm_peak'] = restaurants['pm_peak'].apply( lambda x : x * 2/3)
restaurants['pm_late'] = restaurants['pm_late'].apply( lambda x : x * 2/3)
restaurants['id'] = restaurants['id'].apply(lambda x : ('R-' + str(x) ))
restaurants = restaurants[['id','type','title', 'am_early', 'am_peak', 'midday', 'pm_peak', 'pm_late', 'total_count','latitude', 'longitude']]

restaurants

Unnamed: 0,id,type,title,am_early,am_peak,midday,pm_peak,pm_late,total_count,latitude,longitude
0,R-0,Only Food,c r e a m,2.000000,16.666667,13.333333,12.000000,3.333333,47.333333,32.767243,-117.096294
1,R-1,Only Food,snow cones y raspados,2.000000,16.666667,13.333333,12.000000,3.333333,47.333333,32.697974,-117.096250
2,R-2,Food & Drinks,jalapeno taco shop,4.666667,18.666667,16.666667,23.333333,16.666667,80.000000,32.748683,-117.126968
3,R-3,Only Food,tacos el campechano inc,2.000000,16.666667,13.333333,12.000000,3.333333,47.333333,32.697974,-117.096250
4,R-4,Only Food,up2you cafe llc,2.000000,16.666667,13.333333,12.000000,3.333333,47.333333,32.718370,-117.157817
5,R-5,Food & Drinks,awash ethiopian restaurant,4.666667,18.666667,16.666667,23.333333,16.666667,80.000000,32.677079,-117.107167
6,R-6,Only Food,gueros taco shop,2.000000,16.666667,13.333333,12.000000,3.333333,47.333333,32.890137,-117.150878
7,R-7,Food & Drinks,the fire spot,4.666667,18.666667,16.666667,23.333333,16.666667,80.000000,32.773069,-117.156144
8,R-8,Only Drinks,up2you cafe llc,18.000000,4.000000,2.000000,18.000000,20.000000,62.000000,33.016118,-117.075608
9,R-9,Food & Drinks,cold beers & cheeseburgers,4.666667,18.666667,16.666667,23.333333,16.666667,80.000000,32.708791,-117.160357


In [117]:
transit = pd.read_csv('Stop_Counts.csv')
transit.rename(columns={'stop_id':'id'}, inplace= True)
transit['type'] = 'transit_stop'
del transit['Unnamed: 0'] 
transit

Unnamed: 0,total_count,id,latitude,longitude,am_early,am_peak,midday,pm_peak,pm_late,type
0,214,75000,32.54,-117.03,48.0,36.0,40.0,70.0,20.0,transit_stop
1,214,75000,32.54,-117.03,48.0,36.0,40.0,70.0,20.0,transit_stop
2,101,75002,32.56,-117.05,21.0,18.0,20.0,32.0,10.0,transit_stop
3,98,75003,32.56,-117.05,20.0,16.0,20.0,32.0,10.0,transit_stop
4,101,75004,32.57,-117.07,21.0,18.0,20.0,32.0,10.0,transit_stop
5,101,75005,32.57,-117.07,23.0,17.0,19.0,32.0,10.0,transit_stop
6,101,75006,32.59,-117.08,20.0,19.0,20.0,32.0,10.0,transit_stop
7,95,75007,32.59,-117.08,16.0,17.0,20.0,32.0,10.0,transit_stop
8,101,75008,32.60,-117.08,20.0,19.0,20.0,31.0,11.0,transit_stop
9,99,75009,32.60,-117.08,21.0,16.0,20.0,32.0,10.0,transit_stop


In [118]:
location_counts = restaurants.append(park_counts).append(transit)
# Setting a new index
#location_counts.index = idx # new ad hoc index
# location_counts.index = range(len(location_counts)) # set with list
# location_counts = location_counts.reset_index() # replace old w new
# location_counts.rename(columns={'index':'idx'}, inplace= True)
# location_counts.rename(columns={'id':'type_id'}, inplace= True)
# del location_counts['Unnamed: 0']
location_counts

Unnamed: 0,am_early,am_peak,id,latitude,longitude,midday,pm_late,pm_peak,title,total_count,type
0,2.000000,16.666667,R-0,32.767243,-117.096294,13.333333,3.333333,12.000000,c r e a m,47.333333,Only Food
1,2.000000,16.666667,R-1,32.697974,-117.096250,13.333333,3.333333,12.000000,snow cones y raspados,47.333333,Only Food
2,4.666667,18.666667,R-2,32.748683,-117.126968,16.666667,16.666667,23.333333,jalapeno taco shop,80.000000,Food & Drinks
3,2.000000,16.666667,R-3,32.697974,-117.096250,13.333333,3.333333,12.000000,tacos el campechano inc,47.333333,Only Food
4,2.000000,16.666667,R-4,32.718370,-117.157817,13.333333,3.333333,12.000000,up2you cafe llc,47.333333,Only Food
5,4.666667,18.666667,R-5,32.677079,-117.107167,16.666667,16.666667,23.333333,awash ethiopian restaurant,80.000000,Food & Drinks
6,2.000000,16.666667,R-6,32.890137,-117.150878,13.333333,3.333333,12.000000,gueros taco shop,47.333333,Only Food
7,4.666667,18.666667,R-7,32.773069,-117.156144,16.666667,16.666667,23.333333,the fire spot,80.000000,Food & Drinks
8,18.000000,4.000000,R-8,33.016118,-117.075608,2.000000,20.000000,18.000000,up2you cafe llc,62.000000,Only Drinks
9,4.666667,18.666667,R-9,32.708791,-117.160357,16.666667,16.666667,23.333333,cold beers & cheeseburgers,80.000000,Food & Drinks


In [119]:
long_max = location_counts['longitude'].max()
long_min = location_counts['longitude'].min()
lat_max = location_counts['latitude'].max()
lat_min = location_counts['latitude'].min()
lat_dif = lat_max - lat_min
long_dif = long_max - long_min
NUMBER_BLOCKS_ROOT = 10 #this means 100 blocks 10x10
lat_gap = lat_dif / NUMBER_BLOCKS_ROOT
long_gap = long_dif / NUMBER_BLOCKS_ROOT

def classify_blocks(s):
    location_counts.loc[location_counts['id'] == s,'row'] =  (location_counts.loc[location_counts['id'] == s,'latitude'] - lat_min) // lat_gap
    location_counts.loc[location_counts['id'] == s,'col'] = (location_counts.loc[location_counts['id'] == s,'longitude'] - long_min) // long_gap

location_counts['id'].apply(classify_blocks)

location_counts

Unnamed: 0,am_early,am_peak,id,latitude,longitude,midday,pm_late,pm_peak,title,total_count,type,row,col
0,2.000000,16.666667,R-0,32.767243,-117.096294,13.333333,3.333333,12.000000,c r e a m,47.333333,Only Food,3.0,7.0
1,2.000000,16.666667,R-1,32.697974,-117.096250,13.333333,3.333333,12.000000,snow cones y raspados,47.333333,Only Food,2.0,7.0
2,4.666667,18.666667,R-2,32.748683,-117.126968,16.666667,16.666667,23.333333,jalapeno taco shop,80.000000,Food & Drinks,2.0,7.0
3,2.000000,16.666667,R-3,32.697974,-117.096250,13.333333,3.333333,12.000000,tacos el campechano inc,47.333333,Only Food,2.0,7.0
4,2.000000,16.666667,R-4,32.718370,-117.157817,13.333333,3.333333,12.000000,up2you cafe llc,47.333333,Only Food,2.0,7.0
5,4.666667,18.666667,R-5,32.677079,-117.107167,16.666667,16.666667,23.333333,awash ethiopian restaurant,80.000000,Food & Drinks,2.0,7.0
6,2.000000,16.666667,R-6,32.890137,-117.150878,13.333333,3.333333,12.000000,gueros taco shop,47.333333,Only Food,3.0,7.0
7,4.666667,18.666667,R-7,32.773069,-117.156144,16.666667,16.666667,23.333333,the fire spot,80.000000,Food & Drinks,3.0,7.0
8,18.000000,4.000000,R-8,33.016118,-117.075608,2.000000,20.000000,18.000000,up2you cafe llc,62.000000,Only Drinks,4.0,8.0
9,4.666667,18.666667,R-9,32.708791,-117.160357,16.666667,16.666667,23.333333,cold beers & cheeseburgers,80.000000,Food & Drinks,2.0,7.0


In [120]:
print(location_counts['row'].min())
print(location_counts['row'].max())
print(location_counts['row'].mean())
print(location_counts['row'].std())
print(location_counts['col'].min())
print(location_counts['col'].max())
print(location_counts['col'].mean())
print(location_counts['col'].std())

-0.0
10.0
2.321217127908578
0.5463381063385144
-0.0
10.0
7.050530083987333
0.33117124780078416


In [121]:
grid = []
for i in range(NUMBER_BLOCKS_ROOT):
    for j in range(NUMBER_BLOCKS_ROOT):
        grid.append([i, j, (lat_min+ i*lat_gap), (long_min + j*long_gap)])    
grid = pd.DataFrame(grid)
grid.columns = ['row','col','lat','lon']
grid
#grid.loc[(grid['row']==142) & (grid['col']==378)]

Unnamed: 0,row,col,lat,lon
0,0,0,32.100000,-118.426461
1,0,1,32.100000,-118.259176
2,0,2,32.100000,-118.091892
3,0,3,32.100000,-117.924607
4,0,4,32.100000,-117.757323
5,0,5,32.100000,-117.590038
6,0,6,32.100000,-117.422753
7,0,7,32.100000,-117.255469
8,0,8,32.100000,-117.088184
9,0,9,32.100000,-116.920899


In [122]:
location_counts

Unnamed: 0,am_early,am_peak,id,latitude,longitude,midday,pm_late,pm_peak,title,total_count,type,row,col
0,2.000000,16.666667,R-0,32.767243,-117.096294,13.333333,3.333333,12.000000,c r e a m,47.333333,Only Food,3.0,7.0
1,2.000000,16.666667,R-1,32.697974,-117.096250,13.333333,3.333333,12.000000,snow cones y raspados,47.333333,Only Food,2.0,7.0
2,4.666667,18.666667,R-2,32.748683,-117.126968,16.666667,16.666667,23.333333,jalapeno taco shop,80.000000,Food & Drinks,2.0,7.0
3,2.000000,16.666667,R-3,32.697974,-117.096250,13.333333,3.333333,12.000000,tacos el campechano inc,47.333333,Only Food,2.0,7.0
4,2.000000,16.666667,R-4,32.718370,-117.157817,13.333333,3.333333,12.000000,up2you cafe llc,47.333333,Only Food,2.0,7.0
5,4.666667,18.666667,R-5,32.677079,-117.107167,16.666667,16.666667,23.333333,awash ethiopian restaurant,80.000000,Food & Drinks,2.0,7.0
6,2.000000,16.666667,R-6,32.890137,-117.150878,13.333333,3.333333,12.000000,gueros taco shop,47.333333,Only Food,3.0,7.0
7,4.666667,18.666667,R-7,32.773069,-117.156144,16.666667,16.666667,23.333333,the fire spot,80.000000,Food & Drinks,3.0,7.0
8,18.000000,4.000000,R-8,33.016118,-117.075608,2.000000,20.000000,18.000000,up2you cafe llc,62.000000,Only Drinks,4.0,8.0
9,4.666667,18.666667,R-9,32.708791,-117.160357,16.666667,16.666667,23.333333,cold beers & cheeseburgers,80.000000,Food & Drinks,2.0,7.0


In [123]:
location_counts.to_csv('Location_grid.csv')

In [151]:
total_loc_counts = location_counts[['total_count','row','col']]
amearly_loc_counts = location_counts[['am_early','row','col']]
ampeak_loc_counts = location_counts[['am_peak','row','col']]
midday_loc_counts = location_counts[['midday','row','col']]
pmlate_loc_counts = location_counts[['pm_late','row','col']]
pmpeak_loc_counts = location_counts[['pm_peak','row','col']]
total_loc_counts

Unnamed: 0,total_count,row,col
0,47.333333,3.0,7.0
1,47.333333,2.0,7.0
2,80.000000,2.0,7.0
3,47.333333,2.0,7.0
4,47.333333,2.0,7.0
5,80.000000,2.0,7.0
6,47.333333,3.0,7.0
7,80.000000,3.0,7.0
8,62.000000,4.0,8.0
9,80.000000,2.0,7.0


In [152]:
weighted_grid = total_loc_counts.groupby(['row','col']).sum().sort_values(by='total_count',ascending=False)
weighted_grid2 = amearly_loc_counts.groupby(['row','col']).sum().sort_values(by='am_early',ascending=False)
weighted_grid3 = ampeak_loc_counts.groupby(['row','col']).sum().sort_values(by='am_peak',ascending=False)
weighted_grid4 = midday_loc_counts.groupby(['row','col']).sum().sort_values(by='midday',ascending=False)
weighted_grid5 = pmlate_loc_counts.groupby(['row','col']).sum().sort_values(by='pm_late',ascending=False)
weighted_grid6 = pmpeak_loc_counts.groupby(['row','col']).sum().sort_values(by='pm_peak',ascending=False)
weighted_grid

Unnamed: 0_level_0,Unnamed: 1_level_0,am_early
row,col,Unnamed: 2_level_1
2.0,7.0,14550.666667
3.0,7.0,9146.333333
3.0,8.0,1298.333333
2.0,8.0,1282.0
3.0,6.0,792.0
4.0,8.0,682.666667
4.0,7.0,252.0
4.0,6.0,28.666667
4.0,9.0,24.666667
7.0,1.0,18.0


In [153]:
# merge these two dataframes where the values in grid 
#weighted_grid = pd.merge(weighted_grid, grid, on=['row')
big_grid = pd.merge(grid, 
                weighted_grid.reset_index(), 
                left_on=['row','col'], 
                right_on=['row','col'], 
                how='left')
#big_grid.loc[(big_grid['row']==142) & (big_grid['col']==378)]
big_grid.fillna(0.0)

big_grid2 = pd.merge(grid, 
                weighted_grid2.reset_index(), 
                left_on=['row','col'], 
                right_on=['row','col'], 
                how='left')
#big_grid.loc[(big_grid['row']==142) & (big_grid['col']==378)]
big_grid2.fillna(0.0)

big_grid3 = pd.merge(grid, 
                weighted_grid3.reset_index(), 
                left_on=['row','col'], 
                right_on=['row','col'], 
                how='left')
#big_grid.loc[(big_grid['row']==142) & (big_grid['col']==378)]
big_grid3.fillna(0.0)

big_grid4 = pd.merge(grid, 
                weighted_grid4.reset_index(), 
                left_on=['row','col'], 
                right_on=['row','col'], 
                how='left')
#big_grid.loc[(big_grid['row']==142) & (big_grid['col']==378)]
big_grid4.fillna(0.0)

big_grid5 = pd.merge(grid, 
                weighted_grid5.reset_index(), 
                left_on=['row','col'], 
                right_on=['row','col'], 
                how='left')
#big_grid.loc[(big_grid['row']==142) & (big_grid['col']==378)]
big_grid5.fillna(0.0)

big_grid6 = pd.merge(grid, 
                weighted_grid6.reset_index(), 
                left_on=['row','col'], 
                right_on=['row','col'], 
                how='left')
#big_grid.loc[(big_grid['row']==142) & (big_grid['col']==378)]
big_grid6.fillna(0.0)

Unnamed: 0.1,Unnamed: 0,row,col,lat,lon,total_count_x,total_count_y
0,0,0,0,32.100000,-118.426461,0.000000,0.000000
1,1,0,1,32.100000,-118.259176,0.000000,0.000000
2,2,0,2,32.100000,-118.091892,0.000000,0.000000
3,3,0,3,32.100000,-117.924607,0.000000,0.000000
4,4,0,4,32.100000,-117.757323,0.000000,0.000000
5,5,0,5,32.100000,-117.590038,0.000000,0.000000
6,6,0,6,32.100000,-117.422753,0.000000,0.000000
7,7,0,7,32.100000,-117.255469,0.000000,0.000000
8,8,0,8,32.100000,-117.088184,254.000000,254.000000
9,9,0,9,32.100000,-116.920899,0.000000,0.000000


In [127]:
big_grid['total_count'].max()

772543.66666666686

In [128]:
big_grid2['am_early'].max()

14550.666666666686

In [129]:
big_grid3['am_peak'].max()

135688.99999999983

In [130]:
big_grid4['midday'].max()

374653.33333333326

In [131]:
big_grid5['pm_late'].max()

17042.999999999942

In [132]:
big_grid6['pm_peak'].max()

232405.33333333337

In [133]:
big_grid

Unnamed: 0,row,col,lat,lon,total_count
0,0,0,32.100000,-118.426461,
1,0,1,32.100000,-118.259176,
2,0,2,32.100000,-118.091892,
3,0,3,32.100000,-117.924607,
4,0,4,32.100000,-117.757323,
5,0,5,32.100000,-117.590038,
6,0,6,32.100000,-117.422753,
7,0,7,32.100000,-117.255469,
8,0,8,32.100000,-117.088184,254.000000
9,0,9,32.100000,-116.920899,


In [134]:
big_grid2

Unnamed: 0,row,col,lat,lon,am_early
0,0,0,32.100000,-118.426461,
1,0,1,32.100000,-118.259176,
2,0,2,32.100000,-118.091892,
3,0,3,32.100000,-117.924607,
4,0,4,32.100000,-117.757323,
5,0,5,32.100000,-117.590038,
6,0,6,32.100000,-117.422753,
7,0,7,32.100000,-117.255469,
8,0,8,32.100000,-117.088184,0.000000
9,0,9,32.100000,-116.920899,


In [135]:
big_grid3

Unnamed: 0,row,col,lat,lon,am_peak
0,0,0,32.100000,-118.426461,
1,0,1,32.100000,-118.259176,
2,0,2,32.100000,-118.091892,
3,0,3,32.100000,-117.924607,
4,0,4,32.100000,-117.757323,
5,0,5,32.100000,-117.590038,
6,0,6,32.100000,-117.422753,
7,0,7,32.100000,-117.255469,
8,0,8,32.100000,-117.088184,57.000000
9,0,9,32.100000,-116.920899,


In [136]:
big_grid4

Unnamed: 0,row,col,lat,lon,midday
0,0,0,32.100000,-118.426461,
1,0,1,32.100000,-118.259176,
2,0,2,32.100000,-118.091892,
3,0,3,32.100000,-117.924607,
4,0,4,32.100000,-117.757323,
5,0,5,32.100000,-117.590038,
6,0,6,32.100000,-117.422753,
7,0,7,32.100000,-117.255469,
8,0,8,32.100000,-117.088184,129.000000
9,0,9,32.100000,-116.920899,


In [137]:
big_grid5

Unnamed: 0,row,col,lat,lon,pm_late
0,0,0,32.100000,-118.426461,
1,0,1,32.100000,-118.259176,
2,0,2,32.100000,-118.091892,
3,0,3,32.100000,-117.924607,
4,0,4,32.100000,-117.757323,
5,0,5,32.100000,-117.590038,
6,0,6,32.100000,-117.422753,
7,0,7,32.100000,-117.255469,
8,0,8,32.100000,-117.088184,0.000000
9,0,9,32.100000,-116.920899,


In [138]:
big_grid6

Unnamed: 0,row,col,lat,lon,pm_peak
0,0,0,32.100000,-118.426461,
1,0,1,32.100000,-118.259176,
2,0,2,32.100000,-118.091892,
3,0,3,32.100000,-117.924607,
4,0,4,32.100000,-117.757323,
5,0,5,32.100000,-117.590038,
6,0,6,32.100000,-117.422753,
7,0,7,32.100000,-117.255469,
8,0,8,32.100000,-117.088184,68.000000
9,0,9,32.100000,-116.920899,


In [139]:
big_grid.to_csv('Heat_Map1.csv')
big_grid2.to_csv('Heat_Map_am_early.csv')
big_grid3.to_csv('Heat_Map_am_peak.csv')
big_grid4.to_csv('Heat_Map_midday.csv')
big_grid5.to_csv('Heat_Map_pm_late.csv')
big_grid6.to_csv('Heat_Map_pm_peak.csv')


## Data Visualization

In [198]:
grid = pd.read_csv('Heat_Map.csv')
grid2 = pd.read_csv('AM_Early_Heat_Map.csv')
grid3 = pd.read_csv('AM_Peak_Heat_Map.csv')
grid4 = pd.read_csv('Midday_Heat_Map.csv')
grid5 = pd.read_csv('PM_Late_Heat_Map.csv')
grid6 = pd.read_csv('PM_Peak_Heat_Map.csv')

In [199]:
#print values as strings or empty str if they are null
def null_print(val):
    if(val is not None):
        return str(val)
    else:
        return ''

#get_addresses
def get_address(loc):
    geo = geocoder.google(loc, method='reverse')
    if(geo is not None):
        address = null_print(geo.housenumber) + ' ' + null_print(geo.street) + ' ' + null_print(geo.city) + ' ' + null_print(geo.postal)
    else: 
        address = "Unknown"
    return address


In [200]:
weight_feature = 'total_count'#change this for different timeslots


#stats to use for classification
mean = grid[weight_feature].astype(float).mean()
std = grid[weight_feature].astype(float).std()

#convert to floats for folium to read these
grid['lat'] = grid['lat'].astype(float)
grid['lon'] = grid['lon'].astype(float)

#color constants for viz
COLOR_LOW = '#50f442'
COLOR_MILD = '#fff375'
COLOR_MEDIUM = '#ffa154'
COLOR_HIGH = '#ff5454'
COLOR_MAX = '#a43dff'

#depending on the weight we assign graded colors
def weight_to_color(w):
    if(w < mean - std):
        return COLOR_LOW
    elif (w < mean):
        return COLOR_MILD
    elif (w < mean + std):
        return COLOR_MEDIUM
    elif (w < (mean + 2 * std)):
        return COLOR_HIGH
    else:
        return COLOR_MAX


for index, row in grid.iterrows():
    lat = np.asscalar(row['lat'])
    lon = np.asscalar(row['lon'])
    loc = [lat,lon]
    if(not math.isnan(lon) and not math.isnan(lat)):
        weight = np.asscalar(row[weight_feature])
        color = weight_to_color(weight)
        address = get_address(loc)
        folium.RegularPolygonMarker(loc, popup=address, fill_color=color, rotation = -45, number_of_sides=4, radius=10, fill_opacity = 0.66).add_to(map_sd)

map_sd.save("total_count.html")


In [201]:
weight_feature = 'am_early'#change this for different timeslots


#stats to use for classification
mean = grid2[weight_feature].astype(float).mean()
std = grid2[weight_feature].astype(float).std()

#convert to floats for folium to read these
grid2['lat'] = grid2['lat'].astype(float)
grid2['lon'] = grid2['lon'].astype(float)

#color constants for viz
COLOR_LOW = '#50f442'
COLOR_MILD = '#fff375'
COLOR_MEDIUM = '#ffa154'
COLOR_HIGH = '#ff5454'
COLOR_MAX = '#a43dff'

#depending on the weight we assign graded colors
def weight_to_color(w):
    if(w < mean - std):
        return COLOR_LOW
    elif (w < mean):
        return COLOR_MILD
    elif (w < mean + std):
        return COLOR_MEDIUM
    elif (w < (mean + 2 * std)):
        return COLOR_HIGH
    else:
        return COLOR_MAX


for index, row in grid2.iterrows():
    lat = np.asscalar(row['lat'])
    lon = np.asscalar(row['lon'])
    loc = [lat,lon]
    if(not math.isnan(lon) and not math.isnan(lat)):
        weight = np.asscalar(row[weight_feature])
        color = weight_to_color(weight)
        address = get_address(loc)
        folium.RegularPolygonMarker(loc, popup=address, fill_color=color, rotation = -45, number_of_sides=4, radius=10, fill_opacity = 0.66).add_to(map_sd2)

map_sd2.save("am_early.html")


In [202]:
weight_feature = 'am_peak'#change this for different timeslots


#stats to use for classification
mean = grid3[weight_feature].astype(float).mean()
std = grid3[weight_feature].astype(float).std()

#convert to floats for folium to read these
grid3['lat'] = grid3['lat'].astype(float)
grid3['lon'] = grid3['lon'].astype(float)

#color constants for viz
COLOR_LOW = '#50f442'
COLOR_MILD = '#fff375'
COLOR_MEDIUM = '#ffa154'
COLOR_HIGH = '#ff5454'
COLOR_MAX = '#a43dff'

#depending on the weight we assign graded colors
def weight_to_color(w):
    if(w < mean - std):
        return COLOR_LOW
    elif (w < mean):
        return COLOR_MILD
    elif (w < mean + std):
        return COLOR_MEDIUM
    elif (w < (mean + 2 * std)):
        return COLOR_HIGH
    else:
        return COLOR_MAX


for index, row in grid3.iterrows():
    lat = np.asscalar(row['lat'])
    lon = np.asscalar(row['lon'])
    loc = [lat,lon]
    if(not math.isnan(lon) and not math.isnan(lat)):
        weight = np.asscalar(row[weight_feature])
        color = weight_to_color(weight)
        address = get_address(loc)
        folium.RegularPolygonMarker(loc, popup=address, fill_color=color, rotation = -45, number_of_sides=4, radius=10, fill_opacity = 0.66).add_to(map_sd3)

map_sd3.save("am_peak.html")


In [203]:
weight_feature = 'midday'#change this for different timeslots


#stats to use for classification
mean = grid4[weight_feature].astype(float).mean()
std = grid4[weight_feature].astype(float).std()

#convert to floats for folium to read these
grid4['lat'] = grid4['lat'].astype(float)
grid4['lon'] = grid4['lon'].astype(float)

#color constants for viz
COLOR_LOW = '#50f442'
COLOR_MILD = '#fff375'
COLOR_MEDIUM = '#ffa154'
COLOR_HIGH = '#ff5454'
COLOR_MAX = '#a43dff'

#depending on the weight we assign graded colors
def weight_to_color(w):
    if(w < mean - std):
        return COLOR_LOW
    elif (w < mean):
        return COLOR_MILD
    elif (w < mean + std):
        return COLOR_MEDIUM
    elif (w < (mean + 2 * std)):
        return COLOR_HIGH
    else:
        return COLOR_MAX


for index, row in grid4.iterrows():
    lat = np.asscalar(row['lat'])
    lon = np.asscalar(row['lon'])
    loc = [lat,lon]
    if(not math.isnan(lon) and not math.isnan(lat)):
        weight = np.asscalar(row[weight_feature])
        color = weight_to_color(weight)
        address = get_address(loc)
        folium.RegularPolygonMarker(loc, popup=address, fill_color=color, rotation = -45, number_of_sides=4, radius=10, fill_opacity = 0.66).add_to(map_sd4)

map_sd4.save("midday.html")


In [204]:
weight_feature = 'pm_late'#change this for different timeslots


#stats to use for classification
mean = grid5[weight_feature].astype(float).mean()
std = grid5[weight_feature].astype(float).std()

#convert to floats for folium to read these
grid5['lat'] = grid5['lat'].astype(float)
grid5['lon'] = grid5['lon'].astype(float)

#color constants for viz
COLOR_LOW = '#50f442'
COLOR_MILD = '#fff375'
COLOR_MEDIUM = '#ffa154'
COLOR_HIGH = '#ff5454'
COLOR_MAX = '#a43dff'

#depending on the weight we assign graded colors
def weight_to_color(w):
    if(w < mean - std):
        return COLOR_LOW
    elif (w < mean):
        return COLOR_MILD
    elif (w < mean + std):
        return COLOR_MEDIUM
    elif (w < (mean + 2 * std)):
        return COLOR_HIGH
    else:
        return COLOR_MAX


for index, row in grid5.iterrows():
    lat = np.asscalar(row['lat'])
    lon = np.asscalar(row['lon'])
    loc = [lat,lon]
    if(not math.isnan(lon) and not math.isnan(lat)):
        weight = np.asscalar(row[weight_feature])
        color = weight_to_color(weight)
        address = get_address(loc)
        folium.RegularPolygonMarker(loc, popup=address, fill_color=color, rotation = -45, number_of_sides=4, radius=10, fill_opacity = 0.66).add_to(map_sd5)

map_sd5.save("pm_late.html")


In [205]:
weight_feature = 'pm_peak'#change this for different timeslots


#stats to use for classification
mean = grid6[weight_feature].astype(float).mean()
std = grid6[weight_feature].astype(float).std()

#convert to floats for folium to read these
grid6['lat'] = grid6['lat'].astype(float)
grid6['lon'] = grid6['lon'].astype(float)

#color constants for viz
COLOR_LOW = '#50f442'
COLOR_MILD = '#fff375'
COLOR_MEDIUM = '#ffa154'
COLOR_HIGH = '#ff5454'
COLOR_MAX = '#a43dff'

#depending on the weight we assign graded colors
def weight_to_color(w):
    if(w < mean - std):
        return COLOR_LOW
    elif (w < mean):
        return COLOR_MILD
    elif (w < mean + std):
        return COLOR_MEDIUM
    elif (w < (mean + 2 * std)):
        return COLOR_HIGH
    else:
        return COLOR_MAX


for index, row in grid6.iterrows():
    lat = np.asscalar(row['lat'])
    lon = np.asscalar(row['lon'])
    loc = [lat,lon]
    if(not math.isnan(lon) and not math.isnan(lat)):
        weight = np.asscalar(row[weight_feature])
        color = weight_to_color(weight)
        address = get_address(loc)
        folium.RegularPolygonMarker(loc, popup=address, fill_color=color, rotation = -45, number_of_sides=4, radius=10, fill_opacity = 0.66).add_to(map_sd6)

map_sd6.save("pm_peak.html")


In [217]:
"""
Key:

Purple = Very High
Red = High
Orange = Medium
Yellow = Low

Note: You can view the visualisations by opening the respective html files which are uploaded as well 

"""

map_sd #Map for Total counts

In [212]:
map_sd2 #Map for am early

In [213]:
map_sd3 #Map for am peak

In [214]:
map_sd4 #Map for midday

In [215]:
map_sd5 #Map for pm late

In [216]:
map_sd6 #Map for pm peak

## Conclusions/Discussion

## The 6 geo-visualizations obtained at the end of our project on the grid created (which in actuality, are the roads of San Diego) denote how busy different areas are in San-Diego based on the foot-fall of these areas, for 6 different time zones of the day (Early morning, Peak morning time, Midday, Early evening and Peak evening time) categorized on the foot-traffic these places receive at different times. From these obtained  maps, it is evident from the different colored blocks at different locations, that certain places recieve significantly higher foot-traffic than others. (Purple blocks represent the highest foot-traffic areas, while Yellow blocks represent the areas with lowest foot traffic). 

## It is interesting to note that certain areas are busy throughout the day, like parts of Coronado and near the International Airport. Also, at the extreme hours of the evening and morning, significantly higher foot-traffic is noticed in popular areas of San Diego (like Coronado, Pacific Beach, parts of La Jolla)  which is displayed by a wider spread of purple blocks over these geographic area.