# Best Locations for Fast Food in Downtown Chicago

### This file will be in 3 sections. The first section will contain the introduction, the data and the methodoloy. The second section will contain the Python code that was used to obtain and use the data. The final section will be the results and conclusion.

## Intro
    Suppose one were to open a new fast food restaurant in the downtown area of Chicago,IL. The main intent of this restaurant would be speed and ease of access. The main target for this bussiness is commuter traffic and the local lunch crowd.
    
    The target for this project would be either chains or small business owners that are aiming to open a new location in downtown Chicago that fit the criteria (fast and high volume). 
    
## Data
    The data will come from the foursquare API located near downtown Chicago. Mass transit lines will be the main points of interest.  The 'L' (or Elevated train / train) will take priority over bus lines due to higher volume and likely distance. Two starting points will be used, 50 E Madison and 500 W Madison. The first is closer to downtown and more businesses, the second is the location of the largest Metra commuter stations in Chicago and thus to have a large portion of the commuter traffic. 
    The data will be separeted into 3 categories: Bus stops, train stops and other fast food. The data will then be sanitized and organized before going through analysis.


## Code 

In [12]:
import requests
import pandas as pd
import numpy as np
import random
from geopy.geocoders import Nominatim
from IPython.display import Image
from IPython.core.display import HTML
from pandas.io.json import json_normalize
import folium
from sklearn.cluster import KMeans

In [119]:
CLIENT_ID = 'B4U2MZYSR1BHH4SH5B5K1NEQRYKBRFRVXFBBTEIUAM2WPAK3' # your Foursquare ID
CLIENT_SECRET = 'ZNRMEUKU45XWHJDYQBPA2WQVCPVPYFVONLAPPY0VV12XJB4E' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 10000

In [165]:
address = '500 West Madison St, Chicago, IL'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

radius = 1500

print(latitude, longitude)

41.8837713 -87.6405031


In [166]:
search_query = 'CTA'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

In [167]:
search_query = 'Fast food'
url2 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)


In [168]:
CTAresults = requests.get(url).json()

In [169]:
FFresults = requests.get(url2).json()

In [170]:
CTAven = CTAresults['response']['venues']
FFven = FFresults['response']['venues']

CTAdf = pd.json_normalize(CTAven)
FFdf = pd.json_normalize(FFven)

CTAbus = CTAdf[CTAdf['name'].str.contains("Bus")]
CTAtrain = CTAdf[~CTAdf['name'].str.contains("Bus|bus")]

In [171]:
def filter_dataframe(df):

    # keep only columns that include venue name, and anything that is associated with location
    filtered_columns = ['name', 'categories'] + [col for col in df.columns if col.startswith('location.')] + ['id']
    dataframe_filtered = df.loc[:, filtered_columns]

    # function that extracts the category of the venue
    def get_category_type(row):
            try:
                categories_list = row['categories']
            except:
                categories_list = row['venue.categories']
        
            if len(categories_list) == 0:
                return None
            else:
                return categories_list[0]['name']

    # filter the category for each row
    dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

    # clean column names by keeping only last term
    dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

    return dataframe_filtered

In [172]:
Busdf = filter_dataframe(CTAbus)
Traindf= filter_dataframe(CTAtrain)
FastFooddf = filter_dataframe(FFdf)

In [173]:
myMap = folium.Map(location = [Traindf['lat'].iloc[0],Traindf['lng'].iloc[0]], zoom_start= 15)
myFoodMap = folium.Map(location = [Traindf['lat'].iloc[0],Traindf['lng'].iloc[0]], zoom_start= 15)
for i in range(0, len(Traindf)):
    folium.CircleMarker([Traindf['lat'].iloc[i],Traindf['lng'].iloc[i]], radius = 5, color = 'blue', fill = True, fill_color='blue', popup=Traindf['name'].iloc[i]).add_to(myMap)

for i in range(0, len(Busdf)):
    folium.CircleMarker([Busdf['lat'].iloc[i],Busdf['lng'].iloc[i]], radius = 3, color = 'red', fill = True, fill_color='red', popup=Busdf['name'].iloc[i]).add_to(myMap)

for i in range(0, len(FastFooddf)):
    folium.CircleMarker([FastFooddf['lat'].iloc[i],FastFooddf['lng'].iloc[i]], radius = 3, color = 'green', fill = True, fill_color='green', popup=FastFooddf['name'].iloc[i]).add_to(myFoodMap)

myMap

In [174]:
def getKMcluster(df):
    kmeans = KMeans(init="k-means++", n_clusters=1, n_init=12)
    X = np.vstack((df['lat'],df['lng'])).T
    kmeans.fit(X)
    kmClusterCenters = kmeans.cluster_centers_
    return kmClusterCenters

In [175]:
X1 = getKMcluster(Traindf)
X2 = getKMcluster(Busdf)
X3 = getKMcluster(FastFooddf)

In [176]:
Thismap = folium.Map(location = [X1[0][0],X1[0][1]], zoom_start= 15)
folium.CircleMarker([X1[0][0],X1[0][1]], radius = 7, color = 'red', fill = True, fill_color='red', popup="Trains").add_to(Thismap)
folium.CircleMarker([X2[0][0],X2[0][1]], radius = 7, color = 'blue', fill = True, fill_color='red', popup="Busses").add_to(Thismap)
folium.CircleMarker([X3[0][0],X3[0][1]], radius = 7, color = 'green', fill = True, fill_color='red', popup="Fast Food").add_to(Thismap)

Thismap

# Results

The results of the data show that, based on trains, busses and other fast food places, has two options. The first is based on targeting the Metra commuters, and the prime spot would be Wacker and Washington. This is the higher density of transit for some one coming from or going to the Metra while still targeting the locals using the CTA. Ignoring the Metra the data shows Washington and State. 

# Conclusion

This is a rough example of using the foursquare geo data to find the CTA locations and finding a place near the most of them. Data I was unable to find that would make a much more complete data analysis would include several different data sets. These include htroughput of each CTA stop, the total number of communters and tranist times, information on how many people commute via mass transit AND eat fast food. There would also be logic to surveying on if an area can use another fast food joint or if the market is saturated. Finally location prices would eb the final data point.

All that being said, the data is useful for people who routinely commute in and out of the downtown area.