# Final Project

Chang Che\
Dec 2019

## Research Question Description

In the first section, I will provide a description of the problem and a discussion of the background.

I am curious: __What venues are missing in the South Bend/Mishawaka area of Indiana__ since Mayor Pete Buttigieg, our mayor of South Bend came in the mayor's office?

We know that Mayor Pete Buttigieg is a rising star since he announced running for 2020 election of the President of the United States since last year. He has been the major of our local area since 2012 (seven years ago). Aged only 37, he is not only the youngest democratic candidate, but also he has done a lot of local policy for the improvent to our communities and neighborhoods.

The city of South Bend is the fourth largest city in Indiana. However, for many years, it has been seen as one of the most decaying cities in the mid-west. __Comparing with the the largest city of Indiana, Indianapolis is my major interest here.__

To further address this problem, I will use the venue data the this area from two different places, South Bend and Indianapolis at the current moment - December 8th, 2019. 

I will compare the different clustering configurations in the same area and try to find out if there are some interesting differences in the local venues.

## Data Scraping and Description

In this section, I will use the Foursquare website to obtain the venue data in the South Bend/Mishawaka area of Indiana. I will further provide a basic description of the data and how it will be used to solve the problem. 

First of all, I will need the center location information of the South Bend/Mishawaka area.

In [1]:
# Load related libraries:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
import csv

# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

%matplotlib inline

### Map of South Bend/Mishawaka Area

In general, when talking about the South Bend area, we also consider a nearby city: Mishawaka. Here I show South Bend first and will show a map of Mishawaka, which can be partly visible in the map of South Bend.

In [2]:
# Obtain locations: latitude and longitude
address_sb = 'South Bend, Indiana'
geolocator = Nominatim(user_agent="ny_explorer")
location_sb = geolocator.geocode(address_sb)
latitude_sb = location_sb.latitude
longitude_sb = location_sb.longitude
print('The geograpical coordinate of South Bend are {}, {}.'.format(latitude_sb, longitude_sb))

The geograpical coordinate of South Bend are 41.6833813, -86.2500066.


First map of visualization of South Bend:

In [3]:
# create map of South Bend using latitude and longitude values
map_sb = folium.Map(location=[latitude_sb, longitude_sb], zoom_start=12)
map_sb

In [4]:
# Obtain locations: latitude and longitude
address_mi = 'Mishawaka, Indiana'
geolocator = Nominatim(user_agent="ny_explorer")
location_mi = geolocator.geocode(address_mi)
latitude_mi = location_mi.latitude
longitude_mi = location_mi.longitude
print('The geograpical coordinate of Mishawaka are {}, {}.'.format(latitude_mi, longitude_mi))

The geograpical coordinate of Mishawaka are 41.6619927, -86.1586156.


### Map of Indianapolis

In [5]:
# Obtain locations: latitude and longitude
address_polis = 'Indianapolis, Indiana'
geolocator = Nominatim(user_agent="ny_explorer")
location_polis = geolocator.geocode(address_polis)
latitude_polis = location_polis.latitude
longitude_polis = location_polis.longitude
# create map of Indianapolis using latitude and longitude values
map_polis = folium.Map(location=[latitude_polis, longitude_polis], zoom_start=12)
map_polis

Since this is clearly not a metropolitan like New York City or Toronto. There are only limited neighborhoods and much fewer venues on the map. And South Bend and Mishawaka are very similar in many aspects, so that I will treat them equally. I will use the venue data without further seperate them into different neighborhood. The clustering procedure will be done using the venue information of the whole area.

To accomplish this goal, the latitude and longitude of the area will be set as the median point of South Bend and Mishawaka:

In [6]:
name = 'South Bend, Mishawaka'
latitude = (latitude_sb + latitude_mi)/2
longitude = (longitude_sb + longitude_mi)/2
print("For the whole "+ name + " area, the location information is: latitude " + str(round(latitude,3)) + " and longitude " + str(round(longitude,3)) + ".")

For the whole South Bend, Mishawaka area, the location information is: latitude 41.673 and longitude -86.204.


Secondly, another important thing is to locates the venues and start utilizing the Foursquare API to explore the neighborhoods.

In [7]:
# Foursquare API
CLIENT_ID = '5WD4QPP0N04UIK5QSHQGOE4N40MK3BNFZGZICX0YNBXWGNB5' # your Foursquare ID
CLIENT_SECRET = 'K12Q2SP11PV0YTSRIKRDB1YVKUERVF3B3YM0JDADNZXH0LP0' # your Foursquare Secret
VERSION_new = '20191208' # Foursquare API version for the current time point
radius = 15000 # 15 kilometers
LIMIT = 100 # The maximal number of venues can be returned for me is 100.

### Retrieve South Bend Data Set

In [8]:
# ----------------------------- 2019 Venues of SB ----------------------------- #
# create the API request URL
url_2019 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION_new, # 2019 data using the Version_new
    latitude, 
    longitude, 
    radius, 
    LIMIT)

# make the GET request
results_2019 = requests.get(url_2019).json()["response"]['groups'][0]['items']

venues_list_2019=[]
# return only relevant information for each nearby venue
venues_list_2019.append([(v['venue']['name'], 
    v['venue']['location']['city'],
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],  
    v['venue']['categories'][0]['name']) for v in results_2019])

Aggregate the 2019 data and let's check the size of the resulting dataframe of South Bend:

In [9]:
nearby_venues_2019 = pd.DataFrame([item for venues_list_2019 in venues_list_2019 for item in venues_list_2019])
nearby_venues_2019.columns = ['Venue','City','Venue Latitude','Venue Longitude','Venue Category']
nearby_venues_2019.head()

Unnamed: 0,Venue,City,Venue Latitude,Venue Longitude,Venue Category
0,The Galley At Tradewinds,Mishawaka,41.67984,-86.193735,Bar
1,Planet Fitness,Mishawaka,41.682551,-86.18865,Gym / Fitness Center
2,Farmer's Market of South Bend,South Bend,41.666112,-86.234187,Farmers Market
3,Compton Family Ice Arena,South Bend,41.693928,-86.231229,Hockey Arena
4,Potawatomi Zoo,South Bend,41.6694,-86.217684,Zoo


In [10]:
nearby_venues_2019.shape

(100, 5)

### Retrieve Indianapolis Data Set

In [11]:
# ----------------------------- 2019 Venues of Indianapolis ----------------------------- #
# create the API request URL
url_2019_2 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION_new, # 2019 data using the Version_new
    latitude_polis, 
    longitude_polis, 
    radius, 
    LIMIT)

# make the GET request
results_2019_2 = requests.get(url_2019_2).json()["response"]['groups'][0]['items']

venues_list_2019_polis=[]
# return only relevant information for each nearby venue
venues_list_2019_polis.append([(v['venue']['name'], 
    v['venue']['location']['city'],
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],  
    v['venue']['categories'][0]['name']) for v in results_2019_2])

Aggregate the 2019 data and let's check the size of the resulting dataframe of Indianapolis:

In [12]:
nearby_venues_2019_polis = pd.DataFrame([item for venues_list_2019_polis in venues_list_2019_polis for item in venues_list_2019_polis])
nearby_venues_2019_polis.columns = ['Venue','City','Venue Latitude','Venue Longitude','Venue Category']
nearby_venues_2019_polis.head()

Unnamed: 0,Venue,City,Venue Latitude,Venue Longitude,Venue Category
0,Monument Circle,Indianapolis,39.768382,-86.158059,Plaza
1,Hilbert Circle Theatre,Indianapolis,39.768311,-86.157551,Concert Hall
2,Rocket Fizz,Indianapolis,39.768417,-86.157645,Candy Store
3,Qdoba Mexican Grill,Indianapolis,39.767481,-86.15786,Mexican Restaurant
4,PEARings Frozen Yogurt & Beyond,Indianapolis,39.767196,-86.158345,Frozen Yogurt Shop


In [13]:
nearby_venues_2019_polis.shape

(100, 5)