# Capstone Project Report – The Battle of Neighbourhoods in Chicago

## Introduction/Business Problem

A sushi franchise owner is seeking perfect locations to open branches where he can intorduce the finest sushi to the residents of the city. However, he is new to the city and couldn't decide where to set root for the business to grow. The three rules for starting a business are 1)location, 2)location, and 3)location! Therefore, he seeks help from data scientists and engineers to solve the problem that could be the deciding factor to this expansion.

## Data

Retriving postal and geolocation data from wikipedia for locating the neighbourhoods in Chicago.
Use Foursquare API to explore venues in the neighbourhoods for analysis.
Use Foursquare API to search for sushi restaurants in the neighbourhoods.
Use Foursquare API to extract number of users who have liked the restaurants.

In [93]:
import bs4
from bs4 import BeautifulSoup

import requests

import pandas as pd

import re

In [67]:
url = 'https://en.wikipedia.org/wiki/Community_areas_in_Chicago'
data = requests.get(url).text # send GET request and store as text data
my_soup = BeautifulSoup(data, 'html5lib') # parse the data with beautifulsoup

# search for target table
tables = my_soup.find_all('table')

for index, table in enumerate(tables):
    if 'Chicago community areas by number, population, and area' in str(table): # find the string in our target table that is unique to the other tables
        target_table_index = index
print('There are {} tables found.\nTarget Table Index : {}'.format(index + 1, target_table_index))

There are 4 tables found.
Target Table Index : 0


In [100]:
# convert DMS coordinate to decimal
def dms2dd(s):
    if '″' in s:
        degrees, minutes, seconds, direction = re.split('[°′″]+', s)
        dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60)
        if direction in ('S','W'):
            dd*= -1

    else:
        degrees, minutes, direction = re.split('[°′]+', s)
        dd = float(degrees) + float(minutes)/60
        if direction in ('S','W'):
            dd*= -1

    return dd

# get coordinate from wiki sub page
def get_coordinate(row, name):
    link = 'https://en.wikipedia.org' + row.find('a')['href']
    data = requests.get(link).text
    sub_soup = BeautifulSoup(data,'html5lib')

    table = sub_soup.find('table', {'class':'infobox geography vcard'})
    latitude = table.find('span', {'class':'latitude'}).getText()
    longitude = table.find('span', {'class':'longitude'}).getText()

    latitude = dms2dd(latitude)
    longitude = dms2dd(longitude)

    return latitude, longitude

# create dataframe then append contents from wikipedia : https://en.wikipedia.org/wiki/Community_areas_in_Chicago
community_df =  pd.DataFrame(columns = ['No.', 'Name', 'Latitude', 'Longitude'])

for count, row in enumerate(tables[target_table_index].tbody.find_all('tr')):
    if count > 1 and count < 79:
        number = row.find('td').getText().replace('\n', '')
        name = row.find('a').getText()

        latitude, longitude = get_coordinate(row, name)

        community_df = community_df.append({'No.' : number, 'Name' : name, 'Latitude' : latitude, 'Longitude' : longitude}, ignore_index = True)

community_df

Unnamed: 0,No.,Name,Latitude,Longitude
0,01,Rogers Park,42.010000,-87.670000
1,02,West Ridge,42.000000,-87.690000
2,03,Uptown,41.970000,-87.660000
3,04,Lincoln Square,41.970000,-87.690000
4,05,North Center,41.950000,-87.680000
...,...,...,...,...
72,73,Washington Heights,41.703833,-87.653667
73,74,Mount Greenwood,41.700000,-87.710000
74,75,Morgan Park,41.690000,-87.670000
75,76,O'Hare,42.000000,-87.920000
