# Applied Data Science Capstone Project.

## The Battle of Neighborhoods - A Comparative Analysis of Neighborhoods in San Francisco, CA & Chicago Illinios.

----------------------------------------------------------------------------------------------------------------------

### Part 1 - Exploratory Data Analysis
#### 1.1 Data Import & Cleaning

In [1]:
# Import Libraries
! conda install lxml --yes
! conda install html5lib  --yes
! conda install BeautifulSoup4  --yes
import html5lib
import lxml
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\user\Anaconda3

  added / updated specs:
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.24  |                0         125 KB
    certifi-2020.6.20          |           py37_0         156 KB
    conda-4.8.3                |           py37_0         2.8 MB
    ------------------------------------------------------------
                                           Total:         3.1 MB

The following packages will be UPDATED:

  ca-certificates    conda-forge::ca-certificates-2020.6.2~ --> pkgs/main::ca-certificates-2020.6.24-0

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            conda-forge::certifi-2020.6.20-py37hc~ --> pkgs/main::certifi-2020.6.20

#### 1.2 San Francisco, CA

In [73]:
# Import data set that includes the zip code, and coordinates
url = pd.read_csv('https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/?format=csv&timezone=America/Chicago&use_labels_for_header=true', sep = ';')

# Delete unwanted columns
url = url.drop(['Timezone', 'Daylight savings time flag', 'geopoint'], axis = 1)

# Create a dataframe for Dallas Zip Codes
sanf_zip = url[(url['City'] == 'San Francisco') & (url['State'] == 'CA')].reset_index(drop=True)
sanf_zip.rename(columns={'Zip':'Zip_Code'}, inplace=True) # rename column name
sanf_zip.head()

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude
0,94146,San Francisco,CA,37.784827,-122.727802
1,94165,San Francisco,CA,37.784827,-122.727802
2,94155,San Francisco,CA,37.784827,-122.727802
3,94152,San Francisco,CA,37.784827,-122.727802
4,94117,San Francisco,CA,37.770937,-122.44276


In [74]:
sanf_zip.shape

(71, 5)

In [159]:
# Import a data set that has the names of neighborhoods along with the zip codes for San Francisco
zipcode = pd.read_html('http://www.healthysf.org/bdi/outcomes/zipmap.htm')[4] # using pandas
sanf_column_names = ['ZipCode', 'AreaName'] # rename columns
zipcode.columns = sanf_column_names

# Delete unwanted rows and reset index
sanf_zipcode = zipcode.drop([0, 22]).reset_index(drop=True)
sanf_zipcode.head()

Unnamed: 0,ZipCode,AreaName
0,94102,Hayes Valley/Tenderloin/North of Market
1,94103,South of Market
2,94107,Potrero Hill
3,94108,Chinatown
4,94109,Polk/Russian Hill (Nob Hill)


In [160]:
# Check the datatype prior to merging
print(sanf_zip.dtypes)
print(sanf_zipcode.dtypes)

Zip_Code       int64
City          object
State         object
Latitude     float64
Longitude    float64
dtype: object
ZipCode     object
AreaName    object
dtype: object


In [161]:
# convert ZipCode in sanf_zipcode dataframe from object to int
sanf_zipcode['ZipCode'] = sanf_zipcode['ZipCode'].astype('int')
sanf_zipcode.dtypes

ZipCode      int32
AreaName    object
dtype: object

In [162]:
# Merge both dataframes to create a new dataframe 
sanf_data = pd.merge(sanf_zip, sanf_zipcode, left_on='Zip_Code', right_on='ZipCode')

# Ensure there are no missing values
print(sanf_data.isna().any())
print(sanf_data.isnull().any())

# Delete unwanted columns
del sanf_data['ZipCode'] # delete the second ZipCode column

Zip_Code     False
City         False
State        False
Latitude     False
Longitude    False
ZipCode      False
AreaName     False
dtype: bool
Zip_Code     False
City         False
State        False
Latitude     False
Longitude    False
ZipCode      False
AreaName     False
dtype: bool


In [166]:
# Check the merged data
sanf_data

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude,AreaName
0,94117,San Francisco,CA,37.770937,-122.44276,Haight-Ashbury
1,94131,San Francisco,CA,37.741797,-122.4378,Twin Peaks-Glen Park
2,94114,San Francisco,CA,37.758434,-122.43512,Castro/Noe Valley
3,94107,San Francisco,CA,37.766529,-122.39577,Potrero Hill
4,94116,San Francisco,CA,37.743381,-122.48578,Parkside/Forest Hill
5,94133,San Francisco,CA,37.801878,-122.41018,North Beach/Chinatown
6,94108,San Francisco,CA,37.792678,-122.40793,Chinatown
7,94121,San Francisco,CA,37.778729,-122.49265,Outer Richmond
8,94127,San Francisco,CA,37.734964,-122.4597,St. Francis Wood/Miraloma/West Portal
9,94118,San Francisco,CA,37.782029,-122.46158,Inner Richmond


#### 1.3 Chicago, IL

In [99]:
# Import data set that includes the zip code, and coordinates
url = pd.read_csv('https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/?format=csv&timezone=America/Chicago&use_labels_for_header=true', sep = ';')
# Delete unwanted columns
url = url.drop(['Timezone', 'Daylight savings time flag', 'geopoint'], axis = 1)
# Create a dataframe for Chicago Zip Codes
chi_zip = url[(url['City'] == 'Chicago') & (url['State'] == 'IL')].reset_index(drop=True)
chi_zip.rename(columns={'Zip':'Zip_Code'}, inplace=True) # rename column name
chi_zip.head()

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude
0,60691,Chicago,IL,41.811929,-87.68732
1,60699,Chicago,IL,41.811929,-87.68732
2,60634,Chicago,IL,41.944454,-87.79654
3,60602,Chicago,IL,41.882937,-87.62874
4,60670,Chicago,IL,41.811929,-87.68732


In [101]:
# Import a data set that has the names of neighborhoods along with the zip codes for Chicago
zipcode = pd.read_html('https://www.chicagotribune.com/chi-community-areas-htmlstory.html')[3] # using pandas
chi_column_names = ['ZipCode', 'AreaName'] # rename columns
zipcode.columns = chi_column_names
zipcode.head()

Unnamed: 0,ZipCode,AreaName
0,60601,Loop
1,60602,Loop
2,60603,Loop
3,60604,Loop
4,60605,"Loop, Near South Side"


In [105]:
# Check the datatype prior to merging
print(chi_zip.dtypes)
print(zipcode.dtypes)

Zip_Code       int64
City          object
State         object
Latitude     float64
Longitude    float64
dtype: object
ZipCode      int64
AreaName    object
dtype: object


In [106]:
# Merge both dataframes to create a new dataframe 
chi_data = pd.merge(chi_zip, zipcode, left_on='Zip_Code', right_on='ZipCode')

# Ensure there are no missing values
print(chi_data.isna().any())
print(chi_data.isnull().any())

# Delete unwanted columns
del chi_data['ZipCode'] # delete the second ZipCode column

Zip_Code     False
City         False
State        False
Latitude     False
Longitude    False
ZipCode      False
AreaName     False
dtype: bool
Zip_Code     False
City         False
State        False
Latitude     False
Longitude    False
ZipCode      False
AreaName     False
dtype: bool


In [109]:
# Check the data 
chi_data.head()

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude,AreaName
0,60634,Chicago,IL,41.944454,-87.79654,"Belmont Cragin, Dunning, Montclare, Portage Park"
1,60602,Chicago,IL,41.882937,-87.62874,Loop
2,60601,Chicago,IL,41.886456,-87.62325,Loop
3,60645,Chicago,IL,42.008956,-87.69634,West Ridge
4,60651,Chicago,IL,41.901485,-87.74055,"Austin, Humboldt Park"


### Part 2 - Segmentation and Clustering

#### 2.1.1 Segmentation & Clustering - Downtown San Francisco

In [110]:
# import libraries 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\user\Anaconda3

  added / updated specs:
    - geopy


The following packages will be UPDATED:

  conda                       pkgs/main::conda-4.8.3-py37_0 --> conda-forge::conda-4.8.3-py37hc8dfbb8_1

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    pkgs/main::ca-certificates-2020.6.24-0 --> conda-forge::ca-certificates-2020.6.20-hecda079_0
  certifi               pkgs/main::certifi-2020.6.20-py37_0 --> conda-forge::certifi-2020.6.20-py37hc8dfbb8_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


#### 2.1.2 Create a map for San Francisco

In [119]:
# Get the coordinates to create the map of San Francisco
address = 'San Francisco, Califonia USA'

geolocator = Nominatim(user_agent="sanf_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Francisco are 37.5320542, -121.919973.


In [120]:
# create map of San Francisco using latitude and longitude values
map_San_Francisco = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city, areaName in zip(sanf_data['Latitude'], sanf_data['Longitude'], sanf_data['City'], sanf_data['Neighborhood']):
    label = '{}, {}'.format(city, areaName)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_San_Francisco)  
    
map_San_Francisco

In [169]:
# Create a dataframe for neighborhoods in Downtown San Francisco
sanf_downtown_data = sanf_data[sanf_data['AreaName'].str.contains('Market')].reset_index(drop=True)
sanf_downtown_data.head()

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude,AreaName
0,94103,San Francisco,CA,37.772329,-122.41087,South of Market
1,94102,San Francisco,CA,37.779329,-122.41915,Hayes Valley/Tenderloin/North of Market


In [170]:
# Get the coordinates to create the map of Downtown San Francisco
address = 'Market, San Francisco'

geolocator = Nominatim(user_agent="sanf_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown San Francisco are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown San Francisco are 37.7790262, -122.4199061.


In [171]:
# create map of Downtown Downtown San Francisco using latitude and longitude values
map_sanf_downtown = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(sanf_downtown_data['Latitude'], sanf_downtown_data['Longitude'], sanf_downtown_data['AreaName']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sanf_downtown)  
    
map_sanf_downtown

#### 2.1.3 Define Foursquare credentials and version

In [172]:
CLIENT_ID = 'GNYYZNSWIKYNSZDXOW2RKHRCEJD3SGBTMVQEBECBB4HTQRGX' # your Foursquare ID
CLIENT_SECRET = 'S0IDJ2S3AXP42WCQYQE2Y0COI32Y0TWAICQOZR32VJXL04FL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GNYYZNSWIKYNSZDXOW2RKHRCEJD3SGBTMVQEBECBB4HTQRGX
CLIENT_SECRET:S0IDJ2S3AXP42WCQYQE2Y0COI32Y0TWAICQOZR32VJXL04FL


In [173]:
# Explore the first area
sanf_downtown_data.loc[0, 'AreaName']

'South of Market'

#### 2.1.4 Get the coordinates for the first area

In [174]:
neighbourhood_latitude = sanf_downtown_data.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = sanf_downtown_data.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = sanf_downtown_data.loc[0, 'AreaName'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of South of Market are 37.772329, -122.41086999999999.


#### 2.1.5 Get the top 100 venues of Hayes Valley/Tenderloin/North of Market within a 500 mile radius

In [175]:
# type your answer here
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighbourhood_latitude, neighbourhood_longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=GNYYZNSWIKYNSZDXOW2RKHRCEJD3SGBTMVQEBECBB4HTQRGX&client_secret=S0IDJ2S3AXP42WCQYQE2Y0COI32Y0TWAICQOZR32VJXL04FL&ll=37.772329,-122.41086999999999&v=20180605&radius=500&limit=100'

In [176]:
# Send the GET request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f2b4c9451e46d2576d73090'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'SoMa',
  'headerFullLocation': 'SoMa, San Francisco',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 77,
  'suggestedBounds': {'ne': {'lat': 37.7768290045, 'lng': -122.40518767065868},
   'sw': {'lat': 37.7678289955, 'lng': -122.4165523293413}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bfdd7772b83b71364cba998',
       'name': 'City Dance Annex',
       'location': {'address': '1420 Harrison St',
        'crossStreet': 'btwn 10th & 11th Sts',
        'lat': 37.77141709937545,
        'lng': -122.41178415825

In [177]:
# Function to extract the category of the venues
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [178]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,City Dance Annex,Dance Studio,37.771417,-122.411784
1,The Cake Gallery,Bakery,37.773997,-122.411882
2,El Tonayense Taco Truck,Food Truck,37.771126,-122.412003
3,Piston & Chain,Motorcycle Shop,37.773841,-122.411247
4,Mr. S Leather & Mr. S Locker Room,Clothing Store,37.774117,-122.408853


In [179]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

77 venues were returned by Foursquare.


#### 2.1.6 Explore neighborhoods in Downtown San Francisco

In [180]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [181]:
# Create a dataframe for the Downtown San Francisco Venues
sanf_downtown_venues = getNearbyVenues(names=sanf_downtown_data['AreaName'],
                                   latitudes=sanf_downtown_data['Latitude'],
                                   longitudes=sanf_downtown_data['Longitude']
                                  )

South of Market
Hayes Valley/Tenderloin/North of Market


In [182]:
# Check size of dataframe
print(sanf_downtown_venues.shape)
sanf_downtown_venues.head()

(164, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,South of Market,37.772329,-122.41087,City Dance Annex,37.771417,-122.411784,Dance Studio
1,South of Market,37.772329,-122.41087,The Cake Gallery,37.773997,-122.411882,Bakery
2,South of Market,37.772329,-122.41087,El Tonayense Taco Truck,37.771126,-122.412003,Food Truck
3,South of Market,37.772329,-122.41087,Piston & Chain,37.773841,-122.411247,Motorcycle Shop
4,South of Market,37.772329,-122.41087,Mr. S Leather & Mr. S Locker Room,37.774117,-122.408853,Clothing Store


In [183]:
# Number of venues per neighbourhood
sanf_downtown_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Hayes Valley/Tenderloin/North of Market,87,87,87,87,87,87
South of Market,77,77,77,77,77,77


In [184]:
# Number of unique venue categories
print('There are {} uniques categories.'.format(len(sanf_downtown_venues['Venue Category'].unique())))

There are 85 uniques categories.


#### 2.1.7 Analyze each neighborhood

In [185]:
# one hot encoding
sanf_downtown_onehot = pd.get_dummies(sanf_downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sanf_downtown_onehot['Neighbourhood'] = sanf_downtown_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [sanf_downtown_onehot.columns[-1]] + list(sanf_downtown_onehot.columns[:-1])
sanf_downtown_onehot = sanf_downtown_onehot[fixed_columns]

sanf_downtown_onehot.head()

Unnamed: 0,Neighbourhood,Art Gallery,Art Museum,Auto Garage,BBQ Joint,Bakery,Bar,Beer Bar,Bookstore,Boutique,Bubble Tea Shop,Burger Joint,Café,Camera Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Cosmetics Shop,Credit Union,Dance Studio,Deli / Bodega,Dessert Shop,Dive Bar,Donut Shop,Electronics Store,Ethiopian Restaurant,Event Space,Farmers Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Gay Bar,General Entertainment,Grocery Store,Gym,Gym / Fitness Center,Hotel,Hotel Bar,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Lounge,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Motorcycle Shop,Music School,Music Venue,New American Restaurant,Nightclub,Opera House,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Poke Place,Ramen Restaurant,Restaurant,Rock Club,Sandwich Place,Shipping Store,Southern / Soul Food Restaurant,Sports Bar,Street Food Gathering,Sushi Restaurant,Taco Place,Thai Restaurant,Theater,Tiki Bar,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
0,South of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,South of Market,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,South of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,South of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,South of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [186]:
sanf_downtown_onehot.shape

(164, 86)

#### 2.1.8 Group each neighborhood

In [187]:
sanf_downtown_grouped = sanf_downtown_onehot.groupby('Neighbourhood').mean().reset_index()
sanf_downtown_grouped

Unnamed: 0,Neighbourhood,Art Gallery,Art Museum,Auto Garage,BBQ Joint,Bakery,Bar,Beer Bar,Bookstore,Boutique,Bubble Tea Shop,Burger Joint,Café,Camera Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Cosmetics Shop,Credit Union,Dance Studio,Deli / Bodega,Dessert Shop,Dive Bar,Donut Shop,Electronics Store,Ethiopian Restaurant,Event Space,Farmers Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Gay Bar,General Entertainment,Grocery Store,Gym,Gym / Fitness Center,Hotel,Hotel Bar,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Lounge,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Motorcycle Shop,Music School,Music Venue,New American Restaurant,Nightclub,Opera House,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Poke Place,Ramen Restaurant,Restaurant,Rock Club,Sandwich Place,Shipping Store,Southern / Soul Food Restaurant,Sports Bar,Street Food Gathering,Sushi Restaurant,Taco Place,Thai Restaurant,Theater,Tiki Bar,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
0,Hayes Valley/Tenderloin/North of Market,0.0,0.011494,0.0,0.0,0.022989,0.011494,0.022989,0.011494,0.022989,0.011494,0.022989,0.045977,0.0,0.011494,0.0,0.022989,0.045977,0.022989,0.0,0.011494,0.011494,0.0,0.011494,0.0,0.011494,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.034483,0.011494,0.0,0.0,0.0,0.0,0.0,0.045977,0.011494,0.011494,0.011494,0.011494,0.011494,0.0,0.011494,0.0,0.011494,0.011494,0.011494,0.011494,0.0,0.0,0.0,0.011494,0.022989,0.011494,0.0,0.011494,0.022989,0.0,0.022989,0.022989,0.011494,0.022989,0.011494,0.011494,0.011494,0.022989,0.0,0.022989,0.011494,0.011494,0.0,0.0,0.022989,0.011494,0.011494,0.034483,0.011494,0.022989,0.011494,0.034483,0.011494
1,South of Market,0.025974,0.0,0.012987,0.012987,0.012987,0.025974,0.0,0.0,0.0,0.0,0.0,0.025974,0.012987,0.0,0.025974,0.038961,0.025974,0.0,0.025974,0.0,0.025974,0.012987,0.012987,0.012987,0.0,0.012987,0.012987,0.012987,0.0,0.0,0.012987,0.025974,0.0,0.025974,0.051948,0.012987,0.012987,0.025974,0.025974,0.0,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.012987,0.025974,0.0,0.0,0.025974,0.012987,0.012987,0.038961,0.0,0.012987,0.0,0.090909,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.025974,0.012987,0.012987,0.0,0.0,0.012987,0.012987,0.025974,0.0,0.038961,0.0,0.0,0.0,0.0,0.025974,0.0


In [188]:
# Confirm new size
sanf_downtown_grouped.shape

(2, 86)

#### 2.1.9 Top 5 Venues in each neighborhood

In [189]:
num_top_venues = 5

for hood in sanf_downtown_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = sanf_downtown_grouped[sanf_downtown_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Hayes Valley/Tenderloin/North of Market----
               venue  freq
0              Hotel  0.05
1               Café  0.05
2        Coffee Shop  0.05
3           Wine Bar  0.03
4  French Restaurant  0.03


----South of Market----
             venue  freq
0        Nightclub  0.09
1          Gay Bar  0.05
2  Motorcycle Shop  0.04
3     Cocktail Bar  0.04
4  Thai Restaurant  0.04




In [190]:
# function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### 2.1.10 New Dataframe with Top 10 venues in each Neighborhood

In [191]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = sanf_downtown_grouped['Neighbourhood']

for ind in np.arange(sanf_downtown_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sanf_downtown_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hayes Valley/Tenderloin/North of Market,Café,Coffee Shop,Hotel,Wine Bar,Theater,French Restaurant,Pizza Place,Optical Shop,Park,Performing Arts Venue
1,South of Market,Nightclub,Gay Bar,Motorcycle Shop,Thai Restaurant,Cocktail Bar,Art Gallery,Restaurant,Cosmetics Shop,Lounge,Coffee Shop


#### 2.1.11 Clustering of Neighborhoods

#### _**Since the number of grouped neighborhoods are 2, the number of clusters will be limited to 2**_

In [192]:
# set number of clusters
kclusters = 2

sanf_downtown_grouped_clustering = sanf_downtown_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sanf_downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1])

In [193]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sanf_downtown_merged = sanf_downtown_data

# merge San Francisco_grouped with San Francisco_data to add latitude/longitude for each neighborhood
sanf_downtown_merged = sanf_downtown_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='AreaName')

sanf_downtown_merged.head() # check the last columns!

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude,AreaName,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,94103,San Francisco,CA,37.772329,-122.41087,South of Market,1,Nightclub,Gay Bar,Motorcycle Shop,Thai Restaurant,Cocktail Bar,Art Gallery,Restaurant,Cosmetics Shop,Lounge,Coffee Shop
1,94102,San Francisco,CA,37.779329,-122.41915,Hayes Valley/Tenderloin/North of Market,0,Café,Coffee Shop,Hotel,Wine Bar,Theater,French Restaurant,Pizza Place,Optical Shop,Park,Performing Arts Venue


#### 2.1.12 Visualize the clusters

In [195]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sanf_downtown_merged['Latitude'], sanf_downtown_merged['Longitude'], sanf_downtown_merged['AreaName'], sanf_downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

..

### 2.2 Segmentation & Clustering - Downtown Chicago

#### 2.2.1 Create a map for Chicago

In [114]:
# Get the coordinates to create the map of Chicago
address = 'Chicago, Illinois'

geolocator = Nominatim(user_agent="chi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chicago are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Chicago are 41.8755616, -87.6244212.


In [115]:
# create map of Chicago using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city, areaName in zip(chi_data['Latitude'], chi_data['Longitude'], chi_data['City'], chi_data['AreaName']):
    label = '{}, {}'.format(city, areaName)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

In [122]:
# Create a dataframe for neighborhoods in Downtown Chicago
chi_downtown_data = chi_data[chi_data['AreaName'].str.contains('Loop')].reset_index(drop=True)
chi_downtown_data.head()

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude,AreaName
0,60602,Chicago,IL,41.882937,-87.62874,Loop
1,60601,Chicago,IL,41.886456,-87.62325,Loop
2,60606,Chicago,IL,41.882582,-87.6376,"Loop, Near West Side"
3,60603,Chicago,IL,41.880446,-87.63014,Loop
4,60661,Chicago,IL,41.882082,-87.64461,"Loop, Near West Side"


In [123]:
# Get the coordinates to create the map of Downtown Chicago
address = 'Loop, Chicago'

geolocator = Nominatim(user_agent="chi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Chicago are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Chicago are 41.8755616, -87.6244212.


In [124]:
# create map of Downtown Chicago using latitude and longitude values
map_chi_downtown = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(chi_downtown_data['Latitude'], chi_downtown_data['Longitude'], chi_downtown_data['AreaName']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chi_downtown)  
    
map_chi_downtown

In [226]:
# Explore the first area
chi_downtown_data.loc[0, 'AreaName']

'Loop'

#### 2.2.2 Get the coordinates for the first area

In [234]:

neighbourhood_latitude2 = chi_downtown_data.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude2 = chi_downtown_data.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name2 = chi_downtown_data.loc[0, 'AreaName'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name2, 
                                                               neighbourhood_latitude2, 
                                                               neighbourhood_longitude2))

Latitude and longitude values of Loop are 41.882937, -87.62874000000001.


#### _**Get the top 100 venues of Loop within a 500 mile radius**_

In [237]:
# type your answer here
radius = 500
LIMIT = 100
url2 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighbourhood_latitude, neighbourhood_longitude, VERSION, radius, LIMIT)
url2

'https://api.foursquare.com/v2/venues/explore?client_id=GNYYZNSWIKYNSZDXOW2RKHRCEJD3SGBTMVQEBECBB4HTQRGX&client_secret=S0IDJ2S3AXP42WCQYQE2Y0COI32Y0TWAICQOZR32VJXL04FL&ll=41.882937,-87.62874000000001&v=20180605&radius=500&limit=100'

In [238]:
# Send the GET request
results2 = requests.get(url2).json()
results2

{'meta': {'code': 200, 'requestId': '5f2b5b7bf39d6963a4eba1e0'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'The Loop',
  'headerFullLocation': 'The Loop, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 157,
  'suggestedBounds': {'ne': {'lat': 41.8874370045, 'lng': -87.62270703861752},
   'sw': {'lat': 41.878436995499996, 'lng': -87.6347729613825}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4a7ccf53f964a520b8ed1fe3',
       'name': 'James M. Nederlander Theatre',
       'location': {'address': '24 W Randolph St',
        'crossStreet': 'btwn State St & Dearborn St',
        'lat': 41.88441581705103,
      

In [239]:
# Function to extract the category of the venues
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [240]:
chi_venues = results2['response']['groups'][0]['items']
    
nearby_venues2 = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns2 = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues2 =nearby_venues2.loc[:, filtered_columns2]

# filter the category for each row
nearby_venues2['venue.categories'] = nearby_venues2.apply(get_category_type, axis=1)

# clean columns
nearby_venues2.columns = [col.split(".")[-1] for col in nearby_venues2.columns]

nearby_venues2.head()

Unnamed: 0,name,categories,lat,lng
0,James M. Nederlander Theatre,Theater,41.884416,-87.628861
1,Hamilton The Musical,Theater,41.881049,-87.628811
2,Broadway In Chicago,Performing Arts Venue,41.88283,-87.627864
3,Pret A Manger,Sandwich Place,41.883872,-87.628652
4,The Dearborn,Gastropub,41.884415,-87.629554


In [241]:
print('{} venues were returned by Foursquare.'.format(nearby_venues2.shape[0]))

100 venues were returned by Foursquare.


#### 2.2.3 Explore neighborhoods in Downtown Chicago

In [233]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [242]:
# Create a dataframe for the Downtown Chicago Venues
chi_downtown_venues = getNearbyVenues(names=chi_downtown_data['AreaName'],
                                   latitudes=chi_downtown_data['Latitude'],
                                   longitudes=chi_downtown_data['Longitude']
                                  )

Loop
Loop
Loop, Near West Side
Loop
Loop, Near West Side
Loop
Loop, Near South Side
Loop, Near West Side, Near South Side


In [243]:
# Check size of dataframe
print(chi_downtown_venues.shape)
chi_downtown_venues.head()

(661, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Loop,41.882937,-87.62874,James M. Nederlander Theatre,41.884416,-87.628861,Theater
1,Loop,41.882937,-87.62874,Hamilton The Musical,41.881049,-87.628811,Theater
2,Loop,41.882937,-87.62874,Broadway In Chicago,41.88283,-87.627864,Performing Arts Venue
3,Loop,41.882937,-87.62874,Pret A Manger,41.883872,-87.628652,Sandwich Place
4,Loop,41.882937,-87.62874,The Dearborn,41.884415,-87.629554,Gastropub


In [244]:
# Number of venues per neighbourhood
chi_downtown_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Loop,400,400,400,400,400,400
"Loop, Near South Side",21,21,21,21,21,21
"Loop, Near West Side",179,179,179,179,179,179
"Loop, Near West Side, Near South Side",61,61,61,61,61,61


In [245]:
# Number of unique venue categories
print('There are {} uniques categories.'.format(len(chi_downtown_venues['Venue Category'].unique())))

There are 156 uniques categories.


#### 2.2.4 Analyze each neighborhood

In [246]:
# one hot encoding
chi_downtown_onehot = pd.get_dummies(chi_downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chi_downtown_onehot['Neighbourhood'] = chi_downtown_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [chi_downtown_onehot.columns[-1]] + list(chi_downtown_onehot.columns[:-1])
chi_downtown_onehot = chi_downtown_onehot[fixed_columns]

chi_downtown_onehot.head()

Unnamed: 0,Neighbourhood,American Restaurant,Amphitheater,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beer Garden,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Cafeteria,Café,Cajun / Creole Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Stadium,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Dive Bar,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,English Restaurant,Event Service,Exhibit,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hot Dog Joint,Hotel,Hotel Bar,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Movie Theater,Museum,Music Venue,New American Restaurant,Nightlife Spot,Opera House,Optical Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Parking,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Poke Place,Polish Restaurant,Portuguese Restaurant,Pub,Public Art,Record Shop,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Tiki Bar,Tour Provider,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Whisky Bar,Winery
0,Loop,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,Loop,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,Loop,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Loop,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Loop,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [247]:
chi_downtown_onehot.shape

(661, 157)

#### 2.2.5 Group each neighborhood

In [248]:
chi_downtown_grouped = chi_downtown_onehot.groupby('Neighbourhood').mean().reset_index()
chi_downtown_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Amphitheater,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beer Garden,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Cafeteria,Café,Cajun / Creole Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Stadium,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Dive Bar,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,English Restaurant,Event Service,Exhibit,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hot Dog Joint,Hotel,Hotel Bar,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Movie Theater,Museum,Music Venue,New American Restaurant,Nightlife Spot,Opera House,Optical Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Parking,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Poke Place,Polish Restaurant,Portuguese Restaurant,Pub,Public Art,Record Shop,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Tiki Bar,Tour Provider,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Whisky Bar,Winery
0,Loop,0.0325,0.0025,0.0075,0.0,0.005,0.0075,0.01,0.0,0.0025,0.0175,0.015,0.0025,0.0075,0.0025,0.0,0.0025,0.0025,0.005,0.0,0.005,0.0,0.005,0.0025,0.015,0.0,0.0,0.0,0.0025,0.0175,0.0075,0.0075,0.0,0.005,0.0075,0.0675,0.0,0.005,0.01,0.005,0.0125,0.0,0.0075,0.005,0.0075,0.0075,0.0025,0.0025,0.0025,0.0175,0.0025,0.0,0.0,0.0,0.005,0.0025,0.005,0.01,0.005,0.0,0.0075,0.0075,0.0,0.0025,0.0,0.005,0.005,0.0,0.01,0.0025,0.005,0.0,0.0,0.0,0.0025,0.005,0.0025,0.0025,0.0,0.0,0.005,0.08,0.0075,0.0025,0.005,0.0,0.0225,0.0,0.0025,0.0,0.0,0.0075,0.0025,0.01,0.0025,0.0075,0.02,0.0025,0.0025,0.005,0.0175,0.0025,0.0075,0.0,0.0,0.0025,0.005,0.0,0.0075,0.0,0.005,0.0075,0.0225,0.0175,0.0025,0.0,0.01,0.0075,0.0125,0.005,0.0,0.0075,0.0175,0.0,0.0375,0.0025,0.0025,0.0225,0.0,0.015,0.0025,0.0025,0.01,0.02,0.0,0.0075,0.0,0.0,0.0,0.0,0.0025,0.0075,0.0025,0.0025,0.005,0.0,0.04,0.0025,0.0025,0.005,0.0075,0.0025,0.01,0.0,0.0025,0.0025,0.0025
1,"Loop, Near South Side",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.047619,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Loop, Near West Side",0.01676,0.0,0.0,0.005587,0.0,0.0,0.0,0.0,0.03352,0.011173,0.027933,0.0,0.0,0.0,0.0,0.0,0.005587,0.005587,0.0,0.011173,0.005587,0.005587,0.005587,0.022346,0.005587,0.0,0.0,0.0,0.005587,0.0,0.0,0.0,0.0,0.022346,0.089385,0.0,0.0,0.0,0.01676,0.005587,0.005587,0.005587,0.0,0.0,0.005587,0.01676,0.0,0.0,0.022346,0.0,0.005587,0.0,0.0,0.0,0.0,0.01676,0.0,0.011173,0.0,0.0,0.0,0.0,0.0,0.005587,0.0,0.0,0.0,0.005587,0.0,0.0,0.005587,0.011173,0.01676,0.022346,0.011173,0.005587,0.0,0.0,0.0,0.011173,0.022346,0.0,0.011173,0.0,0.0,0.022346,0.005587,0.011173,0.005587,0.005587,0.005587,0.011173,0.027933,0.0,0.03352,0.005587,0.0,0.0,0.0,0.0,0.0,0.044693,0.005587,0.005587,0.0,0.0,0.005587,0.011173,0.0,0.0,0.0,0.005587,0.0,0.011173,0.005587,0.0,0.005587,0.0,0.0,0.0,0.022346,0.011173,0.005587,0.067039,0.005587,0.0,0.005587,0.005587,0.0,0.0,0.0,0.0,0.022346,0.0,0.0,0.022346,0.0,0.005587,0.0,0.0,0.0,0.01676,0.0,0.0,0.005587,0.011173,0.0,0.0,0.0,0.0,0.011173,0.01676,0.011173,0.0,0.0,0.0
3,"Loop, Near West Side, Near South Side",0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.04918,0.0,0.0,0.016393,0.0,0.0,0.065574,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.131148,0.0,0.032787,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.032787,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.04918,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.065574,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.032787,0.0,0.032787,0.016393,0.0,0.0,0.0,0.016393,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0


In [249]:
# Confirm new size
chi_downtown_grouped.shape

(4, 157)

#### 2.2.7 Top 5 Venues in each neighborhood

In [250]:
num_top_venues = 5

for hood in chi_downtown_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = chi_downtown_grouped[chi_downtown_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Loop----
                 venue  freq
0                Hotel  0.08
1          Coffee Shop  0.07
2              Theater  0.04
3       Sandwich Place  0.04
4  American Restaurant  0.03


----Loop, Near South Side----
                venue  freq
0                Park  0.10
1       Historic Site  0.10
2  Athletics & Sports  0.10
3    Football Stadium  0.10
4      History Museum  0.05


----Loop, Near West Side----
                     venue  freq
0              Coffee Shop  0.09
1           Sandwich Place  0.07
2  New American Restaurant  0.04
3                      Bar  0.03
4                BBQ Joint  0.03


----Loop, Near West Side, Near South Side----
              venue  freq
0  Greek Restaurant  0.13
1       Coffee Shop  0.07
2    Sandwich Place  0.07
3       Pizza Place  0.05
4              Café  0.05




In [217]:
# function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### 2.2.8 New Dataframe with Top 10 venues in each Neighborhood

In [251]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted2 = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted2['Neighbourhood'] = chi_downtown_grouped['Neighbourhood']

for ind in np.arange(chi_downtown_grouped.shape[0]):
    neighbourhoods_venues_sorted2.iloc[ind, 1:] = return_most_common_venues(chi_downtown_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted2.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Loop,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
1,"Loop, Near South Side",Historic Site,Football Stadium,Park,Athletics & Sports,Sushi Restaurant,History Museum,English Restaurant,Museum,Donut Shop,Parking
2,"Loop, Near West Side",Coffee Shop,Sandwich Place,New American Restaurant,BBQ Joint,Mexican Restaurant,Bar,Mediterranean Restaurant,Burger Joint,Italian Restaurant,Donut Shop
3,"Loop, Near West Side, Near South Side",Greek Restaurant,Sandwich Place,Coffee Shop,Pizza Place,Café,Gym,Intersection,Spa,Dance Studio,Sports Bar


#### 2.2.9 Cluster Neighborhoods

#### _**Since the number of grouped neighborhoods are 4, the number of clusters will be limited to 2**_

In [252]:
# set number of clusters
kclusters = 2

chi_downtown_grouped_clustering = chi_downtown_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chi_downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 0, 0])

In [254]:
# add clustering labels
neighbourhoods_venues_sorted2.insert(0, 'Cluster Labels', kmeans.labels_)

chi_downtown_merged = chi_downtown_data

# merge chicago_grouped with chicago_data to add latitude/longitude for each neighborhood
chi_downtown_merged = chi_downtown_merged.join(neighbourhoods_venues_sorted2.set_index('Neighbourhood'), on='AreaName')

chi_downtown_merged.head() # check the last columns!

Unnamed: 0,Zip_Code,City,State,Latitude,Longitude,AreaName,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,60602,Chicago,IL,41.882937,-87.62874,Loop,0,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
1,60601,Chicago,IL,41.886456,-87.62325,Loop,0,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
2,60606,Chicago,IL,41.882582,-87.6376,"Loop, Near West Side",0,Coffee Shop,Sandwich Place,New American Restaurant,BBQ Joint,Mexican Restaurant,Bar,Mediterranean Restaurant,Burger Joint,Italian Restaurant,Donut Shop
3,60603,Chicago,IL,41.880446,-87.63014,Loop,0,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
4,60661,Chicago,IL,41.882082,-87.64461,"Loop, Near West Side",0,Coffee Shop,Sandwich Place,New American Restaurant,BBQ Joint,Mexican Restaurant,Bar,Mediterranean Restaurant,Burger Joint,Italian Restaurant,Donut Shop


#### 2.2.10 Visualize the clusters

In [257]:
# create map
map_clusters22 = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chi_downtown_merged['Latitude'], chi_downtown_merged['Longitude'], chi_downtown_merged['AreaName'], chi_downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters22)
       
map_clusters22

### Part 3 -  Clustering Analysis and Results

#### 3.1 San Francisco cluster analysis

#### _**Cluster 1**_

In [258]:
sanf_downtown_merged.loc[sanf_downtown_merged['Cluster Labels'] == 0, sanf_downtown_merged.columns[[1] + list(range(5, sanf_downtown_merged.shape[1]))]]

Unnamed: 0,City,AreaName,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,San Francisco,Hayes Valley/Tenderloin/North of Market,0,Café,Coffee Shop,Hotel,Wine Bar,Theater,French Restaurant,Pizza Place,Optical Shop,Park,Performing Arts Venue


This cluster is characterized by Hotels, Café/Coffee shop and pizza place which are commonly find in business district. Hence, this cluster can be regarded as  _**"The Business District."**_

#### _**Cluster 2**_

In [259]:
sanf_downtown_merged.loc[sanf_downtown_merged['Cluster Labels'] == 1, sanf_downtown_merged.columns[[1] + list(range(5, sanf_downtown_merged.shape[1]))]]

Unnamed: 0,City,AreaName,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,San Francisco,South of Market,1,Nightclub,Gay Bar,Motorcycle Shop,Thai Restaurant,Cocktail Bar,Art Gallery,Restaurant,Cosmetics Shop,Lounge,Coffee Shop


Cluster 2, *unlike cluster 1*, is characterized by relaxation venues such as Nightclub, Gay Bar, Restaurant and lounge. It depicts that a lot of leisure activities occur in this area. This cluster cn be named _**"Home of Relaxation."**_

#### 3.2 Chicago cluster analysis

#### _**Cluster 1**_

In [260]:
chi_downtown_merged.loc[chi_downtown_merged['Cluster Labels'] == 0, chi_downtown_merged.columns[[1] + list(range(5, chi_downtown_merged.shape[1]))]]

Unnamed: 0,City,AreaName,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Chicago,Loop,0,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
1,Chicago,Loop,0,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
2,Chicago,"Loop, Near West Side",0,Coffee Shop,Sandwich Place,New American Restaurant,BBQ Joint,Mexican Restaurant,Bar,Mediterranean Restaurant,Burger Joint,Italian Restaurant,Donut Shop
3,Chicago,Loop,0,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
4,Chicago,"Loop, Near West Side",0,Coffee Shop,Sandwich Place,New American Restaurant,BBQ Joint,Mexican Restaurant,Bar,Mediterranean Restaurant,Burger Joint,Italian Restaurant,Donut Shop
5,Chicago,Loop,0,Hotel,Coffee Shop,Theater,Sandwich Place,American Restaurant,Italian Restaurant,Seafood Restaurant,Pizza Place,Middle Eastern Restaurant,Snack Place
7,Chicago,"Loop, Near West Side, Near South Side",0,Greek Restaurant,Sandwich Place,Coffee Shop,Pizza Place,Café,Gym,Intersection,Spa,Dance Studio,Sports Bar


Cluster 1 is characterized by Hotels, Coffee shops which are typical venues present in a business district area. This cluster can be referred to as  _**"The Business District."**_

#### _**Cluster 2**_

In [261]:
chi_downtown_merged.loc[chi_downtown_merged['Cluster Labels'] == 1, chi_downtown_merged.columns[[1] + list(range(5, chi_downtown_merged.shape[1]))]]

Unnamed: 0,City,AreaName,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Chicago,"Loop, Near South Side",1,Historic Site,Football Stadium,Park,Athletics & Sports,Sushi Restaurant,History Museum,English Restaurant,Museum,Donut Shop,Parking


Cluster 2 is characterized by sport centers like stadium, as well as Historic or tourist attraction centers. It is assumed that leisure activities happen in this area. This cluster is called _**"The Home of Sport & Tourism."**_

### Part 4 - Discussion.

According to the analyses on both cities data set, it is observed that San Francisco and Chicago downtown areas have a lot of similarities as shown in clusters 1 for both cities. Both clusters (1) are mainly characterized by Hotels, Café or Coffee shops and pizza places which are commonly found in business district area. Therefore, clusters 1 in both San Francisco, California and Chicago, Illinois downtown areas can be referred to as  _**"The Business District."**_

*Cluster 2* for San Francisco data set, *unlike cluster 1*, is characterized by relaxation venues such as Nightclub, Gay Bar, Restaurant and lounge. It shows that a lot of leisure activities and relaxations occur in this area. This cluster can be named _**"Home of Relaxation."**_ While *cluster 2* for Chicago data is characterized by sport centers like football stadium, athletics & sports venue as well as historic or tourist sites such as museum. This cluster can be called _**"The Home of Sport & Tourism."**_


### Part 5 - Summary and Conclusion.

This project made use of data set extracted from different open or public domain sources mainly internet websites. Various python libraries were utilized to fetch, clean, manipulate and visualize the data while foursquare API was used to focus on the venue details of each neighborhood of both *San Francisco, California* and *Chicago, Illinois* downtown areas. 
Machine learning algorithm was applied for segmentation and clustering analysis to gain more insights on the data.

This analysis has provided us with some good insights and preliminary information on neighborhood categorization and various activities centers for quick understanding of the cities for newcomers either for job search or to open businesses.  
The objectives of the project were met and, with additional data such as crime rates, further comparative analysis can be carried out  on both cities which can then be a useful information to newcomers or prospective immigrants who may want to compare both cities either for settling, jobs or other new opportunities.
