# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera
## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>
An investor is looking to open a new buffet in Adelaide (which serves all kinds of cusines), but she is unsure about the best location for her new venue and needs input for making the decision. Adelaide is rather busy city famous for its famous tourit attractions. The goal is to set a venue's location to maximize the profit. According to an analysis in the Small Business, the 5 factors for choosing a new restaurant location are:

* **Parking:** Ideally, a new restaurant location should have its own parking lot. If that isn’t an option—for example, in a major city—consider partnering with a hotel in the area that has its own parking options. Many famous restaurants are housed in hotels, and for a good reason. Not only is there parking, but the benefit of foot traffic that is staying right upstairs is incalculable.
* **Accessibility:** There’s a reason that major restaurant chains are often located near highway exits: It makes them accessible for customers. Certain restaurants can get away with food or service that isn't the best simply because their locations are so accessible, like restaurants near the Eiffel Tower or Collisseum. There is plenty of foot traffic in urbanized areas, and restaurants only need to attract customers from the street into their business. Most successful restaurants—other than the truly elite—are easy to find, and you will find them in city centers or unique locations throughout the world.
* **Visibility:** This goes along with accessibility and is very important for new restaurant locations. People have to know the restaurant is there, either in person or on their mobile devices. It is why property prices in downtown districts and developed strips are higher than in other areas. They offer a level of visibility that can bring in a great deal of walk-in business. Consider advertising in search engines and social media to enhance your presence across all forms of media. Make sure to register your restaurant in search engines as the type of food you offer and your price point, as it will be easier to attract the clientele you want when they go to search.
* **Population Base:** Are there enough people in the area to support your business? There need to be enough people who live in or pass through the area regularly to keep you busy. To determine a particular area's population base, you could do a site study. However, these can cost up to 25,000 dollars. Most people looking at their first restaurant don’t have enough money in their budget for a professional survey. A less expensive method to determine the population base of a certain area is to use a pie chart, as well as asking the local chamber of commerce and town office for more information. If you would rather pound the pavement, simply walk around the area where you plan to build. Intuition can place a big role in choosing your site.
* **Safety:** workplace safty is important for the restaurant owner as well as workers, should avoid crime-laden areas in the city.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>
Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* population base in each neighborhood

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Berlin center will be obtained using **Google Maps API geocoding** of well known Berlin location (Alexanderplatz)

The dataset is derived from: 
1. List of Adelaide suburbs: https://www.costlessquotes.com.au/postcode_tool/postcode_list_NSW.php
2. Crime dataset 2020: https://data.gov.au/dataset/ds-sa-860126f7-eeb5-4fbc-be44-069aa0467d11/details?q=crime

### Import the Packages

In [5]:
# Import libraries
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files
import datetime as dt # Datetime

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

#from bs4 import BeautifulSoup

# Import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          92 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0   conda-forge
    geopy:         1.21.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.21.0         | 58 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environ

### 1. Load the data 
We will scrape the data from the source page and tidy it up by using only Adelaide for this project. 
The dataset is derived from: https://en.wikipedia.org/wiki/List_of_Adelaide_suburbs

In [104]:
import pandas as pd
source = pd.read_html('https://en.wikipedia.org/wiki/List_of_Adelaide_suburbs')
df1 = source[0]
print (df1.shape)
df1.head(5)

(433, 7)


Unnamed: 0,Suburb,PostCode,LGA,YearEstab.[citation needed],Dist.[4](km)[citation needed],Area(ha)[citation needed],Population[citation needed]
0,Adelaide,5000,City of Adelaide,1837,-,1005.0,"15,115[5]"
1,North Adelaide,5006,City of Adelaide,1837,0.5,420.0,"6,950 [6]"
2,Auldana,5072,City of Burnside,1847 [7],9,312.0,625[8]
3,Beaumont,5066,City of Burnside,1870,5.9,158.0,"2,557[9]"
4,Beulah Park,5067,City of Burnside,1941[10],5,60.0,"1,602[11]"


### 2. Data Preprocessing
* Rename the column names
* Drop unneeded columns - only keep Suberb, PostCode and Population
* Drop NaN in Population - as our project aims to count population factor in
* Data cleaning on the citaition for Population column

In [105]:
df1 = df1.rename(columns = {"Postcode":"PostalCode","Suburb":"Neighborhood","Population[citation needed]":"Population"})
#Note: this does not work
#df1.drop(['LGA', 'YearEstab.[citation needed]','Dist.[4](km)[citation needed]','Area(ha)[citation needed]'], axis = 1)
df1.drop(df1.columns[[2,3,4,5]], axis = 1, inplace = True) 
df1.dropna(inplace=True)
df1.reset_index().drop(df1.columns[0],axis=1,inplace = True)
df1['Population'] = df1['Population'].str.replace(r"\[.*\]","")
df1['Population'] = df1['Population'].str.replace(",","")

In [106]:
df = df1.reset_index(drop=True)

In [107]:
df

Unnamed: 0,Neighborhood,PostCode,Population
0,Adelaide,5000,15115
1,North Adelaide,5006,6950
2,Auldana,5072,625
3,Beaumont,5066,2557
4,Beulah Park,5067,1602
5,Burnside,5066,2930
6,Dulwich,5065,1678
7,Eastwood,5063,764
8,Erindale,5066,1186
9,Frewville,5063,874


### Conclusion 1: Population Base
Adelaide (15115) and Golden Grove (9664) have the most populations. Investors should primarily consider opening their restaurants in these two suburbs.

## Methodology <a name="methodology"></a>

### 1. Adding the latitude and longtitude

In [2]:
!conda install -c conda-forge geocoder --yes
print("Installation Done!")
import geocoder # import geocoder
print("Geo Coder imported!")

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |   py36h9f0ad1d_1         149 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

    geocoder:        1.38.1-py_1       conda-forge
    python_abi:    

In [108]:
def get_geocoder(postal_code_from_df):
     # initialize your variable to None
    lat_lng_coords = None
     # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Adelaide, Australia'.format(postal_code_from_df))
        lat_lng_coords = g.latlng
        latitude = lat_lng_coords[0]
        longitude = lat_lng_coords[1]
    return latitude,longitude

In [109]:
df['latitude'], df['longitude'] = zip(*df['PostCode'].apply(get_geocoder))
df.head()

Unnamed: 0,Neighborhood,PostCode,Population,latitude,longitude
0,Adelaide,5000,15115,-34.92565,138.599907
1,North Adelaide,5006,6950,-34.909165,138.594803
2,Auldana,5072,625,-34.91054,138.697384
3,Beaumont,5066,2557,-34.943867,138.67046
4,Beulah Park,5067,1602,-34.924875,138.630063


### 2. Explore the neighborhoods in Adelaide

In [110]:
address = 'Adelaide, South Australia'

geolocator = Nominatim(user_agent="Adelaide_SA")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Adelaide, NSW are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Adelaide, NSW are -34.9281805, 138.5999312.


In [112]:
map_adelaide = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, long, label in zip(df['latitude'], df['longitude'], df['PostCode']):
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_adelaide)
    
map_adelaide

### Foursquare API

In [113]:
CLIENT_ID = '3JDU52VZBVVBWEF3C2Z1GNCA1K4OW1LUGBJCI3PV3JAYZFVV' # your Foursquare ID
CLIENT_SECRET = 'LNWAQOGNJFMKZY3DONHTR2FB5SBGBLLLDUETQN2P5WFI2U2P' # your Foursquare Secret
VERSION = '20200304' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3JDU52VZBVVBWEF3C2Z1GNCA1K4OW1LUGBJCI3PV3JAYZFVV
CLIENT_SECRET:LNWAQOGNJFMKZY3DONHTR2FB5SBGBLLLDUETQN2P5WFI2U2P


### 3. Let's create a function to explore all the suburbs in Adelaide.

In [114]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [115]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and **create a new dataframe called ade_venues**.

In [116]:
ade_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

Adelaide
North Adelaide
Auldana
Beaumont
Beulah Park
Burnside
Dulwich
Eastwood
Erindale
Frewville
Glenelg
Glen Osmond
Glenside
Glenunga
Hazelwood Park
Kensington Gardens
Kensington Park
Leabrook
Leawood Gardens
Linden Park
Magill
Mount Osmond
Rose Park
Rosslyn Park
Skye
St Georges
Stonyfell
Toorak Gardens
Tusmore
Waterfall Gully
Wattle Park
Athelstone
Campbelltown
Hectorville
Magill
Newton
Paradise
Rostrevor
Tranmere
Albert Park
Allenby Gardens
Athol Park
Beverley
Bowden
Brompton
Cheltenham
Croydon
Devon Park
Findon
Flinders Park
Fulham Gardens
Grange
Henley Beach
Henley Beach South
Kidman Park
Port Adelaide
Woodville Gardens
Salisbury
Salisbury East
Salisbury Heights
Golden Grove
Greenwith
Modbury
Salisbury Heights
Walkerville


In [117]:
ade_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adelaide,-34.925650,138.599907,Proof,-34.925730,138.598453,Wine Bar
1,Adelaide,-34.925650,138.599907,Pizza E Mozzarella Bar,-34.925683,138.600868,Pizza Place
2,Adelaide,-34.925650,138.599907,Blefari Caffe & Cucina,-34.927201,138.600413,Café
3,Adelaide,-34.925650,138.599907,Press* food & wine,-34.925819,138.598261,Australian Restaurant
4,Adelaide,-34.925650,138.599907,Pranzo,-34.925364,138.600672,Italian Restaurant
5,Adelaide,-34.925650,138.599907,Sazon Kitchen,-34.924764,138.600554,Mexican Restaurant
6,Adelaide,-34.925650,138.599907,BTS Cafe,-34.925655,138.601133,Cupcake Shop
7,Adelaide,-34.925650,138.599907,Sushi Train,-34.924376,138.599883,Sushi Restaurant
8,Adelaide,-34.925650,138.599907,Soonta,-34.923560,138.600546,Asian Restaurant
9,Adelaide,-34.925650,138.599907,Fair Espresso,-34.923546,138.600651,Coffee Shop


In [118]:
ade_venues.shape

(462, 7)

In [119]:
print('There are {} uniques categories.'.format(len(ade_venues['Venue Category'].unique())))

There are 111 uniques categories.


### 4. Analyze Each Neighborhood

In [120]:
ade_onehot = pd.get_dummies(ade_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ade_onehot['Neighborhood'] = ade_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ade_onehot.columns[-1]] + list(ade_onehot.columns[:-1])
ade_onehot = ade_onehot[fixed_columns]
ade_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Lounge,American Restaurant,Argentinian Restaurant,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,...,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Adelaide,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,Adelaide,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Adelaide,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Adelaide,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Adelaide,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Next, let's group rows by neighborhood and by taking the **mean of the frequency of occurrence of each category**

In [121]:
ade_grouped = ade_onehot.groupby('Neighborhood').mean().reset_index()
ade_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Lounge,American Restaurant,Argentinian Restaurant,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,...,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Adelaide,0.01,0.0,0.01,0.01,0.03,0.0,0.01,0.0,0.01,...,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.03
1,Albert Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
2,Allenby Gardens,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Athelstone,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Athol Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0
5,Beaumont,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Beulah Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Beverley,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bowden,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Brompton,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Let's print each neighborhood along with the top 5 most common venues

In [122]:
num_top_venues = 5

for hood in ade_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = ade_grouped[ade_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
              venue  freq
0              Café  0.11
1               Bar  0.09
2       Coffee Shop  0.08
3  Sculpture Garden  0.04
4             Hotel  0.04


----Albert Park----
                   venue  freq
0  Vietnamese Restaurant  0.25
1            IT Services  0.25
2          Grocery Store  0.25
3       Malay Restaurant  0.25
4      Afghan Restaurant  0.00


----Allenby Gardens----
                venue  freq
0  Italian Restaurant  0.25
1    Asian Restaurant  0.25
2                Café  0.25
3           Pet Store  0.25
4               Motel  0.00


----Athelstone----
               venue  freq
0        Pizza Place  0.25
1      Grocery Store  0.25
2             Bakery  0.25
3      Shopping Mall  0.25
4  Afghan Restaurant  0.00


----Athol Park----
                   venue  freq
0  Vietnamese Restaurant  0.50
1              Pet Store  0.25
2          Grocery Store  0.25
3      Afghan Restaurant  0.00
4      Mobile Phone Shop  0.00


----Beaumont----
               v

### Let's put that into a pandas dataframe
First, let's write a function to **sort the venues in descending order**.

In [123]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the **top 10 venues for each neighborhood.**

In [132]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = ade_grouped['Neighborhood']

for ind in np.arange(ade_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ade_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Café,Bar,Coffee Shop,Hotel,Sculpture Garden,Wine Bar,Japanese Restaurant,Italian Restaurant,Asian Restaurant,Vietnamese Restaurant
1,Albert Park,IT Services,Vietnamese Restaurant,Grocery Store,Malay Restaurant,Fast Food Restaurant,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store
2,Allenby Gardens,Pet Store,Italian Restaurant,Asian Restaurant,Café,Wine Bar,Farmers Market,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega
3,Athelstone,Pizza Place,Grocery Store,Bakery,Shopping Mall,Electronics Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop
4,Athol Park,Vietnamese Restaurant,Pet Store,Grocery Store,Farmers Market,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store


### 5. Clustering the Neighborhoods
Run k-means to cluster the neighborhood into **5 clusters.**

In [133]:
kclusters = 5

ade_grouped_clustering = ade_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ade_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 2, 0, 0, 0, 0], dtype=int32)

Now, let's create a new dataframe that includes the **cluster** as well as the **top 10 venues for each neighborhood.**

In [134]:
neighborhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)
neighborhoods_venues_sorted=neighborhoods_venues_sorted.set_index('Neighborhood')

df_right = pd.merge(df, neighborhoods_venues_sorted, on='Neighborhood', how='right')
df_right.head(2)

Unnamed: 0,Neighborhood,PostCode,Population,latitude,longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,5000,15115,-34.92565,138.599907,0,Café,Bar,Coffee Shop,Hotel,Sculpture Garden,Wine Bar,Japanese Restaurant,Italian Restaurant,Asian Restaurant,Vietnamese Restaurant
1,North Adelaide,5006,6950,-34.909165,138.594803,0,Pub,Burger Joint,Italian Restaurant,Thai Restaurant,Asian Restaurant,Café,Coffee Shop,Gym,Restaurant,Park


In [135]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_right['latitude'], df_right['longitude'], df_right['Neighborhood'], df_right['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [136]:
is_0 =  df_right['Cluster_Labels']==0
cluster0 = df_right[is_0]
cluster0

Unnamed: 0,Neighborhood,PostCode,Population,latitude,longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,5000,15115,-34.92565,138.599907,0,Café,Bar,Coffee Shop,Hotel,Sculpture Garden,Wine Bar,Japanese Restaurant,Italian Restaurant,Asian Restaurant,Vietnamese Restaurant
1,North Adelaide,5006,6950,-34.909165,138.594803,0,Pub,Burger Joint,Italian Restaurant,Thai Restaurant,Asian Restaurant,Café,Coffee Shop,Gym,Restaurant,Park
3,Beulah Park,5067,1602,-34.924875,138.630063,0,Liquor Store,Pub,Noodle House,Pizza Place,Baseball Field,Bus Station,Men's Store,Eastern European Restaurant,Park,Bookstore
5,Dulwich,5065,1678,-34.93856,138.638117,0,Chinese Restaurant,Café,Coffee Shop,Electronics Store,Supermarket,Clothing Store,Pizza Place,Movie Theater,Deli / Bodega,Burger Joint
6,Eastwood,5063,764,-34.952412,138.62242,0,Vietnamese Restaurant,Asian Restaurant,Café,Bakery,Flea Market,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop
8,Frewville,5063,874,-34.952412,138.62242,0,Vietnamese Restaurant,Asian Restaurant,Café,Bakery,Flea Market,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop
9,Glenelg,5045,3349,-34.977801,138.52215,0,Gym / Fitness Center,Thai Restaurant,Sushi Restaurant,Hotel,Vietnamese Restaurant,Japanese Restaurant,Mobile Phone Shop,Newsstand,Chinese Restaurant,Café
11,Glenside,5065,2422,-34.93856,138.638117,0,Chinese Restaurant,Café,Coffee Shop,Electronics Store,Supermarket,Clothing Store,Pizza Place,Movie Theater,Deli / Bodega,Burger Joint
17,Leawood Gardens,5150,2375,-34.970402,138.67647,0,Tunnel,Trail,Australian Restaurant,Wine Bar,Fast Food Restaurant,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega
18,Linden Park,5065,1910,-34.93856,138.638117,0,Chinese Restaurant,Café,Coffee Shop,Electronics Store,Supermarket,Clothing Store,Pizza Place,Movie Theater,Deli / Bodega,Burger Joint


If we use the same method on filtering other clusters, we could find: 


**Cluster0:** restaurants and similiar business stalls.    
**Cluster1:** clothing store and tons of restaurants.  
**Cluster2:** Scenic Lookout and some food stalls/Delis  
**Cluster3:** Cafe and snack places  
**Cluster4:** Beer Garden, Bus station, motel  
### Conclusion 2:  
Investors should potentially consider choosing the neighborhoods to avoid fierce competitiveness. 
Based on our observation, **Cluster2 could be the ideal place to launch a restaurant business**, as it is primarily famous by its Scenic Lookout, hence will have enough customer base (i.e.,tourists) to support the business in the long run.

In [137]:
df_right

Unnamed: 0,Neighborhood,PostCode,Population,latitude,longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,5000,15115,-34.92565,138.599907,0,Café,Bar,Coffee Shop,Hotel,Sculpture Garden,Wine Bar,Japanese Restaurant,Italian Restaurant,Asian Restaurant,Vietnamese Restaurant
1,North Adelaide,5006,6950,-34.909165,138.594803,0,Pub,Burger Joint,Italian Restaurant,Thai Restaurant,Asian Restaurant,Café,Coffee Shop,Gym,Restaurant,Park
2,Beaumont,5066,2557,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
3,Beulah Park,5067,1602,-34.924875,138.630063,0,Liquor Store,Pub,Noodle House,Pizza Place,Baseball Field,Bus Station,Men's Store,Eastern European Restaurant,Park,Bookstore
4,Burnside,5066,2930,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
5,Dulwich,5065,1678,-34.93856,138.638117,0,Chinese Restaurant,Café,Coffee Shop,Electronics Store,Supermarket,Clothing Store,Pizza Place,Movie Theater,Deli / Bodega,Burger Joint
6,Eastwood,5063,764,-34.952412,138.62242,0,Vietnamese Restaurant,Asian Restaurant,Café,Bakery,Flea Market,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop
7,Erindale,5066,1186,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
8,Frewville,5063,874,-34.952412,138.62242,0,Vietnamese Restaurant,Asian Restaurant,Café,Bakery,Flea Market,Convenience Store,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop
9,Glenelg,5045,3349,-34.977801,138.52215,0,Gym / Fitness Center,Thai Restaurant,Sushi Restaurant,Hotel,Vietnamese Restaurant,Japanese Restaurant,Mobile Phone Shop,Newsstand,Chinese Restaurant,Café


## Results: 
After a closer look into factors: population base and visibility. Using population data, we can conclude that Adelaide and Golden Grove have the most populations out of all neighborhoods (limited to the data we could access with). Investors should primarily consider opening their restaurants in these two suburbs.
Another factor, which is visibility, it's safe to conclude neighborhoods in cluster2 should be invested, as it is filled with Scenic Lookout and less competitive to opening a restaurant.  

In [138]:
is_2 =  df_right['Cluster_Labels']==2
cluster2 = df_right[is_2]
cluster2

Unnamed: 0,Neighborhood,PostCode,Population,latitude,longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Beaumont,5066,2557,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
4,Burnside,5066,2930,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
7,Erindale,5066,1186,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
13,Hazelwood Park,5066,1874,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
22,Stonyfell,5066,1326,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
25,Waterfall Gully,5066,2522,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop
26,Wattle Park,5066,1830,-34.943867,138.67046,2,Scenic Lookout,Department Store,Wine Bar,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop


## Discussion: 
This project aims to find the best location for the clients who want to open a restaurant in Adelaide, capital of South Australia. As there are many other factors that we mentioned above could affect the business performance for restaurants, in future studies, we could dive into other factors and give a more comprehensive look into our decisions, such as Safety and Affordability.

We may also conduct a competitor analysis, so we can get a grasp on what our peers doing and how well they are.  

Parking: This is a rather important factor that is not addressed in our analysis. While we have public parking space data, there are many private parkings in San Francisco and we don't have the data. The next step is to obtain a distribution of these parking spaces. Or one could extract the parking information in realtime by using the ParkWhiz API calls.

## Conclusion: 
In this capstone project, we address the business problem of finding a good location in Adelaide for opening a new restaurant. We identify the most important factors that could impact the choice. Using the list of adelaide suburb datasets, we are able to extract the important information on population base. 
By employing Foursquare API recommendation, we could cluster the neighborhoods into 5 clusters and we have found cluster 2 is the most invest-worthy cluster, within it we have: Beaumont, Burnside, Stonyfell, Wattle Park, Erindale, Waterfall Gully, and Hazelwood Park.	
In future studies, we could adopt machine learning alogorithm to help our clients make prediction on cost-revenue analysis. 