# Exploring the neighborhoods of Auckland and choosing a suitable residential area

## 1. Introduction


Although Wellington is the capital city of New Zealand, Auckland is the largest urban area in New Zealand, with an urban population of around 1.65 million(2017) which accounted for 0.35% of the national population. Besides, Auckland is New Zealand's economic, cultural, shipping and tourism hub. The problem is that if you are not familiar with a city, it is difficult to decide which area to live in. And it is not an easy thing to make a decision as the final choice is always the balance of all factors. I would like to solve this problem by analyzing the data I could obtain. I will use Foursquare location data to explore Auckland to get a better understanding of the city and then choose a suitable community for me personally. According to the previous statement, the audience of this report would be people who want to decide the residence area in a more data-driven way and who have a curiosity about exploring their city.

## 2. Data

- List of Suburbs in Auckland City 
    - Using Beautifulsoup package to scrape the Wikipedia page which contains list of suburbs of Auckland [^1]
    - Using Geopy library to obtain the latitude and longitude of Auckland neighborhoods
    - Using Foursquare location data[^2] to obtain the categories of various venues exist in different neighborhoods
- 2013 Census data[^3]
    - median rent paid
    - median personal income
    
[^1]: https://en.wikipedia.org/wiki/List_of_suburbs_of_Auckland
[^2]: https://foursquare.com/
[^3]: http://archive.stats.govt.nz/Census/2013-census/data-tables/meshblock-dataset.aspx#csv


Obviously, when choosing a place of residence, people have different standards. The same conditions create different levels of attraction for different people. From a macro perspective, Foursquare location data will be used to segment and cluster communities to gain a basic insight into these community characteristics. For my personal perspective, I will pay more attention to the atmosphere of the neighborhood. Specifically, I hope that there are coffee shops in the place where I will live, so I can relax with a cup of delicious coffee. In addition, if this area would close to Chinese restaurants, this area will be more attractive to me. Therefore, Foursquare location data will be used in order to get the number of coffee shop and Chinese restaurants in each neighborhood. Although I know that the Auckland CBD is perfectly meet these standards, I prefer to live in a place where it is not so crowded. So Auckland CBD is out of my consideration. Besides, the rent price would also be considered because of taking into account cost constraints. Lastly, I would put the personal median income of residents as a consideration factor due to preferring the good facilities and well-educated neighbor. For these two considerations, I will use the 2013 Census data from Stats NZ, which has record median rent paid and median personal income for different communities.

## 3. Data Gathering & Cleaning

### Import libraries

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


import requests # library to handle requests
import urllib
import re
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib
import matplotlib.cm as cm
import matplotlib.colors as colors


# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Gather Data - Scrape from Wikipedia

Use requests to get the url.

In [2]:
#specify the url
url = "https://en.wikipedia.org/wiki/List_of_suburbs_of_Auckland"

#Query the website and return the html to the variable 'page'
page = requests.get(url)

Use BeautifulSoup to get the html

In [3]:
html = page.text
soup = BeautifulSoup(html, 'html.parser')

Let's see the html

In [4]:
soup.prettify() 

'<!DOCTYPE html>\n<html class="client-nojs" dir="ltr" lang="en">\n <head>\n  <meta charset="utf-8"/>\n  <title>\n   List of suburbs of Auckland - Wikipedia\n  </title>\n  <script>\n   document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );\n  </script>\n  <script>\n   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_suburbs_of_Auckland","wgTitle":"List of suburbs of Auckland","wgCurRevisionId":889958428,"wgRevisionId":889958428,"wgArticleId":6326964,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles lacking sources from April 2017","All articles lacking sources","Use dmy dates from August 2015","Use New Zealand English from August 2015","All Wikipedia articles written in New Zealand English","Suburbs of Auckland"],"wgBreakFrames":

use .find method to narrow down our range

In [5]:
first_div = soup.find("div","div-col columns column-width")
print(first_div)

<div class="div-col columns column-width" style="-moz-column-width: 18em; -webkit-column-width: 18em; column-width: 18em;">
<ul><li><a href="/wiki/Arch_Hill,_New_Zealand" title="Arch Hill, New Zealand">Arch Hill</a></li>
<li><a href="/wiki/Auckland_CBD" title="Auckland CBD">Auckland CBD</a></li>
<li><a href="/wiki/Avondale,_Auckland" title="Avondale, Auckland">Avondale</a></li>
<li><a href="/wiki/Balmoral,_New_Zealand" title="Balmoral, New Zealand">Balmoral</a></li>
<li><a class="mw-redirect" href="/wiki/Blockhouse_Bay,_New_Zealand" title="Blockhouse Bay, New Zealand">Blockhouse Bay</a></li>
<li><a href="/wiki/Eden_Terrace" title="Eden Terrace">Eden Terrace</a></li>
<li><a href="/wiki/Eden_Valley,_New_Zealand" title="Eden Valley, New Zealand">Eden Valley</a></li>
<li><a href="/wiki/Ellerslie,_New_Zealand" title="Ellerslie, New Zealand">Ellerslie</a></li>
<li><a href="/wiki/Epsom,_New_Zealand" title="Epsom, New Zealand">Epsom</a></li>
<li><a href="/wiki/Freemans_Bay" title="Freemans Bay

After observing, we know that the lists we want are contained in the first div, so we can use .findAll method and for loop to get all the lists in Auckland City

In [6]:
a=first_div.findAll("li")
sub_list= []
for i in a:
    lis = i("a")
#print(str(lis[0].text))
    sub_list.append(lis[0].text)
print(sub_list)

['Arch Hill', 'Auckland CBD', 'Avondale', 'Balmoral', 'Blockhouse Bay', 'Eden Terrace', 'Eden Valley', 'Ellerslie', 'Epsom', 'Freemans Bay', 'Glendowie', 'Glen Innes', 'Grafton', 'Greenlane', 'Greenwoods Corner', 'Grey Lynn', 'Herne Bay', 'Hillsborough', 'Kingsland', 'Kohimarama', 'Lynfield', 'Meadowbank', 'Mission Bay', 'Morningside', 'Mount Albert', 'Mount Eden', 'Mount Roskill', 'Mount Wellington', 'Newmarket', 'Newton', 'New Windsor', 'Onehunga', 'One Tree Hill', 'Orakei', 'Oranga', 'Otahuhu', 'Owairaka', 'Panmure', 'Parnell', 'Penrose', 'Point England', 'Point Chevalier', 'Ponsonby', 'Remuera', 'Royal Oak', 'Saint Heliers', 'Saint Johns', 'Saint Marys Bay', 'Sandringham', 'Stonefields', 'Tamaki', 'Te Papapa', 'Three Kings', 'Waikowhai', 'Wai o Taiki Bay', 'Waterview', 'Western Springs', 'Westfield', 'Westmere']


Successfully getting all suburbs in Auckland City.

In [7]:
#check how many suburbs in the list
print ('There is {} suburbs in Auckland City.'.format(len(sub_list)))

There is 59 suburbs in Auckland City.


### Load Data - Median personal income

Here, we load the median personal income data previously cleaned from the 2013 census.

In [8]:
#read csv file
df_income = pd.read_csv('2013_Median_personal_income.csv')

In [9]:
#check info
df_income.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 437 entries, 0 to 436
Data columns (total 3 columns):
Area_unit_code            437 non-null int64
Area_unit_description     437 non-null object
Median_personal_income    437 non-null object
dtypes: int64(1), object(2)
memory usage: 10.3+ KB


In [10]:
#check dataframe
df_income.head()

Unnamed: 0,Area_unit_code,Area_unit_description,Median_personal_income
0,505300,Wellsford,20700
1,505400,Leigh,25300
2,505500,Warkworth,26300
3,505601,Waimauku,39500
4,505602,Huapai,33900


### Clean Data - Median personal income

#### Filter the df_income suburbs to know which suburbs already have the same name as the sub_list.

In [11]:
# use .isin() to find out those suburbs have the same name in sub_list
area = list(df_income.loc[df_income['Area_unit_description'].isin(sub_list)]['Area_unit_description'])

In [12]:
print (area)
print('')
print ('There are {} suburbs in the df_income are the same name as the sub_list.'.format(len(area)))

['Freemans Bay', 'Newton', 'New Windsor', 'Blockhouse Bay', 'Waterview', 'Westmere', 'Herne Bay', 'Arch Hill', 'Eden Terrace', 'Mission Bay', 'Glendowie', 'Point England', 'Stonefields', 'Newmarket', 'Kingsland', 'Balmoral', 'Three Kings', 'Royal Oak', 'Penrose', 'Oranga', 'Te Papapa', 'Tamaki']

There are 22 suburbs in the df_income are the same name as the sub_list.


#### Create area_df dataframe to store reuslt

In [13]:
#create need_df dataframe to store result
area_df=pd.DataFrame()

for item in area:
    result = df_income.loc[df_income['Area_unit_description'].str.contains(item)]
    area_df = area_df.append(result,ignore_index=True)

        
        
        
area_df

Unnamed: 0,Area_unit_code,Area_unit_description,Median_personal_income
0,514000,Freemans Bay,49100
1,514200,Newton,32500
2,514500,New Windsor,22600
3,514700,Blockhouse Bay,25000
4,514900,Waterview,27600
5,515100,Westmere,49500
6,515201,Herne Bay,57500
7,515500,Arch Hill,44900
8,515600,Eden Terrace,37700
9,516500,Mission Bay,45500


In [14]:
area_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 3 columns):
Area_unit_code            24 non-null int64
Area_unit_description     24 non-null object
Median_personal_income    24 non-null object
dtypes: int64(1), object(2)
memory usage: 656.0+ bytes


Next, we need to find out the neighborhoods without corresponding names.

In [15]:
needed_area = [x for x in sub_list if x not in area]

print (needed_area)
print('')
print ('There are {} suburbs in the df_income are without corresponding names in the sub_list.'.format(len(needed_area)))

['Auckland CBD', 'Avondale', 'Eden Valley', 'Ellerslie', 'Epsom', 'Glen Innes', 'Grafton', 'Greenlane', 'Greenwoods Corner', 'Grey Lynn', 'Hillsborough', 'Kohimarama', 'Lynfield', 'Meadowbank', 'Morningside', 'Mount Albert', 'Mount Eden', 'Mount Roskill', 'Mount Wellington', 'Onehunga', 'One Tree Hill', 'Orakei', 'Otahuhu', 'Owairaka', 'Panmure', 'Parnell', 'Point Chevalier', 'Ponsonby', 'Remuera', 'Saint Heliers', 'Saint Johns', 'Saint Marys Bay', 'Sandringham', 'Waikowhai', 'Wai o Taiki Bay', 'Western Springs', 'Westfield']

There are 37 suburbs in the df_income are without corresponding names in the sub_list.


### Let's use `.str.contains()` method to find corresponding names

#### Create needed_df dataframe to store result

In [16]:
#create need_df dataframe to store result
needed_df=pd.DataFrame()

for item in needed_area:
    result = df_income.loc[df_income['Area_unit_description'].str.contains(item)]
    needed_df = needed_df.append(result,ignore_index=True)

        
        
        
needed_df

Unnamed: 0,Area_unit_code,Area_unit_description,Median_personal_income
0,514600,Avondale South,24200
1,514802,Avondale West,19500
2,520201,Ellerslie North,41000
3,520202,Ellerslie South,42000
4,515700,Epsom North,27200
5,515801,Epsom Central,23300
6,515802,Epsom South,32300
7,516900,Glen Innes North,35300
8,517001,Glen Innes West,19700
9,517002,Glen Innes East,19300


As expected, there are the same neighborhood's names, just scattered into multiple names to display separately. By observing, we know that the reason it did not match before is that the neighborhoods of df_income are more detailed. I decide to keep these subdivided neighborhoods as those detail data may improve our accuracy in terms of understanding those neighborhoods.

Using `.str.contains()`, we found 50 detailed communities belonging to 22 communities in the sub_list. There are still 15 unidentified communities. We will try to find this data.

#### Prepare data for finding unidentified neighborhoods

In [17]:
#convert ndarray to list
income_list = df_income['Area_unit_description'].values.tolist()

#replace " " to "_" to distinguish each neighborhood
New_income_list = []
for i in income_list:
      New_income_list.append(i.replace(" ","_"))


print (New_income_list)
#convert list to string
income_str = " ".join(New_income_list)

['Wellsford', 'Leigh', 'Warkworth', 'Waimauku', 'Huapai', 'Riverhead_Urban', 'Kumeu_East', 'Kumeu_West', 'Waipareira_West', 'Waiwera', 'Hatfields_Beach', 'Orewa', 'Silverdale_Central', 'Red_Beach_West', 'Red_Beach_East', 'Manly', 'Army_Bay', 'Vipond', 'Stanmore_Bay_West', 'Stanmore_Bay_East', 'Wade_Heads', 'Gulf_Harbour', 'Gulf_Harbour_Marina', 'Weiti_River', 'Stillwater', 'Silverdale_South', 'Silverdale_North', 'Orewa_West', 'Dairy_Flat-Redvale', 'Paremoremo_West', 'Tauhoa-Puhoi', 'Tahekeroa', 'Matheson_Bay', 'Kawau', 'Snells_Beach', 'Algies_Bay', 'Mahurangi', 'South_Head', 'Parakai_Rural', 'Parakai_Urban', 'Kaukapakapa_Rural', 'Kaukapakapa', 'Helensville_South', 'Rewiti', 'Riverhead', 'Muriwai_Beach', 'Muriwai_Valley', 'Waitakere_West', 'Point_Wells', 'Omaha', 'Matakana', 'Cape_Rodney', 'Cape_Rodney_South', 'Helensville', 'Awaruku', 'Glamorgan', 'Torbay', 'Waiake', 'Browns_Bay', 'Oaktree', 'Rothesay_Bay', 'Murrays_Bay', 'Mairangi_Bay', 'Campbells_Bay', 'Castor_Bay', 'Crown_Hill', 'La

In [18]:
# create find_list which has 15 unidentified nighborhoods
find_list=[]
find_list= ['Auckland CBD','Eden Valley','Greenlane','Greenwoods Corner','Morningside','Mount Albert','Mount Eden','Mount Roskill','Mount Wellington','Remuera','Saint Heliers','Saint Johns','Saint Marys Bay','Wai o Taiki Bay','Westfield']

### Use Regression Expression to filter corresponding name in income_str

##### Let's create a list to keep our finding

In [19]:
find_list_succuess = []

##### 1. Auckland CBD

In [20]:
pattern = r'.{10}Auckland_.{15}'
string = income_str
re.findall(pattern, string)

['emans_Bay Auckland_Harbourside Auc',
 'tral_West Auckland_Central_East Ne',
 'y Oceanic-Auckland_Region_East Tid',
 's Oceanic-Auckland_Region_West Tid']

In [21]:
# add eligible elements to find_list_succuess
find_list_succuess.append('Auckland_Harbourside')
find_list_succuess.append('Auckland_Central_East')

##### 2. Eden Valley

In [22]:
pattern = r'.{10}_Eden_.{5}\b'
string = income_str
re.findall(pattern, string)

['dglen Glen_Eden_East ',
 'ka_East Mt_Eden_North',
 'almoral Mt_Eden_East ',
 'ngawhau Mt_Eden_South']

In [23]:
pattern = r'.{10}_Valley.?\b'
string = income_str
re.findall(pattern, string)

['ch Muriwai_Valley ']

##### There is no 'Eden Valley' in income_str, But we found that 'Mount' is abbreviated as 'Mt' and there are 3 neighborhoods relate to 'Mt_Eden'.

In [24]:
# add eligible elements to find_list_succuess
find_list_succuess.append('Mt_Eden_North')
find_list_succuess.append('Mt_Eden_East')
find_list_succuess.append('Mt_Eden_South')
print (find_list_succuess)

['Auckland_Harbourside', 'Auckland_Central_East', 'Mt_Eden_North', 'Mt_Eden_East', 'Mt_Eden_South']


##### 3. Greenlane

In [25]:
pattern = r'Greenlane'
string = income_str
re.findall(pattern, string)

[]

No matching result

##### 4. Greenwoods

In [26]:
pattern = r'Greenwoods'
string = income_str
re.findall(pattern, string)

[]

No matching result

##### 5. Morningside

In [27]:
pattern = r'Morningside'
string = income_str
re.findall(pattern, string)

[]

No matching result

##### 6. Mt Albert

In [28]:
pattern = r'.{5}_Albert'
string = income_str
re.findall(pattern, string)

['st Mt_Albert']

In [29]:
find_list_succuess.append('Mt_Albert')
print (find_list_succuess)

['Auckland_Harbourside', 'Auckland_Central_East', 'Mt_Eden_North', 'Mt_Eden_East', 'Mt_Eden_South', 'Mt_Albert']


##### 7. Mt Roskill

In [30]:
pattern = r'Roskill'
string = income_str
re.findall(pattern, string)

[]

No matching result

##### 8. Mt Wellington

In [31]:
pattern = r'.{5}_Wellington_.{10}'
string = income_str
re.findall(pattern, string)

['th Mt_Wellington_Domain Mt_',
 'st Mt_Wellington_North Fern',
 'in Mt_Wellington_South Tama']

In [32]:
find_list_succuess.append('Mt_Wellington_Domain')
find_list_succuess.append('Mt_Wellington_North')
find_list_succuess.append('Mt_Wellington_South')
print (find_list_succuess)

['Auckland_Harbourside', 'Auckland_Central_East', 'Mt_Eden_North', 'Mt_Eden_East', 'Mt_Eden_South', 'Mt_Albert', 'Mt_Wellington_Domain', 'Mt_Wellington_North', 'Mt_Wellington_South']


##### 9. Remuera

In [33]:
pattern = r'Remuera.{9}'
string = income_str
re.findall(pattern, string)

['Remuera_South Ab', 'Remuera_West Wai']

In [34]:
find_list_succuess.append('Remuera_South')
find_list_succuess.append('Remuera_West')
print (find_list_succuess)

['Auckland_Harbourside', 'Auckland_Central_East', 'Mt_Eden_North', 'Mt_Eden_East', 'Mt_Eden_South', 'Mt_Albert', 'Mt_Wellington_Domain', 'Mt_Wellington_North', 'Mt_Wellington_South', 'Remuera_South', 'Remuera_West']


##### 10. Saint Johns 
##### 11.St_Marys
##### 12.St_Heliers

Let's try abbreviation

In [35]:
pattern = r'St_.{10}'
string = income_str
re.findall(pattern, string)

['St_Marys Pons',
 'St_Lukes_Nort',
 'St_Heliers Gl',
 'St_Johns Ston',
 'St_Lukes Sand',
 'St_John One_T']

Successfully find results.

In [36]:
find_list_succuess.append('St_Marys')
find_list_succuess.append('St_John')
find_list_succuess.append('St_Heliers')
print (find_list_succuess)

['Auckland_Harbourside', 'Auckland_Central_East', 'Mt_Eden_North', 'Mt_Eden_East', 'Mt_Eden_South', 'Mt_Albert', 'Mt_Wellington_Domain', 'Mt_Wellington_North', 'Mt_Wellington_South', 'Remuera_South', 'Remuera_West', 'St_Marys', 'St_John', 'St_Heliers']


##### 13. Wai o Taiki Bay

In [37]:
pattern = r'Taiki'
string = income_str
re.findall(pattern, string)

[]

No matching result

##### 14. Westfield

In [38]:
pattern = r'\Destfied'
string = income_str
re.findall(pattern, string)

[]

No matching result

In [39]:
print (find_list_succuess)
print (len(find_list_succuess))

['Auckland_Harbourside', 'Auckland_Central_East', 'Mt_Eden_North', 'Mt_Eden_East', 'Mt_Eden_South', 'Mt_Albert', 'Mt_Wellington_Domain', 'Mt_Wellington_North', 'Mt_Wellington_South', 'Remuera_South', 'Remuera_West', 'St_Marys', 'St_John', 'St_Heliers']
14


We successfully find 16 neighborhoods.

### Use find_list_succuess to get the median income

Before we use the previous method to find the median income, we have to replace the "_" with " " to fit the original string.

In [40]:
#replace "_" to " " 
New_find_list = []
for i in find_list_succuess:
      New_find_list.append(i.replace("_"," "))


print (New_find_list)

['Auckland Harbourside', 'Auckland Central East', 'Mt Eden North', 'Mt Eden East', 'Mt Eden South', 'Mt Albert', 'Mt Wellington Domain', 'Mt Wellington North', 'Mt Wellington South', 'Remuera South', 'Remuera West', 'St Marys', 'St John', 'St Heliers']


In [41]:
#create need_df dataframe to store result
find_df=pd.DataFrame()

for item in New_find_list:
    result = df_income.loc[df_income['Area_unit_description'].str.contains(item)]
    find_df = find_df.append(result,ignore_index=True)

        
        
        
find_df

Unnamed: 0,Area_unit_code,Area_unit_description,Median_personal_income
0,514101,Auckland Harbourside,40400
1,514103,Auckland Central East,16200
2,518101,Mt Eden North,33700
3,518202,Mt Eden East,38100
4,518302,Mt Eden South,38700
5,517800,Mt Albert Central,33900
6,520301,Mt Wellington Domain,31000
7,520303,Mt Wellington North,25800
8,520500,Mt Wellington South,24700
9,516002,Remuera South,36500


### Combine 3 dataframe

In [42]:
frames = [find_df, needed_df, area_df]

#use pd.concat to combine datasets
income_com = pd.concat(frames,axis=0,sort=False,ignore_index=True)


#remove unnecessary columns
income_com.drop(columns=['Area_unit_code '],inplace=True)


income_com.head()

Unnamed: 0,Area_unit_description,Median_personal_income
0,Auckland Harbourside,40400
1,Auckland Central East,16200
2,Mt Eden North,33700
3,Mt Eden East,38100
4,Mt Eden South,38700


In [43]:
income_com.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 2 columns):
Area_unit_description     89 non-null object
Median_personal_income    89 non-null object
dtypes: object(2)
memory usage: 1.5+ KB


### Load Data - Median rent paid

In [44]:
df_rent = pd.read_csv('2013_Median_weekly_rent_paid.csv')

In [45]:
df_rent.head()

Unnamed: 0,Area_unit_description,Median_weekly_rent_paid
0,Wellsford,280
1,Leigh,300
2,Warkworth,350
3,Waimauku,330
4,Huapai,380


In [46]:
df_rent.shape

(437, 2)

### Clean Data - Median rent paid

#### Filter the df_rent suburbs using updated New_sub_list

Append New_find_list to sub_list

In [47]:
New_sub_list = sub_list+ New_find_list
len (New_sub_list)

73

In [48]:
print (New_sub_list)

['Arch Hill', 'Auckland CBD', 'Avondale', 'Balmoral', 'Blockhouse Bay', 'Eden Terrace', 'Eden Valley', 'Ellerslie', 'Epsom', 'Freemans Bay', 'Glendowie', 'Glen Innes', 'Grafton', 'Greenlane', 'Greenwoods Corner', 'Grey Lynn', 'Herne Bay', 'Hillsborough', 'Kingsland', 'Kohimarama', 'Lynfield', 'Meadowbank', 'Mission Bay', 'Morningside', 'Mount Albert', 'Mount Eden', 'Mount Roskill', 'Mount Wellington', 'Newmarket', 'Newton', 'New Windsor', 'Onehunga', 'One Tree Hill', 'Orakei', 'Oranga', 'Otahuhu', 'Owairaka', 'Panmure', 'Parnell', 'Penrose', 'Point England', 'Point Chevalier', 'Ponsonby', 'Remuera', 'Royal Oak', 'Saint Heliers', 'Saint Johns', 'Saint Marys Bay', 'Sandringham', 'Stonefields', 'Tamaki', 'Te Papapa', 'Three Kings', 'Waikowhai', 'Wai o Taiki Bay', 'Waterview', 'Western Springs', 'Westfield', 'Westmere', 'Auckland Harbourside', 'Auckland Central East', 'Mt Eden North', 'Mt Eden East', 'Mt Eden South', 'Mt Albert', 'Mt Wellington Domain', 'Mt Wellington North', 'Mt Wellingto

#### Create area_df dataframe to store reuslt

In [49]:
#create need_df dataframe to store result
df_rent_sub =pd.DataFrame()

#filter neighobrhoods in New_sub_list
for item in New_sub_list:
    result = df_rent.loc[df_rent['Area_unit_description'].str.contains(item)]
    df_rent_sub = df_rent_sub.append(result,ignore_index=True)
    
df_rent_sub.head()

Unnamed: 0,Area_unit_description,Median_weekly_rent_paid
0,Arch Hill,500
1,Avondale South,350
2,Avondale West,320
3,Balmoral,350
4,Blockhouse Bay,350


In [50]:
df_rent_sub.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 2 columns):
Area_unit_description       89 non-null object
Median_weekly_rent_paid     89 non-null object
dtypes: object(2)
memory usage: 1.5+ KB


In [51]:
df_rent_sub[df_rent_sub.duplicated()]

Unnamed: 0,Area_unit_description,Median_weekly_rent_paid
83,Remuera South,460
84,Remuera West,550


In [52]:
income_com.head()

Unnamed: 0,Area_unit_description,Median_personal_income
0,Auckland Harbourside,40400
1,Auckland Central East,16200
2,Mt Eden North,33700
3,Mt Eden East,38100
4,Mt Eden South,38700


In [53]:
income_com.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 2 columns):
Area_unit_description     89 non-null object
Median_personal_income    89 non-null object
dtypes: object(2)
memory usage: 1.5+ KB


In [54]:
income_com[income_com.duplicated()]

Unnamed: 0,Area_unit_description,Median_personal_income
58,Remuera South,36500
59,Remuera West,45300


#### Merge the income_com and df_rent_sub

In [55]:
left=income_com
right=df_rent_sub
income_rent = pd.merge(left,right,on='Area_unit_description')
income_rent.head()

Unnamed: 0,Area_unit_description,Median_personal_income,Median_weekly_rent_paid
0,Auckland Harbourside,40400,500
1,Auckland Central East,16200,350
2,Mt Eden North,33700,340
3,Mt Eden East,38100,360
4,Mt Eden South,38700,380


In [56]:
income_rent.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 93 entries, 0 to 92
Data columns (total 3 columns):
Area_unit_description       93 non-null object
Median_personal_income      93 non-null object
Median_weekly_rent_paid     93 non-null object
dtypes: object(3)
memory usage: 2.9+ KB


In [57]:
income_rent[income_rent.duplicated()]

Unnamed: 0,Area_unit_description,Median_personal_income,Median_weekly_rent_paid
10,Remuera South,36500,460
11,Remuera South,36500,460
12,Remuera South,36500,460
14,Remuera West,45300,550
15,Remuera West,45300,550
16,Remuera West,45300,550


In [58]:
income_rent = income_rent.drop_duplicates()

In [59]:
income_rent.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 87 entries, 0 to 92
Data columns (total 3 columns):
Area_unit_description       87 non-null object
Median_personal_income      87 non-null object
Median_weekly_rent_paid     87 non-null object
dtypes: object(3)
memory usage: 2.7+ KB


#### Use geopy library to get the latitude and longitude values of Auckland City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ak_explorer</em>, as shown below.

In [60]:
address = 'Auckland City, New Zealand'
geolocator = Nominatim(user_agent='ak_exploer')

location = geolocator.geocode(address,addressdetails=True)
latitude = location.latitude
longitude = location.longitude
print ('The geograpical coordinate of Auckland are {},{}.'.format(latitude, longitude))

The geograpical coordinate of Auckland are -36.8534665,174.7655514.


#### Use geopy library to get the latitude and longitude values of Neighborhhods in Auckland City.

##### Create a suburb list using income_rent dataframe

In [61]:
income_rent_sub = income_rent['Area_unit_description'].values.tolist()

In [62]:
len(income_rent_sub)

87

In [63]:
column_names = ['Suburb', 'Latitude', 'Longitude'] 
geo = pd.DataFrame(columns=column_names)
location = []
error_list = []
for s in income_rent_sub:
    try:
        address = '{}, Auckland, New Zealand'.format(s)    
        geolocator = Nominatim(user_agent='ak_exploer')    
        location = geolocator.geocode(address,addressdetails=True) 
        
        
        lat = location.latitude
        lon = location.longitude
        geo = geo.append({'Suburb':s,'Latitude':location.latitude, 'Longitude':location.longitude}, ignore_index=True)
    
        print ('The geograpical coordinate of {0}, Auckland are {1},{2}.'.format(s,lat, lon))
        
    # catch exceptions in error_list
    except Exception as e:
        
        error_list.append(s)
        #print error tweet id
        print(s)
print('error:{}'.format(error_list))

Auckland Harbourside
The geograpical coordinate of Auckland Central East, Auckland are -36.848911,174.7652256.
The geograpical coordinate of Mt Eden North, Auckland are -36.8771724,174.7642863.
The geograpical coordinate of Mt Eden East, Auckland are -36.87620535,174.764606161694.
The geograpical coordinate of Mt Eden South, Auckland are -36.87620535,174.764606161694.
The geograpical coordinate of Mt Albert Central, Auckland are -36.8913732,174.7201737.
The geograpical coordinate of Mt Wellington Domain, Auckland are -36.8925891,174.846566650119.
The geograpical coordinate of Mt Wellington North, Auckland are -36.8920447,174.8465941.
The geograpical coordinate of Mt Wellington South, Auckland are -36.8920447,174.8465941.
The geograpical coordinate of Remuera South, Auckland are -36.8759344,174.8014178.
The geograpical coordinate of Remuera West, Auckland are -36.8759344,174.8014178.
The geograpical coordinate of St Marys, Auckland are -36.8534665,174.7655514.
The geograpical coordinate

In [64]:
geo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86 entries, 0 to 85
Data columns (total 3 columns):
Suburb       86 non-null object
Latitude     86 non-null float64
Longitude    86 non-null float64
dtypes: float64(2), object(1)
memory usage: 2.1+ KB


For the error 'Auckland Harbourside' we use 'Auckland Harbour' instead

In [65]:
location=[]
address = 'Auckland Harbour, Auckland'
geolocator = Nominatim(user_agent='ak_exploer')

location = geolocator.geocode(address,addressdetails=True)
latitude = location.latitude
longitude = location.longitude

geo = geo.append({'Suburb':'Auckland Harbour','Latitude':location.latitude, 'Longitude':location.longitude}, ignore_index=True)
print ('The geograpical coordinate of Auckland Harbour are {},{}.'.format(latitude, longitude))

The geograpical coordinate of Auckland Harbour are -36.8312522,174.7452021.


In [66]:
#chekc dataframe
geo.tail()

Unnamed: 0,Suburb,Latitude,Longitude
82,Te Papapa,-36.919669,174.79824
83,Tamaki,-36.8928,174.860764
84,Tidal-Tamaki,-36.974624,174.816122
85,Tamaki Strait,-36.861791,175.079986
86,Auckland Harbour,-36.831252,174.745202


#### Create a map of Auckland City with neighborhoods superimposed on top.

In [67]:
# create map of Manhattan using latitude and longitude values
map_auckland = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(geo['Latitude'], geo['Longitude'], geo['Suburb']):
    label = folium.Popup(label, parse_html=True,max_width=300)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_auckland)  
    
map_auckland

By observing the map above, I decide to remove location which are too far away from Auckland City. Namely,Tamaki Strait, Otahuhu West, Tidal-Tamaki.

In [68]:
geo.loc[geo['Suburb']=='Tamaki Strait']

Unnamed: 0,Suburb,Latitude,Longitude
85,Tamaki Strait,-36.861791,175.079986


In [69]:
geo.loc[geo['Suburb']=='Otahuhu West']

Unnamed: 0,Suburb,Latitude,Longitude
46,Otahuhu West,-36.943722,174.843724


In [70]:
geo.loc[geo['Suburb']=='Tidal-Tamaki']

Unnamed: 0,Suburb,Latitude,Longitude
84,Tidal-Tamaki,-36.974624,174.816122


#### Remove those rows

In [71]:
#drop rows by index
geo.drop(index=[85,46,84],axis=0,inplace=True)

In [72]:
#sort suburb
geo.sort_values(by='Suburb',inplace=True)
geo = geo.reset_index(drop=True)
geo.head()

Unnamed: 0,Suburb,Latitude,Longitude
0,Arch Hill,-36.866092,174.745972
1,Auckland Central East,-36.848911,174.765226
2,Auckland Harbour,-36.831252,174.745202
3,Avondale South,-36.893058,174.692814
4,Avondale West,-36.893058,174.692814


#### Sort income_rent before merging

In [73]:
income_rent.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 87 entries, 0 to 92
Data columns (total 3 columns):
Area_unit_description       87 non-null object
Median_personal_income      87 non-null object
Median_weekly_rent_paid     87 non-null object
dtypes: object(3)
memory usage: 2.7+ KB


In [74]:
income_rent.columns

Index(['Area_unit_description', 'Median_personal_income',
       'Median_weekly_rent_paid '],
      dtype='object')

In [75]:
income_rent.sort_values(by='Area_unit_description',inplace=True)
income_rent = income_rent.reset_index(drop=True)
income_rent.head()

Unnamed: 0,Area_unit_description,Median_personal_income,Median_weekly_rent_paid
0,Arch Hill,44900,500
1,Auckland Central East,16200,350
2,Auckland Harbourside,40400,500
3,Avondale South,24200,350
4,Avondale West,19500,320


In [76]:
#rename column names
names = ['Suburb','Median_personal_income','Median_weekly_rent_paid']
income_rent.columns = names

income_rent.head()

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid
0,Arch Hill,44900,500
1,Auckland Central East,16200,350
2,Auckland Harbourside,40400,500
3,Avondale South,24200,350
4,Avondale West,19500,320


#### Remove Tamaki Strait, Otahuhu West, Tidal-Tamaki

In [77]:
income_rent.loc[income_rent['Suburb']=='Tamaki Strait']

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid
79,Tamaki Strait,15800,..C


In [78]:
income_rent.loc[income_rent['Suburb']=='Otahuhu West']

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid
55,Otahuhu West,17500,270


In [79]:
income_rent.loc[income_rent['Suburb']=='Tidal-Tamaki']

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid
82,Tidal-Tamaki,..C,..C


In [80]:
#drop rows by index
income_rent.drop(index=[79,55,82],axis=0,inplace=True)

In [81]:
#sort by suburb
income_rent.sort_values(by='Suburb',inplace=True)
income_rent = income_rent.reset_index(drop=True)
income_rent.head()

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid
0,Arch Hill,44900,500
1,Auckland Central East,16200,350
2,Auckland Harbourside,40400,500
3,Avondale South,24200,350
4,Avondale West,19500,320


Now, we have two prepared dataframes ready for merging.

### Merge geo dataframe and income_rent dataframe

In [82]:
left = income_rent
right = geo

df = pd.merge(left,right,on='Suburb')
df.head()

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid,Latitude,Longitude
0,Arch Hill,44900,500,-36.866092,174.745972
1,Auckland Central East,16200,350,-36.848911,174.765226
2,Avondale South,24200,350,-36.893058,174.692814
3,Avondale West,19500,320,-36.893058,174.692814
4,Balmoral,39900,350,-36.889205,174.748694


## 4. Data Exploring

### 1. Segment and Cluster Neighborhoods

### 1.1 Explore Data - Using Foursquare API

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [83]:
CLIENT_ID = '14IPFMUFTF5UH14I1V1OIOYMQVZ3Q04W0CR10LRG3EHPODEG' # your Foursquare ID
CLIENT_SECRET = 'T1KILPQ2LITERM30VOIOTSA0MM51QESPF1TDOSIS42XZYMCI' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 14IPFMUFTF5UH14I1V1OIOYMQVZ3Q04W0CR10LRG3EHPODEG
CLIENT_SECRET:T1KILPQ2LITERM30VOIOTSA0MM51QESPF1TDOSIS42XZYMCI


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [84]:
df.loc[0,'Suburb']

'Arch Hill'

Get the neighborhood's latitude and longitude values.

In [85]:
suburb_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
suburb_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

suburb_name = df.loc[0, 'Suburb'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(suburb_name, 
                                                               suburb_latitude, 
                                                               suburb_longitude))

Latitude and longitude values of Arch Hill are -36.8660924, 174.7459717.


#### Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [86]:
# type your answer here

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    suburb_latitude, 
    suburb_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=14IPFMUFTF5UH14I1V1OIOYMQVZ3Q04W0CR10LRG3EHPODEG&client_secret=T1KILPQ2LITERM30VOIOTSA0MM51QESPF1TDOSIS42XZYMCI&v=20180605&ll=-36.8660924,174.7459717&radius=500&limit=100'

Double-click __here__ for the solution.
<!-- The correct answer is:
LIMIT = 100 # limit of number of venues returned by Foursquare API
-->

<!--
radius = 500 # define radius
-->

<!--
\\ # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL
--> 

Send the GET request and examine the resutls

In [87]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cc894964434b9316ae0e596'},
 'response': {'headerLocation': 'Arch Hill',
  'headerFullLocation': 'Arch Hill, Auckland',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': -36.8615923955, 'lng': 174.75158592346068},
   'sw': {'lat': -36.8705924045, 'lng': 174.74035747653934}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '55285dd1498e816af25ff94c',
       'name': 'Crumb',
       'location': {'lat': -36.862473,
        'lng': 174.745015,
        'labeledLatLngs': [{'label': 'display',
          'lat': -36.862473,
          'lng': 174.745015}],
        'distance': 411,
        'cc': 'NZ',
        'country': 'New Zealand',
        'formattedAddress': ['New Zealand']},
       'categori

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [88]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [89]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Crumb,Café,-36.862473,174.745015
1,Philippes Chocolat,Bakery,-36.865305,174.744822
2,Charlie Boys,Café,-36.862712,174.748618
3,Funk Estate Brewery,Brewery,-36.867468,174.741958


And how many venues were returned by Foursquare?

In [90]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


<a id='item2'></a>

### 1.2. Explore Neighborhoods in Auckland City

#### Let's create a function to repeat the same process to all the neighborhoods in Auckland City

In [91]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Create a new dataframe called *auckland_venues* to store results.

In [92]:
auckland_venues = getNearbyVenues(names=df['Suburb'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Arch Hill
Auckland Central East
Avondale South
Avondale West
Balmoral
Blockhouse Bay
Eden Terrace
Ellerslie North
Ellerslie South
Epsom Central
Epsom North
Epsom South
Freemans Bay
Glen Innes East
Glen Innes North
Glen Innes West
Glendowie
Grafton East
Grafton West
Grey Lynn East
Grey Lynn West
Herne Bay
Hillsborough East
Hillsborough West
Kingsland
Kohimarama East
Kohimarama West
Lynfield North
Lynfield South
Meadowbank North
Meadowbank South
Mission Bay
Mt Albert Central
Mt Eden East
Mt Eden North
Mt Eden South
Mt St John
Mt Wellington Domain
Mt Wellington North
Mt Wellington South
New Windsor
Newmarket
Newton
One Tree Hill Central
One Tree Hill East
Onehunga North East
Onehunga North West
Onehunga South East
Onehunga South West
Orakei North
Orakei South
Oranga
Otahuhu East
Otahuhu North
Owairaka East
Owairaka West
Panmure Basin
Parnell East
Parnell West
Penrose
Point Chevalier East
Point Chevalier South
Point Chevalier West
Point England
Ponsonby East
Ponsonby West
Remuera South
Rem

#### Let's check the size of the resulting dataframe

In [93]:
print(auckland_venues.shape)
auckland_venues.head()

(974, 7)


Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Arch Hill,-36.866092,174.745972,Crumb,-36.862473,174.745015,Café
1,Arch Hill,-36.866092,174.745972,Philippes Chocolat,-36.865305,174.744822,Bakery
2,Arch Hill,-36.866092,174.745972,Charlie Boys,-36.862712,174.748618,Café
3,Arch Hill,-36.866092,174.745972,Funk Estate Brewery,-36.867468,174.741958,Brewery
4,Auckland Central East,-36.848911,174.765226,Elliott Stables,-36.85037,174.763591,Food Court


#### Let's see the total amount in each site category

In [94]:
auckland_venues['Venue Category'].value_counts().head()

Café                    155
Coffee Shop              34
Bar                      34
Indian Restaurant        32
Fast Food Restaurant     31
Name: Venue Category, dtype: int64

##### Café has an overwhelming amount compared to other categories.

#### Let's plot the total amount in each site category

In [95]:
auckland_venues['Venue Category'].value_counts(ascending=False).head(15).plot(kind='barh');

#### Let's see how many venues were returned for each neighborhood

In [96]:
auckland_venues.groupby('Suburb').count()

Unnamed: 0_level_0,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arch Hill,4,4,4,4,4,4
Auckland Central East,100,100,100,100,100,100
Avondale South,4,4,4,4,4,4
Avondale West,4,4,4,4,4,4
Balmoral,19,19,19,19,19,19
Blockhouse Bay,7,7,7,7,7,7
Eden Terrace,23,23,23,23,23,23
Ellerslie North,12,12,12,12,12,12
Ellerslie South,12,12,12,12,12,12
Epsom Central,7,7,7,7,7,7


#### Visualize the distribution of venues

In [97]:
from folium import plugins

# let's start again with a clean copy of the map of Auckland City
auckland_map2 = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
venue = plugins.MarkerCluster().add_to(auckland_map2)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(auckland_venues['Venue Latitude'], auckland_venues['Venue Longitude'] , auckland_venues['Venue Category']):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(venue)

# display map
auckland_map2

#### Let's find out how many venues in our dataset

In [98]:
print('There are total of {} venues.'.format(auckland_venues.shape[0]))

There are total of 974 venues.


#### Let's find out how many unique categories and venues can be curated from all the returned venues

In [99]:
print('There are {} unique categories.'.format(len(auckland_venues['Venue Category'].unique())))

There are 138 unique categories.


In [100]:
print('There are {} unique venues.'.format(len(auckland_venues['Venue'].unique())))

There are 591 unique venues.


#### Let's find out which venue appers more in our dataset

In [101]:
auckland_venues['Venue'].value_counts().head(10)

McDonald's                    10
Domino's Pizza                10
KFC                           10
Countdown                      7
Pizza Hut                      5
Subway                         5
Cafe Tran                      4
Tanto Japanese Dining          4
Enchanted Forest Mini Golf     4
Dress Smart Mall               4
Name: Venue, dtype: int64

##### The top six are world-class well-known chain stores.

In [102]:
auckland_venues.query("Venue == 'KFC'")

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
126,Balmoral,-36.889205,174.748694,KFC,-36.886627,174.747217,Fast Food Restaurant
341,Lynfield North,-36.927373,174.718848,KFC,-36.924284,174.722394,Fast Food Restaurant
348,Lynfield South,-36.927373,174.718848,KFC,-36.924284,174.722394,Fast Food Restaurant
620,Otahuhu East,-36.943722,174.843724,KFC,-36.946017,174.845774,Fast Food Restaurant
628,Otahuhu North,-36.943722,174.843724,KFC,-36.946017,174.845774,Fast Food Restaurant
689,Point Chevalier East,-36.866529,174.708087,KFC,-36.870191,174.711329,Fast Food Restaurant
699,Point Chevalier South,-36.866529,174.708087,KFC,-36.870191,174.711329,Fast Food Restaurant
709,Point Chevalier West,-36.866529,174.708087,KFC,-36.870191,174.711329,Fast Food Restaurant
753,Ponsonby East,-36.8502,174.741825,KFC,-36.852194,174.744922,Fast Food Restaurant
795,Ponsonby West,-36.8502,174.741825,KFC,-36.852194,174.744922,Fast Food Restaurant


From the above result, we know that the same venue could be count several times due to the closeness in location.

<a id='item3'></a>

### 1.3. Analyze Each Neighborhood

In [103]:
# one hot encoding
auckland_onehot = pd.get_dummies(auckland_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
auckland_onehot['Suburb'] = auckland_venues['Suburb'] 

# move neighborhood column to the first column
fixed_columns = [auckland_onehot.columns[-1]] + list(auckland_onehot.columns[:-1])
auckland_onehot = auckland_onehot[fixed_columns]

auckland_onehot.head()

Unnamed: 0,Suburb,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Beer Bar,Beer Garden,Bistro,Bookstore,Bowling Green,Brazilian Restaurant,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Business Service,Café,Chinese Restaurant,Circus,City Hall,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Golf Course,Gourmet Shop,Grocery Store,Gym,Health & Beauty Service,History Museum,Home Service,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mountain,Movie Theater,Multiplex,Music Store,Music Venue,Nail Salon,Neighborhood,Noodle House,Organic Grocery,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool Hall,Portuguese Restaurant,Pub,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Restaurant,Sandwich Place,Scenic Lookout,Shopping Mall,Skating Rink,Snack Place,Spa,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,Arch Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Arch Hill,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Arch Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Arch Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Auckland Central East,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [104]:
auckland_onehot.shape

(974, 139)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [105]:
auckland_grouped = auckland_onehot.groupby('Suburb').mean().reset_index()
auckland_grouped

Unnamed: 0,Suburb,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Beer Bar,Beer Garden,Bistro,Bookstore,Bowling Green,Brazilian Restaurant,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Business Service,Café,Chinese Restaurant,Circus,City Hall,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Golf Course,Gourmet Shop,Grocery Store,Gym,Health & Beauty Service,History Museum,Home Service,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mountain,Movie Theater,Multiplex,Music Store,Music Venue,Nail Salon,Neighborhood,Noodle House,Organic Grocery,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool Hall,Portuguese Restaurant,Pub,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Restaurant,Sandwich Place,Scenic Lookout,Shopping Mall,Skating Rink,Snack Place,Spa,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,Arch Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Auckland Central East,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.04,0.0,0.0,0.1,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.03,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.04,0.01,0.02,0.04,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.03,0.0,0.03,0.0,0.02,0.02,0.0,0.01,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,Avondale South,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Avondale West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Balmoral,0.0,0.0,0.0,0.210526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.210526,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Blockhouse Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0
6,Eden Terrace,0.043478,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ellerslie North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Ellerslie South,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Epsom Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.285714,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [106]:
auckland_grouped.shape

(83, 139)

#### Let's print each neighborhood along with the top 5 most common venues

In [107]:
num_top_venues = 5

for hood in auckland_grouped['Suburb']:
    print("----"+hood+"----")
    temp = auckland_grouped[auckland_grouped['Suburb'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arch Hill----
                 venue  freq
0                 Café  0.50
1              Brewery  0.25
2               Bakery  0.25
3  American Restaurant  0.00
4         Noodle House  0.00


----Auckland Central East----
                 venue  freq
0                 Café  0.10
1           Restaurant  0.05
2    Indian Restaurant  0.04
3         Burger Joint  0.04
4  Japanese Restaurant  0.04


----Avondale South----
                venue  freq
0           Racetrack  0.25
1                Café  0.25
2  Chinese Restaurant  0.25
3              Market  0.25
4       Movie Theater  0.00


----Avondale West----
                venue  freq
0           Racetrack  0.25
1                Café  0.25
2  Chinese Restaurant  0.25
3              Market  0.25
4       Movie Theater  0.00


----Balmoral----
                  venue  freq
0      Asian Restaurant  0.21
1    Chinese Restaurant  0.21
2  Fast Food Restaurant  0.11
3       Thai Restaurant  0.11
4   Japanese Restaurant  0.05


----Blockhouse B

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [108]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [109]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
suburb_venues_sorted = pd.DataFrame(columns=columns)
suburb_venues_sorted['Suburb'] = auckland_grouped['Suburb']

for ind in np.arange(auckland_grouped.shape[0]):
    suburb_venues_sorted.iloc[ind, 1:] = return_most_common_venues(auckland_grouped.iloc[ind, :], num_top_venues)

suburb_venues_sorted.head()

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arch Hill,Café,Brewery,Bakery,Farmers Market,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
1,Auckland Central East,Café,Restaurant,Burger Joint,Indian Restaurant,Japanese Restaurant,Lounge,Hotel,Department Store,Steakhouse,Pizza Place
2,Avondale South,Market,Racetrack,Café,Chinese Restaurant,Yoga Studio,Farmers Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
3,Avondale West,Market,Racetrack,Café,Chinese Restaurant,Yoga Studio,Farmers Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
4,Balmoral,Asian Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Dumpling Restaurant,Park,Dessert Shop,Gym,Japanese Restaurant,Coffee Shop


<a id='item4'></a>

### 1.4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [110]:
# set number of clusters
kclusters = 5

auckland_grouped_clustering = auckland_grouped.drop('Suburb', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(auckland_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 3, 3, 0, 0, 0, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [111]:
# add clustering labels
suburb_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)

auckland_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
auckland_merged = auckland_merged.join(suburb_venues_sorted.set_index('Suburb'), on='Suburb')

auckland_merged.head() # check the last columns!

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arch Hill,44900,500,-36.866092,174.745972,3,Café,Brewery,Bakery,Farmers Market,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
1,Auckland Central East,16200,350,-36.848911,174.765226,0,Café,Restaurant,Burger Joint,Indian Restaurant,Japanese Restaurant,Lounge,Hotel,Department Store,Steakhouse,Pizza Place
2,Avondale South,24200,350,-36.893058,174.692814,3,Market,Racetrack,Café,Chinese Restaurant,Yoga Studio,Farmers Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
3,Avondale West,19500,320,-36.893058,174.692814,3,Market,Racetrack,Café,Chinese Restaurant,Yoga Studio,Farmers Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
4,Balmoral,39900,350,-36.889205,174.748694,0,Asian Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Dumpling Restaurant,Park,Dessert Shop,Gym,Japanese Restaurant,Coffee Shop


Finally, let's visualize the resulting clusters

In [112]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(auckland_merged['Latitude'], auckland_merged['Longitude'], auckland_merged['Suburb'], auckland_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

### 1.5. Examine Clusters

Now, based on the defining categories, I will assign a name to each cluster.

#### Cluster 0 - Life experiential

In [113]:
auckland_merged.loc[auckland_merged['Cluster Labels'] == 0, auckland_merged.columns[[0] + list(range(5, auckland_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Auckland Central East,0,Café,Restaurant,Burger Joint,Indian Restaurant,Japanese Restaurant,Lounge,Hotel,Department Store,Steakhouse,Pizza Place
4,Balmoral,0,Asian Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Dumpling Restaurant,Park,Dessert Shop,Gym,Japanese Restaurant,Coffee Shop
5,Blockhouse Bay,0,Ice Cream Shop,Video Store,Mediterranean Restaurant,Café,Grocery Store,Neighborhood,Fish & Chips Shop,Dessert Shop,Diner,Donut Shop
6,Eden Terrace,0,Café,Indian Restaurant,American Restaurant,Sushi Restaurant,Japanese Restaurant,Kebab Restaurant,French Restaurant,Music Venue,Park,Cocktail Bar
7,Ellerslie North,0,Pizza Place,Café,Hotel,Turkish Restaurant,Coffee Shop,Chinese Restaurant,Bar,Bakery,Park,Grocery Store
8,Ellerslie South,0,Pizza Place,Café,Hotel,Turkish Restaurant,Coffee Shop,Chinese Restaurant,Bar,Bakery,Park,Grocery Store
9,Epsom Central,0,Chinese Restaurant,Fish & Chips Shop,Tailor Shop,Japanese Restaurant,Café,Circus,Yoga Studio,Farmers Market,Filipino Restaurant,Fast Food Restaurant
10,Epsom North,0,Chinese Restaurant,Fish & Chips Shop,Tailor Shop,Japanese Restaurant,Café,Circus,Yoga Studio,Farmers Market,Filipino Restaurant,Fast Food Restaurant
11,Epsom South,0,Chinese Restaurant,Fish & Chips Shop,Tailor Shop,Japanese Restaurant,Café,Circus,Yoga Studio,Farmers Market,Filipino Restaurant,Fast Food Restaurant
17,Grafton East,0,Coffee Shop,Café,Gym,Pharmacy,Food,Sandwich Place,Hostel,Deli / Bodega,Grocery Store,Train Station


#### Cluster 1 - Life Essential

In [114]:
auckland_merged.loc[auckland_merged['Cluster Labels'] == 1, auckland_merged.columns[[0] + list(range(5, auckland_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Glen Innes East,1,Supermarket,Turkish Restaurant,Fish & Chips Shop,Pizza Place,Gym,Grocery Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store
14,Glen Innes North,1,Supermarket,Turkish Restaurant,Fish & Chips Shop,Pizza Place,Gym,Grocery Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store
15,Glen Innes West,1,Supermarket,Turkish Restaurant,Fish & Chips Shop,Pizza Place,Gym,Grocery Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store


#### Cluster 2 - Sports Experiential

In [115]:
auckland_merged.loc[auckland_merged['Cluster Labels'] == 2, auckland_merged.columns[[0] + list(range(5, auckland_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
66,Remuera South,2,Bowling Green,Yoga Studio,Food Court,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant
67,Remuera West,2,Bowling Green,Yoga Studio,Food Court,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant


#### Cluster 3 - Daily life

In [116]:
auckland_merged.loc[auckland_merged['Cluster Labels'] == 3, auckland_merged.columns[[0] + list(range(5, auckland_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arch Hill,3,Café,Brewery,Bakery,Farmers Market,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
2,Avondale South,3,Market,Racetrack,Café,Chinese Restaurant,Yoga Studio,Farmers Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
3,Avondale West,3,Market,Racetrack,Café,Chinese Restaurant,Yoga Studio,Farmers Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
12,Freemans Bay,3,Café,Bar,Italian Restaurant,Park,Gym,Vietnamese Restaurant,Japanese Restaurant,Mexican Restaurant,Thai Restaurant,Restaurant
19,Grey Lynn East,3,Café,Bar,Vietnamese Restaurant,Coffee Shop,Farmers Market,Yoga Studio,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant
20,Grey Lynn West,3,Café,Bar,Vietnamese Restaurant,Coffee Shop,Farmers Market,Yoga Studio,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant
29,Meadowbank North,3,Park,Café,Yoga Studio,Electronics Store,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Dumpling Restaurant
30,Meadowbank South,3,Park,Café,Yoga Studio,Electronics Store,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Dumpling Restaurant
41,Newmarket,3,Café,Chinese Restaurant,Coffee Shop,Sushi Restaurant,Department Store,Yoga Studio,Indian Restaurant,Multiplex,Movie Theater,Mexican Restaurant
45,Onehunga North East,3,Café,Fast Food Restaurant,Clothing Store,Shopping Mall,Lingerie Store,Sporting Goods Shop,Pizza Place,Grocery Store,BBQ Joint,Golf Course


#### Cluster 4 - Nature Experiential

In [117]:
auckland_merged.loc[auckland_merged['Cluster Labels'] == 4, auckland_merged.columns[[0] + list(range(5, auckland_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Glendowie,4,Park,Thai Restaurant,Yoga Studio,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Electronics Store
32,Mt Albert Central,4,Park,History Museum,Yoga Studio,Electronics Store,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Donut Shop


### 2. Median Income and rent paid

#### Create a new dataframe named df_filter to do further analysis

In [118]:
df_filter = pd.DataFrame(auckland_merged)

#### Let's see the summary statistics

In [119]:
df_filter.describe()

Unnamed: 0,Latitude,Longitude,Cluster Labels
count,83.0,83.0,83.0
mean,-36.885332,174.778621,0.939759
std,0.025281,0.049383,1.391169
min,-36.943722,174.692814,0.0
25%,-36.900172,174.737612,0.0
50%,-36.884707,174.773462,0.0
75%,-36.865671,174.818424,3.0
max,-36.842507,174.870537,4.0


##### Our best choice is that Median_personal_income is greater than mean and Median_weekly_rent_paid is less than mean.

#### convert datatype to int

In [120]:
df_filter['Median_personal_income'] = df_filter['Median_personal_income'].astype(int)
df_filter['Median_weekly_rent_paid'] = df_filter['Median_weekly_rent_paid'].astype(int)

#### Retain those Median_personal_income larger than mean of Median Personal Income

In [121]:
df_filter = df_filter[df_filter['Median_personal_income'] >df_filter['Median_personal_income'].mean()]

In [122]:
df_filter.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 41 entries, 0 to 82
Data columns (total 16 columns):
Suburb                     41 non-null object
Median_personal_income     41 non-null int64
Median_weekly_rent_paid    41 non-null int64
Latitude                   41 non-null float64
Longitude                  41 non-null float64
Cluster Labels             41 non-null int32
1st Most Common Venue      41 non-null object
2nd Most Common Venue      41 non-null object
3rd Most Common Venue      41 non-null object
4th Most Common Venue      41 non-null object
5th Most Common Venue      41 non-null object
6th Most Common Venue      41 non-null object
7th Most Common Venue      41 non-null object
8th Most Common Venue      41 non-null object
9th Most Common Venue      41 non-null object
10th Most Common Venue     41 non-null object
dtypes: float64(2), int32(1), int64(2), object(11)
memory usage: 5.3+ KB


##### There are 41 Suburbs meet our standards

#### Retain those Median_weekly_rent_paid larger than 75th percentiles of Median_weekly_rent_paid

In [123]:
df_filter = df_filter[df_filter['Median_weekly_rent_paid']< 430.000000]

In [124]:
df_filter.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23 entries, 4 to 77
Data columns (total 16 columns):
Suburb                     23 non-null object
Median_personal_income     23 non-null int64
Median_weekly_rent_paid    23 non-null int64
Latitude                   23 non-null float64
Longitude                  23 non-null float64
Cluster Labels             23 non-null int32
1st Most Common Venue      23 non-null object
2nd Most Common Venue      23 non-null object
3rd Most Common Venue      23 non-null object
4th Most Common Venue      23 non-null object
5th Most Common Venue      23 non-null object
6th Most Common Venue      23 non-null object
7th Most Common Venue      23 non-null object
8th Most Common Venue      23 non-null object
9th Most Common Venue      23 non-null object
10th Most Common Venue     23 non-null object
dtypes: float64(2), int32(1), int64(2), object(11)
memory usage: 3.0+ KB


##### There are only 23 Suburbs meet our standards

##### According to the last part of exploring, I decide to live in the Cluster 0 - Life experiential, which have various choice of venues in the neighborhood. So we will keep neighborhoods that only label as cluster 0.

In [125]:
df_filter = df_filter[(df_filter['Cluster Labels']==0)|(df_filter['Cluster Labels']==3)]

In [126]:
df_filter.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 22 entries, 4 to 77
Data columns (total 16 columns):
Suburb                     22 non-null object
Median_personal_income     22 non-null int64
Median_weekly_rent_paid    22 non-null int64
Latitude                   22 non-null float64
Longitude                  22 non-null float64
Cluster Labels             22 non-null int32
1st Most Common Venue      22 non-null object
2nd Most Common Venue      22 non-null object
3rd Most Common Venue      22 non-null object
4th Most Common Venue      22 non-null object
5th Most Common Venue      22 non-null object
6th Most Common Venue      22 non-null object
7th Most Common Venue      22 non-null object
8th Most Common Venue      22 non-null object
9th Most Common Venue      22 non-null object
10th Most Common Venue     22 non-null object
dtypes: float64(2), int32(1), int64(2), object(11)
memory usage: 2.8+ KB


##### There are only 14 Suburbs meet our standards

#### Let's see our dataframe

In [127]:
df_filter.sort_values(by='Median_personal_income',ascending=False)

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Herne Bay,57500,400,-36.842507,174.736383,0,Café,American Restaurant,Japanese Restaurant,Fast Food Restaurant,French Restaurant,Bar,Pharmacy,Italian Restaurant,Wine Bar,Grocery Store
12,Freemans Bay,49100,410,-36.85315,174.750954,3,Café,Bar,Italian Restaurant,Park,Gym,Vietnamese Restaurant,Japanese Restaurant,Mexican Restaurant,Thai Restaurant,Restaurant
19,Grey Lynn East,47700,410,-36.859922,174.736418,3,Café,Bar,Vietnamese Restaurant,Coffee Shop,Farmers Market,Yoga Studio,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant
31,Mission Bay,45500,420,-36.849862,174.833645,0,Italian Restaurant,Ice Cream Shop,Fish & Chips Shop,Café,Mexican Restaurant,Dessert Shop,Pizza Place,Coffee Shop,Pub,Movie Theater
29,Meadowbank North,44500,370,-36.870254,174.82494,3,Park,Café,Yoga Studio,Electronics Store,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Dumpling Restaurant
8,Ellerslie South,42000,400,-36.897603,174.81503,0,Pizza Place,Café,Hotel,Turkish Restaurant,Coffee Shop,Chinese Restaurant,Bar,Bakery,Park,Grocery Store
45,Onehunga North East,41000,360,-36.923792,174.785774,3,Café,Fast Food Restaurant,Clothing Store,Shopping Mall,Lingerie Store,Sporting Goods Shop,Pizza Place,Grocery Store,BBQ Joint,Golf Course
7,Ellerslie North,41000,360,-36.897603,174.81503,0,Pizza Place,Café,Hotel,Turkish Restaurant,Coffee Shop,Chinese Restaurant,Bar,Bakery,Park,Grocery Store
49,Orakei North,40800,320,-36.856841,174.821819,0,Convenience Store,Athletics & Sports,Tennis Court,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Electronics Store
4,Balmoral,39900,350,-36.889205,174.748694,0,Asian Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Dumpling Restaurant,Park,Dessert Shop,Gym,Japanese Restaurant,Coffee Shop


##### Looks like `Ellerslie_South` will be a good choice! Because it has some Chinese restaurants and coffee shops. What if we put more value on the cost-effective side, in terms of the ratio of rent and income, would the choice offer for us change?

### Best ration after filtering

#### Set Index to measure the cost-effectiveness

Set a column named `Index` which represented Median_yearly_rent_paid divided by Median personal income

In [128]:
df_filter['Index'] = (df_filter['Median_weekly_rent_paid']*48)/df_filter['Median_personal_income']

In [129]:
df_filter.sort_values(by='Index',ascending=True).head()

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Index
21,Herne Bay,57500,400,-36.842507,174.736383,0,Café,American Restaurant,Japanese Restaurant,Fast Food Restaurant,French Restaurant,Bar,Pharmacy,Italian Restaurant,Wine Bar,Grocery Store,0.333913
49,Orakei North,40800,320,-36.856841,174.821819,0,Convenience Store,Athletics & Sports,Tennis Court,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Electronics Store,0.376471
29,Meadowbank North,44500,370,-36.870254,174.82494,3,Park,Café,Yoga Studio,Electronics Store,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Dumpling Restaurant,0.399101
12,Freemans Bay,49100,410,-36.85315,174.750954,3,Café,Bar,Italian Restaurant,Park,Gym,Vietnamese Restaurant,Japanese Restaurant,Mexican Restaurant,Thai Restaurant,Restaurant,0.400815
19,Grey Lynn East,47700,410,-36.859922,174.736418,3,Café,Bar,Vietnamese Restaurant,Coffee Shop,Farmers Market,Yoga Studio,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant,0.412579


Looks like `Herne Bay` would be the better choice if we value more on the ratio, which has a 0.33 index ratio. .`Ellerslie_South` ranked 12th in the Index ranking, which has a 0.46 index ratio.

#### Let's find out which neighborhood is the most cost-effectiveness if we drop all standards

In [130]:
df_filter2 = pd.DataFrame(auckland_merged)

In [131]:
df_filter2['Index'] = (df_filter2['Median_weekly_rent_paid']*48)/df_filter2['Median_personal_income']
df_filter2.sort_values(by='Index',ascending=True).head()

Unnamed: 0,Suburb,Median_personal_income,Median_weekly_rent_paid,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Index
13,Glen Innes East,19300,130,-36.875526,174.859947,1,Supermarket,Turkish Restaurant,Fish & Chips Shop,Pizza Place,Gym,Grocery Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,0.323316
21,Herne Bay,57500,400,-36.842507,174.736383,0,Café,American Restaurant,Japanese Restaurant,Fast Food Restaurant,French Restaurant,Bar,Pharmacy,Italian Restaurant,Wine Bar,Grocery Store,0.333913
63,Point England,14600,110,-36.884707,174.865295,0,Train Station,Park,Fish & Chips Shop,Bus Station,Bakery,Falafel Restaurant,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,0.361644
49,Orakei North,40800,320,-36.856841,174.821819,0,Convenience Store,Athletics & Sports,Tennis Court,Falafel Restaurant,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Electronics Store,0.376471
51,Oranga,26100,210,-36.909534,174.801514,3,Café,Bakery,Yoga Studio,Farmers Market,Food,Flower Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,0.386207


`Glen Innes East` is the most cost-effective community with an index ratio of 0.32. Considering the slight difference from `Herne Bay`, it may not be a good choice.

## 4. Methodology

### Exploratory data analysis

 - There are total of 974 venues and 591 unique venues.
 - There are 138 unique categories.
 - The top six venues in the venue category are world-class chains. Although there may be some duplication in the list due to the proximity of these communities, this proves that those chains do know how to choose a location.

### Machine learning

- Clustering neighborhood into 5 clusters, each has its unique.
    - Cluster 1 - Life Essential
    - Cluster 2 - Sports Experiential
    - Cluster 3 - Daily life
    - Cluster 4 - Nature Experiential

### Exploratory data analysis

## 5. Result

After segmenting and clustering these communities, we have a general idea of these communities, which is a good fit for a newcomer to the city.
However, there is no perfect choice, and every choice is the result of balance. After exploring this data, I found that there is no perfect community to meet all the standards. However, if we make concessions in terms of cost-effectiveness, we can choose Ellerslie_South as our residential area. On the other hand, if we insist on being affordable while retaining the well living standard, we should choose Royal Oak as a living area.

## 6. Discussion

I tried to draw a choropleth map based on the median income and the median rent. I spent a lot of time discovering that the NZ government keep the data non-publicly available. Otherwise, the visualization in this report would be more vivid. Besides, the standard I choose may differ by people, so it may make the result more personalized. Lastly, I wonder that if there may be a better solution to figure out the cost-effectiveness metric.

## 7. Conclusion

In this report, I tried to explore the neighborhoods in Auckland City and find myself a suitable area to live, which has to meet specific venues categories and cost-effectiveness as well. I use Foursquare location data to gather information about venues and combine this data to those I scrape on Wikipedia page and StatsNZ. I use the cluster method to cluster those neighborhoods into 5 clusters. Each cluster has its own characteristics. After analyzing, we have initial thoughts about neighborhoods and know better about our ideal living area.