# Capstone Project - Battle of Neighborhood
## London, UK


<a> <img src = "london-pic.jpg"> </a>

## 1. Introduction: Business Problem

People often explore the neighborhood before they wanted to move to any specific location to understand various aspect of the locations - rentals, near-by shops, clinics / hospitals, schools, safety, distance to the office etc. In this project, I have choosen to explore London neighborhood, where most of the Asian / Indian community reside and the surrounding restaurants around that area using clustering & segmentation techniques learned in this course. This project will also help me in comparing 2 different neighborhoods to choose the best suited location based on the top 10 common venues surrounding it.

**Objective:**
* Extract top trending venues of London using Foursquare API
* Forming neighborhood clusters based on venue categories using unsupervised k-mean clustering algorithm
* Understanding the similarities and differences between two neighborhoods to retrieve more insights and to conclude which neighborhood is best suited for an individual’s need.

## 2. Datasets and APIs:

#### DataSet1 - List of areas in London
I will be extracting the list of areas in London from this wikipage: https://en.wikipedia.org/wiki/List_of_areas_of_London
<tr>BeautifulSoap and other panda libraries will be used to cleans this data for the required Neighborhood of London with Postalcodes

#### DataSet2 - Demography of London
I will also be using the demographic data of london from this wikipage: https://en.wikipedia.org/wiki/Demography_of_London to find the top asian locality within London

#### Important Libraries / API - geocoder, Folium, kMeans, Foursquare
- I will be using geocoder library to fetch the required latitude and logitutde information of each postal code of London, UK
- Folium is a great visualization library for plotting the map. This will be used to visualize the neighborhoods cluster distribution of London city over an interactive leaflet map.
- Foursquare API will help fetch the neighborhood for the given postal codes and find the top common venues from it.
- kMeans clustering will be used to form the clusters of surrounding neighborhoods and compare the neighborhood information



### i) Let us download all the dependent library for the project

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from bs4 import BeautifulSoup #to handle html crawling

!pip -q install geocoder
import geocoder

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

<a id='item1'></a>

### ii) Download and Transform Dataset1 (Areas of London)

We will download the areas of London from this Wikipage: https://en.wikipedia.org/wiki/List_of_areas_of_London

BeautifulSoap and other panda libraries is used to cleans this data for the required information on Neighborhood of London with Postalcodes.

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
wikipage = requests.get(url)

#Clean html file
soup = BeautifulSoup(wikipage.content, 'html.parser')

#Extract "tbody" in the table where the class is "wikitable sortable"
table = soup.find('table', {'class':'wikitable sortable'}).tbody

#Extracting all "tr" in the above table
rows = table.find_all('tr')

#Extracting column headers, removes and replaces new line with '' for 'th' tag
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]

#Converts columns to pd dataframe
df0 = pd.DataFrame(columns = columns)

#Extracts row with corresponding columns 
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')    
    if len(tds) == 7:
        values = [tds[0].text.replace('\n', '').replace('\xa0',''), 
                  tds[1].text.replace('\n', '').replace('\xa0',''), 
                  tds[2].text.replace('\n', '').replace('\xa0',''), 
                  tds[3].text.replace('\n', '').replace('\xa0',''), 
                  tds[4].text.replace('\n', '').replace('\xa0',''), 
                  tds[5].text.replace('\n', '').replace('\xa0',''), 
                  tds[6].text.replace('\n', '').replace('\xa0','')]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
     
        df0 = df0.append(pd.Series(values, index = columns), ignore_index = True)

#Rename 'London Borough' to 'Borough' and stripping the not required information
df0['Borough'] = df0['London borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
#Renaming Post town and Postal Code column names
df0['Post_town'] = df0['Post town']
df0['PostalCode'] = df0['Postcode district']
df0.head()


Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref,Borough,Post_town,PostalCode
0,Abbey Wood,"Bexley, Greenwich [1]",LONDON,SE2,20,TQ465785,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham[2]",LONDON,"W3, W4",20,TQ205805,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Addington,Croydon[2],CROYDON,CR0,20,TQ375645,Croydon,CROYDON,CR0
3,Addiscombe,Croydon[2],CROYDON,CR0,20,TQ345665,Croydon,CROYDON,CR0
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


### iii) Final version of dataset1 after cleansing

In [5]:
#drop the not used columns
df1 = df0[['Location', 'Borough', 'Post_town','PostalCode']].reset_index(drop=True)
#And split rows to consider multiple postal codes
df2 = df1.drop('PostalCode', axis=1).join(df1['PostalCode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('PostalCode'))

df2.head()

Unnamed: 0,Location,Borough,Post_town,PostalCode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0


### iv) Demographic data of London will be our dataset2

**We will extract the demographic data and sort by top5 Asian community**

In [6]:
#reading demographic data of LONDON
demographic = 'https://en.wikipedia.org/wiki/Demography_of_London'
demographic_read = requests.get(demographic)
soup1 = BeautifulSoup(demographic_read.content, 'html.parser')
table1 = soup1.find('table', {'class':'wikitable sortable'}).tbody
rows1 = table1.find_all('tr')
columns1 = [i.text.replace('\n', '')
            for i in rows1[0].find_all('th')]

df_london = pd.DataFrame(columns = columns1)

for j in range(1, len(rows1)):
    tds1 = rows1[j].find_all('td')
    if len(tds1) == 7:
        values1 = [tds1[0].text, 
                   tds1[1].text, 
                   tds1[2].text.replace('\n', ''.replace('\xa0','')), 
                   tds1[3].text, tds1[4].text.replace('\n', ''.replace('\xa0','')), 
                   tds1[5].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values1 = [td1.text.replace('\n', '').replace('\xa0','') for td1 in tds1]
        
        df_london = df_london.append(pd.Series(values1, index = columns1), ignore_index = True)

#d_london

df_london['Asian'] = df_london['Asian'].astype('float')

df_london_sorted = df_london.sort_values(by='Asian', ascending = False)

df_london_sorted.head(5)

Unnamed: 0,Local authority,White,Mixed,Asian,Black,Other
24,Newham,29.0,4.5,43.5,19.6,3.5
13,Harrow,42.2,4.0,42.6,8.2,2.9
25,Redbridge,42.5,4.1,41.8,8.9,2.7
29,Tower Hamlets,45.2,4.1,41.1,7.3,2.3
17,Hounslow,51.4,4.1,34.4,6.6,3.6


### v) Merge Datasets to retrieve areas where asian community lives

**We can see that Newham, Harrow, Redbridge, Tower Hamlets and Hounslow are the top5 boroughs where Asian community lives**

Now let us merge this data with our first dataset to extract only locations where asian community leaves in majority.

In [7]:
df_london_top5_borough_asian = df_london_sorted['Local authority'].iloc[0:5]

#df2 has the final list of location - london borough
df_london_top_asian_loc = df2[df2['Borough'].isin(df_london_top5_borough_asian) & df2['Post_town'].str.contains('LONDON')].reset_index(drop=True)
df_copy = df_london_top_asian_loc.copy()
df_london_top_asian_loc.head()

Unnamed: 0,Location,Borough,Post_town,PostalCode
0,Beckton,Newham,"LONDON, BARKING",E6
1,Beckton,Newham,"LONDON, BARKING",E16
2,Beckton,Newham,"LONDON, BARKING",IG11
3,Bethnal Green,Tower Hamlets,LONDON,E2
4,Blackwall,Tower Hamlets,LONDON,E14



**Function to retrieve latitude and longitude of a specific area in London, UK**

Pass the postal code and this function will retrieve the latitude and longitude co-ordinates

In [8]:
#Function for getting latitude longitude
def getlatlong(postalCode):
    ll_cords = None
    
    while(ll_cords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(postalCode))
        ll_cords = g.latlng
    return ll_cords


### vi) Including Latitude and Longitude to the main dataset

In [9]:
latitude=[] #List to collect the latitudes
longitude=[] #List to collect the longitudes

for i in df_copy['PostalCode']: #Iterating through Postalcodes to collect the locations data
    #loc_search=df_copy['PostalCode'] + ' ' + df_copy['Location']
    location = getlatlong(i)
    latitude.append(location[0])
    longitude.append(location[1])

df_copy['Latitude']=latitude #Adding a column in the main dataframe for Latitude  
df_copy['Longitude']=longitude #Adding a column in the main dataframe for Longitude
df_copy.head()

Unnamed: 0,Location,Borough,Post_town,PostalCode,Latitude,Longitude
0,Beckton,Newham,"LONDON, BARKING",E6,51.53292,0.05461
1,Beckton,Newham,"LONDON, BARKING",E16,51.50913,0.01528
2,Beckton,Newham,"LONDON, BARKING",IG11,51.532674,0.085256
3,Bethnal Green,Tower Hamlets,LONDON,E2,51.52669,-0.06257
4,Blackwall,Tower Hamlets,LONDON,E14,51.51122,-0.01264


### vii) Create a map of London, United Kingdom with neighborhoods superimposed on top

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>eng_explorer</em>, as shown below.

In [10]:
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="eng_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London, United Kingdom are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London, United Kingdom are 51.5073219, -0.1276474.


**Using follium to create the map**

In [11]:
# create map of New York using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_copy['Latitude'], df_copy['Longitude'], df_copy['Borough'], df_copy['Location']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

Let us simplify the above map and segment and cluster only the neighborhoods of Newham borough. 

So let's slice the original dataframe and create a new dataframe of the Newham data.

In [12]:
df_newham = df_copy[df_copy['Borough'] == 'Newham'].reset_index(drop=True)
df_newham.head()

Unnamed: 0,Location,Borough,Post_town,PostalCode,Latitude,Longitude
0,Beckton,Newham,"LONDON, BARKING",E6,51.53292,0.05461
1,Beckton,Newham,"LONDON, BARKING",E16,51.50913,0.01528
2,Beckton,Newham,"LONDON, BARKING",IG11,51.532674,0.085256
3,Canning Town,Newham,LONDON,E16,51.50913,0.01528
4,Custom House,Newham,LONDON,E16,51.50913,0.01528


Let's get the geographical coordinates of Newham, London.

In [13]:
address = 'Newham, London'

geolocator = Nominatim(user_agent="nh_eng_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Newham are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Newham are 51.52999955, 0.0293179602938221.


As we did with all of London City, let's visualize Newham the neighborhoods in it.

In [14]:
# create map of Manhattan using latitude and longitude values
map_newham = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_newham['Latitude'], df_newham['Longitude'], df_newham['Location']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newham)  
    
map_newham

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

Due to confidential information, the below code is hidden from display

In [15]:
# The code was removed by Watson Studio for sharing.

#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [16]:
df_newham.loc[1, 'Location']

'Beckton'

Get the neighborhood's latitude and longitude values.

In [17]:
neighborhood_latitude = df_newham.loc[1, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_newham.loc[1, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_newham.loc[1, 'Location'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Beckton are 51.50913000000003, 0.015280000000075233.


#### Now, let's get the top 100 venues that are in Becktop within a radius of 1000 meters.

First, let's create the GET request URL. Name your URL **url**.

In [19]:
# type your answer here
radius = 1000
LIMIT = 100

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
#url


Send the GET request and examine the resutls

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d106c6ff129b500259fd753'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b770677f964a52096752ee3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/thai_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d149941735',
         'name': 'Thai Restaurant',
         'pluralName': 'Thai Restaurants',
         'primary': True,
         'shortName': 'Thai'}],
       'id': '4b770677f964a52096752ee3',
       'location': {'address': 'Waterfront Studios, 1 Dock Rd',
        'cc': 'GB',
        'city': 'Silvertown',
        'country': 'United Kingdom',
        'distance': 340,
        'formattedAddress': ['Waterfront Studios, 1 Dock Rd',
         'Silvertown',
         'Greater London',
         'E16 1AH',
         'United Kingdom

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [23]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Nakhon Thai Restaurant,Thai Restaurant,51.506144,0.016371
1,Trinity Buoy Wharf,Pier,51.508088,0.008368
2,Ibis Hotel,Hotel,51.514561,0.009151
3,Sunborn Yacht Hotel London,Hotel,51.507236,0.024166
4,The Lighthouse,Lighthouse,51.507711,0.008203
5,Oiler bar,Bar,51.506463,0.017404
6,The Crystal,Science Museum,51.50721,0.016777
7,Zero Sette,Italian Restaurant,51.508611,0.02528
8,Costa Pronto,Coffee Shop,51.514415,0.008015
9,Crowne Plaza London Docklands,Hotel,51.508268,0.02032


And how many venues were returned by Foursquare?

In [24]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

85 venues were returned by Foursquare.


<a id='item2'></a>

## 3. Explore Neighborhoods in Newham

#### Let's create a function to repeat the same process to all the neighborhoods in Newham

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *newham_venues*.

In [26]:
# type your answer here

newham_venues = getNearbyVenues(names=df_newham['Location'],
                                   latitudes=df_newham['Latitude'],
                                   longitudes=df_newham['Longitude']
                                  )



Beckton
Beckton
Beckton
Canning Town
Custom House
East Ham
Forest Gate
Little Ilford
Manor Park
Maryland
North Woolwich
Plaistow
Silvertown
Stratford
Upton Park
Upton Park
West Ham
West Ham


#### Let's check the size of the resulting dataframe

In [27]:
print(newham_venues.shape)
newham_venues.head()

(361, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Beckton,51.53292,0.05461,McDonald's,51.53404,0.053628,Fast Food Restaurant
1,Beckton,51.53292,0.05461,The Miller's Well (Wetherspoon),51.533406,0.056379,Pub
2,Beckton,51.53292,0.05461,Central Park,51.528808,0.052901,Park
3,Beckton,51.53292,0.05461,Costa Coffee,51.534517,0.053365,Coffee Shop
4,Beckton,51.53292,0.05461,Primark,51.535303,0.052308,Clothing Store


Let's check how many venues were returned for each neighborhood

In [28]:
newham_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Beckton,42,42,42,42,42,42
Canning Town,20,20,20,20,20,20
Custom House,20,20,20,20,20,20
East Ham,17,17,17,17,17,17
Forest Gate,19,19,19,19,19,19
Little Ilford,6,6,6,6,6,6
Manor Park,6,6,6,6,6,6
Maryland,53,53,53,53,53,53
North Woolwich,20,20,20,20,20,20
Plaistow,5,5,5,5,5,5


#### Let's find out how many unique categories can be curated from all the returned venues

In [29]:
print('There are {} uniques categories.'.format(len(newham_venues['Venue Category'].unique())))

There are 61 uniques categories.


<a id='item3'></a>

## 4. Analyze Each Neighborhood

In [30]:
# one hot encoding
newham_onehot = pd.get_dummies(newham_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newham_onehot['Neighborhood'] = newham_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [newham_onehot.columns[-1]] + list(newham_onehot.columns[:-1])
newham_onehot = newham_onehot[fixed_columns]

newham_onehot.head()

Unnamed: 0,Neighborhood,Athletics & Sports,Bakery,Bar,Beach,Bookstore,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Diner,Discount Store,Doner Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Fried Chicken Joint,Furniture / Home Store,Gas Station,General Entertainment,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hotel,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Italian Restaurant,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Moroccan Restaurant,Movie Theater,Moving Target,Park,Pharmacy,Pier,Pizza Place,Platform,Portuguese Restaurant,Pub,Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Shopping Mall,Sporting Goods Shop,Steakhouse,Street Food Gathering,Supermarket,Thai Restaurant,Train Station,Turkish Restaurant,Warehouse Store
0,Beckton,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Beckton,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Beckton,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Beckton,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Beckton,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [31]:
newham_onehot.shape

(361, 62)

Let's find the Indian restaurant within Newham

In [32]:
#Check Indian restaurant in Newham
newham_onehot.loc[newham_onehot['Indian Restaurant'] != 0]

Unnamed: 0,Neighborhood,Athletics & Sports,Bakery,Bar,Beach,Bookstore,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Diner,Discount Store,Doner Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Fried Chicken Joint,Furniture / Home Store,Gas Station,General Entertainment,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hotel,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Italian Restaurant,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Moroccan Restaurant,Movie Theater,Moving Target,Park,Pharmacy,Pier,Pizza Place,Platform,Portuguese Restaurant,Pub,Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Shopping Mall,Sporting Goods Shop,Steakhouse,Street Food Gathering,Supermarket,Thai Restaurant,Train Station,Turkish Restaurant,Warehouse Store
117,Forest Gate,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
171,Maryland,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
269,Stratford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
349,West Ham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [33]:
newham_grouped = newham_onehot.groupby('Neighborhood').mean().reset_index()
newham_grouped

Unnamed: 0,Neighborhood,Athletics & Sports,Bakery,Bar,Beach,Bookstore,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Diner,Discount Store,Doner Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Fried Chicken Joint,Furniture / Home Store,Gas Station,General Entertainment,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hotel,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Italian Restaurant,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Moroccan Restaurant,Movie Theater,Moving Target,Park,Pharmacy,Pier,Pizza Place,Platform,Portuguese Restaurant,Pub,Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Shopping Mall,Sporting Goods Shop,Steakhouse,Street Food Gathering,Supermarket,Thai Restaurant,Train Station,Turkish Restaurant,Warehouse Store
0,Beckton,0.02381,0.02381,0.02381,0.02381,0.0,0.047619,0.02381,0.02381,0.047619,0.02381,0.0,0.047619,0.047619,0.0,0.02381,0.02381,0.0,0.02381,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.02381,0.02381,0.071429,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.02381,0.02381,0.02381,0.02381,0.02381,0.02381,0.0,0.02381,0.02381,0.0,0.02381,0.02381
1,Canning Town,0.05,0.0,0.05,0.05,0.0,0.05,0.0,0.05,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.15,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0
2,Custom House,0.05,0.0,0.05,0.05,0.0,0.05,0.0,0.05,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.15,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0
3,East Ham,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.117647,0.058824,0.0,0.0,0.058824,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824
4,Forest Gate,0.0,0.052632,0.0,0.0,0.0,0.0,0.105263,0.0,0.052632,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.105263,0.052632,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0
5,Little Ilford,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0
6,Manor Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0
7,Maryland,0.0,0.018868,0.018868,0.0,0.037736,0.018868,0.018868,0.0,0.037736,0.0,0.0,0.0,0.056604,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.037736,0.0,0.0,0.0,0.0,0.075472,0.018868,0.018868,0.018868,0.018868,0.0,0.0,0.018868,0.0,0.018868,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.018868,0.09434,0.018868,0.113208,0.0,0.056604,0.0,0.0,0.018868,0.0,0.0,0.018868,0.056604,0.0,0.018868,0.0,0.018868
8,North Woolwich,0.05,0.0,0.05,0.05,0.0,0.05,0.0,0.05,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.15,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0
9,Plaistow,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [34]:
newham_grouped.shape

(14, 62)

#### Let's print each neighborhood along with the top 10 most common venues

In [35]:
num_top_venues = 10

for hood in newham_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = newham_grouped[newham_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Beckton----
                  venue  freq
0                 Hotel  0.07
1           Coffee Shop  0.05
2                  Café  0.05
3  Fast Food Restaurant  0.05
4        Clothing Store  0.05
5         Grocery Store  0.05
6           Bus Station  0.05
7            Steakhouse  0.02
8                  Pier  0.02
9              Gym Pool  0.02


----Canning Town----
                       venue  freq
0                      Hotel  0.15
1         Athletics & Sports  0.05
2                Coffee Shop  0.05
3         Italian Restaurant  0.05
4                       Pier  0.05
5       Gym / Fitness Center  0.05
6              Grocery Store  0.05
7  Middle Eastern Restaurant  0.05
8             Science Museum  0.05
9                      Diner  0.05


----Custom House----
                       venue  freq
0                      Hotel  0.15
1         Athletics & Sports  0.05
2                Coffee Shop  0.05
3         Italian Restaurant  0.05
4                       Pier  0.05
5       Gym /

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = newham_grouped['Neighborhood']

for ind in np.arange(newham_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newham_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beckton,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
1,Canning Town,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car
2,Custom House,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car
3,East Ham,Clothing Store,Warehouse Store,Sandwich Place,Grocery Store,Turkish Restaurant,Fast Food Restaurant,Electronics Store,Discount Store,Park,Pub
4,Forest Gate,Grocery Store,Fast Food Restaurant,Bus Stop,Pub,Comfort Food Restaurant,Chinese Restaurant,Indian Restaurant,Market,Fish & Chips Shop,Moving Target


<a id='item4'></a>

## 5. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [38]:
# set number of clusters
kclusters = 5

newham_grouped_clustering = newham_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(newham_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 4, 4, 0, 0, 2, 2, 3, 4, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

newham_merged = df_newham

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
newham_merged = newham_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Location')

newham_merged.head() # check the last columns!

Unnamed: 0,Location,Borough,Post_town,PostalCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beckton,Newham,"LONDON, BARKING",E6,51.53292,0.05461,0,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
1,Beckton,Newham,"LONDON, BARKING",E16,51.50913,0.01528,0,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
2,Beckton,Newham,"LONDON, BARKING",IG11,51.532674,0.085256,0,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
3,Canning Town,Newham,LONDON,E16,51.50913,0.01528,4,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car
4,Custom House,Newham,LONDON,E16,51.50913,0.01528,4,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car


Finally, let's visualize the resulting clusters

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(newham_merged['Latitude'], newham_merged['Longitude'], newham_merged['Location'], newham_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## 6. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

#### Cluster 1

In [41]:
newham_merged.loc[newham_merged['Cluster Labels'] == 0, newham_merged.columns[[0] + list(range(7, newham_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beckton,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
1,Beckton,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
2,Beckton,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
5,East Ham,Clothing Store,Warehouse Store,Sandwich Place,Grocery Store,Turkish Restaurant,Fast Food Restaurant,Electronics Store,Discount Store,Park,Pub
6,Forest Gate,Grocery Store,Fast Food Restaurant,Bus Stop,Pub,Comfort Food Restaurant,Chinese Restaurant,Indian Restaurant,Market,Fish & Chips Shop,Moving Target
14,Upton Park,Café,Grocery Store,Bus Station,Clothing Store,Warehouse Store,Gym,Turkish Restaurant,Fast Food Restaurant,Electronics Store,Discount Store
15,Upton Park,Café,Grocery Store,Bus Station,Clothing Store,Warehouse Store,Gym,Turkish Restaurant,Fast Food Restaurant,Electronics Store,Discount Store


#### Cluster 2

In [42]:
newham_merged.loc[newham_merged['Cluster Labels'] == 1, newham_merged.columns[[0] + list(range(7, newham_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Plaistow,Bus Station,Gym,Grocery Store,Café,Warehouse Store,Discount Store,Gym Pool,Gym / Fitness Center,General Entertainment,Gas Station


#### Cluster 3

In [43]:
newham_merged.loc[newham_merged['Cluster Labels'] == 2, newham_merged.columns[[0] + list(range(7, newham_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Little Ilford,Turkish Restaurant,Gym / Fitness Center,Pub,Gas Station,Restaurant,Fried Chicken Joint,Warehouse Store,Fast Food Restaurant,Doner Restaurant,Electronics Store
8,Manor Park,Turkish Restaurant,Gym / Fitness Center,Pub,Gas Station,Restaurant,Fried Chicken Joint,Warehouse Store,Fast Food Restaurant,Doner Restaurant,Electronics Store


#### Cluster 4

In [44]:
newham_merged.loc[newham_merged['Cluster Labels'] == 3, newham_merged.columns[[0] + list(range(7, newham_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Maryland,Pub,Platform,Hotel,Sandwich Place,Supermarket,Coffee Shop,Bookstore,Café,General Entertainment,Indoor Play Area
13,Stratford,Pub,Platform,Hotel,Sandwich Place,Supermarket,Coffee Shop,Bookstore,Café,General Entertainment,Indoor Play Area
16,West Ham,Pub,Platform,Hotel,Sandwich Place,Supermarket,Bus Station,Café,Coffee Shop,Bookstore,General Entertainment
17,West Ham,Pub,Platform,Hotel,Sandwich Place,Supermarket,Bus Station,Café,Coffee Shop,Bookstore,General Entertainment


#### Cluster 5

In [45]:
newham_merged.loc[newham_merged['Cluster Labels'] == 4, newham_merged.columns[[0] + list(range(7, newham_merged.shape[1]))]]

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Canning Town,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car
4,Custom House,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car
10,North Woolwich,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car
12,Silvertown,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car


### 7. Finally, Let us compare two location within Newham and choose the best neighborhood

Set the Index for the dataset before taking inputs

In [46]:
df_newham = newham_merged.set_index("Location",drop=True)
df_newham.head()

Unnamed: 0_level_0,Borough,Post_town,PostalCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Beckton,Newham,"LONDON, BARKING",E6,51.53292,0.05461,0,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
Beckton,Newham,"LONDON, BARKING",E16,51.50913,0.01528,0,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
Beckton,Newham,"LONDON, BARKING",IG11,51.532674,0.085256,0,Hotel,Coffee Shop,Bus Station,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Discount Store,Turkish Restaurant,Gym Pool
Canning Town,Newham,LONDON,E16,51.50913,0.01528,4,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car
Custom House,Newham,LONDON,E16,51.50913,0.01528,4,Hotel,Athletics & Sports,Scenic Lookout,Grocery Store,Italian Restaurant,Middle Eastern Restaurant,Diner,Coffee Shop,Café,Cable Car


**In the above finding, we have noticed "Forest Gate" and "Maryland" were among the boroughs having "Indian Restaurants"
Let's us compare this neighborhood**

In [47]:
N1=input("Enter Neighborhood1 to compare: ")

Enter Neighborhood1 to compare: Forest Gate


In [48]:
N2=input("Enter Neighborhood2 to compare: ")

Enter Neighborhood2 to compare: Maryland


In [49]:
Neigh_comparison=df_newham.loc[[N1,N2]].T
Neigh_comparison

Location,Forest Gate,Maryland
Borough,Newham,Newham
Post_town,LONDON,LONDON
PostalCode,E7,E15
Latitude,51.5467,51.54
Longitude,0.02558,0.00289
Cluster Labels,0,3
1st Most Common Venue,Grocery Store,Pub
2nd Most Common Venue,Fast Food Restaurant,Platform
3rd Most Common Venue,Bus Stop,Hotel
4th Most Common Venue,Pub,Sandwich Place


### Conclusion

I have shown an example above to show the comparison, this result will give a comparision to choose the most appropriate place for an individuals.

From the above results, we could choose the most appropriate neighborhood we would like to live nearby.
