# Capstone Project - The Battle of the Neighborhoods (Analysis)
### Applied Data Science Capstone by IBM/Coursera

#### Welcome back to the fourth part of Capstone Project. Here, we will analyse the data and calculate the results to meet the customer requirements

 ## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data setup](#data)
* [London Map with area clusters](#map1)
* [Foursquare API setup](#foursq)
* [Generate venue list](#venuelist)
* [Generate common venues](#commonvenues)
* [Set customer parameters](#params)
* [Top areas](#topareas)
* [Hotel Recommendation](#hotels)
* [Footer Note](#footer)

# A Recommender System for Travel Consultants

### Introduction <a name="introduction"></a>

Our clients "Hoteliers Everywhere Inc." require us to generate a recommender system for cusomters staying in areas of London.
As discussed, we will be using the https://en.wikipedia.org/wiki/List_of_areas_of_London wikipedia page to fetch a list of areas in the city. 
We will use the Foursquare API to generate a list of common venues for each area.
We will then use the generated data to calculate the best optinum area and the list of Hotels within those areas recommended to the customers.
<br><br><hr><br>

### Data Setup <a name="data"></a>

###### Import required Libraries

In [1]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium 
from pandas.io.json import json_normalize

website_url = requests.get('https://en.wikipedia.org/wiki/List_of_areas_of_London').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find("table", { "class" : "wikitable sortable" })

In [2]:
table = soup.find("table", { "class" : "wikitable sortable" })

In [3]:
Location=[]
Borough=[]
Town=[]
Postcode=[]
mycounter = 0


for row in table.findAll("tr"):
    cells = row.findAll("td")
    #For each "tr", assign each "td" to a variable.
    if len(cells) == 6:
        Location.append(cells[0].find(text=True))
        Borough.append(cells[1].find(text=True))
        Town.append(cells[2].find(text=True))
        Postcode.append(cells[3].find(text=True).replace("\n", ""))
     

In [4]:
df=pd.DataFrame(Location,columns=['Location'])
df['Borough']=Borough
df['Town']=Town
df['Postcode']=Postcode
df.head(3)


Unnamed: 0,Location,Borough,Town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Addington,Croydon,CROYDON,CR0


In [5]:
df.dtypes

Location    object
Borough     object
Town        object
Postcode    object
dtype: object

In [6]:
df['Latitude'] = ''
df['Longitude'] = ''

In [8]:
#counter = 0
for i in df.index:
    #if counter <= 100:
    if df.at[i, 'Town'] == "Not assigned":
        #print("Replaced for ", df.at[i, 'Neighbourhood'],  "with", df.at[i, 'Borough'], "at index ", i )
        df.at[i, 'Town'] = df.at[i, 'Borough']
    address = df.at[i, 'Location'] + ', ' + df.at[i, 'Town']
    try:
        geolocator = Nominatim(user_agent="df_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        df.at[i, 'Latitude'] = latitude
        df.at[i, 'Longitude'] = longitude
    except:
        continue

#    counter = counter + 1    

    
    

In [9]:
df.shape
#df.head(5)

(533, 6)

In [11]:
# Let's take it over to a new dataframe. Preserve the original fetch if required to reanalyse 
neighborhoods = df

In [12]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 60 boroughs and 533 neighborhoods.


In [14]:
#Let's test latitude and longitude results for a single area

address = 'Bedford Park, LONDON'

geolocator = Nominatim(user_agent="df_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of ', address, ' are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of  Bedford Park, LONDON  are 51.49805155, -0.255735576874332.


In [16]:
#Cleanup the dataframe and then check type
neighborhoods["Borough"].replace("", np.nan, inplace=True)
neighborhoods["Latitude"].replace("", np.nan, inplace=True)
neighborhoods["Longitude"].replace("", np.nan, inplace=True)
neighborhoods.dropna(subset=["Latitude"], axis=0, inplace=True)
neighborhoods.dropna(subset=["Longitude"], axis=0, inplace=True)
neighborhoods.dropna(subset=["Borough"], axis=0, inplace=True)
neighborhoods.dropna(subset=["Town"], axis=0, inplace=True)
# reset index, because we droped rows where Borough are Not assigned
neighborhoods.reset_index(drop=True, inplace=True)


neighborhoods['Latitude'] = neighborhoods['Latitude'].astype(float)
neighborhoods['Longitude'] = neighborhoods['Longitude'].astype(float)

neighborhoods.dtypes

Location      object
Borough       object
Town          object
Postcode      object
Latitude     float64
Longitude    float64
dtype: object

In [None]:
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Location']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

![alt text](Capstone_Images_02_Analysis_01.JPG "InitialData")

<br><br><hr><br>
#### Good so far, now let's setup the Foursquare API <a name="foursq"></a>

In [18]:
CLIENT_ID = 'NGSXGLNDW5OZHJIE04Y5FBO3ADLN5ARR5JVSEKT3A5IG4BUV' # your Foursquare ID
CLIENT_SECRET = 'J4UE3C3UTQFV30F1BTG5A34WV3JQSCB2IL1K3II1THVOOF0W' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [19]:
neighborhoods.loc[0, 'Location']

'Abbey Wood'

In [20]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

                                           
neighborhood_name = neighborhoods.loc[0, 'Location'] # neighborhood name
neighborhood_name

'Abbey Wood'

In [21]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=NGSXGLNDW5OZHJIE04Y5FBO3ADLN5ARR5JVSEKT3A5IG4BUV&client_secret=J4UE3C3UTQFV30F1BTG5A34WV3JQSCB2IL1K3II1THVOOF0W&v=20180605&ll=51.487621,0.1140504&radius=500&limit=100'

In [22]:
results = requests.get(url).json()

In [23]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Co-op Food,Grocery Store,51.48765,0.11349
1,Bostal Gardens,Playground,51.48667,0.110462
2,Cheers Off License,Grocery Store,51.486808,0.107396
3,Abbey Wood Caravan Club,Campground,51.485502,0.120014


<br><br><hr><br>
### Explore Neighborhoods in London

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Location Latitude', 
                  'Location Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Running the above function on each neighborhood and create a new dataframe called *london_venues*. 
<a name="introduction"></a>

In [27]:
london_venues = getNearbyVenues(names=neighborhoods['Location'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Abbey Wood
Acton
Addington
Addiscombe
Albany Park
Aldgate
Aldwych
Alperton
Anerley
Angel
Aperfield
Archway
Ardleigh Green
Arkley
Arnos Grove
Balham
Bankside
Barbican
Barking
Barkingside
Barnehurst
Barnes
Barnet Gate
Barnet
Barnsbury
Battersea
Bayswater
Beckenham
Beckton
Becontree
Becontree Heath
Beddington
Bedford Park
Belgravia
Bellingham
Belmont
Belmont
Belsize Park
Belvedere
Bermondsey
Berrylands
Bethnal Green
Bexley
Bexleyheath
Bickley
Biggin Hill
Blackfen
Blackfriars
Blackheath
Blackheath Royal Standard
Blackwall
Blendon
Bloomsbury
Botany Bay
Bounds Green
Bow
Bowes Park
Brentford
Brent Cross
Brent Park
Brimsdown
Brixton
Brockley
Bromley
Bromley
Bromley Common
Brompton
Brondesbury
Brunswick Park
Bulls Cross
Burnt Oak
Burroughs, The
Camberwell
Cambridge Heath
Camden Town
Canary Wharf
Cann Hall
Canning Town
Canonbury
Carshalton
Castelnau
Castle Green
Catford
Chadwell Heath
Chalk Farm
Charing Cross
Charlton
Chase Cross
Cheam
Chelsea
Chelsfield
Chessington
Childs Hill
Chinatown
Chinbro

In [28]:
#print(london_venues.shape)
london_venues.head(20)
#london_venues

Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abbey Wood,51.487621,0.11405,Co-op Food,51.48765,0.11349,Grocery Store
1,Abbey Wood,51.487621,0.11405,Bostal Gardens,51.48667,0.110462,Playground
2,Abbey Wood,51.487621,0.11405,Cheers Off License,51.486808,0.107396,Grocery Store
3,Abbey Wood,51.487621,0.11405,Abbey Wood Caravan Club,51.485502,0.120014,Campground
4,Acton,51.50814,-0.273261,London Star Hotel,51.509624,-0.272456,Hotel
5,Acton,51.50814,-0.273261,The Aeronaut,51.508376,-0.275216,Pub
6,Acton,51.50814,-0.273261,Dragonfly Brewery at George & Dragon,51.507378,-0.271702,Brewery
7,Acton,51.50814,-0.273261,Bake Me,51.508452,-0.268543,Creperie
8,Acton,51.50814,-0.273261,Everyone Active,51.506608,-0.266878,Gym / Fitness Center
9,Acton,51.50814,-0.273261,Amigo's Peri Peri,51.508396,-0.274561,Fast Food Restaurant


### Check venues  returned for each neighborhood

In [193]:
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 428 uniques categories.


<br><br><hr><br>
## <font color="black"> Analyse Each Neighborhood</font>

In [31]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Location'] = london_venues['Location'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Location,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Windmill,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit
0,Abbey Wood,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Abbey Wood,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Abbey Wood,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Abbey Wood,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
london_onehot.shape

(12713, 429)

In [33]:
london_grouped = london_onehot.groupby('Location').mean().reset_index()


In [34]:
london_grouped.shape

(502, 429)

#### Let's print each neighborhood along with the top 3 most common venues <a name="commonvenues"></a>

In [40]:
# This section takes a lot of space. Displaying only 10 venues

num_top_venues = 5
DisplayRange = 0

for hood in london_grouped['Location']:
    if DisplayRange < 10:
        print("----"+hood+"----")
        temp = london_grouped[london_grouped['Location'] == hood].T.reset_index()
        temp.columns = ['venue','freq']
        temp = temp.iloc[1:]
        temp['freq'] = temp['freq'].astype(float)
        temp = temp.round({'freq': 2})
        print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
        DisplayRange += 1
        print('\n')

----Abbey Wood----
                venue  freq
0       Grocery Store  0.50
1          Campground  0.25
2          Playground  0.25
3   Accessories Store  0.00
4  Persian Restaurant  0.00


----Acton----
                  venue  freq
0  Gym / Fitness Center  0.17
1                   Pub  0.17
2  Fast Food Restaurant  0.09
3   Japanese Restaurant  0.04
4               Brewery  0.04


----Addiscombe----
                  venue  freq
0                  Park  0.27
1         Grocery Store  0.18
2                Bakery  0.09
3    Chinese Restaurant  0.09
4  Fast Food Restaurant  0.09


----Albany Park----
                  venue  freq
0    Italian Restaurant  0.14
1                   Pub  0.14
2                   Bar  0.07
3                  Café  0.07
4  Fast Food Restaurant  0.07


----Aldgate----
               venue  freq
0              Hotel  0.11
1        Coffee Shop  0.11
2  Indian Restaurant  0.06
3                Pub  0.05
4       Cocktail Bar  0.04


----Aldwych----
                

In [41]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Location']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Location'] = london_grouped['Location']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Grocery Store,Playground,Campground,Zoo Exhibit,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
1,Acton,Gym / Fitness Center,Pub,Fast Food Restaurant,Creperie,Hotel,Grocery Store,Thai Restaurant,Sandwich Place,Shopping Mall,Cocktail Bar
2,Addiscombe,Park,Grocery Store,Café,Pub,Cosmetics Shop,Chinese Restaurant,Fast Food Restaurant,Bakery,Fish & Chips Shop,Exhibit
3,Albany Park,Italian Restaurant,Pub,Café,Beer Bar,Train Station,Grocery Store,Fast Food Restaurant,Coffee Shop,Bar,Hotel
4,Aldgate,Hotel,Coffee Shop,Indian Restaurant,Pub,Cocktail Bar,Middle Eastern Restaurant,Salad Place,Thai Restaurant,Gym / Fitness Center,Japanese Restaurant


In [43]:
from sklearn.cluster import KMeans

In [44]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 2, 0, 0, 0, 0, 1, 0, 0])

In [45]:
london_data = neighborhoods[neighborhoods['Borough'] != 'Dummy'].reset_index(drop=True) #Get all list
london_data.head()

Unnamed: 0,Location,Borough,Town,Postcode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,51.487621,0.11405
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.50814,-0.273261
2,Addington,Croydon,CROYDON,CR0,44.42064,-76.978248
3,Addiscombe,Croydon,CROYDON,CR0,51.379692,-0.074282
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",51.434017,0.1032


In [46]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = london_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neighborhoods_venues_sorted.set_index('Location'), on='Location')

london_merged.head() # check the last columns!

Unnamed: 0,Location,Borough,Town,Postcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,51.487621,0.11405,3.0,Grocery Store,Playground,Campground,Zoo Exhibit,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.50814,-0.273261,0.0,Gym / Fitness Center,Pub,Fast Food Restaurant,Creperie,Hotel,Grocery Store,Thai Restaurant,Sandwich Place,Shopping Mall,Cocktail Bar
2,Addington,Croydon,CROYDON,CR0,44.42064,-76.978248,,,,,,,,,,,
3,Addiscombe,Croydon,CROYDON,CR0,51.379692,-0.074282,2.0,Park,Grocery Store,Café,Pub,Cosmetics Shop,Chinese Restaurant,Fast Food Restaurant,Bakery,Fish & Chips Shop,Exhibit
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",51.434017,0.1032,0.0,Italian Restaurant,Pub,Café,Beer Bar,Train Station,Grocery Store,Fast Food Restaurant,Coffee Shop,Bar,Hotel


In [47]:
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = london_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neighborhoods_venues_sorted.set_index('Location'), on='Location')

london_merged.head() # check the last columns!

Unnamed: 0,Location,Borough,Town,Postcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,51.487621,0.11405,3.0,Grocery Store,Playground,Campground,Zoo Exhibit,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.50814,-0.273261,0.0,Gym / Fitness Center,Pub,Fast Food Restaurant,Creperie,Hotel,Grocery Store,Thai Restaurant,Sandwich Place,Shopping Mall,Cocktail Bar
2,Addington,Croydon,CROYDON,CR0,44.42064,-76.978248,,,,,,,,,,,
3,Addiscombe,Croydon,CROYDON,CR0,51.379692,-0.074282,2.0,Park,Grocery Store,Café,Pub,Cosmetics Shop,Chinese Restaurant,Fast Food Restaurant,Bakery,Fish & Chips Shop,Exhibit
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",51.434017,0.1032,0.0,Italian Restaurant,Pub,Café,Beer Bar,Train Station,Grocery Store,Fast Food Restaurant,Coffee Shop,Bar,Hotel


In [48]:
import matplotlib.cm as cm
import matplotlib.colors as colors


In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Location'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[cluster-1],
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

![alt text](Capstone_Images_02_Analysis_02.JPG "Cluster")

<br><br><hr><br>

##### <font color="blue"> Great, we now have the data setup complete. Everything from here is re-runnable to regenerate different results after changing parameters.</font> 

#### <font color="black"> We will first create a list of parameters that can be used here.</font> 

In [222]:
neighborhoods_venues_sorted.columns = ['ClusterLabels','Location','First','Second','Third','Fourth', 'Fifth', 'Sixth', 'Seventh', 'Eighth', 'Ninth', 'Tenth']
my_list = london_venues["Venue Category"].values
uniqueVals = np.unique(my_list)
uniqueVals

array(['Accessories Store', 'Adult Boutique', 'Afghan Restaurant',
       'African Restaurant', 'Airport Lounge', 'Airport Terminal',
       'American Restaurant', 'Antique Shop', 'Aquarium', 'Arcade',
       'Arepa Restaurant', 'Argentinian Restaurant', 'Art Gallery',
       'Art Museum', 'Arts & Crafts Store', 'Arts & Entertainment',
       'Asian Restaurant', 'Athletics & Sports', 'Australian Restaurant',
       'Austrian Restaurant', 'Auto Dealership', 'Auto Garage',
       'Auto Workshop', 'Automotive Shop', 'BBQ Joint', 'Baby Store',
       'Badminton Court', 'Bagel Shop', 'Bakery', 'Bar', 'Baseball Field',
       'Bathing Area', 'Beach', 'Bed & Breakfast', 'Beer Bar',
       'Beer Garden', 'Beer Store', 'Belgian Restaurant',
       'Bike Rental / Bike Share', 'Bike Shop', 'Bistro',
       'Boarding House', 'Boat or Ferry', 'Bookstore', 'Border Crossing',
       'Botanical Garden', 'Boutique', 'Bowling Alley', 'Boxing Gym',
       'Brazilian Restaurant', 'Breakfast Spot', 'Brewer

#### <font color="black"> Here's a list of 5 parameters used. </font>  <a name="params"></a>

In [251]:
ClientParameter1 = 'Italian Restaurant'
ClientParameter2 = 'Karaoke Bar'
ClientParameter3 = 'Coffee Shop'
ClientParameter4 = 'Shopping Mall'
ClientParameter5 = 'Bakery'


###### Collect results by venue for each client parameter.

In [252]:

df_collect = neighborhoods_venues_sorted.loc[(neighborhoods_venues_sorted['First'] == 'DummyEntry')]
                                             
df_mysorted1 = neighborhoods_venues_sorted.loc[(neighborhoods_venues_sorted['First'] == ClientParameter1) 
                                              | (neighborhoods_venues_sorted['Second'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Third'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Fourth'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Fifth'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Sixth'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Seventh'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Eighth'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Ninth'] == ClientParameter1)
                                              | (neighborhoods_venues_sorted['Tenth'] == ClientParameter1)]

df_mysorted2 = neighborhoods_venues_sorted.loc[(neighborhoods_venues_sorted['First'] == ClientParameter2) 
                                              | (neighborhoods_venues_sorted['Second'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Third'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Fourth'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Fifth'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Sixth'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Seventh'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Eighth'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Ninth'] == ClientParameter2)
                                              | (neighborhoods_venues_sorted['Tenth'] == ClientParameter2)]

df_mysorted3 = neighborhoods_venues_sorted.loc[(neighborhoods_venues_sorted['First'] == ClientParameter3) 
                                              | (neighborhoods_venues_sorted['Second'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Third'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Fourth'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Fifth'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Sixth'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Seventh'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Eighth'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Ninth'] == ClientParameter3)
                                              | (neighborhoods_venues_sorted['Tenth'] == ClientParameter3)]

df_mysorted4 = neighborhoods_venues_sorted.loc[(neighborhoods_venues_sorted['First'] == ClientParameter4) 
                                              | (neighborhoods_venues_sorted['Second'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Third'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Fourth'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Fifth'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Sixth'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Seventh'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Eighth'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Ninth'] == ClientParameter4)
                                              | (neighborhoods_venues_sorted['Tenth'] == ClientParameter4)]

df_mysorted5 = neighborhoods_venues_sorted.loc[(neighborhoods_venues_sorted['First'] == ClientParameter5) 
                                              | (neighborhoods_venues_sorted['Second'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Third'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Fourth'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Fifth'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Sixth'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Seventh'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Eighth'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Ninth'] == ClientParameter5)
                                              | (neighborhoods_venues_sorted['Tenth'] == ClientParameter5)]


df_collect = pd.concat([df_mysorted1, df_mysorted2, df_mysorted3, df_mysorted4, df_mysorted5])



###### Add a new column to the df_collect dataframe and add values to it. 
###### 1st Most Common Venue will add 10 to the Location's score, 2nd Most Common Venue will add 9 to the score and so on

In [253]:

df_collect['Score'] = 0

df_collect['Score'] = np.where(df_collect['First'] == ClientParameter1, df_collect['Score'] + 10, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Second'] == ClientParameter1, df_collect['Score'] + 9, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Third'] == ClientParameter1, df_collect['Score'] + 8, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fourth'] == ClientParameter1, df_collect['Score'] + 7, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fifth'] == ClientParameter1, df_collect['Score'] + 6, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Sixth'] == ClientParameter1, df_collect['Score'] + 5, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Seventh'] == ClientParameter1, df_collect['Score'] + 4, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Eighth'] == ClientParameter1, df_collect['Score'] + 3, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Ninth'] == ClientParameter1, df_collect['Score'] + 2, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Tenth'] == ClientParameter1, df_collect['Score'] + 1, df_collect['Score'])

df_collect['Score'] = np.where(df_collect['First'] == ClientParameter2, df_collect['Score'] + 10, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Second'] == ClientParameter2, df_collect['Score'] + 9, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Third'] == ClientParameter2, df_collect['Score'] + 8, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fourth'] == ClientParameter2, df_collect['Score'] + 7, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fifth'] == ClientParameter2, df_collect['Score'] + 6, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Sixth'] == ClientParameter2, df_collect['Score'] + 5, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Seventh'] == ClientParameter2, df_collect['Score'] + 4, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Eighth'] == ClientParameter2, df_collect['Score'] + 3, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Ninth'] == ClientParameter2, df_collect['Score'] + 2, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Tenth'] == ClientParameter2, df_collect['Score'] + 1, df_collect['Score'])

df_collect['Score'] = np.where(df_collect['First'] == ClientParameter3, df_collect['Score'] + 10, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Second'] == ClientParameter3, df_collect['Score'] + 9, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Third'] == ClientParameter3, df_collect['Score'] + 8, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fourth'] == ClientParameter3, df_collect['Score'] + 7, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fifth'] == ClientParameter3, df_collect['Score'] + 6, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Sixth'] == ClientParameter3, df_collect['Score'] + 5, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Seventh'] == ClientParameter3, df_collect['Score'] + 4, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Eighth'] == ClientParameter3, df_collect['Score'] + 3, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Ninth'] == ClientParameter3, df_collect['Score'] + 2, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Tenth'] == ClientParameter3, df_collect['Score'] + 1, df_collect['Score'])

df_collect['Score'] = np.where(df_collect['First'] == ClientParameter4, df_collect['Score'] + 10, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Second'] == ClientParameter4, df_collect['Score'] + 9, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Third'] == ClientParameter4, df_collect['Score'] + 8, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fourth'] == ClientParameter4, df_collect['Score'] + 7, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fifth'] == ClientParameter4, df_collect['Score'] + 6, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Sixth'] == ClientParameter4, df_collect['Score'] + 5, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Seventh'] == ClientParameter4, df_collect['Score'] + 4, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Eighth'] == ClientParameter4, df_collect['Score'] + 3, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Ninth'] == ClientParameter4, df_collect['Score'] + 2, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Tenth'] == ClientParameter4, df_collect['Score'] + 1, df_collect['Score'])

df_collect['Score'] = np.where(df_collect['First'] == ClientParameter5, df_collect['Score'] + 10, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Second'] == ClientParameter5, df_collect['Score'] + 9, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Third'] == ClientParameter5, df_collect['Score'] + 8, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fourth'] == ClientParameter5, df_collect['Score'] + 7, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Fifth'] == ClientParameter5, df_collect['Score'] + 6, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Sixth'] == ClientParameter5, df_collect['Score'] + 5, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Seventh'] == ClientParameter5, df_collect['Score'] + 4, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Eighth'] == ClientParameter5, df_collect['Score'] + 3, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Ninth'] == ClientParameter5, df_collect['Score'] + 2, df_collect['Score'])
df_collect['Score'] = np.where(df_collect['Tenth'] == ClientParameter5, df_collect['Score'] + 1, df_collect['Score'])



###### Now that the results are collected, let's sort the results to descending values of Scores. 


In [254]:
df_final_sorted.sort_values('Score', ascending=False).groupby('Location').head()

Unnamed: 0,ClusterLabels,Location,First,Second,Third,Fourth,Fifth,Sixth,Seventh,Eighth,Ninth,Tenth,Score
443,0,Turnpike Lane,Coffee Shop,Fast Food Restaurant,Clothing Store,Bus Stop,Optical Shop,Bakery,Supermarket,Italian Restaurant,Pharmacy,Pub,27
26,0,Beckenham,Italian Restaurant,Coffee Shop,Indian Restaurant,Fast Food Restaurant,Mediterranean Restaurant,Café,Irish Pub,Pub,Supermarket,Grocery Store,26
454,0,Uxbridge,Coffee Shop,Italian Restaurant,Clothing Store,Fast Food Restaurant,Pub,Gym,Sandwich Place,Bookstore,Grocery Store,Pharmacy,26
92,0,Chiswick,Italian Restaurant,Pub,Coffee Shop,Bakery,Café,Bookstore,Park,Sushi Restaurant,Vietnamese Restaurant,Pizza Place,25
90,0,Chingford,Italian Restaurant,Pub,Coffee Shop,Bakery,Park,Convenience Store,Grocery Store,Indian Restaurant,Sandwich Place,Café,25
142,0,Edgware,Sushi Restaurant,Coffee Shop,Italian Restaurant,Supermarket,Bakery,Vegetarian / Vegan Restaurant,Portuguese Restaurant,Café,Fast Food Restaurant,Bookstore,25
300,0,New Malden,Korean Restaurant,Grocery Store,Fast Food Restaurant,Coffee Shop,Supermarket,Italian Restaurant,Bakery,Sandwich Place,Bookstore,Gastropub,24
161,0,Fitzrovia,Coffee Shop,Italian Restaurant,Café,Hotel,Restaurant,Bakery,French Restaurant,Pizza Place,Hotel Bar,Burger Joint,24
337,0,Petersham,Pub,Coffee Shop,Italian Restaurant,Bakery,Grocery Store,Café,Thai Restaurant,Theater,Sushi Restaurant,Sandwich Place,24
381,0,Soho,Theater,Italian Restaurant,Coffee Shop,Bakery,Ice Cream Shop,Cocktail Bar,Restaurant,Lounge,Seafood Restaurant,Japanese Restaurant,24


###### Remove duplicates from the results if any and get top 10 preferred areas

In [255]:
df_collect.drop_duplicates(subset ="Location", 
                     keep = "first", inplace = True) 
df_final_sorted = df_collect.sort_values(by=['Score'], ascending = False)
PreferredArea = df_final_sorted.iloc[0,1]
PreferredArea2 = df_final_sorted.iloc[1,1]
PreferredArea3 = df_final_sorted.iloc[2,1]
PreferredArea4 = df_final_sorted.iloc[3,1]
PreferredArea5 = df_final_sorted.iloc[4,1]
PreferredArea6 = df_final_sorted.iloc[5,1]
PreferredArea7 = df_final_sorted.iloc[6,1]
PreferredArea8 = df_final_sorted.iloc[7,1]
PreferredArea9 = df_final_sorted.iloc[8,1]
PreferredArea10 = df_final_sorted.iloc[9,1]



print("Preferred areas of interest (in order) are ", PreferredArea, ",", PreferredArea2, \
      ",", PreferredArea3, ",", PreferredArea4, ",", PreferredArea5, \
      ",", PreferredArea6, ",", PreferredArea7, ",", PreferredArea8, \
      ",", PreferredArea9, ",", PreferredArea10)


Preferred areas of interest (in order) are  Chiswick , Chingford , Soho , Fitzrovia , Petersham , Canary Wharf , Edgware , Parsons Green , Swiss Cottage , Fulham


In [256]:
%matplotlib inline 

import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') # optional: for ggplot-like style

# check for latest version of Matplotlib
print ('Matplotlib version: ', mpl.__version__) # >= 2.0.0

Matplotlib version:  3.0.2


<br><br><hr><br>
##### Let's visualise the top 10 results  <a name="topareas"></a>

In [None]:
df_final_sorted[0:10].plot(x='Location', y='Score', figsize=(15,7), grid=True, kind='bar')

![alt text](Capstone_Images_02_Analysis_03.JPG "TopAreas")

###### Super, so we have a list of 10 optimum areas for the customers to stay (based on the 5 parameters selected)
###### Now, let's get a list of Hotels in these areas <a name="hotels"></a>

In [263]:
df_hotels1 = london_venues.loc[(london_venues['Location'] == PreferredArea)  & (london_venues['Venue Category'] == 'Hotel')]
df_hotels2 = london_venues.loc[(london_venues['Location'] == PreferredArea2) & (london_venues['Venue Category'] == 'Hotel')]
df_hotels3 = london_venues.loc[(london_venues['Location'] == PreferredArea3) & (london_venues['Venue Category'] == 'Hotel')]
df_hotels4 = london_venues.loc[(london_venues['Location'] == PreferredArea4) & (london_venues['Venue Category'] == 'Hotel')]
df_hotels5 = london_venues.loc[(london_venues['Location'] == PreferredArea5) & (london_venues['Venue Category'] == 'Hotel')]
df_hotels6 = london_venues.loc[(london_venues['Location'] == PreferredArea6)  & (london_venues['Venue Category'] == 'Hotel')]
df_hotels7 = london_venues.loc[(london_venues['Location'] == PreferredArea7) & (london_venues['Venue Category'] == 'Hotel')]
df_hotels8 = london_venues.loc[(london_venues['Location'] == PreferredArea8) & (london_venues['Venue Category'] == 'Hotel')]
df_hotels9 = london_venues.loc[(london_venues['Location'] == PreferredArea9) & (london_venues['Venue Category'] == 'Hotel')]
df_hotels10 = london_venues.loc[(london_venues['Location'] == PreferredArea10) & (london_venues['Venue Category'] == 'Hotel')]

df_hotels_list = pd.concat([df_hotels1, df_hotels2, df_hotels3, df_hotels4, df_hotels5, df_hotels6, df_hotels7, \
                            df_hotels8, df_hotels9, df_hotels10])
df_hotels_list[['Location', 'Venue', 'Venue Category']]

Unnamed: 0,Location,Venue,Venue Category
9783,Soho,Dean Street Townhouse,Hotel
9795,Soho,Soho Hotel,Hotel
4473,Fitzrovia,The Langham,Hotel
4495,Fitzrovia,Sanderson Hotel,Hotel
4529,Fitzrovia,The London Edition,Hotel
4553,Fitzrovia,Charlotte Street Hotel,Hotel
8847,Petersham,Travelodge,Hotel
2052,Canary Wharf,Novotel London Canary Wharf,Hotel
2084,Canary Wharf,Hilton London Canary Wharf,Hotel
11094,Swiss Cottage,London Marriott Hotel Regents Park,Hotel


##### Let's plot the area for these Hotels

In [None]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)] * 10
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_hotels_list['Venue Latitude'], df_hotels_list['Venue Longitude'], df_hotels_list['Location'], ys):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[cluster-1],
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

![alt text](Capstone_Images_02_Analysis_04.JPG "TopHotels")

* [Back to customer parameters](#params)

<br><br><hr><br><br>
###### Footnote: This concludes the data analysis for this project. Thank you for reviewing my work <a name="footer"></a>