# IBM Applied Data Science Capstone Course by Coursera
## Week 5 Final Report
### Opening a New Shopping Mall in Liverpool, UK

<ul>
<li> Build a dataframe of neighborhoods in Liverpool by web scraping the data from Wikipedia page. </li>
<li> Get the geographical coordinates of the neighborhoods. </li>
<li> Obtain the venue data for the neighborhoods from Foursquare API. </li>
<li> Explore and cluster the neighborhoods. </li>
<li> Select the best cluster to open a new shopping mall. </li>

### 1. Import libraries

In [1]:

import pandas as pd # library for data analaysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


### 3. Scrap data from Wikipedia into Dataframe

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Areas_of_Liverpool").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame from the list
Lpool_df = pd.DataFrame({"Neighborhood": neighborhoodList})

Lpool_df.head()

Unnamed: 0,Neighborhood
0,Aigburth
1,"Allerton, Liverpool"
2,Anfield (suburb)
3,"Belle Vale, Liverpool"
4,Broadgreen


In [7]:
# print the number of rows of the dataframe
Lpool_df.shape

(41, 1)

### 3. Get the geographical coordinates

In [8]:
def get_latlng (neighborhood):
    lat_lang_coordinates = None
    while (lat_lang_coordinates is None):
        g = geocoder.arcgis('{}, Liverpool, United Kingdom'.format(neighborhood))
        lat_lang_coordinates = g.latlng
    return lat_lang_coordinates

In [9]:
coordinates = [ get_latlng(neighborhood) for neighborhood in Lpool_df["Neighborhood"].tolist() ]

In [10]:
coordinates

[[53.36806000000007, -2.9236399999999776],
 [53.388737202074076, -2.913576192936634],
 [53.430540000000065, -2.947469999999953],
 [53.38510325184009, -2.857563741784356],
 [53.40820000000008, -2.8972799999999665],
 [53.397890000000075, -2.9664399999999773],
 [53.39581000000004, -2.8892499999999472],
 [53.39996000000008, -2.976189999999974],
 [53.40575000000007, -2.9921599999999557],
 [53.43463000000003, -2.9336399999999685],
 [53.461820000000046, -2.895369999999957],
 [53.3776525721108, -2.956362538735448],
 [53.41879004538934, -2.8776599107671785],
 [53.40364000000005, -2.948479999999961],
 [53.43134512995583, -2.971376568461553],
 [53.41694418482603, -2.9351098942268266],
 [53.469100000000026, -2.915269999999964],
 [53.41005000000007, -2.9783899999999335],
 [53.38330000000008, -2.8621199999999476],
 [53.456692622795565, -2.901452331680126],
 [53.355508736353634, -2.900061232137948],
 [53.35987000000006, -2.856179999999938],
 [53.41201950839983, -2.9503748296923713],
 [53.430350051927

In [11]:
df_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])

In [12]:
# merge the coordinates into the original dataframe
Lpool_df['Latitude'] = df_coordinates['Latitude']
Lpool_df['Longitude'] = df_coordinates['Longitude']

In [13]:
print(Lpool_df.shape)
Lpool_df

(41, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Aigburth,53.36806,-2.92364
1,"Allerton, Liverpool",53.388737,-2.913576
2,Anfield (suburb),53.43054,-2.94747
3,"Belle Vale, Liverpool",53.385103,-2.857564
4,Broadgreen,53.4082,-2.89728
5,"Canning, Liverpool",53.39789,-2.96644
6,Childwall,53.39581,-2.88925
7,"Chinatown, Liverpool",53.39996,-2.97619
8,Liverpool city centre,53.40575,-2.99216
9,Clubmoor,53.43463,-2.93364


In [14]:
# save the DataFrame as CSV file
Lpool_df.to_csv("Lpool_df.csv", index=False)

### 4. Create a map of Liverpool with neighborhoods 

In [15]:
# Get the coordinates of Liverpool
address = 'Liverpool, United Kingdom'

geolocator = Nominatim(user_agent="my-app")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Liverpool, United Kingdom {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Liverpool, United Kingdom 53.407154, -2.991665.


In [16]:
# create map of Liverpool using latitude and longitude values
map_Lpool = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(Lpool_df['Latitude'], Lpool_df['Longitude'], Lpool_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_Lpool)  
    
map_Lpool

In [17]:
# save the map as HTML file
map_Lpool.save('map_Lpool.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [18]:
# define Foursquare Credentials and Version
CLIENT_ID = 'VXJVJ02KTWR3AFZXIZDVFQYDRTZRQB3DEKI1U5OO33BZ405V' # your Foursquare ID
CLIENT_SECRET = 'QJBHZGUABSJHGTHSORHGTIGGYTFU2KKGR33OEU3ERZQ0JRTJ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VXJVJ02KTWR3AFZXIZDVFQYDRTZRQB3DEKI1U5OO33BZ405V
CLIENT_SECRET:QJBHZGUABSJHGTHSORHGTIGGYTFU2KKGR33OEU3ERZQ0JRTJ


#### let's get the top 100 venues that are within a radius of 2000 meters.

In [19]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(Lpool_df['Latitude'], Lpool_df['Longitude'], Lpool_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [20]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2059, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Aigburth,53.36806,-2.92364,Otterspool Promenade,53.362505,-2.931786,Other Great Outdoors
1,Aigburth,53.36806,-2.92364,Mossley Hill Athletics Club,53.374798,-2.919895,Athletics & Sports
2,Aigburth,53.36806,-2.92364,Sefton Park,53.381713,-2.936611,Park
3,Aigburth,53.36806,-2.92364,Childhood Home of Paul McCartney,53.369586,-2.897883,Historic Site
4,Aigburth,53.36806,-2.92364,The Palm House,53.381339,-2.935269,Botanical Garden


### check how many venues were returned for each neighorhood

In [21]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aigburth,31,31,31,31,31,31
"Allerton, Liverpool",64,64,64,64,64,64
Anfield (suburb),45,45,45,45,45,45
"Belle Vale, Liverpool",24,24,24,24,24,24
Broadgreen,47,47,47,47,47,47
"Canning, Liverpool",100,100,100,100,100,100
Childwall,44,44,44,44,44,44
"Chinatown, Liverpool",100,100,100,100,100,100
Clubmoor,29,29,29,29,29,29
Croxteth,12,12,12,12,12,12


### Let's find out how many unique categories can be curated from all the returned venues

In [22]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 141 uniques categories.


In [23]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Other Great Outdoors', 'Athletics & Sports', 'Park',
       'Historic Site', 'Botanical Garden', 'Steakhouse',
       'Fast Food Restaurant', 'English Restaurant', 'Grocery Store',
       'Indian Restaurant', 'Cricket Ground', 'Gym / Fitness Center',
       'Café', 'Pharmacy', 'Discount Store', 'Hotel', 'Sandwich Place',
       'Gastropub', 'Supermarket', 'Pub', 'Coffee Shop',
       'Outdoor Sculpture', 'Tennis Court', 'Tapas Restaurant',
       'Mexican Restaurant', 'Pizza Place', 'Road', 'Thai Restaurant',
       'Beer Bar', 'Bar', 'Pool', 'Cocktail Bar', 'Gym', 'Bookstore',
       'Gas Station', 'Playground', 'Restaurant', 'Chinese Restaurant',
       'Convenience Store', 'Pet Store', 'Art Museum', 'Platform',
       'Soccer Stadium', 'Souvenir Shop', 'Museum', 'Stadium',
       'Music Venue', 'Furniture / Home Store', 'Sporting Goods Shop',
       'Warehouse Store'], dtype=object)

In [24]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

## 6. Analyze Each Neighborhood

In [25]:
# One Hot Encoding
Lpool_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Lpool_onehot['Neighborhoods'] = venues_df['Neighborhood']

fixed_columns = [Lpool_onehot.columns[-1]] + list(Lpool_onehot.columns[:-1])
Lpool_onehot = Lpool_onehot[fixed_columns]

print(Lpool_onehot.shape)
Lpool_onehot.head()

(2059, 142)


Unnamed: 0,Neighborhoods,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Garage,Bagel Shop,Bakery,Bar,Beer Bar,Beer Garden,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Brazilian Restaurant,Brewery,Burger Joint,Burrito Place,Bus Stop,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,Comic Shop,Concert Hall,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Diner,Discount Store,Dive Bar,Donut Shop,Duty-free Shop,Eastern European Restaurant,English Restaurant,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Go Kart Track,Golf Course,Golf Driving Range,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Historic Site,History Museum,Hostel,Hotel,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Liquor Store,Lounge,Malay Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Racecourse,Restaurant,Road,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spanish Restaurant,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Warehouse Store,Waterfront,Wine Bar
0,Aigburth,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Aigburth,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Aigburth,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aigburth,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aigburth,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Grouping rows by neighborhood and by taking the mean of the frequency of occurrence

In [26]:
Lpool_grouped = Lpool_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(Lpool_grouped.shape)
Lpool_grouped

(41, 142)


Unnamed: 0,Neighborhoods,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Garage,Bagel Shop,Bakery,Bar,Beer Bar,Beer Garden,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Bowling Alley,Brazilian Restaurant,Brewery,Burger Joint,Burrito Place,Bus Stop,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,Comic Shop,Concert Hall,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Diner,Discount Store,Dive Bar,Donut Shop,Duty-free Shop,Eastern European Restaurant,English Restaurant,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Go Kart Track,Golf Course,Golf Driving Range,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Historic Site,History Museum,Hostel,Hotel,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Liquor Store,Lounge,Malay Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Park,Pedestrian Plaza,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Racecourse,Restaurant,Road,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Soccer Field,Soccer Stadium,Souvenir Shop,Spanish Restaurant,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Warehouse Store,Waterfront,Wine Bar
0,Aigburth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.096774,0.0,0.064516,0.0,0.0,0.0,0.064516,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Allerton, Liverpool",0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.03125,0.015625,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.015625,0.0,0.0,0.0,0.015625,0.109375,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.015625,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.09375,0.015625,0.03125,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.015625,0.015625,0.03125,0.015625,0.015625,0.0,0.015625,0.140625,0.0,0.015625,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046875,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Anfield (suburb),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.022222,0.088889,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.177778,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.088889,0.022222,0.0,0.044444,0.0,0.022222,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0
3,"Belle Vale, Liverpool",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.208333,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0
4,Broadgreen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.042553,0.0,0.042553,0.0,0.0,0.021277,0.0,0.042553,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.106383,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12766,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.021277,0.021277,0.0,0.0,0.0,0.12766,0.0,0.0,0.0,0.0,0.021277,0.042553,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Canning, Liverpool",0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.1,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.07,0.01,0.0,0.01,0.0,0.0,0.02,0.09,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.05,0.01,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.08,0.0,0.02,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0
6,Childwall,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.113636,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.113636,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.022727,0.022727,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0
7,"Chinatown, Liverpool",0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.07,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.07,0.01,0.0,0.01,0.0,0.0,0.02,0.1,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.05,0.01,0.01,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.07,0.0,0.02,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0
8,Clubmoor,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.206897,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.034483,0.0,0.0,0.0,0.034483,0.034483,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0
9,Croxteth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
len(Lpool_grouped[Lpool_grouped["Shopping Mall"] > 0])

4

### Create a new DataFrame for Shopping Mall data only

In [28]:
Lpool_mall = Lpool_grouped[["Neighborhoods","Shopping Mall"]]

In [29]:
Lpool_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Aigburth,0.0
1,"Allerton, Liverpool",0.0
2,Anfield (suburb),0.0
3,"Belle Vale, Liverpool",0.0
4,Broadgreen,0.0


### 7. Cluster Neighborhoods¶
Run k-means to cluster the neighborhoods in Liverpool into 3 clusters.

In [30]:
Lclusters = 2

Lpool_clustering = Lpool_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=Lclusters, random_state=0).fit(Lpool_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 1, 0, 1, 0, 0])

In [31]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
Lpool_merged = Lpool_mall.copy()

# add clustering labels
Lpool_merged["Cluster Labels"] = kmeans.labels_

In [32]:
Lpool_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
Lpool_merged.head(10)

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Aigburth,0.0,0
1,"Allerton, Liverpool",0.0,0
2,Anfield (suburb),0.0,0
3,"Belle Vale, Liverpool",0.0,0
4,Broadgreen,0.0,0
5,"Canning, Liverpool",0.01,1
6,Childwall,0.0,0
7,"Chinatown, Liverpool",0.01,1
8,Clubmoor,0.0,0
9,Croxteth,0.0,0


In [33]:
# merge Liverpool_grouped with Liverpool_data to add latitude/longitude for each neighborhood
Lpool_merged = Lpool_merged.merge(Lpool_df.set_index("Neighborhood"), on="Neighborhood")

print(Lpool_merged.shape)
Lpool_merged.head(10)

(41, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Aigburth,0.0,0,53.36806,-2.92364
1,"Allerton, Liverpool",0.0,0,53.388737,-2.913576
2,Anfield (suburb),0.0,0,53.43054,-2.94747
3,"Belle Vale, Liverpool",0.0,0,53.385103,-2.857564
4,Broadgreen,0.0,0,53.4082,-2.89728
5,"Canning, Liverpool",0.01,1,53.39789,-2.96644
6,Childwall,0.0,0,53.39581,-2.88925
7,"Chinatown, Liverpool",0.01,1,53.39996,-2.97619
8,Clubmoor,0.0,0,53.43463,-2.93364
9,Croxteth,0.0,0,53.46182,-2.89537


In [34]:
# sort the results by Cluster Labels
print(Lpool_merged.shape)
Lpool_merged.sort_values(["Cluster Labels"], inplace=True)
Lpool_merged

(41, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Aigburth,0.0,0,53.36806,-2.92364
22,"Kirkdale, Liverpool",0.0,0,53.43035,-2.976595
23,Knotty Ash,0.0,0,53.41772,-2.8894
25,Mossley Hill,0.0,0,53.38022,-2.91348
26,"Netherley, Liverpool",0.0,0,53.3923,-2.83941
27,Norris Green,0.0,0,53.44209,-2.91886
28,Old Swan,0.0,0,53.41349,-2.91274
29,Orrell Park,0.0,0,53.46351,-2.96501
21,"Kensington, Liverpool",0.0,0,53.41202,-2.950375
30,Sefton Park (district),0.0,0,53.38915,-2.94953


### Finally, let's visualize the resulting clusters

In [38]:
import numpy as np
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(Lclusters)
ys = [i+x+(i*x)**2 for i in range(Lclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Lpool_merged['Latitude'], Lpool_merged['Longitude'], Lpool_merged['Neighborhood'], Lpool_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [39]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

## 8. Examine Clusters

#### Cluster 0

In [40]:
Lpool_merged.loc[Lpool_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Aigburth,0.0,0,53.36806,-2.92364
22,"Kirkdale, Liverpool",0.0,0,53.43035,-2.976595
23,Knotty Ash,0.0,0,53.41772,-2.8894
25,Mossley Hill,0.0,0,53.38022,-2.91348
26,"Netherley, Liverpool",0.0,0,53.3923,-2.83941
27,Norris Green,0.0,0,53.44209,-2.91886
28,Old Swan,0.0,0,53.41349,-2.91274
29,Orrell Park,0.0,0,53.46351,-2.96501
21,"Kensington, Liverpool",0.0,0,53.41202,-2.950375
30,Sefton Park (district),0.0,0,53.38915,-2.94953


#### Cluster 1

In [41]:
Lpool_merged.loc[Lpool_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
7,"Chinatown, Liverpool",0.01,1,53.39996,-2.97619
5,"Canning, Liverpool",0.01,1,53.39789,-2.96644
24,Liverpool city centre,0.01,1,53.40575,-2.99216
16,"Garston, Liverpool",0.01,1,53.41005,-2.97839


# Conclusion

All shopping malls are concentrated in the central area of Liverpool, in the cluster 1. On the other hand, cluster 0 has no shopping malls at all. This represents a great opportunity and high potential areas to open new shopping malls anywhere in cluster 0, as there is no competition from existing malls. Meanwhile, shopping malls in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have no shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 1 with no competition.