# DESCRIPTION OF DATA

### We will obtain our data by
#### 1.Building a dataframe of neighborhoods in Lagos, Nigeria by web scraping the data from Wikipedia page, 
#### 2. Get the geographical coordinates of the neighborhoods 
#### 3. Obtain venue data for the neighborhoods from Foursquare API. 
#### 4. Based on the actions above, we will begin exploring and clustering the neighborhoods 
#### 5. Finally select the best cluster to open a new shopping mall

# 1. Building a Dataframe

## Import Libraries

In [15]:
pip install BeautifulSoup4


The following command must be run outside of the IPython shell:

    $ pip install BeautifulSoup4

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more information on how to install packages:

    https://docs.python.org/3/installing/


In [87]:
import numpy as np # library to analyze vectorized data
import pandas as pd # Data analsysis Library
import requests 

from bs4 import BeautifulSoup

### 2. Scrap data from Wikipedia page into a DataFrame

In [17]:
URL = "https://en.wikipedia.org/wiki/Category:Local_Government_Areas_in_Lagos_State"
r = requests.get(URL) 
  
soup = BeautifulSoup(r.content, 'html5lib') 
table = soup.find('div', attrs = {'id':'container'})

In [18]:
# create a list to store neighborhood data
neighborhoodList = []

In [19]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [34]:
# create a new DataFrame from the list
lg_df = pd.DataFrame({"Neighborhood": neighborhoodList})

lg_df.head()

Unnamed: 0,Neighborhood
0,List of Lagos State local government areas by ...
1,Agege
2,Ajeromi-Ifelodun
3,Alimosho
4,Amuwo-Odofin


In [35]:
# print the number of rows of the dataframe
lg_df.shape

(25, 1)

### 3. Get the Geographical Coordinates

In [27]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Lagos, Nigeria'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [29]:
import sys
!{sys.executable} -m pip install geocoder

print('Packages installed.')

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 11.7MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Packages installed.


In [39]:
# Create the dataframe columns
column_names = [ 'Neighborhood', 'Latitude', 'Longitude'] 

# Dataframe
suburbs = pd.DataFrame(columns=column_names)

suburbs

Unnamed: 0,Neighborhood,Latitude,Longitude


In [60]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Lagos, Nigeria'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [63]:
import geocoder

In [64]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [get_latlng(neighborhood) for neighborhood in lg_df["Neighborhood"].tolist() ]

In [65]:
coords

[[6.562960000000032, 3.346040000000073],
 [6.6256100000000515, 3.312620000000038],
 [6.459410000000048, 3.3405500000000643],
 [6.609270000000038, 3.255800000000022],
 [6.445430000000044, 3.2675400000000536],
 [6.437950000000058, 3.3643600000000333],
 [6.432160000000067, 2.89265000000006],
 [6.542899587275793, 3.308255190037968],
 [6.582122018656454, 3.9608476268710713],
 [6.4666800000000535, 3.5832600000000525],
 [6.4666800000000535, 3.5832600000000525],
 [6.5036700000000565, 3.7330100000000357],
 [6.651110000000074, 3.3232900000000427],
 [6.573000000000036, 3.5925000000000296],
 [6.607760000000042, 3.348540000000071],
 [6.6235600000000545, 3.5048300000000268],
 [6.599990000000048, 3.4150900000000206],
 [6.454700000000059, 3.3887600000000475],
 [6.506430000000023, 3.375530000000026],
 [6.444980178179933, 3.3725468414927926],
 [6.530016085396608, 3.3495585299693253],
 [6.528991559260941, 3.3549415609283675],
 [6.521350000000041, 3.3186300000000415],
 [6.537850000000049, 3.38534000000004

In [66]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [68]:
# merge the coordinates into the original dataframe
lg_df['Latitude'] = df_coords['Latitude']
lg_df['Longitude'] = df_coords['Longitude']

In [69]:
# check the neighborhoods and the coordinates
print(lg_df.shape)
lg_df

(25, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,List of Lagos State local government areas by ...,6.56296,3.34604
1,Agege,6.62561,3.31262
2,Ajeromi-Ifelodun,6.45941,3.34055
3,Alimosho,6.60927,3.2558
4,Amuwo-Odofin,6.44543,3.26754
5,Apapa,6.43795,3.36436
6,Badagry,6.43216,2.89265
7,"Ejigbo, Lagos",6.5429,3.308255
8,"Epe, Lagos",6.582122,3.960848
9,Eti-Osa,6.46668,3.58326


In [70]:
# save the DataFrame as CSV file
lg_df.to_csv("lg_df.csv", index=False)

# Map of Lagos with Neighborhoods

In [71]:
# get the coordinates of Lagos
address = 'Lagos, Nigeria'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Lagos, Nigeria {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Lagos, Nigeria 6.4550575, 3.3941795.


In [82]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [108]:
!pip install folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 7.5MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


In [110]:
import folium

In [111]:
# create map of Lagos using latitude and longitude values

map_lg=folium.Map (location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(lg_df['Latitude'], lg_df['Longitude'], lg_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_lg)  
    
    map_lg

In [112]:
# save the map as HTML file
map_lg.save('map_lg.html')

In [113]:
# define Foursquare Credentials and Version
CLIENT_ID = 'CU5A4O54ERRJTRQRIMGQW04OBPEDQK4H0FJJNVJ3WMEW0SNS' # Foursquare ID
CLIENT_SECRET = 'SND3X01A1J4ONWBNGOP4N5G5SGWFYHTLMRHILKNXSXAD2CYU' # Foursquare Secret
VERSION = '20200521' # Foursquare API version

print('credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

credentails:
CLIENT_ID: CU5A4O54ERRJTRQRIMGQW04OBPEDQK4H0FJJNVJ3WMEW0SNS
CLIENT_SECRET:SND3X01A1J4ONWBNGOP4N5G5SGWFYHTLMRHILKNXSXAD2CYU


In [114]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(lg_df['Latitude'], lg_df['Longitude'], lg_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [117]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.tail()

(243, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
238,Surulere,6.48932,3.358,Molade Okoya Thomas Indoor Sports Hall,6.498561,3.359283,Stadium
239,Surulere,6.48932,3.358,Mile 2 Bus Stop,6.480848,3.349324,Bike Rental / Bike Share
240,Surulere,6.48932,3.358,Premium Seafoods,6.480462,3.369235,Fish Market
241,Surulere,6.48932,3.358,Costain Busstop,6.477154,3.365701,Bus Stop
242,Surulere,6.48932,3.358,Indomie HQ,6.479493,3.370367,Noodle House


## Number of venues returned from each Neighborhood

In [120]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agege,9,9,9,9,9,9
Ajeromi-Ifelodun,4,4,4,4,4,4
Alimosho,4,4,4,4,4,4
Amuwo-Odofin,5,5,5,5,5,5
Apapa,6,6,6,6,6,6
Badagry,3,3,3,3,3,3
"Ejigbo, Lagos",4,4,4,4,4,4
"Epe, Lagos",1,1,1,1,1,1
Eti-Osa,8,8,8,8,8,8
Eti-Osa East,8,8,8,8,8,8


### Numbers of Unique Categories

In [121]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 82 uniques categories.


In [122]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Bar', 'Shopping Mall', 'Nightclub', 'Convenience Store', 'Spa',
       'Golf Course', 'Hotel', 'Fast Food Restaurant', 'Park',
       'Bus Station', 'Flea Market', 'Department Store',
       'Indian Restaurant', 'Toll Plaza', 'Airport', 'Clothing Store',
       'Campground', 'Market', 'Gym', 'Bus Stop', 'Food Truck', 'Pub',
       'Harbor / Marina', 'Playground', 'Pizza Place', 'Beer Garden',
       'Museum', 'Burger Joint', 'Resort', 'Sushi Restaurant',
       'Salon / Barbershop', 'Fried Chicken Joint', 'Light Rail Station',
       'Grocery Store', 'Lounge', 'BBQ Joint', 'Chinese Restaurant',
       'Ice Cream Shop', 'Multiplex', 'Coffee Shop', 'Soup Place',
       'Steakhouse', 'Performing Arts Venue', 'Bakery',
       'Seafood Restaurant', 'Electronics Store', 'African Restaurant',
       'Boutique', 'Plaza', 'Mobile Phone Shop'], dtype=object)

In [124]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

### Analyzing each Neighborhood

In [126]:
# one hot encoding
lg_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
lg_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [lg_onehot.columns[-1]] + list(lg_onehot.columns[:-1])
lg_onehot = lg_onehot[fixed_columns]

print(lg_onehot.shape)
lg_onehot.tail()

(243, 83)


Unnamed: 0,Neighborhoods,African Restaurant,Airport,Art Gallery,Arts & Entertainment,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bar,Baseball Stadium,Basketball Court,Beer Garden,Bike Rental / Bike Share,Boat or Ferry,Boutique,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Campground,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Convention Center,Cupcake Shop,Department Store,Diner,Electronics Store,Fast Food Restaurant,Fish Market,Flea Market,Food Truck,Fried Chicken Joint,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Metro Station,Mobile Phone Shop,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Noodle House,Optical Shop,Park,Performing Arts Venue,Pharmacy,Photography Studio,Pier,Pizza Place,Playground,Plaza,Pub,Resort,Salon / Barbershop,Seafood Restaurant,Shopping Mall,Soccer Field,Soup Place,Spa,Stadium,Steakhouse,Sushi Restaurant,Toll Plaza,Train Station,Vineyard,Wine Shop
238,Surulere,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
239,Surulere,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
240,Surulere,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
241,Surulere,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
242,Surulere,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Group rows by frequency of occurrence of each category

In [127]:
lg_grouped = lg_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(lg_grouped.shape)
lg_grouped

(24, 83)


Unnamed: 0,Neighborhoods,African Restaurant,Airport,Art Gallery,Arts & Entertainment,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bar,Baseball Stadium,Basketball Court,Beer Garden,Bike Rental / Bike Share,Boat or Ferry,Boutique,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Campground,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Convention Center,Cupcake Shop,Department Store,Diner,Electronics Store,Fast Food Restaurant,Fish Market,Flea Market,Food Truck,Fried Chicken Joint,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Metro Station,Mobile Phone Shop,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Noodle House,Optical Shop,Park,Performing Arts Venue,Pharmacy,Photography Studio,Pier,Pizza Place,Playground,Plaza,Pub,Resort,Salon / Barbershop,Seafood Restaurant,Shopping Mall,Soccer Field,Soup Place,Spa,Stadium,Steakhouse,Sushi Restaurant,Toll Plaza,Train Station,Vineyard,Wine Shop
0,Agege,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ajeromi-Ifelodun,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alimosho,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amuwo-Odofin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Apapa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Badagry,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Ejigbo, Lagos",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Epe, Lagos",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Eti-Osa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
9,Eti-Osa East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0


In [128]:
len(lg_grouped[lg_grouped["Shopping Mall"] > 0])

7

### New Data frame for Shopping Mall Data Alone

In [129]:
lg_mall = lg_grouped[["Neighborhoods","Shopping Mall"]]

In [130]:
lg_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Agege,0.0
1,Ajeromi-Ifelodun,0.0
2,Alimosho,0.0
3,Amuwo-Odofin,0.0
4,Apapa,0.166667


### Clustering of Neighborhoods by running K-means , 2 clusters

In [132]:
# set number of clusters
kclusters = 2

lg_clustering = lg_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lg_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 0, 1, 1, 1, 1, 1], dtype=int32)

In [133]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
lg_merged = lg_mall.copy()

# add clustering labels
lg_merged["Cluster Labels"] = kmeans.labels_

In [134]:
lg_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
lg_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Agege,0.0,1
1,Ajeromi-Ifelodun,0.0,1
2,Alimosho,0.0,1
3,Amuwo-Odofin,0.0,1
4,Apapa,0.166667,0


In [135]:
lg_merged.tail()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
19,"Ojo, Lagos",0.0,1
20,"Ojuwoye, Mushin",0.0,1
21,Oshodi-Isolo,0.0,1
22,Somolu,0.0,1
23,Surulere,0.136364,0


In [136]:
# merge lagos_grouped with lagos_data to add latitude/longitude for each neighborhood
lg_merged = lg_merged.join(lg_df.set_index("Neighborhood"), on="Neighborhood")

print(lg_merged.shape)
lg_merged.head() 

(24, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Agege,0.0,1,6.62561,3.31262
1,Ajeromi-Ifelodun,0.0,1,6.45941,3.34055
2,Alimosho,0.0,1,6.60927,3.2558
3,Amuwo-Odofin,0.0,1,6.44543,3.26754
4,Apapa,0.166667,0,6.43795,3.36436


In [137]:
# sort the results by Cluster Labels
print(lg_merged.shape)
lg_merged.sort_values(["Cluster Labels"], inplace=True)
lg_merged

(24, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
23,Surulere,0.136364,0,6.48932,3.358
4,Apapa,0.166667,0,6.43795,3.36436
18,"Mushin, Lagos",0.090909,0,6.44498,3.372547
17,List of Lagos State local government areas by ...,0.095238,0,6.56296,3.34604
12,Ikeja,0.076923,0,6.60776,3.34854
21,Oshodi-Isolo,0.0,1,6.52135,3.31863
20,"Ojuwoye, Mushin",0.0,1,6.528992,3.354942
19,"Ojo, Lagos",0.0,1,6.530016,3.349559
16,Lagos Mainland,0.041667,1,6.50643,3.37553
15,Lagos Island,0.045455,1,6.4547,3.38876


### Visualizing the Cluster

In [138]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lg_merged['Latitude'], lg_merged['Longitude'], lg_merged['Neighborhood'], lg_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [139]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

## Cluster Examination


#### Cluster 0

In [140]:
lg_merged.loc[lg_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
23,Surulere,0.136364,0,6.48932,3.358
4,Apapa,0.166667,0,6.43795,3.36436
18,"Mushin, Lagos",0.090909,0,6.44498,3.372547
17,List of Lagos State local government areas by ...,0.095238,0,6.56296,3.34604
12,Ikeja,0.076923,0,6.60776,3.34854


## Cluster 1

In [141]:
lg_merged.loc[lg_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
21,Oshodi-Isolo,0.0,1,6.52135,3.31863
20,"Ojuwoye, Mushin",0.0,1,6.528992,3.354942
19,"Ojo, Lagos",0.0,1,6.530016,3.349559
16,Lagos Mainland,0.041667,1,6.50643,3.37553
15,Lagos Island,0.045455,1,6.4547,3.38876
14,Kosofe,0.0,1,6.59999,3.41509
13,Ikorodu,0.0,1,6.62356,3.50483
0,Agege,0.0,1,6.62561,3.31262
10,Ifako-Ijaiye,0.0,1,6.65111,3.32329
9,Eti-Osa East,0.0,1,6.46668,3.58326


## Conclusion and Findings

### Most of the shopping malls are concentrated in Neighborhoods around cluster 0 with little concentration in Cluster 1. Cluster 1 represents a good opportunity for set up of a shopping mall due to little or no competition. property Developers and investors could have a quick return on investment by investing in this area. This in itself would however require further strategic studies to understand why shopping malls are not well sitiated in the cluster but froma stand point of data science. Cluster 1 presents a very good opportunity.