## CAPSTONE PROJECT(BATTLE OF NEIGHBORHOODS)

### BUSINESS PROBLEM: Building or opening a new shopping mall in Lagos, Nigeria

#### The objective is to determine a suitable location to open a shopping mall based on the difeerent localities of Lagos, already established Shopping Malls in various geographical locations to be able to maximize revenue and reduce losses

### DATA

#### This project will use data from :

 Geopy - For getting the co-ordinates of different locations.
 Foursquare API - To get the list of venues and the details around a given location.
 Wikipedia - To get the Localities in Lagos

### METHODOLOGY 

1. Getting the co-ordinates of the city.
2. Getting the list of neighborhoods and their co-ordinates.
3. Exploring the most visited venues in the target localities.
4. Clustering of the different localities.
5. Analysis of  the clusters formed.


In [74]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [75]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [76]:
pip install BeautifulSoup4

Note: you may need to restart the kernel to use updated packages.


In [77]:
#Importing other required libraries
import numpy as np
import pandas as pd

from geopy.geocoders import Nominatim
try:
    import geocoder
except:
    !pip install geocoder
    import geocoder

import requests
from bs4 import BeautifulSoup

try:
    import folium
except:
    !pip install folium
    import folium
    
from sklearn.cluster import KMeans
print ('done')

done


### Getting the location

In [78]:
g = geocoder.arcgis('Lagos, Nigeria')
Lagos_lat = g.latlng[0]
Lagos_lng = g.latlng[1]
print("The Latitude and Longitude of Lagos is {} and {}".format(Lagos_lat, Lagos_lng))

The Latitude and Longitude of Lagos is 6.454700000000059 and 3.3887600000000475


In [90]:
#scraping the table from wikipedia
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_Lagos_State_local_government_areas_by_population')[2]
df


Unnamed: 0,Rank,LGA,Population
0,1,Alimosho,11456783
1,2,Ajeromi-Ifelodun,2000346
2,3,Kosofe,665421
3,4,Mushin,633543
4,5,Oshodi-Isolo,10621789
5,6,Ojo,598336
6,7,Ikorodu,535811
7,8,Surulere,504409
8,9,Agege,461123
9,10,Ifako-Ijaiye,428812


In [91]:
df.shape

(20, 3)

In [92]:
df.size

60

In [93]:
#renaming LGA to Locality
df.rename(columns = {'LGA': 'Locality'}, inplace = True)
df

Unnamed: 0,Rank,Locality,Population
0,1,Alimosho,11456783
1,2,Ajeromi-Ifelodun,2000346
2,3,Kosofe,665421
3,4,Mushin,633543
4,5,Oshodi-Isolo,10621789
5,6,Ojo,598336
6,7,Ikorodu,535811
7,8,Surulere,504409
8,9,Agege,461123
9,10,Ifako-Ijaiye,428812


In [94]:
# Dropping the Rank and Population columns
df.drop(columns = ['Rank', 'Population'], axis = 1, inplace=True)
df

Unnamed: 0,Locality
0,Alimosho
1,Ajeromi-Ifelodun
2,Kosofe
3,Mushin
4,Oshodi-Isolo
5,Ojo
6,Ikorodu
7,Surulere
8,Agege
9,Ifako-Ijaiye


In [95]:
#shape of dataframe
df.shape

(20, 1)

## Defining a function to get the location of the localities
def get_location(localities):
    g = geocoder.arcgis('{}, Lagos, Nigeria'.format(localities))
    get_latlng = g.latlng
    return get_latlng


In [96]:
#Defining a function to get the location of the localities
def get_location(localities):
    g = geocoder.arcgis('{}, Lagos, Nigeria'.format(localities))
    get_latlng = g.latlng
    return get_latlng

In [97]:
co_ordinates = []
for i in df["Locality"].tolist():
    co_ordinates.append(get_location(i))
print(co_ordinates)

[[6.609270000000038, 3.255800000000022], [6.459410000000048, 3.3405500000000643], [6.599990000000048, 3.4150900000000206], [6.53174000000007, 3.3470100000000684], [6.521350000000041, 3.3186300000000415], [6.462620000000072, 3.166960000000074], [6.6235600000000545, 3.5048300000000268], [6.489320000000021, 3.358000000000061], [6.6256100000000515, 3.312620000000038], [6.651110000000074, 3.3232900000000427], [6.537850000000049, 3.385340000000042], [6.445430000000044, 3.2675400000000536], [6.506430000000023, 3.375530000000026], [6.607760000000042, 3.348540000000071], [6.4666800000000535, 3.5832600000000525], [6.432160000000067, 2.89265000000006], [6.437950000000058, 3.3643600000000333], [6.454700000000059, 3.3887600000000475], [6.583750000000066, 3.975530000000049], [6.5036700000000565, 3.7330100000000357]]


In [98]:
#Creating a dataframe from the list of location
co_ordinates_df = pd.DataFrame(co_ordinates, columns=['Latitudes', 'Longitudes'])


In [99]:
#Adding the coordinates to the dataframe
df["Latitudes"] = co_ordinates_df["Latitudes"]
df["Longitudes"] = co_ordinates_df["Longitudes"]

In [100]:
df.head()

Unnamed: 0,Locality,Latitudes,Longitudes
0,Alimosho,6.60927,3.2558
1,Ajeromi-Ifelodun,6.45941,3.34055
2,Kosofe,6.59999,3.41509
3,Mushin,6.53174,3.34701
4,Oshodi-Isolo,6.52135,3.31863


### Plotting the localities on a map

In [102]:
#Creating a map
Lagos_map = folium.Map(location=[Lagos_lat, Lagos_lng],zoom_start=11)

#adding markers to the map for localities
#marker for Bangalore
folium.Marker([Lagos_lat, Lagos_lng], popup='<i>Lgos</i>', color='red', tooltip="Click to see").add_to(Lagos_map)

#markers for localities
for latitude,longitude,name in zip(df["Latitudes"], df["Longitudes"], df["Locality"]):
    folium.CircleMarker(
        [latitude, longitude],
        radius=6,
        color='blue',
        popup=name,
        fill=True,
        fill_color='#3186ff'
    ).add_to(Lagos_map)

Lagos_map

### Using Foursquare API to explore the different localities

In [103]:
#Foursquare Credentials
# @hidden_cell
CLIENT_ID = 'JTB4R2ZERJU1QIVN1L4DXTEHZZS3ALDRVPDITI5KSV45D0DG'
CLIENT_SECRET = 'ICQ5C1WJOIFWHALH01K3XKDN4UFX3Q5PT3I4ZBNVW3P1SVKD'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + "CLIENT_ID")
print('CLIENT_SECRET:' + "CLIENT_SECRET")

Your credentails:
CLIENT_ID: CLIENT_ID
CLIENT_SECRET:CLIENT_SECRET


In [None]:
#Getting the top 100 venues in each locality
radius = 2000
LIMIT = 100

venues = []

for lat, lng, locality in zip(df["Latitudes"], df["Longitudes"], df["Locality"]):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, radius, LIMIT)
    results = requests.get(url).json()['response']['groups'][0]['items']

    for venue in results:
        venues.append((locality, lat, lng, venue['venue']['name'], venue['venue']['location']['lat'], venue['venue']['location']['lng'], venue['venue']['categories'][0]['name']))


In [106]:
venues[0]

('Alimosho',
 6.609270000000038,
 3.255800000000022,
 'Ipaja market ipaja',
 6.602248491329565,
 3.2555854768399923,
 'Market')

In [107]:
#Convert the venue list into dataframe
venues_df = pd.DataFrame(venues)
venues_df.columns = ['Locality', 'Latitude', 'Longitude', 'Venue name', 'Venue Lat', 'Venue Lng', 'Venue Category']
venues_df.head()

Unnamed: 0,Locality,Latitude,Longitude,Venue name,Venue Lat,Venue Lng,Venue Category
0,Alimosho,6.60927,3.2558,Ipaja market ipaja,6.602248,3.255585,Market
1,Alimosho,6.60927,3.2558,De Grange suites & bar,6.602309,3.267038,Bar
2,Alimosho,6.60927,3.2558,Baruwa,6.602284,3.267039,Bus Station
3,Alimosho,6.60927,3.2558,mm international airport terminal d lagos nige...,6.598807,3.267039,Airport
4,Ajeromi-Ifelodun,6.45941,3.34055,Food step,6.463992,3.345732,Food Truck


In [108]:
#Number of venues for each Locality
venues_df.groupby(['Locality']).count()

Unnamed: 0_level_0,Latitude,Longitude,Venue name,Venue Lat,Venue Lng,Venue Category
Locality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agege,8,8,8,8,8,8
Ajeromi-Ifelodun,5,5,5,5,5,5
Alimosho,4,4,4,4,4,4
Amuwo-Odofin,4,4,4,4,4,4
Apapa,7,7,7,7,7,7
Badagry,4,4,4,4,4,4
Epe,1,1,1,1,1,1
Eti-Osa,7,7,7,7,7,7
Ifako-Ijaiye,5,5,5,5,5,5
Ikeja,42,42,42,42,42,42


In [109]:
#Getting the unique categories
print('There are {} unique categries.'.format(len(venues_df['Venue Category'])))

There are 203 unique categries.


In [110]:
#List of categories
print('Total number of unique catefories are {}'.format(len(venues_df['Venue Category'].unique().tolist())))
#First 10 categories
venues_df['Venue Category'].unique().tolist()#[:10]

Total number of unique catefories are 82


['Market',
 'Bar',
 'Bus Station',
 'Airport',
 'Food Truck',
 'Lake',
 'Bus Stop',
 'Metro Station',
 'Burger Joint',
 'Steakhouse',
 'Flea Market',
 'Vineyard',
 'Department Store',
 'Diner',
 'Wine Shop',
 'Fast Food Restaurant',
 'Food',
 'Hotel',
 'Stadium',
 'Park',
 'African Restaurant',
 'Café',
 'Historic Site',
 'Electronics Store',
 'Auto Workshop',
 'Bank',
 'Multiplex',
 'Movie Theater',
 'Shopping Mall',
 'Clothing Store',
 'Pizza Place',
 'Art Gallery',
 'Fried Chicken Joint',
 'Soccer Field',
 'Jewelry Store',
 'BBQ Joint',
 'Bike Rental / Bike Share',
 'Fish Market',
 'Noodle House',
 'Campground',
 'Gym',
 'Light Rail Station',
 'Grocery Store',
 'Beer Garden',
 'IT Services',
 'Arts & Entertainment',
 'Convenience Store',
 'Boat or Ferry',
 'Lounge',
 'Harbor / Marina',
 'Nightclub',
 'Cupcake Shop',
 'Bagel Shop',
 'Bakery',
 'Optical Shop',
 'Basketball Court',
 'German Restaurant',
 'Boutique',
 'Pharmacy',
 'Chinese Restaurant',
 'Ice Cream Shop',
 'Coffee Shop',

### Analyzing the Localities according to the venues

In [111]:
#one hot encoding
Lagos_onehot = pd.get_dummies(venues_df[['Venue Category']], prefix="", prefix_sep="")

Lagos_onehot['L'] = venues_df['Locality']

#move the locality column to the front
Lagos_onehot = blr_onehot[ [ 'Locality' ] + [ col for col in blr_onehot.columns if col!='Locality' ] ]
Lagos_onehot.head()

Unnamed: 0,Locality,African Restaurant,Airport,Art Gallery,Arts & Entertainment,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,...,Shopping Mall,Soccer Field,Soup Place,Stadium,Steakhouse,Sushi Restaurant,Trail,Train Station,Vineyard,Wine Shop
0,Alimosho,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Alimosho,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Alimosho,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Alimosho,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ajeromi-Ifelodun,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Grouping the categories

In [113]:
Lagos_grouped = blr_onehot.groupby(['Locality']).mean().reset_index()
print(Lagos_grouped.shape)
Lagos_grouped.head()

(19, 83)


Unnamed: 0,Locality,African Restaurant,Airport,Art Gallery,Arts & Entertainment,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,...,Shopping Mall,Soccer Field,Soup Place,Stadium,Steakhouse,Sushi Restaurant,Trail,Train Station,Vineyard,Wine Shop
0,Agege,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ajeromi-Ifelodun,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alimosho,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amuwo-Odofin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Apapa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [114]:
#numbers of localities having Shopping malls
len(Lagos_grouped[Lagos_grouped['Shopping Mall'] > 0])

5

### Dataframe for shopping mall

In [115]:
Shopping_mall = Lagos_grouped[['Locality', 'Shopping Mall']]
Shopping_mall.head()

Unnamed: 0,Locality,Shopping Mall
0,Agege,0.0
1,Ajeromi-Ifelodun,0.0
2,Alimosho,0.0
3,Amuwo-Odofin,0.0
4,Apapa,0.142857


In [116]:
#K-means clustering
cluster = 3 

#Dataframe for clustering
Lagos_clustering = Shopping_mall.drop(['Locality'], 1)

#run K-means clustering
k_means = KMeans(init="k-means++", n_clusters=cluster, n_init=12).fit(Lagos_clustering)

#getting the labels for first 10 localities
print(k_means.labels_[0:10])

[0 0 0 0 1 0 0 0 0 2]


In [117]:
#Creating a dataframe
Lagos_labels = Shopping_mall.copy()

#adding labels
Lagos_labels["Cluster Label"] = k_means.labels_

Lagos_labels.head()

Unnamed: 0,Locality,Shopping Mall,Cluster Label
0,Agege,0.0,0
1,Ajeromi-Ifelodun,0.0,0
2,Alimosho,0.0,0
3,Amuwo-Odofin,0.0,0
4,Apapa,0.142857,1


In [119]:
#Merging the Lagos_labels and first dataframe(df) to get the latitude and longitudes for each locality
Lagos_labels = Lagos_labels.join(df.set_index('Locality'), on='Locality')
Lagos_labels.head()

Unnamed: 0,Locality,Shopping Mall,Cluster Label,Latitudes,Longitudes
0,Agege,0.0,0,6.62561,3.31262
1,Ajeromi-Ifelodun,0.0,0,6.45941,3.34055
2,Alimosho,0.0,0,6.60927,3.2558
3,Amuwo-Odofin,0.0,0,6.44543,3.26754
4,Apapa,0.142857,1,6.43795,3.36436


In [121]:
#Grouping the localities according to their Cluster Labels
Lagos_labels.sort_values(["Cluster Label"], inplace=True)
Lagos_labels.head()

Unnamed: 0,Locality,Shopping Mall,Cluster Label,Latitudes,Longitudes
0,Agege,0.0,0,6.62561,3.31262
1,Ajeromi-Ifelodun,0.0,0,6.45941,3.34055
2,Alimosho,0.0,0,6.60927,3.2558
3,Amuwo-Odofin,0.0,0,6.44543,3.26754
16,Oshodi-Isolo,0.0,0,6.52135,3.31863


In [124]:
#Plot the cluster on map
cluster_map = folium.Map(location=[Lagos_lat, Lagos_lng], zoom_start = 10)

#marker for Bangalore
folium.Marker([Lagos_lat, Lagos_lng], popup='<i>Lagos</i>', color='blue', tooltip="Click to see").add_to(cluster_map)

#Getting the colors for the clusters
col = ['red', 'green', 'blue']

#markers for localities
for latitude,longitude,name,clus in zip(Lagos_labels["Latitudes"], Lagos_labels["Longitudes"], Lagos_labels["Locality"], Lagos_labels["Cluster Label"]):
    label = folium.Popup(name + ' - Cluster ' + str(clus))
    folium.CircleMarker(
        [latitude, longitude],
        radius=6,
        color=col[clus],
        popup=label,
        fill=False,
        fill_color=col[clus],
        fill_opacity=0.3
    ).add_to(cluster_map)
       
cluster_map

## Analysing the cluster

In [126]:
#First Cluster
cluster_1 = Lagos_labels[Lagos_labels['Cluster Label'] == 0]
print("There are {} localities in cluster-1".format(cluster_1.shape[0]))
mean_presence_1 = cluster_1['Shopping Mall'].mean()
print("The mean occurence of Shopping Mall in cluster-1 is {0:.2f}".format(mean_presence_1))
cluster_1

There are 14 localities in cluster-1
The mean occurence of Shopping Mall in cluster-1 is 0.00


Unnamed: 0,Locality,Shopping Mall,Cluster Label,Latitudes,Longitudes
0,Agege,0.0,0,6.62561,3.31262
1,Ajeromi-Ifelodun,0.0,0,6.45941,3.34055
2,Alimosho,0.0,0,6.60927,3.2558
3,Amuwo-Odofin,0.0,0,6.44543,3.26754
16,Oshodi-Isolo,0.0,0,6.52135,3.31863
5,Badagry,0.0,0,6.43216,2.89265
6,Epe,0.0,0,6.58375,3.97553
7,Eti-Osa,0.0,0,6.46668,3.58326
8,Ifako-Ijaiye,0.0,0,6.65111,3.32329
17,Somolu,0.0,0,6.53785,3.38534


In [128]:
#Second Cluster
cluster_2 = Lagos_labels[Lagos_labels['Cluster Label'] == 1]
print("There are {} localities in cluster-2".format(cluster_2.shape[0]))
mean_presence_2 = cluster_2['Shopping Mall'].mean()
print("The mean occurence of Shopping Mall in cluster-2 is {0:.2f}".format(mean_presence_2))
cluster_2

There are 2 localities in cluster-2
The mean occurence of Shopping Mall in cluster-2 is 0.13


Unnamed: 0,Locality,Shopping Mall,Cluster Label,Latitudes,Longitudes
18,Surulere,0.12,1,6.48932,3.358
4,Apapa,0.142857,1,6.43795,3.36436


In [129]:
#Third Cluster
cluster_3 = Lagos_labels[Lagos_labels['Cluster Label'] == 2]
print("There are {} localities in cluster-3".format(cluster_3.shape[0]))
mean_presence_3 = cluster_3['Shopping Mall'].mean()
print("The mean occurence of Shoppimg Mall in cluster-3 is {0:.2f}".format(mean_presence_3))
cluster_3

There are 3 localities in cluster-3
The mean occurence of Shoppimg Mall in cluster-3 is 0.05


Unnamed: 0,Locality,Shopping Mall,Cluster Label,Latitudes,Longitudes
12,Lagos Island,0.037037,2,6.4547,3.38876
13,Lagos Mainland,0.041667,2,6.50643,3.37553
9,Ikeja,0.071429,2,6.60776,3.34854


# CONCLUSION


### From, the above analyisis, we can see that Clustrer 1 with red color has the highest number of shopping malls while cluster 2 and 3 with colors green and blue respectively have less numbers. 
### This analysis gives an insight to potential entrepreneurs on the perfect localities to open a shopping mall. As seen Cluster 1 already has a hugh competition, so the perfect areas would be those in Clusters 2 and 3