# Coursera Capstone Project: 

## The Battle of Neighborhoods (Week 2)

Segmenting and Clustering Neighborhoods in Casablanca to define the best cluster of the city to open new shopping mall


### Importing and downloading all the dependencies

In [6]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!conda install -c conda-forge geocoder --yes 
import geocoder

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         713 KB

The following NEW packages will be INSTALLED:

    altair:  4.1.0-py_1 conda-forge
    branca:  0.4.1-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Downloading and Extracting Packages
altair-4.1.0         | 614 KB    | #####

### Scrapping data from Wikipedia page into a DataFrame

In [7]:
# Get request
url = 'https://fr.wikipedia.org/wiki/Cat%C3%A9gorie:Quartier_de_Casablanca'
data = requests.get(url).text

# parsing data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

# creating a list to store neighborhoods
neighborhoodList = []

# appending the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)


In [10]:
# creating a DataFrame from the list
df1 = pd.DataFrame({"Neighborhood": neighborhoodList})

df1.head()

Unnamed: 0,Neighborhood
0,Aïn Diab
1,Belvédère (Casablanca)
2,Bourgogne (Casablanca)
3,Californie (quartier)
4,CIL (Casablanca)


In [11]:
# Number of rows of dataframe df1
df1.shape[0]

21

### Getting geographical coordinates

In [12]:
# function to get coordinates
def get_coordinates(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Casablanca, Morocco'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [14]:
# geting the coordinates
coordinates = [get_coordinates(neighborhood) for neighborhood in df1["Neighborhood"].tolist()]
coordinates

[[33.596610000000055, -7.618889999999965],
 [33.595120000000065, -7.58809999999994],
 [33.602670000000046, -7.645299999999963],
 [35.78204570453261, -5.823913642809645],
 [33.596610000000055, -7.618889999999965],
 [33.57593000000003, -7.629709999999932],
 [33.57227000000006, -7.5954099999999585],
 [33.605190834735005, -7.652688191920623],
 [33.58062000000007, -7.665269999999964],
 [33.575960000000066, -7.67665999999997],
 [33.596610000000055, -7.618889999999965],
 [33.60107000000005, -7.584429999999941],
 [33.57367000000005, -7.598109999999963],
 [33.55119000000008, -7.5515799999999444],
 [33.55741000000006, -7.6815299999999525],
 [33.58921000000004, -7.640609999999981],
 [33.59946000000008, -7.583719999999971],
 [33.53983000000005, -7.568619999999953],
 [33.53825000000006, -7.55350999999996],
 [33.546910000000025, -7.575049999999976],
 [33.524820000000034, -7.650489999999934]]

In [15]:
# creating dataframe for latitude and longitude
df2 = pd.DataFrame(coordinates, columns=['Latitude', 'Longitude'])

In [16]:
# merging the two dataframes df1 and df2
df1['Latitude'] = df2['Latitude']
df1['Longitude'] = df2['Longitude']
df1

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Aïn Diab,33.59661,-7.61889
1,Belvédère (Casablanca),33.59512,-7.5881
2,Bourgogne (Casablanca),33.60267,-7.6453
3,Californie (quartier),35.782046,-5.823914
4,CIL (Casablanca),33.59661,-7.61889
5,Derb Ghallef,33.57593,-7.62971
6,Derb Sultan,33.57227,-7.59541
7,Habous (Casablanca),33.605191,-7.652688
8,Hay El Hanaa,33.58062,-7.66527
9,Hay El Hassani,33.57596,-7.67666


In [17]:
# Number of rows of dataframe df1
df1.shape

(21, 3)

In [18]:
# saving the DataFrame as CSV file
df1.to_csv("df1.csv", index=False)

 ### Creating a map of Casablanca with neighborhoods
 
let's visualizat Casablanca the neighborhoods in it

In [19]:
# get the coordinates of Casablanca
address = 'Casablanca, Morocco'

geolocator = Nominatim(user_agent="casablanca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Casablanca, Morocco {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Casablanca, Morocco 33.5950627, -7.6187768.


In [20]:
# create map of Casablanca using latitude and longitude values
map_Casablanca = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df1['Latitude'], df1['Longitude'], df1['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_Casablanca)  
    
map_Casablanca

In [22]:
# save the map as HTML file
map_Casablanca.save('map_Casablanca.html')

### Using the Foursquare API to explore the neighborhoods

In [23]:
# define Foursquare Credentials and Version
CLIENT_ID = '********************' # your Foursquare ID
CLIENT_SECRET = '****************' # your Foursquare Secret
VERSION = '20202907' # Foursquare API version


Now, let's get the top 100 venues that are in Toronto within a raidus of 2000 meters

In [28]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df1['Latitude'], df1['Longitude'], df1['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))



In [30]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1107, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Aïn Diab,33.59661,-7.61889,Casa Jose,33.597823,-7.615341,Tapas Restaurant
1,Aïn Diab,33.59661,-7.61889,Sofitel Casablanca Tour Blanche,33.597748,-7.614201,Hotel
2,Aïn Diab,33.59661,-7.61889,La Bodega,33.59522,-7.611576,Pub
3,Aïn Diab,33.59661,-7.61889,Le Riad Restaurant,33.593936,-7.614676,Moroccan Restaurant
4,Aïn Diab,33.59661,-7.61889,Six PM,33.59594,-7.618684,Hotel Bar


Let's check how many venues were returned for each neighorhood

In [31]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aïn Diab,100,100,100,100,100,100
Belvédère (Casablanca),27,27,27,27,27,27
Bourgogne (Casablanca),100,100,100,100,100,100
CIL (Casablanca),100,100,100,100,100,100
Californie (quartier),64,64,64,64,64,64
Derb Ghallef,100,100,100,100,100,100
Derb Sultan,49,49,49,49,49,49
Habous (Casablanca),74,74,74,74,74,74
Hay El Hanaa,93,93,93,93,93,93
Hay El Hassani,38,38,38,38,38,38


Let's find out how many unique categories can be curated from all the returned venues

In [32]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 115 uniques categories.


In [33]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Tapas Restaurant', 'Hotel', 'Pub', 'Moroccan Restaurant',
       'Hotel Bar', 'French Restaurant', 'Seafood Restaurant',
       'Coffee Shop', 'Sandwich Place', 'Mediterranean Restaurant',
       'Lounge', 'Plaza', 'Italian Restaurant', 'Gastropub', 'Café',
       'Bar', 'Brazilian Restaurant', 'Fast Food Restaurant',
       'Ice Cream Shop', 'Burger Joint', 'Salad Place', 'Pizza Place',
       'Japanese Restaurant', 'Restaurant',
       'Vegetarian / Vegan Restaurant', 'Art Gallery', 'Sushi Restaurant',
       'Cupcake Shop', 'Library', 'Middle Eastern Restaurant', 'Bakery',
       'American Restaurant', 'Latin American Restaurant', 'Diner',
       'Farmers Market', 'Clothing Store', 'Vietnamese Restaurant',
       'Wings Joint', 'Asian Restaurant', 'Hot Dog Joint',
       'Big Box Store', 'Scenic Lookout', 'Noodle House', 'Pool Hall',
       'Shopping Mall', 'Flower Shop', 'Performing Arts Venue', 'Theater',
       'Concert Hall', 'Boarding House'], dtype=object)

In [35]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

### Analyzing Each Neighborhood

In [36]:
# one hot encoding
df1_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df1_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [df1_onehot.columns[-1]] + list(df1_onehot.columns[:-1])
df1_onehot = df1_onehot[fixed_columns]

print(df1_onehot.shape)
df1_onehot.head()

(1107, 116)


Unnamed: 0,Neighborhoods,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Beach Bar,Beer Garden,Big Box Store,Boarding House,Bookstore,Brazilian Restaurant,Burger Joint,Café,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Creperie,Cupcake Shop,Department Store,Diner,Doner Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food,French Restaurant,Garden Center,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Latin American Restaurant,Library,Lighthouse,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,Movie Theater,Multiplex,Neighborhood,Nightclub,Noodle House,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Pool Hall,Pub,Racetrack,Resort,Restaurant,Rock Club,Salad Place,Sandwich Place,Scenic Lookout,School,Seafood Restaurant,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Yoga Studio
0,Aïn Diab,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1,Aïn Diab,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Aïn Diab,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aïn Diab,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aïn Diab,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [37]:
df1_grouped = df1_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(df1_grouped.shape)
df1_grouped

(21, 116)


Unnamed: 0,Neighborhoods,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Beach Bar,Beer Garden,Big Box Store,Boarding House,Bookstore,Brazilian Restaurant,Burger Joint,Café,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Creperie,Cupcake Shop,Department Store,Diner,Doner Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food,French Restaurant,Garden Center,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Latin American Restaurant,Library,Lighthouse,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,Movie Theater,Multiplex,Neighborhood,Nightclub,Noodle House,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Pool Hall,Pub,Racetrack,Resort,Restaurant,Rock Club,Salad Place,Sandwich Place,Scenic Lookout,School,Seafood Restaurant,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Yoga Studio
0,Aïn Diab,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.12,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.01,0.02,0.0,0.03,0.02,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.01,0.03,0.0,0.0,0.03,0.0,0.01,0.02,0.01,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0
1,Belvédère (Casablanca),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.037037,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.148148,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.148148,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.074074,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bourgogne (Casablanca),0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.07,0.0,0.0,0.1,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.04,0.0,0.05,0.02,0.0,0.01,0.01,0.0,0.01,0.03,0.0,0.0,0.01,0.04,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.0,0.01,0.01,0.02,0.02,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0
3,CIL (Casablanca),0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.12,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.01,0.02,0.0,0.03,0.02,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.01,0.03,0.0,0.0,0.03,0.0,0.01,0.02,0.01,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0
4,Californie (quartier),0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.046875,0.015625,0.0,0.0,0.015625,0.0,0.0,0.03125,0.0,0.0,0.171875,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.03125,0.0,0.078125,0.0,0.015625,0.0,0.046875,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.03125,0.0,0.0625,0.015625,0.015625,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625,0.0,0.015625,0.03125,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.046875,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Derb Ghallef,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.12,0.02,0.0,0.08,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.02,0.07,0.03,0.0,0.02,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.01,0.02,0.0,0.0,0.03,0.0,0.02,0.02,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0
6,Derb Sultan,0.020408,0.020408,0.0,0.0,0.0,0.020408,0.020408,0.061224,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.163265,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.102041,0.020408,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.040816,0.020408,0.040816,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.040816,0.040816,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Habous (Casablanca),0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.027027,0.027027,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.013514,0.094595,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.013514,0.0,0.0,0.040541,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.013514,0.0,0.013514,0.0,0.013514,0.027027,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.013514,0.0,0.027027,0.013514,0.0,0.0,0.0,0.0,0.027027,0.040541,0.0,0.0,0.013514,0.040541,0.0,0.027027,0.0,0.013514,0.0,0.027027,0.0,0.013514,0.0,0.0,0.027027,0.0,0.013514,0.0,0.013514,0.0,0.0,0.013514,0.013514,0.013514,0.013514,0.0,0.013514,0.013514,0.0,0.013514,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514
8,Hay El Hanaa,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.043011,0.021505,0.032258,0.010753,0.0,0.0,0.0,0.0,0.0,0.010753,0.215054,0.0,0.0,0.043011,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.010753,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.010753,0.010753,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.0,0.086022,0.0,0.021505,0.0,0.043011,0.021505,0.010753,0.0,0.0,0.0,0.0,0.032258,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.010753,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.010753,0.010753,0.010753,0.053763,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.010753,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753
9,Hay El Hassani,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.210526,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.131579,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0


In [39]:
len(df1_grouped[df1_grouped["Shopping Mall"] > 0])

15

Creating a DataFrame for Shopping Mall data 

In [42]:
casa_mall = df1_grouped[["Neighborhoods","Shopping Mall"]]
casa_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Aïn Diab,0.02
1,Belvédère (Casablanca),0.037037
2,Bourgogne (Casablanca),0.01
3,CIL (Casablanca),0.02
4,Californie (quartier),0.03125


### Cluster Neighborhoods

Run k-means to cluster the neighborhoods in Casablanca into 3 clusters.

In [47]:
# set number of clusters
kclusters = 3

casa_clustering = casa_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(casa_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 1, 1, 0, 1, 0, 1, 1, 0], dtype=int32)

In [48]:
# creating a dataframe including the cluster and the top 10 venues for each neighborhood.
casa_merged = casa_mall.copy()

# add clustering labels
casa_merged["Cluster Labels"] = kmeans.labels_

In [50]:
casa_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
casa_merged

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Aïn Diab,0.02,1
1,Belvédère (Casablanca),0.037037,0
2,Bourgogne (Casablanca),0.01,1
3,CIL (Casablanca),0.02,1
4,Californie (quartier),0.03125,0
5,Derb Ghallef,0.0,1
6,Derb Sultan,0.040816,0
7,Habous (Casablanca),0.013514,1
8,Hay El Hanaa,0.010753,1
9,Hay El Hassani,0.026316,0


In [51]:
# merge casa_grouped with casa_data to add latitude/longitude for each neighborhood
casa_merged = casa_merged.join(df1.set_index("Neighborhood"), on="Neighborhood")

print(casa_merged.shape)
casa_merged.head() 

(21, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Aïn Diab,0.02,1,33.59661,-7.61889
1,Belvédère (Casablanca),0.037037,0,33.59512,-7.5881
2,Bourgogne (Casablanca),0.01,1,33.60267,-7.6453
3,CIL (Casablanca),0.02,1,33.59661,-7.61889
4,Californie (quartier),0.03125,0,35.782046,-5.823914


In [52]:
# sort the results by Cluster Labels
print(casa_merged.shape)
casa_merged.sort_values(["Cluster Labels"], inplace=True)
casa_merged

(21, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
20,Sidi Maârouf,0.032258,0,33.52482,-7.65049
1,Belvédère (Casablanca),0.037037,0,33.59512,-7.5881
4,Californie (quartier),0.03125,0,35.782046,-5.823914
6,Derb Sultan,0.040816,0,33.57227,-7.59541
13,Les Roches Noires,0.043478,0,33.59946,-7.58372
9,Hay El Hassani,0.026316,0,33.57596,-7.67666
12,La Colline (Casablanca),0.036364,0,33.57367,-7.59811
11,Inara,0.045455,0,33.60107,-7.58443
18,Salmia 2,0.0,1,33.53825,-7.55351
17,Salmia 1,0.0,1,33.53983,-7.56862


### Visualization of the resulting clusters

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(casa_merged['Latitude'], casa_merged['Longitude'], casa_merged['Neighborhood'], casa_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining Clusters

Cluster 0

In [54]:
casa_merged.loc[casa_merged['Cluster Labels'] == 0]


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
20,Sidi Maârouf,0.032258,0,33.52482,-7.65049
1,Belvédère (Casablanca),0.037037,0,33.59512,-7.5881
4,Californie (quartier),0.03125,0,35.782046,-5.823914
6,Derb Sultan,0.040816,0,33.57227,-7.59541
13,Les Roches Noires,0.043478,0,33.59946,-7.58372
9,Hay El Hassani,0.026316,0,33.57596,-7.67666
12,La Colline (Casablanca),0.036364,0,33.57367,-7.59811
11,Inara,0.045455,0,33.60107,-7.58443


Cluster 1

In [55]:
casa_merged.loc[casa_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
18,Salmia 2,0.0,1,33.53825,-7.55351
17,Salmia 1,0.0,1,33.53983,-7.56862
16,Racine (Casablanca),0.0,1,33.58921,-7.64061
14,Oasis (Casablanca),0.0,1,33.55119,-7.55158
0,Aïn Diab,0.02,1,33.59661,-7.61889
8,Hay El Hanaa,0.010753,1,33.58062,-7.66527
7,Habous (Casablanca),0.013514,1,33.605191,-7.652688
5,Derb Ghallef,0.0,1,33.57593,-7.62971
3,CIL (Casablanca),0.02,1,33.59661,-7.61889
2,Bourgogne (Casablanca),0.01,1,33.60267,-7.6453


Cluster 2

In [57]:
casa_merged.loc[casa_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
15,Oulfa,0.076923,2,33.55741,-7.68153


#### Conclusion

Most of the shopping centers are concentrated in the central area of the city of Casablanca, with the highest number in cluster 1 and a moderate number in cluster 0. In contrast, cluster 2 has a very low or totally non-existent number in districts. This represents an excellent opportunity and areas with high potential to open new shopping centers, as there is very little or no competition from existing shopping centers. Meanwhile, cluster 1 shopping centers are likely to suffer from intense competition due to oversupply and high concentration of shopping centers. From another point of view, it also shows that the oversupply of malls mainly occurred in the central area of the city, with the suburb still having very few malls. Therefore, this project recommends that real estate developers capitalize on these results to open new shopping centers in cluster 2 neighborhoods with little or no competition. Real estate developers with unique selling propositions to set themselves apart from the competition can also open new malls in cluster 0 neighborhoods with moderate competition. Finally, real estate developers are advised to avoid neighborhoods in cluster 2 which already have a high concentration of shopping centers and which suffer from intense competition.