# Capstone Project - Moroccan Cities Clustering

# ---------------------------------------------------------------------------

## Introduction

In this project, we will try to cluster the cities of my own country Morocco based on their characteristics. In order to know the similar cities and to recommend new cities from people's preferences. This recommender systems may be utilized in a variety of areas. It may be used, for example, as a method for assessing tourists. It can also be extended to different compare different cities from different countries. The data collected will also be used for clustering the cities based on the presence of a single venue. To decide, for example where to invest his money on a project.

## Data Description

The data that we'll be using is constructed of the cities names and the characteristics of each city based on the Foursquare location data. We will use a Wikipedia page to extract the Moroccan cities with their population, latitudes and longitudes. Then, we will use the foursquare API to get the venues of each city. The dataset created will be used in the recommender system.

## Methodology

#### Import Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


#### Get the list of Moroccan cities

In [2]:
URL_Moroccan_Cities = 'https://en.wikipedia.org/wiki/List_of_cities_in_Morocco'
# Load article, turn into soup and get the <table>s
url = requests.get(URL_Moroccan_Cities).text
soup = BeautifulSoup(url,'lxml')
My_table = soup.find('table',{'class':'wikitable sortable'})
#My_table

for table in My_table:
    ths = My_table.find_all('th')
    headers = [th.text.strip() for th in ths]

#Correct headers
headers = ['Rank', 'City', 'Population', 'Region']    

i=1;
for tr in My_table.find_all('tr'):
    tds = tr.find_all('td')
    row = [td.text.strip() for td in tds]
    if i:
        col = row
        i=0
    else :
        col.append(row)

df = pd.DataFrame(col)
df.columns = headers

#Remove [a], [b].... from cities
df=df.replace('\[.\]','',regex=True)

#Drop renk column
df.drop(['Rank'], axis = 1, inplace = True)

df.head()
df.loc[df.index.max() + 1] = ['Laayoune', 217732, 'Sahara']
df.loc[df.index.max() + 1] = ['Dakhla', 106277, 'Sahara']
df.head()

Unnamed: 0,City,Population,Region
0,Casablanca,3359818,Casablanca-Settat
1,Fez,1112072,Fès-Meknès
2,Tangier,947952,Tanger-Tetouan-Al Hoceima
3,Marrakesh,928850,Marrakesh-Safi
4,Salé,890403,Rabat-Salé-Kénitra


In [3]:
df.dtypes

City          object
Population    object
Region        object
dtype: object

#### Let's convert the population to int

In [4]:
df['Population'].replace(',','', regex = True, inplace = True)
df.head()

Unnamed: 0,City,Population,Region
0,Casablanca,3359818,Casablanca-Settat
1,Fez,1112072,Fès-Meknès
2,Tangier,947952,Tanger-Tetouan-Al Hoceima
3,Marrakesh,928850,Marrakesh-Safi
4,Salé,890403,Rabat-Salé-Kénitra


In [5]:
df['Population'] = df['Population'].astype(int)
df.dtypes

City          object
Population     int64
Region        object
dtype: object

In [6]:
df.head()

Unnamed: 0,City,Population,Region
0,Casablanca,3359818,Casablanca-Settat
1,Fez,1112072,Fès-Meknès
2,Tangier,947952,Tanger-Tetouan-Al Hoceima
3,Marrakesh,928850,Marrakesh-Safi
4,Salé,890403,Rabat-Salé-Kénitra


#### Get latitude and longitude from address

In [7]:
import sys
import time

address = df['City']
geolocator = Nominatim(user_agent="foursquare_agent")
df['Latitude'] = ""
df['Longitude'] = ""

location = None
for city in address :
    while location is None:
        location = geolocator.geocode('{}, Morocco'.format(city))
    latitude = location.latitude
    longitude = location.longitude
    df['Latitude'].loc[df.index[df['City'] == city]] = latitude
    df['Longitude'].loc[df.index[df['City'] == city]] = longitude
    location = None
    print(city, ' : OK !')

print('Done !')
df['Latitude'] = df['Latitude'].astype(float)
df['Longitude'] = df['Longitude'].astype(float)
df.head()


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


Casablanca  : OK !
Fez  : OK !
Tangier  : OK !
Marrakesh  : OK !
Salé  : OK !
Meknes  : OK !
Rabat  : OK !
Oujda  : OK !
Kenitra  : OK !
Agadir  : OK !
Tetouan  : OK !
Temara  : OK !
Safi  : OK !
Mohammedia  : OK !
Khouribga  : OK !
El Jadida  : OK !
Beni Mellal  : OK !
Aït Melloul  : OK !
Nador  : OK !
Dar Bouazza  : OK !
Taza  : OK !
Settat  : OK !
Berrechid  : OK !
Khemisset  : OK !
Inezgane  : OK !
Ksar El Kebir  : OK !
Larache  : OK !
Guelmim  : OK !
Khenifra  : OK !
Berkane  : OK !
Taourirt  : OK !
Bouskoura  : OK !
Fquih Ben Salah  : OK !
Dcheira El Jihadia  : OK !
Oued Zem  : OK !
El Kelaa Des Sraghna  : OK !
Sidi Slimane  : OK !
Errachidia  : OK !
Guercif  : OK !
Oulad Teima  : OK !
Ben Guerir  : OK !
Tifelt  : OK !
Lqliaa  : OK !
Taroudant  : OK !
Sefrou  : OK !
Essaouira  : OK !
Fnideq  : OK !
Sidi Kacem  : OK !
Tiznit  : OK !
Tan-Tan  : OK !
Ouarzazate  : OK !
Souk El Arbaa  : OK !
Youssoufia  : OK !
Lahraouyine  : OK !
Martil  : OK !
Ain Harrouda  : OK !
Suq as-Sabt Awlad 

Unnamed: 0,City,Population,Region,Latitude,Longitude
0,Casablanca,3359818,Casablanca-Settat,33.595063,-7.618777
1,Fez,1112072,Fès-Meknès,34.034653,-5.016193
2,Tangier,947952,Tanger-Tetouan-Al Hoceima,35.777103,-5.803792
3,Marrakesh,928850,Marrakesh-Safi,31.625826,-7.989161
4,Salé,890403,Rabat-Salé-Kénitra,34.015678,-6.756799


#### Visualize the map

In [9]:
location = geolocator.geocode('Morocco')
latitude = location.latitude
longitude = location.longitude
# create map of New York using latitude and longitude values
map_Morocco = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, city in zip(df['Latitude'], df['Longitude'], df['City']):
    label = '{} : {}, {} '.format(city, lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='Green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Morocco)  
    
map_Morocco

Now that we have our location candidates, let's use Foursquare API to get info on venues in each city

#### Define Foursquare Credentials and Version

In [11]:
CLIENT_ID = 'JDL4X53XBSDRDHG5H4CYLYEFJQYS5AYSFA5LTJA4VP3PXV3T' # your Foursquare ID
CLIENT_SECRET = 'AQA54H5YE3CE02SONVFCCBNV4K5CFCHGO3UT1ZVPQ3WVWOLG' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
radius = 1000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: JDL4X53XBSDRDHG5H4CYLYEFJQYS5AYSFA5LTJA4VP3PXV3T
CLIENT_SECRET:AQA54H5YE3CE02SONVFCCBNV4K5CFCHGO3UT1ZVPQ3WVWOLG


#### Get the venues of each city

In [12]:
df_venues = pd.DataFrame(columns=['City', 'Population','name', 'categories', 'lat', 'lng'])

for lat, lng, city, population in zip(df['Latitude'], df['Longitude'], df['City'], df['Population']):
    # Create the explore url
    latitude = lat
    longitude = lng
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION)


    # Send the GET Request and examine the results
    results = requests.get(url).json()


    # Get relevant part of JSON and transform it into a pandas dataframe

    # function that extracts the category of the venue
    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']

        if len(categories_list) == 0:
            return None
        else:
            return categories_list[0]['name']

    venues = results['response']['groups'][0]['items']

    nearby_venues = json_normalize(venues) # flatten JSON

    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues =nearby_venues.loc[:, filtered_columns]

    # filter the category for each row
    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    # clean columns
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    
    nearby_venues['City'] = ""
    nearby_venues['Population'] = ""
    nearby_venues = nearby_venues[['City', 'Population','name', 'categories', 'lat', 'lng']]
    for i in range(0,nearby_venues.index.max()+1):
        nearby_venues['City'].iloc[i] = city
        nearby_venues['Population'].iloc[i] = population
    
    df_venues = df_venues.append(nearby_venues, ignore_index = True)
    print(city, ' : Done !')


df_venues.head()


Casablanca  : Done !
Fez  : Done !
Tangier  : Done !
Marrakesh  : Done !
Salé  : Done !
Meknes  : Done !
Rabat  : Done !
Oujda  : Done !
Kenitra  : Done !
Agadir  : Done !
Tetouan  : Done !
Temara  : Done !
Safi  : Done !
Mohammedia  : Done !
Khouribga  : Done !
El Jadida  : Done !
Beni Mellal  : Done !
Aït Melloul  : Done !
Nador  : Done !
Dar Bouazza  : Done !
Taza  : Done !
Settat  : Done !
Berrechid  : Done !
Khemisset  : Done !
Inezgane  : Done !
Ksar El Kebir  : Done !
Larache  : Done !
Guelmim  : Done !
Khenifra  : Done !
Berkane  : Done !
Taourirt  : Done !
Bouskoura  : Done !
Fquih Ben Salah  : Done !
Dcheira El Jihadia  : Done !
Oued Zem  : Done !
El Kelaa Des Sraghna  : Done !
Sidi Slimane  : Done !
Errachidia  : Done !
Guercif  : Done !
Oulad Teima  : Done !
Ben Guerir  : Done !
Tifelt  : Done !
Lqliaa  : Done !
Taroudant  : Done !
Sefrou  : Done !
Essaouira  : Done !
Fnideq  : Done !
Sidi Kacem  : Done !
Tiznit  : Done !
Tan-Tan  : Done !
Ouarzazate  : Done !
Souk El Arbaa

Unnamed: 0,City,Population,name,categories,lat,lng
0,Casablanca,3359818,Casa Jose,Tapas Restaurant,33.597823,-7.615341
1,Casablanca,3359818,Six PM,Hotel Bar,33.59594,-7.618684
2,Casablanca,3359818,Sofitel Casablanca Tour Blanche,Hotel,33.598251,-7.61396
3,Casablanca,3359818,Le Rouget de l'Isle,French Restaurant,33.592591,-7.622857
4,Casablanca,3359818,Hyatt Regency Casablanca,Hotel,33.596195,-7.618708


In [13]:
df_categories = pd.get_dummies(df_venues['categories'])
print(df_venues.shape, df_categories.shape)
df_categories.head()


(1096, 6) (1096, 132)


Unnamed: 0,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Board Shop,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Café,Campground,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Creperie,Cupcake Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Falafel Restaurant,Farm,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Forest,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Travel,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hotel Pool,Housing Development,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Lighthouse,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Mountain,Movie Theater,Museum,Nature Preserve,Neighborhood,Night Market,Nightclub,Other Great Outdoors,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Recreation Center,Resort,Rest Area,Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Stadium,Theater,Theme Park,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Water Park,Wings Joint,Zoo
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [14]:
df_venues_categories = pd.concat([df_venues, df_categories], axis=1)
df_venues_categories.head()

Unnamed: 0,City,Population,name,categories,lat,lng,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Board Shop,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Café,Campground,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Creperie,Cupcake Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Falafel Restaurant,Farm,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Forest,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Travel,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hotel Pool,Housing Development,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Lighthouse,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Mountain,Movie Theater,Museum,Nature Preserve,Neighborhood,Night Market,Nightclub,Other Great Outdoors,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Recreation Center,Resort,Rest Area,Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Stadium,Theater,Theme Park,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Water Park,Wings Joint,Zoo
0,Casablanca,3359818,Casa Jose,Tapas Restaurant,33.597823,-7.615341,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1,Casablanca,3359818,Six PM,Hotel Bar,33.59594,-7.618684,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Casablanca,3359818,Sofitel Casablanca Tour Blanche,Hotel,33.598251,-7.61396,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Casablanca,3359818,Le Rouget de l'Isle,French Restaurant,33.592591,-7.622857,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Casablanca,3359818,Hyatt Regency Casablanca,Hotel,33.596195,-7.618708,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [15]:
df_venues_categories.shape
df_venues_categories.columns[6:145]

Index(['Airport', 'Airport Terminal', 'American Restaurant', 'Amphitheater',
       'Antique Shop', 'Art Gallery', 'Art Museum', 'Arts & Crafts Store',
       'BBQ Joint', 'Bakery',
       ...
       'Tennis Stadium', 'Theater', 'Theme Park', 'Trail', 'Train Station',
       'Travel & Transport', 'Vegetarian / Vegan Restaurant', 'Water Park',
       'Wings Joint', 'Zoo'],
      dtype='object', length=132)

#### Finalize the dataset by regrouping the population the venues and the total venues of each city

In [16]:
data = df_venues_categories[['City','Population']+df_venues_categories.columns[6:145].tolist()].groupby(['City', 'Population'],as_index = False).sum()
data.insert(loc = 2,column = 'Latitude', value = df[['City', 'Latitude']].sort_values(by=['City']).reset_index(drop=True).iloc[:,1].values)
data.insert(loc = 3,column = 'Longitude', value = df[['City', 'Longitude']].sort_values(by=['City']).reset_index(drop=True).iloc[:,1].values)
data.insert(loc = 4,column = 'Venues Total Number', value = df_venues[['City', 'categories']].groupby(['City'],as_index = False).count().iloc[:,1].values)
data.to_csv('data.csv', sep=';')
data.head()

Unnamed: 0,City,Population,Latitude,Longitude,Venues Total Number,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Board Shop,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Café,Campground,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Creperie,Cupcake Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Falafel Restaurant,Farm,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Forest,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Travel,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hotel Pool,Housing Development,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Lighthouse,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Mountain,Movie Theater,Museum,Nature Preserve,Neighborhood,Night Market,Nightclub,Other Great Outdoors,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Recreation Center,Resort,Rest Area,Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Stadium,Theater,Theme Park,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Water Park,Wings Joint,Zoo
0,Agadir,421844,30.421114,-9.583063,30,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1
1,Ain Harrouda,62420,33.635107,-7.450797,30,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,2,0,0,0,0,0,1,0,2,0,0,0,2,0,1,0,0,0,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,Al Hoceima,56716,35.245114,-3.930186,13,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Azrou,54350,33.436117,-5.221913,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aït Melloul,171847,30.338128,-9.504277,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,0,0,0,4,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


As you can see the data we'll be using is given by the data frame "data". It contains the the Moroccan cities their population, latitude, longitude and the number of venues with their types. This data is used for clustering in the next section

#### Data clustering 

After creating the dataset, in this section we will use it for clustering the cities based on their characteristics. We will be using the *k*-means which is vastly used for clustering in many data science applications.

The KMeans class has many parameters that can be used, but we will use these three:
<ul>
    <li> <strong>init</strong>: Initialization method of the centroids. </li>
    <ul>
        <li> Value will be: "k-means++". k-means++ selects initial cluster centers for <em>k</em>-means clustering in a smart way to speed up convergence.</li>
    </ul>
    <li> <strong>n_clusters</strong>: The number of clusters to form as well as the number of centroids to generate. </li>
    <ul> <li> Value will be: 4 (since we have 4 centers)</li> </ul>
    <li> <strong>n_init</strong>: Number of times the <em>k</em>-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. </li>
    <ul> <li> Value will be: 12 </li> </ul>
</ul>


In [18]:
kclusters = 10
k_means = KMeans(init="k-means++", n_clusters = kclusters,  random_state=0)

Now let's fit the KMeans model with the feature matrix we created above, <b> data </b>.

In [19]:
data_clustering = data.drop(['City', 'Population', 'Latitude', 'Longitude'] , 1)

# run k-means clustering
kmeans = k_means.fit(data_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30] 


array([0, 4, 2, 1, 4, 9, 4, 9, 1, 9, 9, 4, 4, 9, 2, 0, 4, 8, 1, 9, 0, 4,
       8, 9, 9, 9, 0, 4, 9, 9], dtype=int32)

Let's create a new dataframe that includes the cluster as well as venues for each city.

In [20]:
# add clustering labels
data_clusters = data.copy()
data_clusters.insert(0, 'Cluster Labels', kmeans.labels_)
 

data_clusters.head() # check the last columns!

Unnamed: 0,Cluster Labels,City,Population,Latitude,Longitude,Venues Total Number,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Board Shop,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Café,Campground,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Creperie,Cupcake Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Falafel Restaurant,Farm,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Forest,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Travel,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hotel Pool,Housing Development,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Lighthouse,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Mountain,Movie Theater,Museum,Nature Preserve,Neighborhood,Night Market,Nightclub,Other Great Outdoors,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Recreation Center,Resort,Rest Area,Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Stadium,Theater,Theme Park,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Water Park,Wings Joint,Zoo
0,0,Agadir,421844,30.421114,-9.583063,30,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1
1,4,Ain Harrouda,62420,33.635107,-7.450797,30,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,2,0,0,0,0,0,1,0,2,0,0,0,2,0,1,0,0,0,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,2,Al Hoceima,56716,35.245114,-3.930186,13,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1,Azrou,54350,33.436117,-5.221913,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,4,Aït Melloul,171847,30.338128,-9.504277,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,0,0,0,4,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


Finally, let's visualize the resulting clusters

In [21]:
# create map
location = geolocator.geocode('Morocco')
latitude = location.latitude
longitude = location.longitude
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=6)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, city, cluster in zip(data_clusters['Latitude'], data_clusters['Longitude'], data_clusters['City'], data_clusters['Cluster Labels']):
    label = folium.Popup(str(city) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine the clusters

In [23]:
data_clusters_show = data_clusters.copy()
for k in range(0,kclusters) :
    print('Cluster N° : ', k+1)
    print(data_clusters_show.loc[data_clusters_show['Cluster Labels'] == k].iloc[:,[0,1]].reset_index(drop=True))
    


Cluster N° :  1
   Cluster Labels                City
0               0              Agadir
1               0  Dcheira El Jihadia
2               0           Essaouira
3               0            Inezgane
4               0              Tifelt
Cluster N° :  2
    Cluster Labels                       City
0                1                      Azrou
1                1                 Benslimane
2                1       El Kelaa Des Sraghna
3                1                     Lqliaa
4                1                     Midelt
5                1                   Oued Zem
6                1                       Safi
7                1                     Sefrou
8                1                     Settat
9                1               Sidi Slimane
10               1                    Skhirat
11               1  Suq as-Sabt Awlad an-Nama
Cluster N° :  3
   Cluster Labels         City
0               2   Al Hoceima
1               2  Dar Bouazza
2               2        Oujda
Cl

We can use the data also to cluster the moroccan cities based on the presence of a single venues. In order, for example, to decide where to invest on a projet. In the next part, we will use the data to cluster the cities based on the presence of coffees.  

#### Moroccan cities clustering based on the presence of coffees

In [42]:
data.head()

Unnamed: 0,City,Population,Latitude,Longitude,Venues Total Number,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Board Shop,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Café,Campground,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Creperie,Cupcake Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Falafel Restaurant,Farm,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Forest,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Travel,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hotel Pool,Housing Development,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Lighthouse,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Mountain,Movie Theater,Museum,Nature Preserve,Neighborhood,Night Market,Nightclub,Other Great Outdoors,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Recreation Center,Resort,Rest Area,Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Stadium,Theater,Theme Park,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Water Park,Wings Joint,Zoo
0,Agadir,421844,30.421114,-9.583063,30,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1
1,Ain Harrouda,62420,33.635107,-7.450797,30,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,2,0,0,0,0,0,1,0,2,0,0,0,2,0,1,0,0,0,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,Al Hoceima,56716,35.245114,-3.930186,13,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Azrou,54350,33.436117,-5.221913,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aït Melloul,171847,30.338128,-9.504277,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,0,0,0,4,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


In [44]:
data_clustring_coffees = data[['Café', 'Coffee Shop']]
data_clustring_coffees.head()

Unnamed: 0,Café,Coffee Shop
0,1,2
1,3,2
2,1,0
3,3,1
4,5,4


In [62]:
kclusters_coffees = 10
k_means_coffees = KMeans(init="k-means++", n_clusters = kclusters_coffees,  random_state=0)

# run k-means clustering
kmeanscoffees = k_means_coffees.fit(data_clustring_coffees)

# check cluster labels generated for each row in the dataframe
kmeanscoffees.labels_[0:30] 

# add clustering labels
data_clusters_coffees = data.copy()
data_clusters_coffees.insert(0, 'Coffees Cluster Labels', kmeanscoffees.labels_)
 

data_clusters_coffees.head() # check the last columns!

Unnamed: 0,Coffees Cluster Labels,City,Population,Latitude,Longitude,Venues Total Number,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Board Shop,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Café,Campground,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Creperie,Cupcake Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Falafel Restaurant,Farm,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Forest,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Travel,Golf Course,Grocery Store,Gym,Harbor / Marina,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hotel Pool,Housing Development,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Lighthouse,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Mountain,Movie Theater,Museum,Nature Preserve,Neighborhood,Night Market,Nightclub,Other Great Outdoors,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Recreation Center,Resort,Rest Area,Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Stadium,Theater,Theme Park,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Water Park,Wings Joint,Zoo
0,5,Agadir,421844,30.421114,-9.583063,30,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1
1,4,Ain Harrouda,62420,33.635107,-7.450797,30,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,2,0,0,0,0,0,1,0,2,0,0,0,2,0,1,0,0,0,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,7,Al Hoceima,56716,35.245114,-3.930186,13,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,4,Azrou,54350,33.436117,-5.221913,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1,Aït Melloul,171847,30.338128,-9.504277,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,0,0,0,4,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


In [63]:
# create map
location = geolocator.geocode('Morocco')
latitude = location.latitude
longitude = location.longitude
map_clusters_coffees = folium.Map(location=[latitude, longitude], zoom_start=6)

# set color scheme for the clusters
x = np.arange(kclusters_coffees)
ys = [i + x + (i*x)**2 for i in range(kclusters_coffees)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, city, cluster in zip(data_clusters_coffees['Latitude'], data_clusters_coffees['Longitude'], data_clusters_coffees['City'], data_clusters_coffees['Coffees Cluster Labels']):
    label = folium.Popup(str(city) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_coffees)
       
map_clusters_coffees

In [60]:
print(data_clusters_coffees.columns.get_loc('Café'),(data_clusters_coffees.columns.get_loc('Coffee Shop')))

30 34


In [64]:
data_clusters_coffees_show = data_clusters_coffees.copy()

for k in range(0,kclusters_coffees) :
    print('Cluster N° : ', k+1)
    print(data_clusters_coffees_show.loc[data_clusters_coffees_show['Coffees Cluster Labels'] == k].iloc[:,[0,1,2,30,34]].reset_index(drop=True))

Cluster N° :  1
    Coffees Cluster Labels                City  Population  Café  Coffee Shop
0                        0          Ben Guerir       88626     0            0
1                        0           Berrechid      136634     0            1
2                        0  Dcheira El Jihadia      100336     0            1
3                        0          Errachidia       92374     0            0
4                        0            Inezgane      130333     0            1
5                        0           Khouribga      196196     0            0
6                        0            Laayoune      217732     0            0
7                        0              Lqliaa       83235     0            0
8                        0          Ouarzazate       71067     0            0
9                        0              Settat      142250     0            1
10                       0          Sidi Kacem       75672     0            0
11                       0        Sidi Slimane  

## Results and Discussion

The first use of the dataset was for clustering the Moroccan cities based on their venues. We can see the association between the similar cities on the map. For the next part, our analysis shows that there exist a lot of cities with a low number of coffees. Even if the data is not 100 % correct, it is significant enough from my knowledge about the Moroccan cities. The result of the analysis is 14 cities with a low number of coffees, which may be good places for such projects. However, it may not be a good idea to invest in some cities, taking into account the population and its culture. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in locations which have not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion

Purpose of this project was to cluster the cities of my own country Morocco based on their characteristics. The to cluster the cities based on the presence of single venue. We used a Wikipedia page to extract the Moroccan cities, their populations, latitudes and longitudes. Then we used the foursquare API to get the venues in each city. The data collected was used to cluster the cities to see the distribution of the venues in Morocco. The dataset was also use to cluster the cities based on the presence of coffees. We've grouped the cities into 10 clusters from cities with low number of coffees to cities with high number of coffees. The results is 14 cities with low number of coffees what means low competition. However, further analysis may be needed to make a more precise decision where to invest his money.
