# Morocco Cities Clustering

## Introduction and description 

The goal of this project it to **cluster** the cities of my country **Morocco**. The clustering algorithm will use **geographical** data such as *population*, *number of hotels* and *number and type of industries**. This clustering could serve for serveral purposes:

- Say I had to move from my current city, I would like to choose another city which is similar to my current city.

- For a foreign tourist, Say you visited a city $A$ and you liked it but didn't like city $B$. In future visit to Morocco, you'll would like to avoid all the cities in the $B$ cluster and try to discover more cities in the $A$ cluster. Better application for this would be a **Recommender system** but can also use *clustering*.


## Data description

The first step for the data preparation is to get a list of **cities** in Morocco. Luckily, this [wikipedia page](https://en.wikipedia.org/wiki/List_of_cities_in_Morocco) has a list of all the imporant cities in a table.

In [42]:
import pandas as pd
import matplotlib.pyplot as plt
import folium
from geopy import Nominatim
import requests
import numpy as np

In [8]:
morocco_cities_link = "https://en.wikipedia.org/wiki/List_of_cities_in_Morocco"

In [63]:
morocco_cities = requests.get(morocco_cities_link).content
morocco_cities = pd.read_html(morocco_cities)[0]

In [64]:
morocco_cities.head()

Unnamed: 0,Rank,City,Population(2014 census)[5][6],Region
0,1,Casablanca[b],3359818,Casablanca-Settat
1,2,Fez[c],1112072,Fès-Meknès
2,3,Tangier[d],947952,Tanger-Tetouan-Al Hoceima
3,4,Marrakesh[e],928850,Marrakesh-Safi
4,5,Salé[f],890403,Rabat-Salé-Kénitra


In [65]:
#rename the long population column
morocco_cities.rename(columns = {'Population(2014 census)[5][6]':'Population'}, inplace=True)
#set index as the City
morocco_cities.set_index(morocco_cities['City'],inplace=True)

#dropping the rank colum
morocco_cities.drop(['City','Rank'], axis=1, inplace=True)
morocco_cities.head()

Unnamed: 0_level_0,Population,Region
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Casablanca[b],3359818,Casablanca-Settat
Fez[c],1112072,Fès-Meknès
Tangier[d],947952,Tanger-Tetouan-Al Hoceima
Marrakesh[e],928850,Marrakesh-Safi
Salé[f],890403,Rabat-Salé-Kénitra


In [66]:
morocco_cities.index = morocco_cities.index.map(lambda x : x.split(sep = '[')[0])
morocco_cities.head()

Unnamed: 0_level_0,Population,Region
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Casablanca,3359818,Casablanca-Settat
Fez,1112072,Fès-Meknès
Tangier,947952,Tanger-Tetouan-Al Hoceima
Marrakesh,928850,Marrakesh-Safi
Salé,890403,Rabat-Salé-Kénitra


The table is _bare bone_ for now as we only have the 

- City name:  as index
- Population:
- Region

As a first step, we will add the **goelocalisation** positions

In [67]:
def gps_coordinates(description):
    """
    get the gps (latitude, longitude)
    from the description using the foursquare agent
    """
    geolocator = Nominatim(user_agent='foursquare_agent')

    #getting the location
    while True:
        location = geolocator.geocode(description)
        if location is not None:
            break
    
    return location.latitude, location.longitude


In [69]:
morocco_cities['latitude'] = np.zeros(len(morocco_cities))
morocco_cities['longitude'] = np.zeros(len(morocco_cities))

for i in range(len(morocco_cities)):
    #region
    region = morocco_cities.index[i]   
    #coordinate
    lat, long = gps_coordinates(region)
    
    #
    morocco_cities.iloc[i,2] = lat
    morocco_cities.iloc[i,3] = long

In [70]:
morocco_cities.head()

Unnamed: 0_level_0,Population,Region,latitude,longitude
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Casablanca,3359818,Casablanca-Settat,33.595063,-7.618777
Fez,1112072,Fès-Meknès,34.034653,-5.016193
Tangier,947952,Tanger-Tetouan-Al Hoceima,35.777103,-5.803792
Marrakesh,928850,Marrakesh-Safi,31.625826,-7.989161
Salé,890403,Rabat-Salé-Kénitra,34.044889,-6.814017


# showing the cities

In [72]:
morocco_position = gps_coordinates('Morocco')

In [85]:
morocco = folium.Map(location = morocco_position, zoom_start=6)
for city, lat, long in zip(morocco_cities.index, morocco_cities.latitude, morocco_cities.longitude):
    folium.features.CircleMarker(
        [lat, long],
        fill_color='blue',
        radius=3,
        fill=True
    ).add_to(morocco)
morocco

Now we nee