# Cities of Egypt. 
## IBM Data Science Professional Certificate -- Capstone Project
### By: Abdullah M. Mustafa


## Introduction:

Egypt is a big country with a population over 100 million with a total of 27 governorates. These governorates differ both culturally and economically. For Egypt, tourism is considered a main source of national income; however, not all governorates are considered attractive destinations for tourists. In this project, we aim to better understand the most popular venues across Egypt using the Foursquare API.
The popular venues for the capital cities of each of the governorates are analyzed, and these cities are clustered to better understand the touristic attractions. We expect cities like Cairo, Luxor, and Hurgada to be popular destinations; on the other hand, poor governorates would be less popular. Our objective to enrich these poor cities to be more attractive.


#### Load neccessary libraries

In [1]:
import requests #request the data of some url
import json # manipulate jason files into python data strcutures 
import pandas as pd #Tabular data manipulation in python
import numpy as np #numerical data manipulation in python
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from sklearn.cluster import KMeans # A non-parametric clustering algorithm
import folium # Map visualisation package
import matplotlib.pyplot as plt # python plotting library
from matplotlib import cm, colors
from IPython.display import HTML, display 
%matplotlib inline

## Getting Data:

To proceed with our problem, we first need the location data to feed to the Foursquare API. The data could be retrieved in JSON format from this [simplemaps.com url](https://simplemaps.com/static/data/country-cities/eg/eg.json). Out of multiple data columns, we are mainly interested in the capital of each governorates with its latitude and longitude. 

In [2]:
r = requests.get('https://simplemaps.com/static/data/country-cities/eg/eg.json', allow_redirects=True)

with open('eg.json') as json_file:
    data = json.load(json_file)

In [3]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,city,admin,country,population_proper,iso2,capital,lat,lng,population
0,Cairo,Al Qāhirah,Egypt,7734614,EG,primary,30.07708,31.285909,11893000
1,Alexandria,Al Iskandarīyah,Egypt,3811516,EG,admin,31.215645,29.955266,4165000
2,Al Jīzah,Al Jīzah,Egypt,2681863,EG,admin,30.008079,31.210931,2681863
3,Ismailia,Al Ismā‘īlīyah,Egypt,284813,EG,admin,30.604272,32.272252,656135
4,Port Said,Būr Sa‘īd,Egypt,500000,EG,admin,31.256541,32.284115,623864


We filter out the dataframe to extract only neccessary columns. We also convert datatypes of latitude and longitude to floats.

In [4]:
City = df['admin']
Latitude = df.lat.astype('float')
Longitude = df.lng.astype('float')
df_egypt = pd.DataFrame({'City':City,'Latitude': Latitude, 'Longitude': Longitude})
df_egypt = df_egypt.groupby('City').mean().reset_index()
df_egypt

Unnamed: 0,City,Latitude,Longitude
0,Ad Daqahlīyah,31.036373,31.380691
1,Al Baḩr al Aḩmar,26.991034,33.87731
2,Al Buḩayrah,31.033452,30.446752
3,Al Fayyūm,29.309949,30.841804
4,Al Gharbīyah,30.788471,31.001921
5,Al Iskandarīyah,31.215645,29.955266
6,Al Ismā‘īlīyah,30.604272,32.272252
7,Al Jīzah,30.008079,31.210931
8,Al Minyā,28.165388,30.777255
9,Al Minūfīyah,30.552581,31.009035


Using Nominatim library, we extract the latitude and longitude of Egypt. This is neccessary for constructing a map around the country location.

In [5]:
address = 'Egypt'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Egypt are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Egypt are 26.2540493, 29.2675469.


We can now use folium package to visulaize the country with different cities around it. 

In [70]:
# create map
map_clusters = folium.Map(location=[latitude, longitude],zoom_start=6)

# set color scheme for the clusters


# add markers to the map
markers_colors = []
for lat, lon, city in zip(df_egypt['Latitude'], df_egypt['Longitude'], df_egypt['City']):
    label = folium.Popup(city, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label).add_to(map_clusters)
       
map_clusters

We need to provide some user data to use the Foursquare API.

In [68]:
CLIENTID = '' # your Foursquare ID
CLIENTSECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENTID)
print('CLIENT_SECRET:' + CLIENTSECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


After collecting the locations of different cities across Egypt, we can use Foursquare API to extract the most popular venues at each location. To better explore the city, we set the search diameter to 10 Km, and limit the number of top venues to 100.
For each venue, we extract its name, location, and category. Different cities can be compared based on the popularity of each category. 

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'Country Latitude', 
                  'Country Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We retrieved the popular venues across the country. We could get about 822 venues. 

In [9]:
# type your answer here
egypt_venues = getNearbyVenues(names=df_egypt['City'],
                                   latitudes=df_egypt['Latitude'],
                                   longitudes=df_egypt['Longitude']
                                  )
egypt_venues.head()

Unnamed: 0,City,Country Latitude,Country Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ad Daqahlīyah,31.036373,31.380691,Bremer (بريما),31.044445,31.365417,Fast Food Restaurant
1,Ad Daqahlīyah,31.036373,31.380691,Bayro,31.038255,31.366137,Juice Bar
2,Ad Daqahlīyah,31.036373,31.380691,Pen & Paper,31.045199,31.361419,Bookstore
3,Ad Daqahlīyah,31.036373,31.380691,Titanium Gym,31.046116,31.360996,Gym / Fitness Center
4,Ad Daqahlīyah,31.036373,31.380691,Origo Kaffee,31.037354,31.361005,Coffee Shop


In [10]:
egypt_venues.shape

(822, 7)

We need to process the data to get the most popular categories. This can be done by one-hot encoding all the venues as follows.

In [11]:
# one hot encoding
egypt_onehot = pd.get_dummies(egypt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
egypt_onehot['City'] = egypt_venues['City']
# move neighborhood column to the first column
fixed_columns = ['City'] + list(set(egypt_onehot.columns) - set(['City']))

egypt_onehot = egypt_onehot[fixed_columns]

egypt_onehot.head()

Unnamed: 0,City,Harbor / Marina,Kebab Restaurant,Scenic Lookout,Multiplex,Sports Club,Church,Gym / Fitness Center,Food Court,Shopping Mall,...,Perfume Shop,Entertainment Service,Burger Joint,Italian Restaurant,Fondue Restaurant,Canal,River,History Museum,Port,American Restaurant
0,Ad Daqahlīyah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ad Daqahlīyah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ad Daqahlīyah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ad Daqahlīyah,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ad Daqahlīyah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We can then group the categories across each city.

In [12]:
egypt_grouped = egypt_onehot.groupby('City').mean().reset_index()
egypt_grouped

Unnamed: 0,City,Harbor / Marina,Kebab Restaurant,Scenic Lookout,Multiplex,Sports Club,Church,Gym / Fitness Center,Food Court,Shopping Mall,...,Perfume Shop,Entertainment Service,Burger Joint,Italian Restaurant,Fondue Restaurant,Canal,River,History Museum,Port,American Restaurant
0,Ad Daqahlīyah,0.0,0.052632,0.0,0.0,0.0,0.0,0.035088,0.0,0.017544,...,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Al Baḩr al Aḩmar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0
2,Al Buḩayrah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Al Fayyūm,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Al Gharbīyah,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02
5,Al Iskandarīyah,0.0,0.03,0.0,0.01,0.03,0.0,0.01,0.02,0.03,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01
6,Al Ismā‘īlīyah,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0
7,Al Jīzah,0.0,0.01,0.0,0.01,0.03,0.01,0.02,0.04,0.01,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01
8,Al Minyā,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Al Minūfīyah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The results include 146 categories across 26 governorates (For one governorate, Foursquare didn't return any venues). 

We use these results to obtain the popular categories in each governorate.

In [13]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [58]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
egypt_venues_sorted = pd.DataFrame(columns=columns)
egypt_venues_sorted['City'] = egypt_grouped['City']

for ind in np.arange(egypt_grouped.shape[0]):
    egypt_venues_sorted.iloc[ind, 1:] = return_most_common_venues(egypt_grouped.iloc[ind, :], num_top_venues)

egypt_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ad Daqahlīyah,Café,Coffee Shop,Fast Food Restaurant,Kebab Restaurant,Lounge,Juice Bar,Gaming Cafe,Clothing Store,Bookstore,Dessert Shop
1,Al Baḩr al Aḩmar,Resort,Hotel,Beach,Hotel Bar,Restaurant,Lounge,Pool,Bar,Chinese Restaurant,Mexican Restaurant
2,Al Buḩayrah,Café,Coffee Shop,Hookah Bar,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock,Organic Grocery,American Restaurant
3,Al Fayyūm,Café,Scenic Lookout,American Restaurant,Hookah Bar,Cupcake Shop,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock
4,Al Gharbīyah,Café,Coffee Shop,Restaurant,Juice Bar,Fast Food Restaurant,Lebanese Restaurant,Pizza Place,Music Store,Mobile Phone Shop,Fried Chicken Joint


## Clustering the data

In [59]:
# set number of clusters
kclusters = 5

egypt_grouped_clustering = egypt_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,n_init=1000).fit(egypt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 0, 0, 1, 3, 1, 3, 1, 1])

In [60]:
# add clustering labels
egypt_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
df_egypt_clust = df_egypt.merge(egypt_venues_sorted, on='City')

df_egypt_clust.head() # check the last columns!

Unnamed: 0,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ad Daqahlīyah,31.036373,31.380691,1,Café,Coffee Shop,Fast Food Restaurant,Kebab Restaurant,Lounge,Juice Bar,Gaming Cafe,Clothing Store,Bookstore,Dessert Shop
1,Al Baḩr al Aḩmar,26.991034,33.87731,3,Resort,Hotel,Beach,Hotel Bar,Restaurant,Lounge,Pool,Bar,Chinese Restaurant,Mexican Restaurant
2,Al Buḩayrah,31.033452,30.446752,0,Café,Coffee Shop,Hookah Bar,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock,Organic Grocery,American Restaurant
3,Al Fayyūm,29.309949,30.841804,0,Café,Scenic Lookout,American Restaurant,Hookah Bar,Cupcake Shop,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock
4,Al Gharbīyah,30.788471,31.001921,1,Café,Coffee Shop,Restaurant,Juice Bar,Fast Food Restaurant,Lebanese Restaurant,Pizza Place,Music Store,Mobile Phone Shop,Fried Chicken Joint


In [61]:
# create map
map_clusters = folium.Map(location=[latitude, longitude],zoom_start=6)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_egypt_clust['Latitude'], df_egypt_clust['Longitude'], df_egypt_clust['City'], df_egypt_clust['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [66]:
df_egypt_clust.loc[df_egypt_clust['Cluster Labels'] == 0, df_egypt_clust.columns[[0] + list(range(4, df_egypt_clust.shape[1]))]]


Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Al Buḩayrah,Café,Coffee Shop,Hookah Bar,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock,Organic Grocery,American Restaurant
3,Al Fayyūm,Café,Scenic Lookout,American Restaurant,Hookah Bar,Cupcake Shop,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock
21,Kafr ash Shaykh,Café,Mobile Phone Shop,Fast Food Restaurant,Bus Station,Coffee Shop,Modern European Restaurant,Waterfront,Bakery,Liquor Store,Canal Lock


In [62]:
df_egypt_clust.loc[df_egypt_clust['Cluster Labels'] == 1, df_egypt_clust.columns[[0] + list(range(4, df_egypt_clust.shape[1]))]]


Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ad Daqahlīyah,Café,Coffee Shop,Fast Food Restaurant,Kebab Restaurant,Lounge,Juice Bar,Gaming Cafe,Clothing Store,Bookstore,Dessert Shop
4,Al Gharbīyah,Café,Coffee Shop,Restaurant,Juice Bar,Fast Food Restaurant,Lebanese Restaurant,Pizza Place,Music Store,Mobile Phone Shop,Fried Chicken Joint
6,Al Ismā‘īlīyah,Café,Beach,Dessert Shop,Seafood Restaurant,Fried Chicken Joint,Pizza Place,Coffee Shop,Modern European Restaurant,Plaza,Soccer Stadium
8,Al Minyā,Café,Restaurant,Shopping Mall,Hotel,Sports Club,Dessert Shop,Fast Food Restaurant,Coffee Shop,Waterfront,Opera House
9,Al Minūfīyah,Café,Restaurant,Sandwich Place,Hookah Bar,Ice Cream Shop,Train Station,Fried Chicken Joint,Juice Bar,Chinese Restaurant,Opera House
10,Al Qalyūbīyah,Café,Coffee Shop,Restaurant,Fried Chicken Joint,Waterfront,Soccer Stadium,Sports Club,Snack Place,Shopping Mall,Fast Food Restaurant
14,Ash Sharqīyah,Café,Fast Food Restaurant,Pizza Place,Steakhouse,Plaza,Juice Bar,BBQ Joint,Restaurant,Fried Chicken Joint,Gym / Fitness Center
16,Asyūţ,Café,Restaurant,Fast Food Restaurant,Park,Fried Chicken Joint,Lounge,Nightclub,Bus Station,American Restaurant,Canal Lock
17,Banī Suwayf,Café,Restaurant,Waterfront,Dessert Shop,Hotel,Fried Chicken Joint,Chinese Restaurant,Opera House,Gaming Cafe,Modern European Restaurant
18,Būr Sa‘īd,Café,Seafood Restaurant,Hotel,Fried Chicken Joint,Waterfront,Fast Food Restaurant,Restaurant,Beach,Outlet Mall,Museum


In [67]:
df_egypt_clust.loc[df_egypt_clust['Cluster Labels'] == 2, df_egypt_clust.columns[[0] + list(range(4, df_egypt_clust.shape[1]))]]


Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Sūhāj,Train Station,American Restaurant,Hookah Bar,Cupcake Shop,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock,Organic Grocery


In [64]:
df_egypt_clust.loc[df_egypt_clust['Cluster Labels'] == 3, df_egypt_clust.columns[[0] + list(range(4, df_egypt_clust.shape[1]))]]


Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Al Baḩr al Aḩmar,Resort,Hotel,Beach,Hotel Bar,Restaurant,Lounge,Pool,Bar,Chinese Restaurant,Mexican Restaurant
5,Al Iskandarīyah,Café,Coffee Shop,Dessert Shop,Juice Bar,Restaurant,Clothing Store,Bar,Kebab Restaurant,Beach,Sports Club
7,Al Jīzah,Hotel,Café,Historic Site,Food Court,French Restaurant,Lebanese Restaurant,Lounge,Pastry Shop,Coffee Shop,Middle Eastern Restaurant
11,Al Qāhirah,Hotel,Café,Egyptian Restaurant,Historic Site,Bookstore,Ice Cream Shop,Italian Restaurant,Dessert Shop,Bakery,Pastry Shop
12,Al Uqşur,Historic Site,Hotel,Boat or Ferry,Middle Eastern Restaurant,Flea Market,History Museum,Fast Food Restaurant,Café,Hostel,Mediterranean Restaurant
13,As Suways,Boat or Ferry,Seafood Restaurant,Plaza,Bus Station,Canal,Restaurant,Italian Restaurant,Fried Chicken Joint,Fast Food Restaurant,Market
15,Aswān,Hotel,Historic Site,Perfume Shop,Egyptian Restaurant,Resort,Boat or Ferry,Café,Coffee Shop,Pier,Miscellaneous Shop
20,Janūb Sīnā’,Bus Station,Airport,Indian Restaurant,Rest Area,Bay,Track,Cupcake Shop,Waterfront,Bakery,Liquor Store
23,Qinā,Historic Site,Hotel,Middle Eastern Restaurant,Flea Market,History Museum,Boat or Ferry,Fast Food Restaurant,Café,Hostel,Mediterranean Restaurant


In [65]:
df_egypt_clust.loc[df_egypt_clust['Cluster Labels'] == 4, df_egypt_clust.columns[[0] + list(range(4, df_egypt_clust.shape[1]))]]


Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Maţrūḩ,Airport Terminal,American Restaurant,Hookah Bar,Cupcake Shop,Waterfront,Bakery,Liquor Store,Bus Station,Canal Lock,Organic Grocery
