<a href="https://colab.research.google.com/github/CherifArsanious/The-Battle-of-Neighbordoods/blob/master/Copy_of_The_Battle_of_Neighborhoods_in_Cairo_Egypt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The_Battle_of_Neighborhoods_in_Cairo_Egypt

# Overview of Project
## Introduction to Problem
I want to help entrepreneurs in exploring better locations for their startups projects(restaurants, cafe, pharmacies) around the district of Cairo Egypt. It will help them make smart and efficient decisions on deciding which location to open their startup neighborhoods in Cairo district, Egypt. It has always been difficult in Egypt to find that kind of information. Usually entrepreneurs have to rely on very scarce data and just depend on their observations and hunches to select a location for their startups cafe or restaurants.
 
## The Location
Cairo is a popular destination for new immigrants in Egypt to explore or start new projects.It is the capital of Egypt. Cairo Governorateis the most populated of the governorates of Egypt. Its capital, the city of Cairo, is the national capital of Egypt, and is part of the Greater Cairo metropolitan area.
## Foursquare API
This project would use Four-square API as its prime data gathering source as it has a database of millions of places, especially their places API which provides the ability to perform location search, location sharing and details about a business. 
Wikipedia and web scraping
Web scraping to get the different neighborhoods of Cairo district and also webscraping and manual search to get the latitudes and longitudes of all Cairo neighborhoos as sometimes these kind of information is not readily available as ready tables.
## Work Flow
Using credentials of Foursquare API features of near-by places of the neighborhoods would be mined. Due to http request limitations the number of places per neighborhood parameter would reasonably be set to 100 and the radius parameter would be set to 500.
## Libraries Which are Used to Develope the Project
* Pandas: For creating and manipulating dataframes.
* Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
* Scikit Learn: For importing k-means clustering.
* JSON: Library to handle JSON files.
* XML: To separate data from presentation and XML stores data in plain text format.
* Geocoder: To retrieve Location Data.
* Beautiful Soup and Requests: To scrap and library to handle http requests.
* Matplotlib: Python Plotting Module.



## Gathering Data about Cairo Neighborhoods in Egypt

### Importing required libraries


In [0]:
# Importing required libraries to start working on the project
import numpy as np
import pandas as pd
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#import clustering module from scipy library
from scipy.cluster.vq import kmeans, vq

### Importing Cairo Neighborhoods and cleaning Data

In [0]:
# wikipedia url of Cairo Neighborhoods
url = 'https://en.wikipedia.org/wiki/Cairo_Governorate'

In [0]:
# importing neighborhoods of cairo district and their population
cairo_data = pd.read_html(url)
cairo_df=cairo_data[2]
cairo_df

Unnamed: 0,Anglicized name,Native name,Arabic transliteration,Population(July 2017 Est.),Type
0,15 May City,قسم 15 مايو,Māyū,93879,Kism (fully urban)
1,Abdeen,قسم عابدين,'Ābidīn,40450,Kism (fully urban)
2,El Darb El Ahmar,قسم الدرب الأحمر,Al-Darb Al-Aḥmar[10],58677,Kism (fully urban)
3,Ain Shams,قسم عين شمس,'Ain Schams,616374,Kism (fully urban)
4,Amreya,قسم الاميريه,Al-Amīriīah,153046,Kism (fully urban)
5,Azbakeya,قسم الأزبكية,Al-Azbakiyah,19826,Kism (fully urban)
6,El Basatin,قسم البساتين,Al-Basātīn,497041,Kism (fully urban)
7,El Gamaliya,قسم الجمالية,Al-Jamāliyah,36485,Kism (fully urban)
8,El Khalifa,قسم الخليفة,Al-Khalīfah,105578,Kism (fully urban)
9,Maadi,قسم المعادي,Al-Ma'ādī,88869,Kism (fully urban)


In [0]:
# Cleaning cairo_df
# drop unnecessarily columns
cairo_df = cairo_df.drop(['Native name','Arabic transliteration','Type'],axis=1)
cairo_df.head(1)

Unnamed: 0,Anglicized name,Population(July 2017 Est.)
0,15 May City,93879


In [0]:
# Changing columns names
cairo_df.columns = ['Neighborhood','Population']
cairo_df.head(1)

Unnamed: 0,Neighborhood,Population
0,15 May City,93879


In [0]:
cairo_df.sort_values(by=['Population'],ascending=False,inplace = True)

In [0]:
cairo_df.reset_index(drop=True,inplace=True)
cairo_df.head()

Unnamed: 0,Neighborhood,Population
0,El Marg,801222
1,Nasr City 1,636864
2,Ain Shams,616374
3,El Matareya,604428
4,Dar El Salam,527335


In [0]:
# Exporting the dataframe to excel to start looking for their latitudes and longitudes manually 
#as there are not websites in Egypt that offer that information or dataset readily
cairo_df.to_excel(excel_writer='cairo_neighborhood_latitude_longitude.xlsx',sheet_name='sheet1',index=False)


Getting the latitude and the longitude coordinates of each neighborhood 

In [0]:
#importing the excel sheet that holds the latitudes and longitudes after manually fetching them from different websites
lat_long=pd.read_excel('cairo_neighborhood_latitude_longitude.xlsx',sheet_name='sheet1')
lat_long

Unnamed: 0,Neighborhood,Latitude,Longitude
0,El Marg,30.1521,31.3357
1,Nasr City 1,30.0773,31.3195
2,Ain Shams,30.1279,31.33
3,El Matareya,31.2064,32.2216
4,Helwan,29.8403,31.2982
5,El Basatin,29.9793,31.3027
6,Hada'iq El Qobbah,30.086,31.2827
7,Old Cairo,30.0078,31.234
8,El Nozha,30.1074,31.3885
9,El Mokattam,30.0217,31.3033


In [0]:
cairo_neighborhoods = cairo_df.merge(lat_long, on='Neighborhood',how='inner',right_index=False)
cairo_neighborhoods

Unnamed: 0,Neighborhood,Population,Latitude,Longitude
0,El Marg,801222,30.1521,31.3357
1,Nasr City 1,636864,30.0773,31.3195
2,Ain Shams,616374,30.1279,31.33
3,El Matareya,604428,31.2064,32.2216
4,Helwan,522927,29.8403,31.2982
5,El Basatin,497041,29.9793,31.3027
6,Hada'iq El Qobbah,317092,30.086,31.2827
7,Old Cairo,251125,30.0078,31.234
8,El Nozha,231987,30.1074,31.3885
9,El Mokattam,224860,30.0217,31.3033


## Getting the top venues in each neighborhood of Cairo district
By getting the top venues in each neighborhood of Cairo district, investors will have valuable information where to open their cafe or restaurants or any other kind of store.

In [0]:
#Importing my foursquare credentials 
CLIENT_ID =  # your Foursquare ID
CLIENT_SECRET = # your Foursquare Secret
VERSION = '20200430' # Foursquare API version
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [0]:
#Creating a function to explore the venues of each neighborhood using foursquare data
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Writing the code to run the above function on each neighborhood and create a new dataframe called *cairo_venues*.

In [0]:
import requests

In [0]:
LIMIT = 100
cairo_venues = getNearbyVenues(names=cairo_neighborhoods['Neighborhood'],
                                   latitudes=cairo_neighborhoods['Latitude'],
                                   longitudes=cairo_neighborhoods['Longitude']
                                  )


El Marg
Nasr City 1
Ain Shams
El Matareya
Helwan
El Basatin
Hada'iq El Qobbah
Old Cairo
El Nozha
El Mokattam
El Sharabiya
Zeitoun
Rod El Farag
El Sayeda Zeinab
Heliopolis
El Khalifa
15 May City
Maadi
El Shorouk
Shubra
El Zaher
New Cairo 3
El Darb El Ahmar
Bulaq
Bab El Sharia
Abdeen
El Gamaliya
Badr City
Azbakeya
El Muski
Zamalek
Qasr El Nil


In [0]:
# Checking Cairo venues of each neighborhood
print(cairo_venues.shape)
cairo_venues

(306, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,El Marg,30.1521,31.3357,Popeye Pastries,30.151228,31.338115,Middle Eastern Restaurant
1,El Marg,30.1521,31.3357,New El Marg Metro Station (محطة مترو المرج الج...,30.155651,31.337470,Metro Station
2,El Marg,30.1521,31.3357,Top Dawgs,30.155656,31.337468,Hot Dog Joint
3,Nasr City 1,30.0773,31.3195,Sushi Ya,30.078917,31.317740,Sushi Restaurant
4,Nasr City 1,30.0773,31.3195,Sonesta Lobby Bar,30.079098,31.317922,Hotel Bar
...,...,...,...,...,...,...,...
301,Qasr El Nil,30.0433,31.2326,Il Ritrovo,30.045709,31.228713,Whisky Bar
302,Qasr El Nil,30.0433,31.2326,Al Andalus Park (حديقة الأندلس),30.044485,31.227974,Park
303,Qasr El Nil,30.0433,31.2326,Animal Mummy Room,30.047403,31.233559,History Museum
304,Qasr El Nil,30.0433,31.2326,Tag W Dagg (دق وطق),30.045286,31.228265,Nightclub


Let's check how many venues were returned for each neighborhood

In [0]:
cairo_venues.groupby('Neighborhood')['Venue Category'].count()

Neighborhood
Abdeen               34
Ain Shams             3
Azbakeya              9
Bab El Sharia         5
Badr City             1
Bulaq                14
El Basatin            2
El Darb El Ahmar      5
El Gamaliya          24
El Marg               3
El Mokattam           1
El Muski             12
El Nozha              7
El Sayeda Zeinab      8
El Sharabiya          1
El Zaher              6
Hada'iq El Qobbah     6
Heliopolis           11
Maadi                20
Nasr City 1          14
New Cairo 3           3
Old Cairo            10
Qasr El Nil          52
Rod El Farag          7
Shubra                6
Zamalek              37
Zeitoun               5
Name: Venue Category, dtype: int64

#### Let's find out how many unique categories can be curated from all the returned venues

In [0]:
print('There are {} uniques categories.'.format(len(cairo_venues['Venue Category'].unique())))

There are 101 uniques categories.


<a id='item3'></a>

## Analyze Each Neighborhood

In [0]:
# one hot encoding
cairo_onehot = pd.get_dummies(cairo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
cairo_onehot['Neighborhood'] = cairo_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [cairo_onehot.columns[-1]] + list(cairo_onehot.columns[:-1])
cairo_onehot = cairo_onehot[fixed_columns]

cairo_onehot.head()

Unnamed: 0,Yoga Studio,Airport Lounge,Airport Terminal,American Restaurant,Aquarium,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Bakery,Bar,Basketball Court,Beer Garden,Bistro,Bookstore,Boutique,Buffet,Burger Joint,Cafeteria,Café,Carpet Store,Casino,Chinese Restaurant,Church,Coffee Shop,Concert Hall,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Food Stand,Food Truck,French Restaurant,...,Mobile Phone Shop,Mosque,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Music Venue,Neighborhood,Nightclub,Other Nightlife,Park,Pastry Shop,Performing Arts Venue,Pharmacy,Pie Shop,Pizza Place,Platform,Plaza,Pool,Restaurant,River,Roof Deck,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shopping Mall,Soccer Field,Souvenir Shop,Spa,Supermarket,Sushi Restaurant,Syrian Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Toll Plaza,Track,Waterfront,Whisky Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,El Marg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,El Marg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,El Marg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,Nasr City 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,Nasr City 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [0]:
cairo_onehot.shape

(306, 101)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [0]:
cairo_grouped = cairo_onehot.groupby('Neighborhood').mean().reset_index()
cairo_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Airport Lounge,Airport Terminal,American Restaurant,Aquarium,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Bakery,Bar,Basketball Court,Beer Garden,Bistro,Bookstore,Boutique,Buffet,Burger Joint,Cafeteria,Café,Carpet Store,Casino,Chinese Restaurant,Church,Coffee Shop,Concert Hall,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Food Stand,Food Truck,...,Middle Eastern Restaurant,Mobile Phone Shop,Mosque,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Music Venue,Nightclub,Other Nightlife,Park,Pastry Shop,Performing Arts Venue,Pharmacy,Pie Shop,Pizza Place,Platform,Plaza,Pool,Restaurant,River,Roof Deck,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shopping Mall,Soccer Field,Souvenir Shop,Spa,Supermarket,Sushi Restaurant,Syrian Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Toll Plaza,Track,Waterfront,Whisky Bar
0,Abdeen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.323529,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088235,0.088235,0.0,0.0,0.029412,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0
1,Ain Shams,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Azbakeya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0
3,Bab El Sharia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Badr City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bulaq,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.142857,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,El Basatin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,El Darb El Ahmar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,El Gamaliya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,El Marg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [0]:
cairo_grouped.shape

(27, 101)

#### Let's print each neighborhood along with the top 5 most common venues
Investors can by inspecting this table find at a glance the top 5 venues in any neighborhood in Egypt so they know primarily where to best open their startup store depending on data from the follwing table

In [0]:
num_top_venues = 5

for hood in cairo_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = cairo_grouped[cairo_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abdeen----
                 venue  freq
0                 Café  0.32
1   Falafel Restaurant  0.09
2  Egyptian Restaurant  0.09
3     Kebab Restaurant  0.06
4              Theater  0.06


----Ain Shams----
                venue  freq
0         Coffee Shop  0.33
1   Mobile Phone Shop  0.33
2  Seafood Restaurant  0.33
3         Yoga Studio  0.00
4       Moving Target  0.00


----Azbakeya----
                   venue  freq
0          Movie Theater  0.22
1                Theater  0.22
2                   Café  0.22
3  Performing Arts Venue  0.11
4     Falafel Restaurant  0.11


----Bab El Sharia----
              venue  freq
0             Plaza   0.2
1  Kebab Restaurant   0.2
2      Dessert Shop   0.2
3              Café   0.2
4          Pie Shop   0.2


----Badr City----
                   venue  freq
0                   Farm   1.0
1            Yoga Studio   0.0
2          Movie Theater   0.0
3               Pharmacy   0.0
4  Performing Arts Venue   0.0


----Bulaq----
                

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [0]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = cairo_grouped['Neighborhood']

for ind in np.arange(cairo_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cairo_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abdeen,Café,Falafel Restaurant,Egyptian Restaurant,Theater,Kebab Restaurant,Coffee Shop,Seafood Restaurant,Hotel Bar,Beer Garden,Herbs & Spices Store
1,Ain Shams,Mobile Phone Shop,Coffee Shop,Seafood Restaurant,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
2,Azbakeya,Theater,Café,Movie Theater,Convenience Store,Falafel Restaurant,Performing Arts Venue,Whisky Bar,Farm,Creperie,Cruise
3,Bab El Sharia,Pie Shop,Plaza,Kebab Restaurant,Café,Dessert Shop,Farm,Convenience Store,Creperie,Cruise,Cupcake Shop
4,Badr City,Farm,Whisky Bar,Fast Food Restaurant,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
5,Bulaq,Nightclub,Hotel,Restaurant,Lounge,Coffee Shop,Asian Restaurant,Moving Target,Pool,Lebanese Restaurant,Other Nightlife
6,El Basatin,Soccer Field,Carpet Store,Coffee Shop,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
7,El Darb El Ahmar,Historic Site,Park,Café,Concert Hall,Whisky Bar,Farm,Creperie,Cruise,Cupcake Shop,Dairy Store
8,El Gamaliya,Café,Historic Site,Coffee Shop,Souvenir Shop,Hotel,Kebab Restaurant,Market,Mosque,Egyptian Restaurant,Dessert Shop
9,El Marg,Hot Dog Joint,Metro Station,Middle Eastern Restaurant,Historic Site,Herbs & Spices Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop


<a id='item4'></a>

In [0]:
neighborhoods_venues_sorted.to_excel(excel_writer='top_10_venues.xlsx',sheet_name='sheet1',index=False)

## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [0]:
# set number of clusters
kclusters = 5

cairo_grouped_clustering = cairo_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cairo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 1, 3, 0, 0, 1, 1, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [0]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [0]:
cairo_merged = cairo_neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
cairo_merged = cairo_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood',how='inner')

cairo_merged

Unnamed: 0,Neighborhood,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,El Marg,801222,30.1521,31.3357,0,Hot Dog Joint,Metro Station,Middle Eastern Restaurant,Historic Site,Herbs & Spices Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop
1,Nasr City 1,636864,30.0773,31.3195,0,Gym / Fitness Center,Hotel Bar,Sushi Restaurant,Dairy Store,Dessert Shop,Restaurant,Metro Station,Lebanese Restaurant,Bakery,Food Truck
2,Ain Shams,616374,30.1279,31.33,0,Mobile Phone Shop,Coffee Shop,Seafood Restaurant,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
5,El Basatin,497041,29.9793,31.3027,0,Soccer Field,Carpet Store,Coffee Shop,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
6,Hada'iq El Qobbah,317092,30.086,31.2827,0,Creperie,Fried Chicken Joint,Supermarket,Sandwich Place,Shopping Mall,Whisky Bar,Farm,Concert Hall,Convenience Store,Cruise
7,Old Cairo,251125,30.0078,31.234,0,Historic Site,Church,Arts & Crafts Store,Metro Station,Middle Eastern Restaurant,Restaurant,History Museum,Art Gallery,Arts & Entertainment,Cupcake Shop
8,El Nozha,231987,30.1074,31.3885,0,Airport Lounge,Middle Eastern Restaurant,Toll Plaza,Airport Terminal,Café,Whisky Bar,Cruise,Cupcake Shop,Dairy Store,Dessert Shop
9,El Mokattam,224860,30.0217,31.3033,4,Mobile Phone Shop,Fast Food Restaurant,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant,Falafel Restaurant
10,El Sharabiya,187806,30.0816,31.2598,2,Platform,Whisky Bar,Farmers Market,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
11,Zeitoun,174738,30.1074,31.3157,1,Café,Coffee Shop,Fried Chicken Joint,Fast Food Restaurant,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant


Finally, let's visualize the resulting clusters

In [0]:
# Cairo latitude and longitude
Latitude = 30.0444
Longitude = 31.2357
# create map
map_clusters = folium.Map(location=[Latitude, Longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cairo_merged['Latitude'], cairo_merged['Longitude'], cairo_merged['Neighborhood'], cairo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## Examine Clusters

#### Cluster 1

In [0]:
cairo_merged.loc[cairo_merged['Cluster Labels'] == 0, cairo_merged.columns[[1] + list(range(5, cairo_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,801222,Hot Dog Joint,Metro Station,Middle Eastern Restaurant,Historic Site,Herbs & Spices Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop
1,636864,Gym / Fitness Center,Hotel Bar,Sushi Restaurant,Dairy Store,Dessert Shop,Restaurant,Metro Station,Lebanese Restaurant,Bakery,Food Truck
2,616374,Mobile Phone Shop,Coffee Shop,Seafood Restaurant,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
5,497041,Soccer Field,Carpet Store,Coffee Shop,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
6,317092,Creperie,Fried Chicken Joint,Supermarket,Sandwich Place,Shopping Mall,Whisky Bar,Farm,Concert Hall,Convenience Store,Cruise
7,251125,Historic Site,Church,Arts & Crafts Store,Metro Station,Middle Eastern Restaurant,Restaurant,History Museum,Art Gallery,Arts & Entertainment,Cupcake Shop
8,231987,Airport Lounge,Middle Eastern Restaurant,Toll Plaza,Airport Terminal,Café,Whisky Bar,Cruise,Cupcake Shop,Dairy Store,Dessert Shop
12,146102,Shopping Mall,Plaza,Cafeteria,Burger Joint,Egyptian Restaurant,Men's Store,Middle Eastern Restaurant,Farmers Market,Creperie,Cruise
13,136722,Coffee Shop,Farmers Market,Historic Site,Kebab Restaurant,Plaza,Asian Restaurant,Egyptian Restaurant,Falafel Restaurant,Convenience Store,Creperie
14,134549,Gym / Fitness Center,Basketball Court,Italian Restaurant,Plaza,Pizza Place,German Restaurant,Falafel Restaurant,Syrian Restaurant,Hotel,Track


#### Cluster 2

In [0]:
cairo_merged.loc[cairo_merged['Cluster Labels'] == 1, cairo_merged.columns[[1] + list(range(5, cairo_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,174738,Café,Coffee Shop,Fried Chicken Joint,Fast Food Restaurant,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant
19,76942,Pharmacy,Supermarket,Creperie,Café,Restaurant,Metro Station,German Restaurant,Egyptian Restaurant,Convenience Store,Historic Site
20,72101,Café,History Museum,Plaza,Juice Bar,Metro Station,Farmers Market,Creperie,Cruise,Cupcake Shop,Dairy Store
22,58677,Historic Site,Park,Café,Concert Hall,Whisky Bar,Farm,Creperie,Cruise,Cupcake Shop,Dairy Store
24,46823,Pie Shop,Plaza,Kebab Restaurant,Café,Dessert Shop,Farm,Convenience Store,Creperie,Cruise,Cupcake Shop
25,40450,Café,Falafel Restaurant,Egyptian Restaurant,Theater,Kebab Restaurant,Coffee Shop,Seafood Restaurant,Hotel Bar,Beer Garden,Herbs & Spices Store
26,36485,Café,Historic Site,Coffee Shop,Souvenir Shop,Hotel,Kebab Restaurant,Market,Mosque,Egyptian Restaurant,Dessert Shop
28,19826,Theater,Café,Movie Theater,Convenience Store,Falafel Restaurant,Performing Arts Venue,Whisky Bar,Farm,Creperie,Cruise
29,16715,Historic Site,Café,Music Venue,Plaza,Arts & Entertainment,Market,Mosque,Whisky Bar,Farm,Cruise


#### Cluster 3

In [0]:
cairo_merged.loc[cairo_merged['Cluster Labels'] == 2, cairo_merged.columns[[1] + list(range(5, cairo_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,187806,Platform,Whisky Bar,Farmers Market,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant


#### Cluster 4

In [0]:
cairo_merged.loc[cairo_merged['Cluster Labels'] == 3, cairo_merged.columns[[1] + list(range(5, cairo_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,31398,Farm,Whisky Bar,Fast Food Restaurant,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant


#### Cluster 5

In [0]:
cairo_merged.loc[cairo_merged['Cluster Labels'] == 4, cairo_merged.columns[[1] + list(range(5, cairo_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,224860,Mobile Phone Shop,Fast Food Restaurant,Convenience Store,Creperie,Cruise,Cupcake Shop,Dairy Store,Dessert Shop,Egyptian Restaurant,Falafel Restaurant
