# Segmenting and Clustering Jordan

## A. Intoduction 
### A.1. Background
**Amman** is the capital of Jordan, with a population of 2,165,925 in 2020 and since it's the capital them most of the jobs opportunities are their and people from different Jordan's cities are comming to work in Amman for that.

**Irbid** is the 2nd popular city of Jordan with a population of 307,480 in 2020.
### A.2. Problem Description:
**The scenario of this Capstone project**.

Say you live in Irbid city. You love your neighborhood, mainly because of all the great amenities and other types of venues that exist in the neighborhood, such as gourmet fast food joints, pharmacies, parks, graduate schools and so on. Now say you receive a job offer from a great company in Amman city. However, given the far distance from your current place you unfortunately must move if you decide to accept the offer

It would be great if you can find the most convenien neighborhood in Amman, both in terms of the **lowest distance** from the company headquarters and in terms of the **similarity of the amenities** in your home neighborhood.
### A.3 Objective
This porject is aiming to analyze the neighborhoods of Amman and Irbid cities and group them into similar clusters.
And by analyzing these clusters we can gather meaningful information which will be used to **find out the neighborhoods that are similar to your current neighborhood** 

## B. Data Description:
We are going to use the below data source to achecvie the above objective.

**List Of All ZIP/POSTAL Codes In JORDAN**: The following page was scraped to pull out all the necessary information: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

all if the informaiton required for our project are listed in the above link which contains: City name, Neighborhood name, Zip Code for each neighborhood, geographical coordinates (Latitude,Longitude) for each neighborhood.

The information obtained i.e. the table of postal codes was transformed into a pandas data frame for further analysis.

**Foursquare API:** to collect information about the venues in the neighborhoods of Amman and Irbid

## C. Methodology

### C.1. Scrape the web page and gather data into a Pandas DF

To start we used the **BeautifulSoup** package to transform the data in the web page table into pandas DF

In [1]:
# Import the needed packages\libraries

import pandas as pd
import numpy as np 
import requests
from bs4 import BeautifulSoup as bs

In [2]:
## request a get to the url then parse using beutifulSoup then convert it to pandas DF.

url='https://gpsarab.com/shop11/en/content/12-zip-code-in-jordan'
response= requests.get(url)
soup= bs(response.text,'html.parser')
table= soup.find('table')
df= pd.read_html(str(table))
df= pd.DataFrame(df[0])
df.head()

Unnamed: 0,City,Location,Zip Code,Latitude,Longitude
0,Amman | عمان,Abdoun Al Janobi (S) | عبدون الجنوبي,11183,31.942011,35.881741
1,Amman | عمان,Abdoun Alshamali (N) | عبدون الشمالي,11183,31.948469,35.893509
2,Amman | عمان,Abu Alanda | ابو علندا,11592,31.905396,35.960555
3,Amman | عمان,Abu Alya | أبو عليا,11946,32.001043,35.970014
4,Amman | عمان,Abu-Nsair | أبو نصير,11937,32.05286,35.87648


### C.2. Preprocessing
The data is almost cleaned but we need to:
 - Split the English and Arabic names into individuals columns.
 - Rename all columns to a convenient names.
 - Remove the additional Spaces at the begining\ending of values. using strip function

In [3]:
jordan_df= df.copy()

# rename Location column to Neighborhood
jordan_df.rename(columns={'Location':'Neighborhood','Zip Code':'ZipCode'},inplace=True)

# Split the English and Arabic values
jordan_df['City']=jordan_df['City'].str.split('|')
jordan_df['Neighborhood']=jordan_df['Neighborhood'].str.split('|')

In [4]:
# convert the list of splited values into individual columns 
jordan_df['City_Arabic']= jordan_df.apply(lambda row: row['City'][1], axis=1)
jordan_df['City']= jordan_df.apply(lambda row: row['City'][0],axis=1)

jordan_df['Neighborhood_Arabic']= jordan_df.apply(lambda row: row['Neighborhood'][-1], axis=1)
jordan_df['Neighborhood']= jordan_df.apply(lambda row: row['Neighborhood'][0], axis=1)

In [5]:
# move the new columns: 
temp=jordan_df.pop('City_Arabic')
jordan_df.insert(1,'City_Arabic',temp)

temp= jordan_df.pop('Neighborhood_Arabic')
jordan_df.insert(3,'Neighborhood_Arabic',temp)

In [6]:
# remove the spaces at the end of the values
jordan_df['City'] = jordan_df['City'].str.strip()
jordan_df['City_Arabic']= jordan_df['City_Arabic'].str.strip()

jordan_df['Neighborhood']= jordan_df['Neighborhood'].str.strip()
jordan_df['Neighborhood_Arabic']= jordan_df['Neighborhood_Arabic'].str.strip()

In [7]:
jordan_df.head()

Unnamed: 0,City,City_Arabic,Neighborhood,Neighborhood_Arabic,ZipCode,Latitude,Longitude
0,Amman,عمان,Abdoun Al Janobi (S),عبدون الجنوبي,11183,31.942011,35.881741
1,Amman,عمان,Abdoun Alshamali (N),عبدون الشمالي,11183,31.948469,35.893509
2,Amman,عمان,Abu Alanda,ابو علندا,11592,31.905396,35.960555
3,Amman,عمان,Abu Alya,أبو عليا,11946,32.001043,35.970014
4,Amman,عمان,Abu-Nsair,أبو نصير,11937,32.05286,35.87648


In [8]:
# Check how many neighborhoods each city has
jordan_df.groupby('City')['ZipCode'].count().sort_values(ascending=False)

City
Amman        178
Irbid        113
Karak         53
Al Mafraq     32
Al-Balqa'     26
Ma'an         22
Az Zarqa      16
Tafileh       11
Aqaba          6
Madaba         2
Name: ZipCode, dtype: int64

#### C.2.1 Create DataFrames for each city (Amman and Irbid)
As our project focus on analyze only two cities then we need to pull up the data for each city into its corrosponding DF

In [9]:
irbid_df= jordan_df[jordan_df['City']=='Irbid'] # Home city 
amman_df= jordan_df[jordan_df['City']=='Amman'] # destination city

### C.3. Analysis

#### C.3.1) let's start by creating the required functions which will help us in the analysis
Below, 4 functions for the Analysis.

**Analysis functions**

**1st** This function will return the **geographic coordinates**

In [10]:
from geopy.geocoders import Nominatim

def get_coordinates(city,country):
    address='{},{}'.format(city,country)
    geolocator= Nominatim(user_agent='foursquare_agent')
    location=geolocator.geocode(address)
    
    latitude=location.latitude
    longitude=location.longitude
    
    location=[latitude,longitude]
    
    return (location)

**2nd** This function will return a **Map** for given city and its neighborhoods

In [11]:
import folium

def get_neighborhoods_map(location,names,latitudes,longitudes):
    map_= folium.Map(location=location,zoom_start=10,width=700)
    
    for neighborhood, lat, lng in zip(names,latitudes,longitudes):
        label= str(neighborhood)
        label= folium.Popup(label, parse_html=True)
        
        folium.CircleMarker(
            [lat,lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False
        ).add_to(map_)
        
    return map_

**3rd** This function will use the **Foursquare API** to explore a given Neighborhoods venues 

In [12]:
# Foursquare Credintials
CLIENT_ID ='~'
CLIENT_SECRET= '~'
VERSION= '20180605'

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500,LIMIT=100):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #create the Foursquare API request
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        # GET request
        results= requests.get(url).json()
        venues= results['response']['groups'][0]['items']

        # return only relevant info. for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) 
            for v in venues])

    nearby_venues = pd.DataFrame(
        [item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                            'Neighborhood Latitude',
                            'Neighborhood Longitude',
                            'Venue',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category'
                            ]
    return(nearby_venues)

**4th** This function will create new DF contains the Mean of the venues categories grouped by neighborhoods

In [14]:
def get_group_df(venues):    
# create a onehot encoding pandasd DF
    onehot_df= pd.get_dummies(venues['Venue Category'])
    
# We have category called Neighborhood. So we need to rename it.
    onehot_df.rename(columns={'Neighborhood':'Category_Neighborhood'},inplace=True)

# add Neighborhood column to the begining of our df
    onehot_df.insert(0,'Neighborhood',venues['Neighborhood'])
    
# Group rows by Neighborhood with the mean of the frequency of each category
    grouped_df=onehot_df.groupby('Neighborhood').mean().reset_index()

    return grouped_df

**5th** This function will get the top venues categories based on their frequency

In [15]:
def get_top_categories_for_each_neighborhood(venues,num_top_venues,grouped_df):
   
    # Create function to sort the cateogries 
    def return_most_common_venues(row,num_top_venues): 
        row_categories= row.iloc[1:] # remove first string (neighborhood) row
        row_categories_sorted= row_categories.sort_values(ascending=False)
        return row_categories_sorted.index.values[0:num_top_venues]
        
    
    indicators= ['st','nd','rd']
    columns= ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1,indicator[ind]))
        except: 
            columns.append('{}th Most Common Venue'.format(ind+1))

    neighborhoods_venues_sorted=pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood']=grouped_df['Neighborhood']


    for ind in np.arange(grouped_df.shape[0]): 
        neighborhoods_venues_sorted.iloc[ind,1:]=return_most_common_venues(grouped_df.iloc[ind,:],num_top_venues)
    
    return neighborhoods_venues_sorted

### C.3.2) Analyze Irbid data

**First** Let's start by analyzing the home city IRBID.

Here is the geographic coordinates of IRBID city.

In [16]:
print('Irbid city has {} neighborhoods \n'.format(len(irbid_df['Neighborhood'].unique())))
irbid_df.head()

Irbid city has 113 neighborhoods 



Unnamed: 0,City,City_Arabic,Neighborhood,Neighborhood_Arabic,ZipCode,Latitude,Longitude
258,Irbid,اربد,Aidun,ايدون,21166,32.523781,35.85358
259,Irbid,اربد,Ain Janna,عين جنا,26813,32.340159,35.758824
260,Irbid,اربد,Ajloun Community College,كلية عجلون,26816,32.317935,35.75074
261,Irbid,اربد,Ajlun Al Markazi,عجلون المركزي,26810,32.5,35.8
262,Irbid,اربد,Al Ashrafiyyah,الأشرفية,21753,32.5,35.8


In [17]:
irbid_location=get_coordinates('Irbid','Jordan')
print(irbid_location)

[32.363397, 35.5610167]


Create a Map for IRBID city with its Neighborhoods

In [18]:
irbid_map= get_neighborhoods_map(irbid_location,irbid_df['Neighborhood'],irbid_df['Latitude'],irbid_df['Longitude'])
irbid_map

The below DataFrame is a sample of Irbid's neighborhoods and thier venues

In [23]:
irbid_venues= getNearbyVenues(irbid_df['Neighborhood'],irbid_df['Latitude'],irbid_df['Longitude'])

In [24]:
print('Irbid city has {} unique venues categories spreaded in {} neighborhoods. \n'.format(len(irbid_venues['Neighborhood'].unique()),
                                                                                          len(irbid_venues['Venue Category'].unique())
                                                                                          ))
irbid_venues.head()

Irbid city has 14 unique venues categories spreaded in 29 neighborhoods. 



Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Aidun,32.523781,35.85358,Galaxy Gym,32.523849,35.855335,Gym / Fitness Center
1,Aidun,32.523781,35.85358,حلويات القاضي,32.524639,35.853708,Pastry Shop
2,Aidun,32.523781,35.85358,ملحمة المصري,32.526812,35.853753,Market
3,Ajloun Community College,32.317935,35.75074,Ajloun Baptist Center,32.32101,35.751678,Campground
4,Ajloun Community College,32.317935,35.75074,Ajloun Castel Cup,32.320178,35.751278,Coffee Shop


The below is the number of each venue category in Irbid city

In [25]:
irbid_venues[['Neighborhood','Venue Category']].groupby('Venue Category').count().sort_values(by='Neighborhood',ascending=False).head(7)

Unnamed: 0_level_0,Neighborhood
Venue Category,Unnamed: 1_level_1
Café,9
Coffee Shop,4
Middle Eastern Restaurant,4
Fast Food Restaurant,4
Asian Restaurant,3
Farm,2
Soccer Field,2


Below DataFrame is a sample of the Mean of the frequency of the venues categories for each neighborhood

In [26]:
irbid_grouped= get_group_df(irbid_venues)
irbid_grouped.head()

Unnamed: 0,Neighborhood,Asian Restaurant,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café,Campground,Coffee Shop,Creperie,...,Middle Eastern Restaurant,Pastry Shop,Pizza Place,Pool,Restaurant,Road,Sandwich Place,Seafood Restaurant,Soccer Field,Theme Park Ride / Attraction
0,Aidun,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ajloun Community College,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
2,Al Bariha,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Al Hai Al Shamali,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
4,Al Husun,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5


The below dataframe is the sorted venues categories

In [28]:
irbid_neighborhoods_venues_sorted= get_top_categories_for_each_neighborhood(irbid_venues,10,irbid_grouped)
irbid_neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aidun,Pastry Shop,Market,Gym / Fitness Center,Theme Park Ride / Attraction,Farm,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café
1,Ajloun Community College,Coffee Shop,Campground,Soccer Field,Theme Park Ride / Attraction,Fast Food Restaurant,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café
2,Al Bariha,Bus Station,Café,Theme Park Ride / Attraction,Fast Food Restaurant,Bagel Shop,Burger Joint,Burrito Place,Campground,Coffee Shop,Creperie
3,Al Hai Al Shamali,Sandwich Place,Campground,Theme Park Ride / Attraction,Fast Food Restaurant,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café,Coffee Shop
4,Al Husun,Theme Park Ride / Attraction,Grocery Store,Fast Food Restaurant,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café,Campground,Coffee Shop
5,Al Mazar,Food Court,Middle Eastern Restaurant,Café,Coffee Shop,Fast Food Restaurant,Asian Restaurant,Pool,Diner,Bagel Shop,Burger Joint
6,Hakma,Farm,Theme Park Ride / Attraction,Fast Food Restaurant,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café,Campground,Coffee Shop
7,Hawwara,Seafood Restaurant,Theme Park Ride / Attraction,Fast Food Restaurant,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café,Campground,Coffee Shop
8,Irbid Camp,Seafood Restaurant,Theme Park Ride / Attraction,Fast Food Restaurant,Bagel Shop,Burger Joint,Burrito Place,Bus Station,Café,Campground,Coffee Shop
9,Irbid Central,Asian Restaurant,Restaurant,Café,Middle Eastern Restaurant,Grocery Store,Fast Food Restaurant,Donut Shop,Bagel Shop,Burger Joint,Burrito Place


### C.3.3) Analyze Amman Data

In [29]:
print('Amman city has {} neighborhoods \n'.format(len(amman_df['Neighborhood'].unique())))
amman_df.head()

Amman city has 177 neighborhoods 



Unnamed: 0,City,City_Arabic,Neighborhood,Neighborhood_Arabic,ZipCode,Latitude,Longitude
0,Amman,عمان,Abdoun Al Janobi (S),عبدون الجنوبي,11183,31.942011,35.881741
1,Amman,عمان,Abdoun Alshamali (N),عبدون الشمالي,11183,31.948469,35.893509
2,Amman,عمان,Abu Alanda,ابو علندا,11592,31.905396,35.960555
3,Amman,عمان,Abu Alya,أبو عليا,11946,32.001043,35.970014
4,Amman,عمان,Abu-Nsair,أبو نصير,11937,32.05286,35.87648


Get the geographic coordinates

In [30]:
amman_location=get_coordinates('Amman','Jordan')
print(amman_location)

[31.9515694, 35.9239625]


Create a Map for AMMAN city with its Neighborhoods

In [31]:
amman_map= get_neighborhoods_map(amman_location,amman_df['Neighborhood'],amman_df['Latitude'],amman_df['Longitude'])
amman_map

This is a sample of Amman's neighborhoods and thier venues

In [32]:
amman_venues= getNearbyVenues(amman_df['Neighborhood'],amman_df['Latitude'],amman_df['Longitude'])

In [33]:
print('Amman city has {} unique venues categories spreaded in {} neighborhoods. \n'.format(len(amman_venues['Neighborhood'].unique()),
                                                                                          len(amman_venues['Venue Category'].unique())
                                                                                          ))
amman_venues.head()

Amman city has 122 unique venues categories spreaded in 161 neighborhoods. 



Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abdoun Al Janobi (S),31.942011,35.881741,Adam,31.941975,35.881291,Restaurant
1,Abdoun Al Janobi (S),31.942011,35.881741,خاشوكة,31.941328,35.882684,Breakfast Spot
2,Abdoun Al Janobi (S),31.942011,35.881741,Buffalo Wings & Rings,31.941047,35.884596,Wings Joint
3,Abdoun Al Janobi (S),31.942011,35.881741,Aitch's Cheesecake,31.942194,35.880633,Dessert Shop
4,Abdoun Al Janobi (S),31.942011,35.881741,Halim Zaman (حليم زمان),31.940512,35.88293,Café


In [34]:
amman_venues[['Neighborhood','Venue Category']].groupby('Venue Category').count().sort_values(by='Neighborhood',ascending=False).head(10)

Unnamed: 0_level_0,Neighborhood
Venue Category,Unnamed: 1_level_1
Café,184
Middle Eastern Restaurant,80
Hotel,48
Coffee Shop,47
Dessert Shop,39
Fast Food Restaurant,35
Restaurant,32
Bakery,31
Ice Cream Shop,30
Burger Joint,26


The below dataframe contains the Mean of the frequency of the venues categories grouped by neighborhood

In [35]:
amman_grouped= get_group_df(amman_venues)
amman_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport Service,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,Art Museum,...,Theme Park,Thrift / Vintage Store,Toy / Game Store,Train Station,Tunnel,Turkish Restaurant,Video Store,Wings Joint,Women's Store,Zoo Exhibit
0,Abdoun Al Janobi (S),0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0
1,Abdoun Alshamali (N),0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Abu Alanda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Abu Alya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Abu-Nsair,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The below dataframe is the sorted venues categories

In [37]:
amman_neighborhoods_venues_sorted= get_top_categories_for_each_neighborhood(amman_venues,10,amman_grouped)
amman_neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abdoun Al Janobi (S),Café,Restaurant,Burger Joint,Wings Joint,Breakfast Spot,Ice Cream Shop,Gym,Middle Eastern Restaurant,Coffee Shop,Electronics Store
1,Abdoun Alshamali (N),Café,Restaurant,Ice Cream Shop,Fast Food Restaurant,Lounge,Burger Joint,Middle Eastern Restaurant,Bakery,Doner Restaurant,Donut Shop
2,Abu Alanda,Snack Place,Department Store,Cheese Shop,Plaza,Supermarket,Eastern European Restaurant,Duty-free Shop,Donut Shop,Doner Restaurant,Dog Run
3,Abu Alya,Coffee Shop,Zoo Exhibit,Dessert Shop,Electronics Store,Eastern European Restaurant,Duty-free Shop,Donut Shop,Doner Restaurant,Dog Run,Diner
4,Abu-Nsair,Pizza Place,Gym / Fitness Center,Shopping Mall,Food & Drink Shop,Ice Cream Shop,Comedy Club,Dessert Shop,Cocktail Bar,Duty-free Shop,Donut Shop
...,...,...,...,...,...,...,...,...,...,...,...
117,Wadi Al Hidadah,Café,Health & Beauty Service,Dessert Shop,Electronics Store,Eastern European Restaurant,Duty-free Shop,Donut Shop,Doner Restaurant,Dog Run,Diner
118,Wadi Al Suroor,Café,Arts & Crafts Store,Coffee Shop,Juice Bar,Middle Eastern Restaurant,Italian Restaurant,Street Fair,Bookstore,Gastropub,Dance Studio
119,Wadi as Sir,Coffee Shop,Café,Ice Cream Shop,Fast Food Restaurant,Pizza Place,Burger Joint,Dessert Shop,Bakery,Pub,Clothing Store
120,Zahran,Ice Cream Shop,Café,Burger Joint,Middle Eastern Restaurant,Spa,Restaurant,American Restaurant,Fast Food Restaurant,Pizza Place,Asian Restaurant


## D. Results
Now after we analyzed our data and prepared it; lets start by clustering the neighborhoods for both cities in to 10 clusrters

### D.1 Let's start by preparing the functions needed in the clustring phase. 

**Clustering functions**

**1st** This function will create K-Means cluster model, fit it and return the cluster labels

In [38]:
from sklearn.cluster import KMeans

def get_kmeans_cluster_labels(k,grouped_df): 
    
    # the features needed to fit our model
    features= grouped_df.drop('Neighborhood',1)

    # Create the kmeans model and fit it
    kmeans= KMeans(k,random_state=4).fit(features)
    
    labels=kmeans.labels_
    return labels

**2nd** This function will merge the main df of the city and its neighborhoods_venues_sorted df.

In [39]:
def get_merged_df(city_df,neighborhoods_venues_sorted,labels):
    
    # create the DF which will contains the merged DFs 
    city_merged= city_df.copy()
    
    # insert the Cluster Labels as a new column to the sorted df 
    neighborhoods_venues_sorted.insert(0,'Cluster Labels',labels)
    # merge the 2 DFs 
    city_merged= city_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'),left_on='Neighborhood',right_on='Neighborhood')
    
    return city_merged

**3rd** This function will return the **Map** of the clusters of the given city

In [52]:
import matplotlib.cm as cm
import matplotlib.colors as colors

def get_clusters_map(k,location,city_merged):

    map_clusters= folium.Map(location=amman_location,zoom_start=9,width=700)

    x=np.arange(k)
    ys= [i+x+(i*x)**2 for i in range(k)]
    colors_array= cm.rainbow(np.linspace(0,1,len(ys)))
    rainbow= [colors.rgb2hex(i) for i in colors_array]

    markers_colors= []
    for lat, lon, poi, cluster in zip(city_merged['Latitude'],city_merged['Longitude'],city_merged['Neighborhood'],city_merged['Cluster Labels']):
        label= folium.Popup(str(poi)+' | Cluster: '+str(cluster),parse_html=True)
        folium.CircleMarker(
            [lat,lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7
        ).add_to(map_clusters)

    return map_clusters    

### D.2) Prepare the data for Clustering

In [41]:
# first let's combine both main dataframes
main_df= amman_df.append(irbid_df).reset_index(drop=True)

In [42]:
# second let's combine the grouped by categories dataframes
grouped_df= amman_grouped.append(irbid_grouped).reset_index(drop=True).fillna(0)

In [43]:
# third let's combine the venues dataframes
venues_df= amman_venues.append(irbid_venues).reset_index(drop=True)

In [45]:
# fourth let's create a new sorted dataframes for both cities
neighborhoods_venues_sorted= get_top_categories_for_each_neighborhood(venues_df,10,grouped_df)

### D.2.3) Clustering

In [46]:
# fifth let's get the labels of the clusters of the whole grouped dataframe
labels= get_kmeans_cluster_labels(10,grouped_df)

In [47]:
# sixth let's merge the label
merged_df= get_merged_df(main_df,neighborhoods_venues_sorted,labels)

The final Dataframe which contains the clusters labels for each neighborhood

In [48]:
merged_df

Unnamed: 0,City,City_Arabic,Neighborhood,Neighborhood_Arabic,ZipCode,Latitude,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amman,عمان,Abdoun Al Janobi (S),عبدون الجنوبي,11183,31.942011,35.881741,1,Café,Restaurant,Burger Joint,Wings Joint,Gym,Breakfast Spot,Middle Eastern Restaurant,Ice Cream Shop,Salad Place,Sandwich Place
1,Amman,عمان,Abdoun Alshamali (N),عبدون الشمالي,11183,31.948469,35.893509,1,Café,Restaurant,Ice Cream Shop,Fast Food Restaurant,Lounge,Burger Joint,Middle Eastern Restaurant,Donut Shop,Coffee Shop,Spa
2,Amman,عمان,Abu Alanda,ابو علندا,11592,31.905396,35.960555,1,Department Store,Snack Place,Plaza,Cheese Shop,Supermarket,Dessert Shop,Eastern European Restaurant,Duty-free Shop,Donut Shop,Doner Restaurant
3,Amman,عمان,Abu Alya,أبو عليا,11946,32.001043,35.970014,9,Coffee Shop,Theme Park Ride / Attraction,Diner,Falafel Restaurant,Electronics Store,Eastern European Restaurant,Duty-free Shop,Donut Shop,Doner Restaurant,Dog Run
4,Amman,عمان,Abu-Nsair,أبو نصير,11937,32.052860,35.876480,1,Pizza Place,Ice Cream Shop,Shopping Mall,Food & Drink Shop,Gym / Fitness Center,Gift Shop,Eastern European Restaurant,Donut Shop,Grocery Store,Doner Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
132,Irbid,اربد,Irbid Central,وسط البلد,21110,32.543850,35.859501,1,Asian Restaurant,Middle Eastern Restaurant,Grocery Store,Fast Food Restaurant,Café,Restaurant,Dessert Shop,Eastern European Restaurant,Duty-free Shop,Donut Shop
133,Irbid,اربد,University of Science and Technology,جامعة العلوم,22110,32.494578,35.990078,1,Restaurant,Burrito Place,Pool,Dance Studio,Eastern European Restaurant,Duty-free Shop,Donut Shop,Doner Restaurant,Dog Run,Diner
134,Irbid,اربد,Juhfiyah,جحفيه,21622,32.545000,35.856000,1,Food Court,Asian Restaurant,Café,Middle Eastern Restaurant,Coffee Shop,Fast Food Restaurant,Go Kart Track,Gift Shop,Electronics Store,Eastern European Restaurant
135,Irbid,اربد,Kufr Jayez,كفر جايز,21125,32.619701,35.825802,8,Farm,Café,Fried Chicken Joint,Diner,Falafel Restaurant,Electronics Store,Eastern European Restaurant,Duty-free Shop,Donut Shop,Doner Restaurant


### D.2.4) Visualize the clusters for each neighborhood

Below is the **Map of Amman** city with the **clusters** of venues categories of its neighborhoods

In [53]:
clusters_map= get_clusters_map(10,amman_location,merged_df)
clusters_map

## E. Discussion 

The goal of this project was to find the most similar and nearest neighborhood to the company headquarter

Since the home neighborhood in Irbid is **Yarmouk University** and its Cluster is 1
We can see that most of the neighborhoods in Amman are on the same Cluster. <br>
and the most nearest neighborhood in Amman close to the company headquarter with the same amenities is **Jabal Al Hussein Al Gharbi**

Lets have a look to the common  venues in Cluster 1

In [54]:
neighborhoods_venues_sorted[(neighborhoods_venues_sorted['Cluster Labels']==1)&(neighborhoods_venues_sorted['Neighborhood'].isin(['Jabal Al Hussein Al Gharbi','Yarmouk University']))]

Unnamed: 0,Cluster Labels,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
75,1,Jabal Al Hussein Al Gharbi,Café,Ice Cream Shop,Dessert Shop,Shopping Mall,Hookah Bar,Turkish Restaurant,Hotel,Mediterranean Restaurant,Intersection,Eastern European Restaurant
135,1,Yarmouk University,Café,Donut Shop,Pizza Place,Dessert Shop,Fried Chicken Joint,Middle Eastern Restaurant,Fast Food Restaurant,Burger Joint,Bagel Shop,Road


## F. Conclusion: 
Data Science is widely used field, which can be used in a vary real world problems. such as the above one where we used the data to cluster neighborhoods in Jordan country based on the most common venues in those neighborhoods. 

## G. References: 
- LIST OF ALL ZIP/POSTAL CODES IN JORDAN: __[https://gpsarab.com/shop11/en/content/12-zip-code-in-jordan](https://gpsarab.com/shop11/en/content/12-zip-code-in-jordan)__
- Foursqaure API