# Lahore Battle of Neighbourhood Councils

Lahore is the capital city of Punjab Province with more than 11 Million according to the 2017 Census and its keep increasing due to immigrants from the other cities. Lahore exerts a strong cultural influence over Pakistan. The city also hosts much of Pakistan's tourist industry, with major attractions including the Walled City, the famous Badshahi and Wazir Khan mosques and Sikh shrines. Lahore is also home to the Lahore Fort and Shalimar Gardens, both of which are UNESCO World Heritage Sites.

Let’s assume that you live in any neighborhood council of Lahore and You love your neighborhood, mainly because of all the great amenities and other types of venues that exist in the neighborhood, such as restaurants, shopping mall, historic site. And now you have an offer letter from a great company on the other neighborhood council of the city and great opportunity. However, it is far away from your current residence and you want to relocate yourself in another union council. Wouldn't it be great if you are able to determine neighborhoods on the other side of the city that are the same as your current neighborhood, and if not, perhaps similar neighborhoods that are at least closer to your new job?
The aim of this project is to explore the Neighborhood Councils of Lahore city to find the 10 most common venues in each Union Council.

This information provided by this report would be useful for people who are interested in relocating to a different part of the city and are interested in finding new neighborhoods that are highly similar to their existing neighborhood.


## Importing Libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import geopy
from geopy.extra.rate_limiter import RateLimiter

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

## Scrapping Neigbhorhood Councils

In [2]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_towns_in_Lahore').text
soup = BeautifulSoup(source,'lxml')

In [3]:
uc= soup.find("div", class_="div-col columns column-width").text
print(uc)


Union Council No 1 Begum Kot Shahdara
Union Council No 2 Yousif Park
Union Council No 3 Kot Kamboh
Union Council No 4 Shamsabad Shahdara
Union Council No 5 Chah Jhabbay Wala
Union Council No 6 Aziz Colony Shahdra
Union Council No 7 Lajpat Nagar
Union Council No 8 Faisal Park
Union Council No 9 Javid Park
Union Council No 10 Qaisar Town
Union Council No 11 Majeed Park
Union Council No 12 Qazi Park
Union Council No 13 Ravi Clifton Colony
Union Council No 14 Ladhay Shah
Union Council No 15 Qila Lakshan Singh
Union Council No 16 Auqaf Colony
Union Council No 17 Farooq Ganjj
Union Council No 18 Hanif Park
Union Council No 19 Siddique Pura
Union Council No 20 Larix Park
Union Council No 21 Badar Colony
Union Council No 22 Data Nagar
Union Council No 23 Siddiqa Colony
Union Council No 24 Bhagat Pura
Union Council No 25 Jhuggian
Union Council No 26 Akram Park
Union Council No 27 Fazal Park
Union Council No 28 Jahangir Park
Union Council No 29 Usman Ganjj
Union Council No 30 Manzorabad
Union C

## Data Cleaning
Data contain blank fields. We have to remove it and convert this list into data frame 

In [4]:
uc_list=uc.split("\n")
uc_list[:] = [x for x in uc_list if x] #remove empty elements
uc_list

['Union Council No 1 Begum Kot Shahdara',
 'Union Council No 2 Yousif Park',
 'Union Council No 3 Kot Kamboh',
 'Union Council No 4 Shamsabad Shahdara',
 'Union Council No 5 Chah Jhabbay Wala',
 'Union Council No 6 Aziz Colony Shahdra',
 'Union Council No 7 Lajpat Nagar',
 'Union Council No 8 Faisal Park',
 'Union Council No 9 Javid Park',
 'Union Council No 10 Qaisar Town',
 'Union Council No 11 Majeed Park',
 'Union Council No 12 Qazi Park',
 'Union Council No 13 Ravi Clifton Colony',
 'Union Council No 14 Ladhay Shah',
 'Union Council No 15 Qila Lakshan Singh',
 'Union Council No 16 Auqaf Colony',
 'Union Council No 17 Farooq Ganjj',
 'Union Council No 18 Hanif Park',
 'Union Council No 19 Siddique Pura',
 'Union Council No 20 Larix Park',
 'Union Council No 21 Badar Colony',
 'Union Council No 22 Data Nagar',
 'Union Council No 23 Siddiqa Colony',
 'Union Council No 24 Bhagat Pura',
 'Union Council No 25 Jhuggian',
 'Union Council No 26 Akram Park',
 'Union Council No 27 Fazal Park

In [5]:
df=pd.DataFrame(uc_list)
df=df[0].str.split(n=4,expand=True)
df.drop(columns=[0,1,2],inplace=True)
df.rename(columns={3: "UCNo", 4: "Name"},inplace=True)

## Geolocation
In order to get the more accurate results from the geopy libray we have to add the City and coutnry name after each neighborhood council name and then appplyy the geopy RateLimiter function to locate the Coordinates point.

In [6]:
df['NL'] =  df['Name'].astype(str)+ ",Lahore Pakistan"
df.head()

Unnamed: 0,UCNo,Name,NL
0,1,Begum Kot Shahdara,"Begum Kot Shahdara,Lahore Pakistan"
1,2,Yousif Park,"Yousif Park,Lahore Pakistan"
2,3,Kot Kamboh,"Kot Kamboh,Lahore Pakistan"
3,4,Shamsabad Shahdara,"Shamsabad Shahdara,Lahore Pakistan"
4,5,Chah Jhabbay Wala,"Chah Jhabbay Wala,Lahore Pakistan"


In [7]:
locator = geopy.geocoders.Nominatim(user_agent="myGeocoder")
# conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
# create location column
df['location'] = df['NL'].apply(geocode)
# create longitude, laatitude and altitude from location column (returns tuple)
df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)

In [8]:
df

Unnamed: 0,UCNo,Name,NL,location,point
0,1,Begum Kot Shahdara,"Begum Kot Shahdara,Lahore Pakistan",,
1,2,Yousif Park,"Yousif Park,Lahore Pakistan",,
2,3,Kot Kamboh,"Kot Kamboh,Lahore Pakistan",,
3,4,Shamsabad Shahdara,"Shamsabad Shahdara,Lahore Pakistan",,
4,5,Chah Jhabbay Wala,"Chah Jhabbay Wala,Lahore Pakistan",,
...,...,...,...,...,...
269,270,Araiyan,"Araiyan,Lahore Pakistan","(Faisalabad-Sheikhupura-Lahore Road, Toll Plaz...","(31.4783407, 73.2237338, 0.0)"
270,271,Jia Bagga,"Jia Bagga,Lahore Pakistan","(Jia Bagga, Lake City Main Boulevard, Fazaia H...","(31.3304027, 74.2667851, 0.0)"
271,272,Raiwind Rural,"Raiwind Rural,Lahore Pakistan","(Raiwind, Lahore District, پنجاب, 55150, پاکست...","(31.2442237, 74.215911, 0.0)"
272,273,Raiwind Urban,"Raiwind Urban,Lahore Pakistan",,


Our data contains missing geolocation points so we have to remove the rows containing null value in location.

In [9]:
df.dropna(inplace=True)
df.reset_index(drop=True)

Unnamed: 0,UCNo,Name,NL,location,point
0,8,Faisal Park,"Faisal Park,Lahore Pakistan","(Faisal Park, Makhan Pura, China Scheme, لاہور...","(31.6037915, 74.35944568430989, 0.0)"
1,11,Majeed Park,"Majeed Park,Lahore Pakistan","(Abid Majeed Road (Featherston Road), St. John...","(31.5223002, 74.3723858, 0.0)"
2,12,Qazi Park,"Qazi Park,Lahore Pakistan","(Qazi Muhammad Isa Road, Faisal Town, Johar To...","(31.4756793, 74.3053802, 0.0)"
3,18,Hanif Park,"Hanif Park,Lahore Pakistan","(Hanif Garder Street 1, Sharif Pura, Haji Pura...","(31.5871834, 74.4201815, 0.0)"
4,19,Siddique Pura,"Siddique Pura,Lahore Pakistan","(Jamia Masjid Siddique-e-Akbar, Canal Bank Roa...","(31.5748716, 74.42893671669796, 0.0)"
...,...,...,...,...,...
119,262,Shamkay Bhattian,"Shamkay Bhattian,Lahore Pakistan","(Shamkay Bhattian, Lahore District, پنجاب, پاک...","(31.3305959, 74.1076545, 0.0)"
120,263,Manga,"Manga,Lahore Pakistan","(Manga, Lahore District, پنجاب, 55270, پاکستان...","(31.3070462, 74.0485141, 0.0)"
121,270,Araiyan,"Araiyan,Lahore Pakistan","(Faisalabad-Sheikhupura-Lahore Road, Toll Plaz...","(31.4783407, 73.2237338, 0.0)"
122,271,Jia Bagga,"Jia Bagga,Lahore Pakistan","(Jia Bagga, Lake City Main Boulevard, Fazaia H...","(31.3304027, 74.2667851, 0.0)"


#### Split point column into latitude, longitude and altitude columns

In [10]:
df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)
df.drop(columns=['NL', 'location','point','altitude'])
df.reset_index(drop=True)

Unnamed: 0,UCNo,Name,NL,location,point,latitude,longitude,altitude
0,8,Faisal Park,"Faisal Park,Lahore Pakistan","(Faisal Park, Makhan Pura, China Scheme, لاہور...","(31.6037915, 74.35944568430989, 0.0)",31.603791,74.359446,0.0
1,11,Majeed Park,"Majeed Park,Lahore Pakistan","(Abid Majeed Road (Featherston Road), St. John...","(31.5223002, 74.3723858, 0.0)",31.522300,74.372386,0.0
2,12,Qazi Park,"Qazi Park,Lahore Pakistan","(Qazi Muhammad Isa Road, Faisal Town, Johar To...","(31.4756793, 74.3053802, 0.0)",31.475679,74.305380,0.0
3,18,Hanif Park,"Hanif Park,Lahore Pakistan","(Hanif Garder Street 1, Sharif Pura, Haji Pura...","(31.5871834, 74.4201815, 0.0)",31.587183,74.420181,0.0
4,19,Siddique Pura,"Siddique Pura,Lahore Pakistan","(Jamia Masjid Siddique-e-Akbar, Canal Bank Roa...","(31.5748716, 74.42893671669796, 0.0)",31.574872,74.428937,0.0
...,...,...,...,...,...,...,...,...
119,262,Shamkay Bhattian,"Shamkay Bhattian,Lahore Pakistan","(Shamkay Bhattian, Lahore District, پنجاب, پاک...","(31.3305959, 74.1076545, 0.0)",31.330596,74.107654,0.0
120,263,Manga,"Manga,Lahore Pakistan","(Manga, Lahore District, پنجاب, 55270, پاکستان...","(31.3070462, 74.0485141, 0.0)",31.307046,74.048514,0.0
121,270,Araiyan,"Araiyan,Lahore Pakistan","(Faisalabad-Sheikhupura-Lahore Road, Toll Plaz...","(31.4783407, 73.2237338, 0.0)",31.478341,73.223734,0.0
122,271,Jia Bagga,"Jia Bagga,Lahore Pakistan","(Jia Bagga, Lake City Main Boulevard, Fazaia H...","(31.3304027, 74.2667851, 0.0)",31.330403,74.266785,0.0


In [11]:
df.to_csv("data.csv") # Save the location Data

# Data Visualization

In [12]:
Lahore_map = folium.Map(location=[df.latitude.mean(), df.longitude.mean()], zoom_start=11)

In [13]:
for lat, lng, ucno, name in zip(df.latitude, df.longitude, df.UCNo, df.Name):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=f'{ucno},{name}',
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(Lahore_map)

In [14]:
Lahore_map

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

### Define Foursquare Credentials and Version

In [15]:
CLIENT_ID = 'WZD**********************************************QB' #  Foursquare ID
CLIENT_SECRET = '0SB*******************************************3N' # Foursquare Secret
VERSION = '20200227' # Foursquare API version
LIMIT =100
radius =500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WZD**********************************************QB
CLIENT_SECRET:0SB*******************************************3N


#### Let's create a function to process all the neighborhoods in Lahore and get the top 100 venues that are in each Council within a radius of 500 meters

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Name', 
                  'Name Latitude', 
                  'Name Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
lahore_venues = getNearbyVenues(names=df['Name'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

Faisal Park
Majeed Park
Qazi Park
Hanif Park
Siddique Pura
Badar Colony
Data Nagar
Jhuggian
Akram Park
Fazal Park
Jahangir Park
Faiz Bagh
Mochi Gate
Azam Market
Shah Alam Market
Rang Mahal
Lohari Gate
Hussain Park
Makhan Pura
Dhobi Ghat
Sultan Pura
Misri Shah
Chah Miran
Kasur Pura
Amin Park
Nasir Park
Karim Park
Mian Munshi Park
Tauheed Park
Chohan Park
Beadon Road
New Anarkali
Riwaz Garden
Islam Pura
Sanda
Mozang
Sir Ganga Ram Hospital
Sarai Sultan
Raj Garh
Gulshan e Ravi
New Chauburji Park
Gulshan Ravi A Block
G.O.R Lahore
Islamia Park
Bahawalpur House
Rehman Pura
New Samanabad
Muhammad Pura
Union Park
Sabzazar Block B
Sabzazar K Block
Mustafa Park
Pakki Thatti
Thokar Niaz Baig
Hanjarwal
Mustafa Town
Johar Town
Johar Town / PIA Society
EME Society
Railway Colony
Crown Park
Garhi Shahu Lahore
Baghban Pura
Begumpura
Madina Colony
Naseerabad
Darogha Wala
Mominpura
Nishtar Colony
Fatehgarh
Rasheed Pura
Salamat Pura
Harbans Pura
Panj Pir
Tajpura
Al Faisal Town
Faisal Park
Ghaziabad
Bilal 

#### Let's check the size of the resulting dataframe

In [18]:
print(lahore_venues.shape)
lahore_venues.rename(columns={'Neighborhood':'Name','Neighborhood Latitude':'latitude','Neighborhood Longitude':'longitude'},inplace=True)
lahore_venues.head()

(354, 7)


Unnamed: 0,Name,Name Latitude,Name Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Faisal Park,31.603791,74.359446,Optp,31.601458,74.358464,Restaurant
1,Majeed Park,31.5223,74.372386,Aramish,31.524728,74.371419,Spa
2,Majeed Park,31.5223,74.372386,GOGO Resturant,31.522615,74.368309,Restaurant
3,Majeed Park,31.5223,74.372386,Shell,31.521304,74.368287,Gas Station
4,Majeed Park,31.5223,74.372386,CSD Mall Road,31.526227,74.371276,Department Store


Let's check how many venues were returned for each neighborhood

In [19]:
lahore_venues.groupby('Name').count()

Unnamed: 0_level_0,Name Latitude,Name Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Akram Park,2,2,2,2,2,2
Al Faisal Town,8,8,8,8,8,8
Amin Park,4,4,4,4,4,4
Azam Market,5,5,5,5,5,5
Baghban Pura,2,2,2,2,2,2
...,...,...,...,...,...,...
Township Sector B-2,4,4,4,4,4,4
Union Park,6,6,6,6,6,6
Wafaqi Colony,2,2,2,2,2,2
Wapda Town,4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [20]:
print('There are {} uniques categories.'.format(len(lahore_venues['Venue Category'].unique())))

There are 92 uniques categories.


## Analyze Each Neighborhood

In [21]:
# one hot encoding
lahore_onehot = pd.get_dummies(lahore_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
lahore_onehot['Name'] = lahore_venues['Name'] 

# move neighborhood column to the first column
fixed_columns = [lahore_onehot.columns[-1]] + list(lahore_onehot.columns[:-1])
lahore_onehot = lahore_onehot[fixed_columns]

lahore_onehot.head()

Unnamed: 0,Name,Accessories Store,Asian Restaurant,Auto Workshop,BBQ Joint,Bakery,Basketball Court,Bike Rental / Bike Share,Board Shop,Bookstore,...,Sporting Goods Shop,Supplement Shop,Sushi Restaurant,Tailor Shop,Tea Room,Theme Park,Tourist Information Center,Travel & Transport,Warehouse Store,Zoo
0,Faisal Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Majeed Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Majeed Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Majeed Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Majeed Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [22]:
lahore_onehot.shape

(354, 93)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [23]:
lahore_grouped = lahore_onehot.groupby('Name').mean().reset_index()
lahore_grouped

Unnamed: 0,Name,Accessories Store,Asian Restaurant,Auto Workshop,BBQ Joint,Bakery,Basketball Court,Bike Rental / Bike Share,Board Shop,Bookstore,...,Sporting Goods Shop,Supplement Shop,Sushi Restaurant,Tailor Shop,Tea Room,Theme Park,Tourist Information Center,Travel & Transport,Warehouse Store,Zoo
0,Akram Park,0.0,0.500,0.0,0.000,0.00,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0
1,Al Faisal Town,0.0,0.125,0.0,0.375,0.00,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0
2,Amin Park,0.0,0.000,0.0,0.000,0.00,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
3,Azam Market,0.2,0.000,0.0,0.000,0.20,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0
4,Baghban Pura,0.0,0.000,0.0,0.000,0.00,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,Township Sector B-2,0.0,0.000,0.0,0.000,0.00,0.25,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0
91,Union Park,0.0,0.000,0.0,0.000,0.00,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0
92,Wafaqi Colony,0.0,0.000,0.0,0.000,0.50,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0
93,Wapda Town,0.0,0.000,0.0,0.000,0.25,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [24]:
num_top_venues = 5

for hood in lahore_grouped['Name']:
    print("----"+hood+"----")
    temp = lahore_grouped[lahore_grouped['Name'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Akram Park----
                  venue  freq
0      Asian Restaurant   0.5
1        Breakfast Spot   0.5
2           Men's Store   0.0
3  Pakistani Restaurant   0.0
4           Outlet Mall   0.0


----Al Faisal Town----
              venue  freq
0         BBQ Joint  0.38
1  Asian Restaurant  0.12
2    Breakfast Spot  0.12
3           Brewery  0.12
4     Shopping Mall  0.12


----Amin Park----
                  venue  freq
0  Pakistani Restaurant  0.25
1    Travel & Transport  0.25
2          Gourmet Shop  0.25
3            Restaurant  0.25
4           Men's Store  0.00


----Azam Market----
               venue  freq
0      Historic Site   0.4
1             Bakery   0.2
2             Market   0.2
3  Accessories Store   0.2
4          BBQ Joint   0.0


----Baghban Pura----
                  venue  freq
0              Cemetery   0.5
1  Fast Food Restaurant   0.5
2     Accessories Store   0.0
3              Pharmacy   0.0
4  Pakistani Restaurant   0.0


----Bahawalpur House----
      

              venue  freq
0  Asian Restaurant  0.14
1         BBQ Joint  0.14
2       Pizza Place  0.14
3         Juice Bar  0.14
4       Flea Market  0.14


----Lohari Gate----
                  venue  freq
0         Historic Site  0.25
1  Pakistani Restaurant  0.25
2             Bookstore  0.25
3       Supplement Shop  0.25
4           Men's Store  0.00


----Majeed Park----
                venue  freq
0    Department Store   0.2
1  Chinese Restaurant   0.2
2         Gas Station   0.2
3          Restaurant   0.2
4                 Spa   0.2


----Mian Mir----
                venue  freq
0               Hotel  0.25
1              Bakery  0.25
2            Cemetery  0.25
3   Convenience Store  0.25
4  Miscellaneous Shop  0.00


----Mian Munshi Park----
                  venue  freq
0           Bus Station  0.33
1         Auto Workshop  0.33
2           Tailor Shop  0.33
3     Accessories Store  0.00
4  Pakistani Restaurant  0.00


----Misri Shah----
                       venue  freq
0 

4        Outlet Mall   0.0


----Sector 2 Township----
               venue  freq
0         Restaurant   0.5
1               Park   0.5
2  Accessories Store   0.0
3        Men's Store   0.0
4        Outlet Mall   0.0


----Shadman Colony----
                venue  freq
0         Pizza Place  0.14
1              Bakery  0.14
2  Italian Restaurant  0.14
3           Juice Bar  0.14
4                Café  0.14


----Shah Alam Market----
                       venue  freq
0              Historic Site  0.25
1             Clothing Store  0.25
2             Breakfast Spot  0.25
3                     Market  0.25
4  Middle Eastern Restaurant  0.00


----Shah Jamal----
                    venue  freq
0               Juice Bar  0.25
1  Furniture / Home Store  0.25
2        Business Service  0.25
3             Bus Station  0.25
4       Accessories Store  0.00


----Siddique Pura----
                       venue  freq
0                Bus Station   1.0
1          Accessories Store   0.0
2  Middle E

#### Let's put that into a *pandas* dataframe
First, let's write a function to sort the venues in descending order.

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Name'] = lahore_grouped['Name']

for ind in np.arange(lahore_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(lahore_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Akram Park,Asian Restaurant,Breakfast Spot,Zoo,Food,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant
1,Al Faisal Town,BBQ Joint,Brewery,Shopping Mall,Asian Restaurant,Breakfast Spot,Food,Department Store,Dessert Shop,Diner,Doctor's Office
2,Amin Park,Gourmet Shop,Restaurant,Pakistani Restaurant,Travel & Transport,Gas Station,Garden,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store
3,Azam Market,Historic Site,Accessories Store,Bakery,Market,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
4,Baghban Pura,Cemetery,Fast Food Restaurant,Zoo,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fish & Chips Shop


## 4. Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 5 clusters.

In [27]:
# set number of clusters
kclusters = 5

lahore_grouped_clustering = lahore_grouped.drop('Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lahore_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 4, 2, 4, 2, 1, 2, 4])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [28]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

lahore_merged = df.reset_index(drop=True)
lahore_merged
# # merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
lahore_merged = lahore_merged.join(neighborhoods_venues_sorted.set_index('Name'), on='Name',how='inner')

lahore_merged.head() # check the last columns!

Unnamed: 0,UCNo,Name,NL,location,point,latitude,longitude,altitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,8,Faisal Park,"Faisal Park,Lahore Pakistan","(Faisal Park, Makhan Pura, China Scheme, لاہور...","(31.6037915, 74.35944568430989, 0.0)",31.603791,74.359446,0.0,0,Restaurant,Zoo,Flea Market,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant
76,155,Faisal Park,"Faisal Park,Lahore Pakistan","(Faisal Park, Makhan Pura, China Scheme, لاہور...","(31.6037915, 74.35944568430989, 0.0)",31.603791,74.359446,0.0,0,Restaurant,Zoo,Flea Market,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant
1,11,Majeed Park,"Majeed Park,Lahore Pakistan","(Abid Majeed Road (Featherston Road), St. John...","(31.5223002, 74.3723858, 0.0)",31.5223,74.372386,0.0,1,Chinese Restaurant,Gas Station,Department Store,Restaurant,Spa,Fish & Chips Shop,Coffee Shop,Comfort Food Restaurant,Convenience Store,Dessert Shop
2,12,Qazi Park,"Qazi Park,Lahore Pakistan","(Qazi Muhammad Isa Road, Faisal Town, Johar To...","(31.4756793, 74.3053802, 0.0)",31.475679,74.30538,0.0,1,BBQ Joint,Brewery,Shopping Mall,Asian Restaurant,Breakfast Spot,Food,Department Store,Dessert Shop,Diner,Doctor's Office
4,19,Siddique Pura,"Siddique Pura,Lahore Pakistan","(Jamia Masjid Siddique-e-Akbar, Canal Bank Roa...","(31.5748716, 74.42893671669796, 0.0)",31.574872,74.428937,0.0,1,Bus Station,Zoo,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant,Fish & Chips Shop


Finally, let's visualize the resulting clusters

In [29]:
# create map
map_clusters = folium.Map(location=[df.latitude.mean(), df.longitude.mean()], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lahore_merged['latitude'], lahore_merged['longitude'], lahore_merged['Name'], lahore_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters
Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.
#### Cluster 1

In [30]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 0, lahore_merged.columns[[1] + list(range(5, lahore_merged.shape[1]))]]

Unnamed: 0,Name,latitude,longitude,altitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Faisal Park,31.603791,74.359446,0.0,0,Restaurant,Zoo,Flea Market,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant
76,Faisal Park,31.603791,74.359446,0.0,0,Restaurant,Zoo,Flea Market,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant
109,Sector 2 Township,31.456506,74.32008,0.0,0,Park,Restaurant,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
110,Sector 1 Township,31.456506,74.32008,0.0,0,Park,Restaurant,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office


#### Cluster 2

In [31]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 1, lahore_merged.columns[[1] + list(range(5, lahore_merged.shape[1]))]]

Unnamed: 0,Name,latitude,longitude,altitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Majeed Park,31.5223,74.372386,0.0,1,Chinese Restaurant,Gas Station,Department Store,Restaurant,Spa,Fish & Chips Shop,Coffee Shop,Comfort Food Restaurant,Convenience Store,Dessert Shop
2,Qazi Park,31.475679,74.30538,0.0,1,BBQ Joint,Brewery,Shopping Mall,Asian Restaurant,Breakfast Spot,Food,Department Store,Dessert Shop,Diner,Doctor's Office
4,Siddique Pura,31.574872,74.428937,0.0,1,Bus Station,Zoo,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant,Fish & Chips Shop
6,Data Nagar,31.5988,74.318827,0.0,1,Bus Station,Zoo,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant,Fish & Chips Shop
8,Akram Park,31.601234,74.341886,0.0,1,Asian Restaurant,Breakfast Spot,Zoo,Food,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant
10,Jahangir Park,31.623016,74.300961,0.0,1,Historic Site,Flower Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant,Fish & Chips Shop
11,Faiz Bagh,31.582511,74.337926,0.0,1,Bookstore,Shopping Mall,Zoo,Flea Market,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
12,Mochi Gate,31.57695,74.321564,0.0,1,Clothing Store,Neighborhood,Café,Breakfast Spot,Flea Market,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
14,Shah Alam Market,31.577838,74.317898,0.0,1,Clothing Store,Historic Site,Breakfast Spot,Market,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
15,Rang Mahal,31.576582,74.321149,0.0,1,Clothing Store,Neighborhood,Café,Breakfast Spot,Flea Market,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office


#### Cluster 3

In [32]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 2, lahore_merged.columns[[1] + list(range(5, lahore_merged.shape[1]))]]

Unnamed: 0,Name,latitude,longitude,altitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Sabzazar Block B,31.521793,74.270126,0.0,2,BBQ Joint,Fast Food Restaurant,Zoo,Food,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fish & Chips Shop
50,Sabzazar K Block,31.521793,74.270126,0.0,2,BBQ Joint,Fast Food Restaurant,Zoo,Food,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fish & Chips Shop
59,Railway Colony,31.572491,74.351196,0.0,2,Fast Food Restaurant,Zoo,Flower Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fish & Chips Shop
62,Baghban Pura,31.580327,74.36831,0.0,2,Cemetery,Fast Food Restaurant,Zoo,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fish & Chips Shop
63,Begumpura,31.579621,74.365473,0.0,2,Cemetery,Fast Food Restaurant,Zoo,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fish & Chips Shop
82,Habibullah Road,31.558383,74.34404,0.0,2,Fast Food Restaurant,Zoo,Flower Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fish & Chips Shop
86,Barki,31.525978,74.457172,0.0,2,Multiplex,Fast Food Restaurant,Zoo,Flea Market,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office


#### Cluster 4

In [33]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 3, lahore_merged.columns[[1] + list(range(5, lahore_merged.shape[1]))]]

Unnamed: 0,Name,latitude,longitude,altitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
66,Darogha Wala,31.58709,74.387794,0.0,3,Pakistani Restaurant,Flea Market,Coffee Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant
71,Salamat Pura,31.584019,74.394746,0.0,3,Pakistani Restaurant,Flea Market,Coffee Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office,Fast Food Restaurant


#### Cluster 5

In [34]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 4, lahore_merged.columns[[1] + list(range(5, lahore_merged.shape[1]))]]

Unnamed: 0,Name,latitude,longitude,altitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Fazal Park,31.523174,74.298467,0.0,4,Food & Drink Shop,Bakery,Fast Food Restaurant,Breakfast Spot,Food,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
13,Azam Market,31.584047,74.320878,0.0,4,Historic Site,Accessories Store,Bakery,Market,Flower Shop,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
23,Kasur Pura,31.592512,74.303201,0.0,4,Bakery,National Park,Gym / Fitness Center,Garden,Tea Room,Burger Joint,Flea Market,Convenience Store,Department Store,Dessert Shop
26,Karim Park,31.59097,74.298861,0.0,4,Bakery,Burger Joint,Gym / Fitness Center,Garden,Tea Room,River,Zoo,Convenience Store,Department Store,Dessert Shop
29,Chohan Park,31.566751,74.286396,0.0,4,Pakistani Restaurant,Pizza Place,Bakery,Flea Market,Coffee Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner
35,Mozang,31.553745,74.314149,0.0,4,BBQ Joint,Bakery,Pakistani Restaurant,Breakfast Spot,Flower Shop,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner
37,Sarai Sultan,31.57886,74.330742,0.0,4,Flea Market,Pizza Place,Bakery,Zoo,Comfort Food Restaurant,Convenience Store,Department Store,Dessert Shop,Diner,Doctor's Office
39,Gulshan e Ravi,31.550405,74.277338,0.0,4,Garden,Tea Room,Bakery,Rest Area,Soup Place,Zoo,Fish & Chips Shop,Comfort Food Restaurant,Convenience Store,Department Store
44,Bahawalpur House,31.550756,74.309178,0.0,4,Department Store,Bakery,Pakistani Restaurant,Zoo,Flea Market,Comfort Food Restaurant,Convenience Store,Dessert Shop,Diner,Doctor's Office
45,Rehman Pura,31.522471,74.321575,0.0,4,BBQ Joint,Pharmacy,Bakery,Department Store,Ice Cream Shop,Zoo,Flower Shop,Convenience Store,Dessert Shop,Diner


# Conclusion and Discussion
I have created 5 different clusters and each cluster have similar neighborhood venues such as restaurants, parks and historic places. The purpose of this project was to give the options where to relocate in the City of Lahore.

As a recommendation to those who plan to relocate, location selection is only one fundamental problem to think over. Final decision on relocation will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / status of every neighborhood etc.