# Exploring Neighborhoods in Karachi, Pakistan

## Table of Content
1. [Introduction](#intro)
2. [Data Collection](#dc)
3. [Methodology](#meth)
4. [Analysis](#ana)
5. [Results and Discussion](#rd)
6. [Conclusion](#conc)

## Introduction <a name="intro"></a>

This project aims to identify popular neighborhoods and explore its venues in Karachi, Pakistan which is based on their overall rating, its reviews and also its price ranges using Foursquare API and cluster the neighborhoods and venues to help visitors select the restaurants that suit them the best.

Whenever a user is visiting a city they start looking for places to visit during their stay. They primarily look for places based on the venue ratings across all venues and the average prices such that the locations fits in their budget.

Here, we'll identify places that are fit for various individuals based on the information collected from the API and Data Science. Once we have the plot neighborhoods with their venues, any company can launch an application using the same data and suggest users such information.

## Data Collection <a name="dc"></a>

### Import Libraries

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_Union_Councils_of_Karachi').text
soup = BeautifulSoup(source,'lxml')
uc_tab = soup.find('div', class_='toc').ul
district = []
for lis in uc_tab.find_all('li',class_='toclevel-2'):
    district.append(lis.find('span', class_='toctext').text)

In [5]:
district

['Baldia Town',
 'Bin Qasim Town',
 'Gadap Town',
 'Gulberg Town',
 'Gulshan Town',
 'Jamshed Town',
 'Kemari Town',
 'Korangi Town',
 'Landhi Town',
 'Liaquatabad Town',
 'Lyari Town',
 'Malir Town',
 'New Karachi Town',
 'North Nazimabad Town',
 'Orangi Town',
 'Saddar Town',
 'Shah Faisal Town',
 'S.I.T.E. Town (Sindh Industrial & Trading Estate)']

## for town

In [6]:
town = []
districts = [] 
dist_and_town = {}
for dis_links in district:
    source2 = requests.get('https://en.wikipedia.org/wiki/{}'.format(dis_links)).text
    soup2 = BeautifulSoup(source2,'lxml')
    try:
        neigh_tab = soup2.find('div',class_='div-col columns column-width')
        for lis2 in neigh_tab.find_all('li'):
            town.append(lis2.text)
            districts.append(dis_links)
    except Exception as e:
        pass

dist_and_town = {'Districts':districts,'Town':town}

## create dataframe

In [7]:
df_khi = pd.DataFrame(dist_and_town)
df_khi.head()

Unnamed: 0,Districts,Town
0,Bin Qasim Town,Abdullah Goth
1,Bin Qasim Town,Cattle Colony
2,Bin Qasim Town,Gaghar
3,Bin Qasim Town,Green Park City
4,Bin Qasim Town,Gulshan-e-Hadeed


In [8]:
df_khi.shape

(220, 2)

**Some Neighborhoods enter manually**

In [9]:
#save the file sp enter manually
#df_khi.to_csv('karachi.csv')

In [10]:
#read file
df_khi = pd.read_csv('karachi_district.csv')
df_khi.head()

Unnamed: 0,Districts,Town
0,Bin Qasim Town,Abdullah Goth
1,Bin Qasim Town,Cattle Colony
2,Bin Qasim Town,Gaghar
3,Bin Qasim Town,Green Park City
4,Bin Qasim Town,Gulshan-e-Hadeed


In [11]:
df_khi.shape

(294, 2)

In [12]:
import geocoder
from geopy.geocoders import Nominatim

**Find the Co-ordiantes of each Neighborhood**

In [11]:
latitude = []
longitude = []
for t_loc in df_khi['Town']: 
    address = '{},Karachi,Pakistan'.format(t_loc)
    geolocator = Nominatim(user_agent='khi_explorer')
    location = geolocator.geocode(address,timeout=1000)
    try:
        latitude.append(location.latitude)
        longitude.append(location.longitude)
        print('Latitude {} and Longitude {}'.format(latitude,longitude))
    except Exception as e:
        latitude.append("")
        longitude.append("")

Latitude [24.86654] and Longitude [67.2836719]
Latitude [24.86654, 24.8793108] and Longitude [67.2836719, 67.1987233]
Latitude [24.86654, 24.8793108, '', 24.8576588] and Longitude [67.2836719, 67.1987233, '', 67.2224746]
Latitude [24.86654, 24.8793108, '', 24.8576588, 24.869998799999998] and Longitude [67.2836719, 67.1987233, '', 67.2224746, 67.36005178758305]
Latitude [24.86654, 24.8793108, '', 24.8576588, 24.869998799999998, 24.79160805] and Longitude [67.2836719, 67.1987233, '', 67.2224746, 67.36005178758305, 67.14091644916715]
Latitude [24.86654, 24.8793108, '', 24.8576588, 24.869998799999998, 24.79160805, 24.8520926] and Longitude [67.2836719, 67.1987233, '', 67.2224746, 67.36005178758305, 67.14091644916715, 67.1864717]
Latitude [24.86654, 24.8793108, '', 24.8576588, 24.869998799999998, 24.79160805, 24.8520926, 24.9005091] and Longitude [67.2836719, 67.1987233, '', 67.2224746, 67.36005178758305, 67.14091644916715, 67.1864717, 67.103604]
Latitude [24.86654, 24.8793108, '', 24.85765

In [12]:
len(latitude)

294

In [13]:
len(longitude)

294

In [14]:
df_khi['Latitude'] = latitude
df_khi['Longitude'] = longitude
df_khi.head()

Unnamed: 0,Districts,Town,Latitude,Longitude
0,Bin Qasim Town,Abdullah Goth,24.8665,67.2837
1,Bin Qasim Town,Cattle Colony,24.8793,67.1987
2,Bin Qasim Town,Gaghar,,
3,Bin Qasim Town,Green Park City,24.8577,67.2225
4,Bin Qasim Town,Gulshan-e-Hadeed,24.87,67.3601


In [15]:
len(df_khi[df_khi['Latitude'] == ''])

81

In [16]:
df_khi.drop(df_khi[df_khi['Latitude'] == ""].index,axis=0,inplace=True)

In [17]:
len(df_khi[df_khi['Latitude'] == ''])

0

In [19]:
#df_khi.to_csv('karachi_lat_lng.csv',index=False)

In [13]:
df_khi = pd.read_csv('karachi_lat_lng.csv')
df_khi.head()

Unnamed: 0,Districts,Town,Latitude,Longitude
0,Bin Qasim Town,Abdullah Goth,24.86654,67.283672
1,Bin Qasim Town,Cattle Colony,24.879311,67.198723
2,Bin Qasim Town,Green Park City,24.857659,67.222475
3,Bin Qasim Town,Gulshan-e-Hadeed,24.869999,67.360052
4,Bin Qasim Town,Ibrahim Hyderi,24.791608,67.140916


In [14]:
df_khi.shape

(213, 4)

## folium

In [15]:
import folium
from pandas.io.json import json_normalize

In [16]:
map_khi = folium.Map(
    location=[24.9008, 67.1681], 
    zoom_start=10
)

In [17]:
for lat, lng, borough, neighborhood in zip(df_khi['Latitude'], df_khi['Longitude'], df_khi['Districts'], df_khi['Town']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_khi)  
    
map_khi

In [18]:
CLIENT_ID = 'KAJZOUZF20UXDL2H3MOEBLTIHGQ2HK1PDJ14Z1R0QCAJCYVA' # your Foursquare ID
CLIENT_SECRET = 'UOJCAODP5HOBJS1W451QQ3JQ3HSAWPQ2TL2WWMCJXLFAYD33' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KAJZOUZF20UXDL2H3MOEBLTIHGQ2HK1PDJ14Z1R0QCAJCYVA
CLIENT_SECRET:UOJCAODP5HOBJS1W451QQ3JQ3HSAWPQ2TL2WWMCJXLFAYD33


In [19]:
df_khi[df_khi['Town'] == 'Clifton'].index

Int64Index([202], dtype='int64')

In [20]:
neighborhood_latitude = df_khi['Latitude'][202] # neighborhood latitude value
neighborhood_longitude = df_khi['Longitude'][202] # neighborhood longitude value

neighborhood_name = df_khi['Town'][202] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Clifton are 24.8190552, 67.0262397.


In [21]:

LIMIT = 100 # limit of number of venues returned by Foursquare API



radius = 1000 # define radius




url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=KAJZOUZF20UXDL2H3MOEBLTIHGQ2HK1PDJ14Z1R0QCAJCYVA&client_secret=UOJCAODP5HOBJS1W451QQ3JQ3HSAWPQ2TL2WWMCJXLFAYD33&v=20180605&ll=24.8190552,67.0262397&radius=1000&limit=100'

In [22]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e82333eedbcad001ba82516'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Clifton',
  'headerFullLocation': 'Clifton, Karachi',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 39,
  'suggestedBounds': {'ne': {'lat': 24.82805520900001, 'lng': 67.036137044161},
   'sw': {'lat': 24.810055190999993, 'lng': 67.01634235583901}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b5fdbf6f964a520f9ce29e3',
       'name': 'Bar-B-Q Tonight',
       'location': {'address': 'Com 5/1, Sh-e-Firdousi, Boat Basin, Clifton Block-5',
        'crossStreet': 'Khayaban-e-Saadi, opp. Bilawal Chowrangi',
        'lat': 24.816200703197044,
        'lng

In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues[filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,id,name,categories,lat,lng
0,4b5fdbf6f964a520f9ce29e3,Bar-B-Q Tonight,BBQ Joint,24.816201,67.021181
1,4ba4e10ef964a52036be38e3,Mohatta Palace,History Museum,24.81455,67.032652
2,4dab1f0b1e72c1ab9c032a0b,Karachi Broast,Fast Food Restaurant,24.826819,67.026328
3,54c66c2c498e215fade90b7e,Tao - Pan Asian Cuisine,Asian Restaurant,24.827452,67.027946
4,4b8b3a6cf964a520919832e3,Boat Basin,Street Food Gathering,24.826821,67.025774


In [26]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

39 venues were returned by Foursquare.


##  Explore Neighborhoods in Karachi

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['id'],
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue ID','Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [29]:
khi_venues = getNearbyVenues(names=df_khi['Town'],
                                   latitudes=df_khi['Latitude'],
                                   longitudes=df_khi['Longitude']
                                  )


Abdullah Goth
Cattle Colony
Green Park City
Gulshan-e-Hadeed
Ibrahim Hyderi
Landhi Colony
Mujahid Colony
Quaidabad
Razzaqabad
Rehri Goth
Shah Latif Town
Sherpao Colony
Steel Town
Sindh Baloch Cooperative Housing Society
Essa Nagri
Gulshan-e-Iqbal I
Gulshan-e-Iqbal II
Gulzar-e-Hijri
Gulistan-e-Johar
Abbas Town
Jamali Colony
Metroville Colony
Pehlwan Goth
Safoora Goth
Shanti Nagar
Sachal Goth
KESC Society
Abyssinia Lines
Akhtar Colony
Azam Basti
Baloch Colony
Baltistani Society
Catholic Colony No. 1
Central Jacob Lines
Chanesar Goth
Defence View
Garden East
Garden West
Gulistan-e-Zafar
Jamshed Quarters
Mahmudabad
Manzoor Colony
Nursery
Pakistan Quarters
Patel Para
Sohrab Katrak Parsi Colony
Soldier Bazaar
Abdul Rehman Goth
Arbian
Darvesh Goth
Goth Lashkari
Goth Mohammad Ali
Goth Shaikhan
Gulshan-e-Sikandarabad
Haji Ali Goth
Jamali Goth
Machar Colony
Maripur
Masroor Colony
Mubarak Goth
Rais Goth
Rehman Goth
Salehabad
Sher Shah
Somar Goth
Sultanabad
Bilal Colony
Gulzar Colony
Hasrat Mohani

In [32]:
#khi_venues.to_csv('karachi_venues.csv',index=False)

In [33]:
print(khi_venues.shape)
khi_venues.head()

(3113, 8)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cattle Colony,24.879311,67.198723,4ca21645e44d6dcb4dde0375,Anwar Baloch,24.86962,67.200499,BBQ Joint
1,Cattle Colony,24.879311,67.198723,4dbadad51e72b351ca89bc12,The Broast Restaurant,24.884536,67.182993,Fast Food Restaurant
2,Cattle Colony,24.879311,67.198723,4fe5a492e4b02e4293b64533,Student Biryani,24.884636,67.18226,Restaurant
3,Cattle Colony,24.879311,67.198723,4ec2103893ad36d7aa12a7c5,Taj Chaiye Ka Hotel,24.887062,67.183052,Tea Room
4,Cattle Colony,24.879311,67.198723,4ec785a0722e1437965e2e90,Liaquat Market,24.887072,67.183061,Market


In [34]:
khi_venues.groupby('Neighborhood')['Venue'].count().sort_values(ascending=False)

Neighborhood
Clifton               100
Mahmudabad            100
Bath Island           100
Shah Rasool Colony    100
Civil Lines            91
                     ... 
Razzaqabad              1
Qaim Khani              1
Landhi Colony           1
Majeed Colony           1
Mubarak Goth            1
Name: Venue, Length: 185, dtype: int64

we can also visualize it

we can see that most of the venues are in bath island and shah rasool colony

In [35]:
print('There are {} uniques categories.'.format(len(khi_venues['Venue Category'].unique())))

There are 150 uniques categories.


There are 148 unique categories you will find

We will first plot the venues data on the map.

In [36]:
map_khi_venues = folium.Map(
    location=[24.9008, 67.1681], 
    zoom_start=11
)

for lat,lng,name in zip(khi_venues['Venue Latitude'],khi_venues['Venue Longitude'],khi_venues['Venue']):
    label = name
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius = 5,
        popup=label,
        color = 'green',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(map_khi_venues)
    
map_khi_venues

## rating

In [37]:
khi_venues['Venue ID']

0       4ca21645e44d6dcb4dde0375
1       4dbadad51e72b351ca89bc12
2       4fe5a492e4b02e4293b64533
3       4ec2103893ad36d7aa12a7c5
4       4ec785a0722e1437965e2e90
                  ...           
3108    50e1a253e4b0a6d1d2e151cc
3109    4e8b59507ee6dd016f730976
3110    517cebefe4b0ea421e2f2fdd
3111    54318e35498e08f48d9c47d8
3112    55310e94498e6ce7cf529afa
Name: Venue ID, Length: 3113, dtype: object

In [55]:
venues_data = []
for v_id in khi_venues['Venue ID'].head():
    venue_id = '4eb7057d1081376a2c5b5f05'
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
        venue_id,
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION
    )
    
    result = requests.get(url).json()
    print(result)
    print("\n")

{'meta': {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5e8236ca0be7b4001cbd8cc1'}, 'response': {}}


{'meta': {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5e8236aa60ba08001bcaf8bd'}, 'response': {}}


{'meta': {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5e8236e6618f43001bbc4b31'}, 'response': {}}


{'meta': {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5e8236fd77af030028b4567b'}, 'response': {}}


{'meta': {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5e82362c0f5968001bf0e54f'}, 'response': {}}




In [42]:
for v in result:
    v['response']['venue']

TypeError: string indices must be integers

In [56]:
for v_id in khi_venues['Venue ID'].head():
    venue_id = '4eb7057d1081376a2c5b5f05'
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
        venue_id,
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION
    )
    
    result = requests.get(url).json()
    
    try:
        venue_loc = result['response']['venue']['location']['address']
    except:
        venue_loc = "Not Given"
    try:
        venue_total_rating = result['response']['venue']['ratingSignals']
    except:
        venue_total_rating = 0
    try:
        venue_time = result['response']['venue']['hours']['richStatus']['text']
    except:
        venue_time = "Not listed"
    try:
        venue_rating = result['response']['venue']['rating']
    except:
        venue_rating = 0
    try:
        venue_tips = result['response']['venue']['tips']['count']
    except:
        venue_tips = 0
    try:
        venue_price = result['response']['venue']['price']['message']
    except:
        venue_price = "Not listed"

    print("This {},{} has {} rating with total {} people rated and {} tips and timing {} and price {}".format(
        venue_id,
        venue_loc,
        venue_rating,
        venue_total_rating,
        venue_tips,
        venue_time,
        venue_price
    ))


This 4eb7057d1081376a2c5b5f05,Not Given has 0 rating with total 0 people rated and 0 tips and timing Not listed and price Not listed
This 4eb7057d1081376a2c5b5f05,Not Given has 0 rating with total 0 people rated and 0 tips and timing Not listed and price Not listed
This 4eb7057d1081376a2c5b5f05,Not Given has 0 rating with total 0 people rated and 0 tips and timing Not listed and price Not listed
This 4eb7057d1081376a2c5b5f05,Not Given has 0 rating with total 0 people rated and 0 tips and timing Not listed and price Not listed
This 4eb7057d1081376a2c5b5f05,Not Given has 0 rating with total 0 people rated and 0 tips and timing Not listed and price Not listed


In [59]:
result = requests.get(url).json()

In [60]:
result['response']['venue'].keys()

KeyError: 'venue'

In [61]:
venues_data = pd.DataFrame()
venues_data