# IBM Applied Data Science Capstone Course by Coursera

### Week 5 Final Report

__*Opening a new Café in Lahore, Pakistan*__
<br>
<ul>
    <li> Build a dataframe of suburbs in Lahore, Pakistan by web scraping the internet
    <li> Fetch the geospatial coordinates of the neighborhoods
    <li> Get the venue data from the Forsquare API
    <li> Explore different areas of Lahore using clustering
    <li> Select the best cluster to open a new cafe
<ul>

## Importing required libraries

In [0]:
import numpy as np
import pandas as pd; pd.set_option("display.max_columns", None); pd.set_option("display.max_rows", None)
from pandas.io.json import json_normalize
import json
import requests
from urllib.request import urlopen
import geocoder
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

## Scraping data from the Wikipedia Page into a DataFrame

In [3]:
web = requests.get("https://en.wikipedia.org/wiki/List_of_towns_in_Lahore").text
soup = BeautifulSoup(web, 'lxml')
table = soup.find('table', {'class':'wikitable'})
links = table.findAll('a')
print(links)

[<a href="/wiki/Ravi_Town" title="Ravi Town">Ravi</a>, <a href="/wiki/Shahdara_Bagh" title="Shahdara Bagh">Shahdara Bagh</a>, <a class="mw-redirect" href="/wiki/Shalimar,_Lahore" title="Shalimar, Lahore">Shalamar</a>, <a href="/wiki/Begampura" title="Begampura">Begampura</a>, <a href="/wiki/Shad_Bagh" title="Shad Bagh">Shad Bagh</a>, <a href="/wiki/Baghbanpura" title="Baghbanpura">Baghbanpura</a>, <a class="mw-redirect" href="/wiki/Wagha" title="Wagha">Wagha</a>, <a href="/wiki/Batapur" title="Batapur">Batapur</a>, <a href="/wiki/Barki,_Pakistan" title="Barki, Pakistan">Barki</a>, <a href="/wiki/Ghurki,_Pakistan" title="Ghurki, Pakistan">Ghurki</a>, <a href="/wiki/Aziz_Bhatti_Town" title="Aziz Bhatti Town">Aziz Bhatti</a>, <a href="/wiki/Harbanspura" title="Harbanspura">Harbanspura</a>, <a class="mw-redirect" href="/wiki/Mughalpura" title="Mughalpura">Mughalpura</a>, <a class="mw-redirect" href="/wiki/Data_Gunj_Buksh_Town" title="Data Gunj Buksh Town">Data Gunj Buksh</a>, <a href="/wik

In [4]:
towns = []
towns_clean = []
for link in links:
  towns.append(link.get("title"))
print(towns)
len(towns)
for t in towns:
  t = t.replace(", Lahore", "")
  t = t.replace(" (Lahore)", "")
  t = t.replace(", Pakistan", "")
  t = t.replace(" (Pakistan)", "")
  towns_clean.append(t)
print(towns_clean)

['Ravi Town', 'Shahdara Bagh', 'Shalimar, Lahore', 'Begampura', 'Shad Bagh', 'Baghbanpura', 'Wagha', 'Batapur', 'Barki, Pakistan', 'Ghurki, Pakistan', 'Aziz Bhatti Town', 'Harbanspura', 'Mughalpura', 'Data Gunj Buksh Town', 'Anarkali', 'Gawalmandi', 'Qila Gujar Singh', 'Mozang Chungi', 'Islampura', 'Krishan Nagar', 'Sanda, Lahore', 'Gulberg, Lahore', 'Garhi Shahu', 'Mayo Gardens', 'Gulberg, Lahore', 'Garden Town (Pakistan)', 'Model Town, Lahore', 'Faisal Town', 'Mochi Pura', 'Kot Lakhpat', 'Samanabad Town', 'Gulshan-e-Ravi', 'Islamia Park', 'Ichhra', 'Baba Shah Jamal', 'Samanabad', 'Mustafa Town', 'Muslim Town, Lahore', 'Iqbal Town, Lahore', 'Awan Town', 'Hassan Town', 'Sabzazar', 'Johar Town', 'Mansoorah, Lahore', 'Education Town', 'Abdalian Cooperative Housing Society', 'WAPDA Town', 'Jati Umra (Lahore)', 'Township, Lahore', 'Raiwind', 'Nishtar Town', 'Green Town', 'Valencia, Lahore', 'NFC Employees Cooperative Housing Society', 'Kahna Nau', 'Pandoke, Lahore', 'Ladheke', 'Lahore Cant

In [5]:
columns = ["Town"]
df = pd.DataFrame(towns_clean, columns=columns)
df.head()

Unnamed: 0,Town
0,Ravi Town
1,Shahdara Bagh
2,Shalimar
3,Begampura
4,Shad Bagh


In [6]:
df.shape

(62, 1)

## Getting the Geospatial Coordinates

In [0]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Lahore, Pakistan'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [0]:
coords = [ get_latlng(neighborhood) for neighborhood in df["Town"].tolist()]

In [9]:
coords

[[31.614900000000034, 74.29570000000007],
 [31.627200000000073, 74.29250000000008],
 [31.549720000000036, 74.34361000000007],
 [31.58000000000004, 74.36580000000004],
 [31.60020000000003, 74.33960000000008],
 [31.224170000000072, 74.09972000000005],
 [31.549720000000036, 74.34361000000007],
 [31.594170000000076, 74.49361000000005],
 [31.549720000000036, 74.34361000000007],
 [31.549720000000036, 74.34361000000007],
 [31.549720000000036, 74.34361000000007],
 [31.60378000000003, 74.56776000000008],
 [25.975820000000056, 68.63502000000005],
 [31.549720000000036, 74.34361000000007],
 [31.566670000000045, 74.31611000000004],
 [33.60280000000006, 73.06040000000007],
 [31.549720000000036, 74.34361000000007],
 [31.549720000000036, 74.34361000000007],
 [31.63020000000006, 74.27330000000006],
 [31.562200000000075, 74.28970000000004],
 [31.549720000000036, 74.34361000000007],
 [24.94210000000004, 67.07040000000006],
 [31.561100000000067, 74.35110000000003],
 [31.555400000000077, 74.35370000000006]

In [0]:
df_coords = pd.DataFrame(coords, columns=["Latitude", "Longitude"])

In [11]:
df_coords.head()

Unnamed: 0,Latitude,Longitude
0,31.6149,74.2957
1,31.6272,74.2925
2,31.54972,74.34361
3,31.58,74.3658
4,31.6002,74.3396


In [0]:
df["Latitude"] = df_coords["Latitude"]
df["Longitude"] = df_coords["Longitude"]

In [13]:
df.head()

Unnamed: 0,Town,Latitude,Longitude
0,Ravi Town,31.6149,74.2957
1,Shahdara Bagh,31.6272,74.2925
2,Shalimar,31.54972,74.34361
3,Begampura,31.58,74.3658
4,Shad Bagh,31.6002,74.3396


In [0]:
df.to_csv("lahore_df.csv", index=False)

## Creating a map of Lahore with neighborhoods

In [15]:
address = "Lahore, Pakistan"

geolocator = Nominatim(user_agent="my-app")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("The geographical cooordinate of Lahore, Pakistan are {}, {}".format(latitude, longitude))

The geographical cooordinate of Lahore, Pakistan are 31.5656079, 74.3141775


In [16]:
map_lahore = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, town in zip(df["Latitude"], df["Longitude"], df["Town"]):
  label = "{}".format(town)
  label = folium.Popup(label, parse_html=True)

  folium.CircleMarker(
      [lat, lng],
      radius = 5,
      popup=label,
      color='blue',
      fill=True,
      fill_color="#3186cc",
      fill_opacity=0.7
  ).add_to(map_lahore)

map_lahore

In [0]:
map_lahore.save("map_lahore.html")

## Using Foursquare API to explore the neighborhoods

In [1]:
CLIENT_ID = 'Your Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'Your Foursquare Secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Your Foursquare ID
CLIENT_SECRET:Your Foursquare Secret


**Let's get upto 500 venues within a radius of 5000 Meters**

In [0]:
radius = 5000
LIMIT = 500

venues = []

for lat, lng, town in zip(df["Latitude"], df["Longitude"], df["Town"]):
  url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
      CLIENT_ID,
      CLIENT_SECRET,
      VERSION,
      lat,
      lng,
      radius,
      LIMIT
  )

  results = requests.get(url).json()['response']['groups'][0]['items']
  
  for venue in results:
    venues.append((
        town,
        lat, 
        lng, 
        venue["venue"]["name"],
        venue["venue"]["location"]["lat"],
        venue["venue"]["location"]["lng"],
        venue["venue"]["categories"][0]["name"]
    ))

In [20]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ["Town", "Latitude", "Longitude", "VenueName", 
                     "VenueLatitude", "VenueLongitude", "VenueCategory"]
print(venues_df.shape)
venues_df.head()

(4012, 7)


Unnamed: 0,Town,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ravi Town,31.6149,74.2957,Ilyas Karahi,31.606977,74.306366,Pakistani Restaurant
1,Ravi Town,31.6149,74.2957,Minar-e-Pakistan,31.591604,74.309481,Monument / Landmark
2,Ravi Town,31.6149,74.2957,Fort Food Street,31.587092,74.311538,Food Court
3,Ravi Town,31.6149,74.2957,Badshahi Masjid,31.588195,74.311354,Mosque
4,Ravi Town,31.6149,74.2957,Fort View,31.587374,74.31201,Restaurant


**Let's see how many venues each neighborhood has**

In [21]:
venues_df.groupby(["Town"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abdalian Cooperative Housing Society,100,100,100,100,100,100
Anarkali,68,68,68,68,68,68
Awan Town,8,8,8,8,8,8
Aziz Bhatti Town,100,100,100,100,100,100
Baba Shah Jamal,100,100,100,100,100,100
Barki,100,100,100,100,100,100
Batapur,6,6,6,6,6,6
Begampura,59,59,59,59,59,59
Cavalry Ground,100,100,100,100,100,100
Data Gunj Buksh Town,100,100,100,100,100,100


**Let's get all the unique categories**

In [22]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 127 uniques categories.


In [23]:
venues_df['VenueCategory'].unique()[:10]

array(['Pakistani Restaurant', 'Monument / Landmark', 'Food Court',
       'Mosque', 'Restaurant', 'Historic Site', 'Department Store',
       'Market', 'BBQ Joint', 'Bookstore'], dtype=object)

In [25]:
"Café" in venues_df['VenueCategory'].unique()

True

## Analyzing each neighborhood

In [26]:
lahore_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

lahore_onehot['Town'] = venues_df['Town'] 

fixed_columns = [lahore_onehot.columns[-1]] + list(lahore_onehot.columns[:-1])
lahore_onehot = lahore_onehot[fixed_columns]

print(lahore_onehot.shape)
lahore_onehot.head()

(4012, 128)


Unnamed: 0,Town,Afghan Restaurant,African Restaurant,Airport Food Court,Airport Lounge,Airport Terminal,American Restaurant,Asian Restaurant,Auto Dealership,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Basketball Court,Beach,Bistro,Bookstore,Border Crossing,Boutique,Bowling Alley,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Café,Camera Store,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,English Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hookah Bar,Hotel,Housing Development,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lebanese Restaurant,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Mosque,Movie Theater,Multiplex,Neighborhood,Other Nightlife,Paintball Field,Pakistani Restaurant,Park,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Recreation Center,Resort,Restaurant,River,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Social Club,Spa,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Toll Plaza,Tourist Information Center,Train Station,Vegetarian / Vegan Restaurant,Warehouse Store,Water Park,Zoo
0,Ravi Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ravi Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ravi Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ravi Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ravi Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Grouping the neighborhoods by taking the mean of the frequency of occurence of each category**

In [27]:
lahore_grouped = lahore_onehot.groupby(["Town"]).mean().reset_index()

print(lahore_grouped.shape)
lahore_grouped

(52, 128)


Unnamed: 0,Town,Afghan Restaurant,African Restaurant,Airport Food Court,Airport Lounge,Airport Terminal,American Restaurant,Asian Restaurant,Auto Dealership,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Basketball Court,Beach,Bistro,Bookstore,Border Crossing,Boutique,Bowling Alley,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Café,Camera Store,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,English Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hookah Bar,Hotel,Housing Development,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lebanese Restaurant,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Mosque,Movie Theater,Multiplex,Neighborhood,Other Nightlife,Paintball Field,Pakistani Restaurant,Park,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Recreation Center,Resort,Restaurant,River,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Social Club,Spa,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Toll Plaza,Tourist Information Center,Train Station,Vegetarian / Vegan Restaurant,Warehouse Store,Water Park,Zoo
0,Abdalian Cooperative Housing Society,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.05,0.0,0.0,0.12,0.0,0.02,0.0,0.08,0.01,0.0,0.0,0.0,0.04,0.03,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.06,0.02,0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Anarkali,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0,0.014706,0.014706,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0,0.014706,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.014706,0.029412,0.014706,0.0,0.088235,0.0,0.014706,0.0,0.029412,0.029412,0.029412,0.0,0.0,0.014706,0.044118,0.014706,0.014706,0.014706,0.0,0.0,0.0,0.014706,0.014706,0.0,0.014706,0.014706,0.0,0.0,0.132353,0.029412,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.058824,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0
2,Awan Town,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0
3,Aziz Bhatti Town,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.13,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.03,0.0,0.05,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Baba Shah Jamal,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.14,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.01,0.03,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.05,0.01,0.02,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.06,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Barki,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.13,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.03,0.0,0.05,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Batapur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0
7,Begampura,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.016949,0.0,0.016949,0.016949,0.033898,0.033898,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.016949,0.016949,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.016949,0.050847,0.0,0.0,0.101695,0.0,0.016949,0.016949,0.016949,0.033898,0.016949,0.0,0.0,0.016949,0.016949,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.0,0.0,0.101695,0.016949,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.016949,0.050847,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.016949,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949
8,Cavalry Ground,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.13,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.03,0.0,0.05,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Data Gunj Buksh Town,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.13,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.03,0.0,0.05,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [28]:
len(lahore_grouped[lahore_grouped["Café"] > 0])

43

**Let's create a new DataFrame for Café data only**

In [29]:
lahore_hotel = lahore_grouped[["Town","Café"]]
lahore_hotel.head()

Unnamed: 0,Town,Café
0,Abdalian Cooperative Housing Society,0.12
1,Anarkali,0.014706
2,Awan Town,0.0
3,Aziz Bhatti Town,0.13
4,Baba Shah Jamal,0.14


## Clustering Neighborhoods/Suburbs

In [30]:
# setting the number of clusters
kclusters = 3

lahore_clustering = lahore_hotel.drop(["Town"], 1)

# running the k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lahore_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 2, 1, 1, 1, 2, 2, 1, 1], dtype=int32)

In [0]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
lahore_merged = lahore_hotel.copy()

# add clustering labels
lahore_merged["Cluster Labels"] = kmeans.labels_

In [32]:
lahore_merged.rename(columns={"Town":"Town"}, inplace=True)
lahore_merged.head()

Unnamed: 0,Town,Café,Cluster Labels
0,Abdalian Cooperative Housing Society,0.12,1
1,Anarkali,0.014706,2
2,Awan Town,0.0,2
3,Aziz Bhatti Town,0.13,1
4,Baba Shah Jamal,0.14,1


In [33]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
lahore_merged = lahore_merged.join(df.set_index("Town"), on="Town")

print(lahore_merged.shape)
lahore_merged.head() # check the last columns

(54, 5)


Unnamed: 0,Town,Café,Cluster Labels,Latitude,Longitude
0,Abdalian Cooperative Housing Society,0.12,1,31.4847,74.3934
1,Anarkali,0.014706,2,31.56667,74.31611
2,Awan Town,0.0,2,33.6227,72.9904
3,Aziz Bhatti Town,0.13,1,31.54972,74.34361
4,Baba Shah Jamal,0.14,1,31.5301,74.3276


In [34]:
# sort the results by Cluster Labels
print(lahore_merged.shape)
lahore_merged.sort_values(["Cluster Labels"], inplace=True)
lahore_merged

(54, 5)


Unnamed: 0,Town,Café,Cluster Labels,Latitude,Longitude
33,Mayo Gardens,0.09,0,31.5554,74.3537
17,Green Town,0.04878,0,31.4289,74.2985
14,Garhi Shahu,0.09,0,31.5611,74.3511
12,Faisal Town,0.07,0,31.4835,74.3064
23,Islamia Park,0.06,0,31.5471,74.3083
37,Mustafa Town,0.061856,0,31.4887,74.2771
22,Iqbal Town,0.04,0,33.6441,73.0995
38,NFC Employees Cooperative Housing Society,0.12,1,31.4847,74.3934
40,Qila Gujar Singh,0.13,1,31.54972,74.34361
35,Mozang Chungi,0.13,1,31.54972,74.34361


**Let's now visualize the resulting clusters**

In [35]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []

for lat, lng, poi, cluster in zip(lahore_merged["Latitude"], 
                                  lahore_merged["Longitude"], lahore_merged["Town"], 
                                  lahore_merged["Cluster Labels"]):
  label = folium.Popup(str(poi) + " - Cluster " + str(cluster), parse_html=True)
  folium.CircleMarker(
      [lat, lng],
      radius = 5,
      popup = label,
      color = rainbow[cluster-1],
      fill=True,
      fill_color=rainbow[cluster-1],
      fill_opacity=0.7
  ).add_to(map_clusters)

map_clusters

In [0]:
map_clusters.save('map_clusters.html')

## Examining Clusters

**Cluster 0**

In [37]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 0]

Unnamed: 0,Town,Café,Cluster Labels,Latitude,Longitude
33,Mayo Gardens,0.09,0,31.5554,74.3537
17,Green Town,0.04878,0,31.4289,74.2985
14,Garhi Shahu,0.09,0,31.5611,74.3511
12,Faisal Town,0.07,0,31.4835,74.3064
23,Islamia Park,0.06,0,31.5471,74.3083
37,Mustafa Town,0.061856,0,31.4887,74.2771
22,Iqbal Town,0.04,0,33.6441,73.0995


**Cluster 1**

In [38]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 1]

Unnamed: 0,Town,Café,Cluster Labels,Latitude,Longitude
38,NFC Employees Cooperative Housing Society,0.12,1,31.4847,74.3934
40,Qila Gujar Singh,0.13,1,31.54972,74.34361
35,Mozang Chungi,0.13,1,31.54972,74.34361
34,Model Town,0.166667,1,30.2073,67.0055
44,Sanda,0.13,1,31.54972,74.34361
47,Shalimar,0.13,1,31.54972,74.34361
32,Mansoorah,0.13,1,31.54972,74.34361
48,Township,0.13,1,31.54972,74.34361
31,Lahore Cantonment,0.13,1,31.54972,74.34361
31,Lahore Cantonment,0.13,1,31.54972,74.34361


**Cluster 2**

In [39]:
lahore_merged.loc[lahore_merged['Cluster Labels'] == 2]

Unnamed: 0,Town,Café,Cluster Labels,Latitude,Longitude
1,Anarkali,0.014706,2,31.56667,74.31611
2,Awan Town,0.0,2,33.6227,72.9904
24,Islampura,0.0,2,31.6302,74.2733
46,Shahdara Bagh,0.0,2,31.6272,74.2925
45,Shad Bagh,0.019231,2,31.6002,74.3396
19,Harbanspura,0.0,2,31.60378,74.56776
43,Sabzazar,0.034091,2,24.8717,67.0969
15,Gawalmandi,0.021277,2,33.6028,73.0604
42,Ravi Town,0.0,2,31.6149,74.2957
18,Gulberg,0.025316,2,24.9421,67.0704


**Observation and Conclusion**
<br>
We traversed the entire Data Science pipeline in this project and completed all the required
steps to get a solution for our business problem. From specifying the problem, obtaining the
required data, extracting and wrangling the data, preprocessing data, performing Machine
Learning on that data to test our hypothesis, to providing recommendations to the stakeholders,
everything was covered in this project. <br>
The solution provided by this project for the business problem discussed in the Introduction
section is as follows: <br>
**_“The neighborhoods/suburbs in Cluster-0 are the most optimal and preferred locations to
open a new cafe in the city of Lahore, Pakistan”_** <br>
The findings of this project will help the stakeholders to decide which neighborhood might be the
best option for opening a new cafe. Which in turn increase their revenue or strengthen their
investment choices.