## The Battle of Neighborhoods - Delhi - Final Report

## Introduction & Business Problem:

A retail company wants to set up supermarket stores in Delhi city but is not exactly sure about which Neighborhood(s) to open the store(s) in. The chosen locations should ideally have a considerable population so that there is more store footfall & near to work centers/residential districts for easier access to a large number of citizens.
There are 2 business questions that need to be answered.
1. Which part (area) of the city should the company open the supermarket first.
2. Which Neighborhood(s) would be ideal in that part (as in point 1) setting up such a supermarket in the city.
The company would ideally prefer to open the store/s in Neighborhoods where there is a comparatively lower real estate prices (not absolutely low). But the same time, they want to choose the Neighborhoods with a high population and more number of venues, since it should result in more footfall for the store. When we consider the business problem, we can create a map and information chart where the real estate prices are placed on Hyderabad and each area is clustered according to the venue density.

## Background :

I have selected delhi for my project since I am familiar with the same, being a resident of the city. Delhi district is a metropolitan with a population of roughly 5 million and 150 Neighborhoods (GHMC) . The city has a high population and population density. Being a crowded city leads the owners of shops and social sharing places in the city where the population is dense. This clustering will ensure that Neighborhoods with moderate real estate price and more number of venues will be in single clustered together and hence would be used to answer the business problem.

## Data Description

In order to solve the business problem, I have decided to use the following data as listed below, which includes the Foursquare Location data API.
Geographical co-ordinates data of Neighborhoods in Hyderabad city by zip code from GitHub repository.
Source : https://github.com/sanand0/pincode/blob/master/data/IN.csv
Venue data for each Neighborhood in the city using Foursquare API. I included venues within a 1000 meter radius from each neighborhood.
The data helps us to identify similar Neighborhoods using venues and also helps in clustering algorithm.
Geo-Json data for GHMC (Hyderabad Municipality) for Choropleth Maps (to show real estate prices).
Use:
Mapping Neighborhoods on Folium Map. Generating centers for each Neighborhood using geo co-ordinates.
The data helps us to show real estate prices on Choropleth/Folium Maps.
Average House prices (per square feet) for each Neighborhood in Hyderabad city.
Source: https://www.makaan.com/price-trends/property-rates-for-buy-in-delhi
Use:
The data helps us to show real estate prices on Choropleth Maps and to identify potential Neighborhoods where stores can be opened.

## Problem Statement

1. Which part (area) of the city should the company open the supermarket first.
2. Which Neighborhood(s) would be ideal in that part (as in point 1) setting up such a supermarket in the city

## Methodology

* For the House prices, I have used web scraping to extract data from a house finder website in my project. A part of the table shown below.
* I have used python folium library to visualize geographic details of Hyderabad by creating a map of Hyderabad with Neighborhoods superimposed on top. I used latitude and longitude values to get the visual as below:
* Using Hyderabad Geojson data (with boundary co-ordinates for Neighborhoods), I calculated the center co-ordinates for each Neighborhoods using python code & list comprehension. Then, I used Folium Library to visualize the centers on map.
* I utilized the Foursquare API to explore Neighborhoods and segment them. I kept the limit as 100 venues and the radius 1000 meters for each Neighborhood centers (calculated above) from their given latitude and longitude data. Here is a head of the list Venues name, category, latitude and longitude information from Foursquare API.
* In summary of this data ~ 1400 venues were returned by Foursquare for Neighborhoods in Hyderabad


### Load all necessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


### Downloading and exploring dataset

In [2]:
url = "https://raw.githubusercontent.com/sanand0/pincode/master/data/IN.csv"
hyderabad_data = pd.read_csv(url,delimiter = ',')
hyderabad_data.head()

Unnamed: 0,key,place_name,admin_name1,latitude,longitude,accuracy
0,IN/110001,Connaught Place,New Delhi,28.6333,77.2167,4.0
1,IN/110002,Darya Ganj,New Delhi,28.6333,77.25,4.0
2,IN/110003,Aliganj,New Delhi,28.65,77.2167,
3,IN/110004,Rashtrapati Bhawan,New Delhi,28.65,77.2167,
4,IN/110005,Lower Camp Anand Parbat,New Delhi,28.65,77.2,


In [3]:
hyderabad_data[['place_name']].dropna()

Unnamed: 0,place_name
0,Connaught Place
1,Darya Ganj
2,Aliganj
3,Rashtrapati Bhawan
4,Lower Camp Anand Parbat
5,Bara Tooti
6,Birla Lines
7,Patel Nagar
8,Delhi Cantt
9,Nirman Bhawan


In [4]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4done

# All requested packages already installed.



In [5]:
from geopy.geocoders import Nominatim

In [8]:
address = "India, Del"

geolocator = Nominatim(user_agent="Delhi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of delhi city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of delhi city are 28.55489735, 77.08467458266915.


In [9]:
#web scraping from housefinder website to find average house prices for each neighborhood in hyderabad
url2 = "https://www.makaan.com/price-trends/property-rates-for-buy-in-delhi" 
source = requests.get(url2).text
hyderabad_pricedata = BeautifulSoup(source,'html.parser')
hyderabad_pricedata 

<!DOCTYPE html>
 <html lang="en"><head><meta content="text/html; charset=utf-8" http-equiv="Content-type"/><title>Property Rates in Delhi - 2020 - Real Estate Property Price &amp; Trends in Delhi</title><meta content="2020 Property Rates in Delhi - Search for residential property price &amp; real estate trends &amp; Compare area wise property rates across Delhi. Check Property Index Delhi on Makaan.com." name="description"/><meta content="Property rates in Delhi, Property trends in Delhi, Delhi property rates, Delhi property rates area wise, Delhi property price, real estate trends in Delhi, Delhi property rates sector wise" name="keywords"/><meta content="#fff" id="themeColor" name="theme-color"/><meta content="origin" name="referrer"/><meta content="55ce01b3ca93c05fd5a41439a23dd0d9" name="p:domain_verify"/><meta content="155462194517712" name="fb:pages"/><meta content="India" name="country"/><meta content="website" name="og:type"/><meta content="Makaan.com" name="og:site_name"/><meta

In [10]:
soup = hyderabad_pricedata.find('table')
table_soup = str(soup)

In [11]:
from IPython.core.display import HTML
HTML(table_soup)

Locality Name,Buy Rates,Buy Rates,Buy Rates,Buy Rates,Buy Rates
"{""city"":""Delhi"",""typeAheadType"":""locality"",""templatePath"":""doT!modules/typeAhead/views/locality"",""rows"":5,""includeGoogplePlace"":true,""includeSuggestion"":true,""placeholder"":""search locality in delhi""}",Price range per sqft,Avg price per sqft,Price rise,Trend,View properties
Uttam Nagar,"250 - 38,000 / sqft","22,576.18 / sqft",5.4%,See trend,View 3213 properties
Greater kailash 1,"2,621 - 70,547 / sqft","41,200.19 / sqft",-15%,See trend,View 76 properties
Malviya Nagar,"4,286 - 26,087 / sqft","15,221.7 / sqft",41.1%,See trend,View 32 properties
Dwarka Mor,"563 - 29,091 / sqft","12,81.17 / sqft",-,See trend,View 1004 properties
Saket,"3,435 - 1,60,256 / sqft","73,265.13 / sqft",-7%,See trend,View 103 properties
Uttam Nagar west,"3,548 - 5,926 / sqft","5,340.79 / sqft",-25.7%,See trend,View 18 properties
Patel Nagar,"8,665 - 40,000 / sqft","23,39.52 / sqft",160.8%,See trend,View 22 properties
Safdarjung Enclave,"2,333 - 73,601 / sqft","37,11.29 / sqft",-20.9%,See trend,View 54 properties
Burari,"2,615 - 46,667 / sqft","6,502.2 / sqft",59.9%,See trend,View 616 properties
Vasant Kunj,"1,000 - 1,11,111 / sqft","33,599.83 / sqft",7%,See trend,View 602 properties


In [12]:
b = hyderabad_pricedata.find('table',attrs = {'class':'tbl'}).find_all('td')
l =[]

for i in range(len(b)):
    l.append(b[i])

    
content = []
for i in l:
    content.append(i.get_text().replace('\n',''))

content

['Uttam Nagar',
 ' 250 - 38,000 / sqft',
 ' 22,576.18 / sqft',
 '5.4%',
 'See trend',
 'View 3213 properties',
 'Greater kailash 1',
 ' 2,621 - 70,547 / sqft',
 ' 41,200.19 / sqft',
 '-15%',
 'See trend',
 'View 76 properties',
 'Malviya Nagar',
 ' 4,286 - 26,087 / sqft',
 ' 15,221.7 / sqft',
 '41.1%',
 'See trend',
 'View 32 properties',
 'Dwarka Mor',
 ' 563 - 29,091 / sqft',
 ' 12,81.17 / sqft',
 '-',
 'See trend',
 'View 1004 properties',
 'Saket',
 ' 3,435 - 1,60,256 / sqft',
 ' 73,265.13 / sqft',
 '-7%',
 'See trend',
 'View 103 properties',
 'Uttam Nagar west',
 ' 3,548 - 5,926 / sqft',
 ' 5,340.79 / sqft',
 '-25.7%',
 'See trend',
 'View 18 properties',
 'Patel Nagar',
 ' 8,665 - 40,000 / sqft',
 ' 23,39.52 / sqft',
 '160.8%',
 'See trend',
 'View 22 properties',
 'Safdarjung Enclave',
 ' 2,333 - 73,601 / sqft',
 ' 37,11.29 / sqft',
 '-20.9%',
 'See trend',
 'View 54 properties',
 'Burari',
 ' 2,615 - 46,667 / sqft',
 ' 6,502.2 / sqft',
 '59.9%',
 'See trend',
 'View 616 proper

In [13]:
columns = ['Locality','Price range per sqft','Average_price','price rise','Trend','View properties']

In [14]:
stacked_values = list(zip(*[content[i::6] for i in range(6)]))
content_list = []
for i in stacked_values:
    c = list(i)
    content_list.append(c)
print(content_list)

[['Uttam Nagar', ' 250 - 38,000 / sqft', ' 22,576.18 / sqft', '5.4%', 'See trend', 'View 3213 properties'], ['Greater kailash 1', ' 2,621 - 70,547 / sqft', ' 41,200.19 / sqft', '-15%', 'See trend', 'View 76 properties'], ['Malviya Nagar', ' 4,286 - 26,087 / sqft', ' 15,221.7 / sqft', '41.1%', 'See trend', 'View 32 properties'], ['Dwarka Mor', ' 563 - 29,091 / sqft', ' 12,81.17 / sqft', '-', 'See trend', 'View 1004 properties'], ['Saket', ' 3,435 - 1,60,256 / sqft', ' 73,265.13 / sqft', '-7%', 'See trend', 'View 103 properties'], ['Uttam Nagar west', ' 3,548 - 5,926 / sqft', ' 5,340.79 / sqft', '-25.7%', 'See trend', 'View 18 properties'], ['Patel Nagar', ' 8,665 - 40,000 / sqft', ' 23,39.52 / sqft', '160.8%', 'See trend', 'View 22 properties'], ['Safdarjung Enclave', ' 2,333 - 73,601 / sqft', ' 37,11.29 / sqft', '-20.9%', 'See trend', 'View 54 properties'], ['Burari', ' 2,615 - 46,667 / sqft', ' 6,502.2 / sqft', '59.9%', 'See trend', 'View 616 properties'], ['Vasant Kunj', ' 1,000 - 1,

In [15]:
df = pd.DataFrame(content_list)
df.columns = columns
df

Unnamed: 0,Locality,Price range per sqft,Average_price,price rise,Trend,View properties
0,Uttam Nagar,"250 - 38,000 / sqft","22,576.18 / sqft",5.4%,See trend,View 3213 properties
1,Greater kailash 1,"2,621 - 70,547 / sqft","41,200.19 / sqft",-15%,See trend,View 76 properties
2,Malviya Nagar,"4,286 - 26,087 / sqft","15,221.7 / sqft",41.1%,See trend,View 32 properties
3,Dwarka Mor,"563 - 29,091 / sqft","12,81.17 / sqft",-,See trend,View 1004 properties
4,Saket,"3,435 - 1,60,256 / sqft","73,265.13 / sqft",-7%,See trend,View 103 properties
5,Uttam Nagar west,"3,548 - 5,926 / sqft","5,340.79 / sqft",-25.7%,See trend,View 18 properties
6,Patel Nagar,"8,665 - 40,000 / sqft","23,39.52 / sqft",160.8%,See trend,View 22 properties
7,Safdarjung Enclave,"2,333 - 73,601 / sqft","37,11.29 / sqft",-20.9%,See trend,View 54 properties
8,Burari,"2,615 - 46,667 / sqft","6,502.2 / sqft",59.9%,See trend,View 616 properties
9,Vasant Kunj,"1,000 - 1,11,111 / sqft","33,599.83 / sqft",7%,See trend,View 602 properties


In [16]:
df.shape

(60, 6)

In [17]:
realestate_price = df[['Locality','Average_price']]
realestate_price 

Unnamed: 0,Locality,Average_price
0,Uttam Nagar,"22,576.18 / sqft"
1,Greater kailash 1,"41,200.19 / sqft"
2,Malviya Nagar,"15,221.7 / sqft"
3,Dwarka Mor,"12,81.17 / sqft"
4,Saket,"73,265.13 / sqft"
5,Uttam Nagar west,"5,340.79 / sqft"
6,Patel Nagar,"23,39.52 / sqft"
7,Safdarjung Enclave,"37,11.29 / sqft"
8,Burari,"6,502.2 / sqft"
9,Vasant Kunj,"33,599.83 / sqft"


In [18]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - anaconda/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4done

# All requested packages already installed.



In [19]:
import folium

In [20]:
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=10)
map_delhi

In [21]:
hyderabad_data.shape

(11042, 6)

In [22]:
hyderabad_data.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True)

In [23]:
for lat, lng, place_name in zip(
        hyderabad_data['latitude'], 
        hyderabad_data['longitude'], 
        hyderabad_data['place_name']):
    label = '{}'.format(place_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  

map_delhi

In [34]:
CLIENT_ID = 'ZVLHJZ33WTETAWCHLCZGBLOEHC5W00ZFSIKKNOD4UPWW5BTJ'
CLIENT_SECRET = 'SIZ04Q1SZXAJHBJX0GSFAGR2NVIJ5X33VIUI140MT3JHJQS5'
VERSION = '20180605'

In [35]:
neighborhood_name = hyderabad_data.loc[0, 'place_name']
print(f"The first neighborhood's name is '{place_name}'.")

The first neighborhood's name is 'Nawabganj'.


In [36]:
neighborhood_latitude = hyderabad_data.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = hyderabad_data.loc[0, 'longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(place_name, 
                                                               latitude, 
                                                               longitude))

Latitude and longitude values of Nawabganj are 28.55489735, 77.08467458266915.


In [37]:
from pandas.io.json import json_normalize 

In [38]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb99b3d69babe001bc263de'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'N.D. Charge 1',
  'headerFullLocation': 'N.D. Charge 1, Delhi',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 62,
  'suggestedBounds': {'ne': {'lat': 28.637800004500004,
    'lng': 77.2218174420243},
   'sw': {'lat': 28.628799995499993, 'lng': 77.2115825579757}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '519ba450498eb0c559152d94',
       'name': 'HOTEL SARAVANA BHAVAN',
       'location': {'address': 'P-13/90, Connaught Circus',
        'lat': 28.632319466435003,
        'lng': 77.21644531748599,
        'labeledLatLngs': [{'label': 'display',
  

In [39]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [40]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  


Unnamed: 0,name,categories,lat,lng
0,HOTEL SARAVANA BHAVAN,South Indian Restaurant,28.632319,77.216445
1,Wenger's,Bakery,28.633412,77.218292
2,Fabindia,Clothing Store,28.632012,77.217729
3,Starbucks,Coffee Shop,28.632011,77.217731
4,Wenger's Deli,Deli / Bodega,28.633658,77.218139
5,Connaught Place | कनॉट प्लेस (Connaught Place),Plaza,28.632731,77.220018
6,Immigrants Project - A Cafe in History,Café,28.634055,77.218867
7,Berco's,Chinese Restaurant,28.632407,77.217323
8,Nizam's Kathi Kabab | निजा़म काठी कबाब,Indian Restaurant,28.634858,77.219462
9,Jain Chawal Wale,Food Truck,28.630052,77.217649


In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [43]:
hyderabad_venues = getNearbyVenues(names=hyderabad_data['place_name'],
                                   latitudes=hyderabad_data['latitude'],
                                   longitudes=hyderabad_data['longitude']
                                  )

ConnectionError: HTTPSConnectionPool(host='api.foursquare.com', port=443): Max retries exceeded with url: /v2/venues/explore?&client_id=ZVLHJZ33WTETAWCHLCZGBLOEHC5W00ZFSIKKNOD4UPWW5BTJ&client_secret=SIZ04Q1SZXAJHBJX0GSFAGR2NVIJ5X33VIUI140MT3JHJQS5&v=20180605&ll=28.6333,77.2167&radius=500&limit=100 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001B249D24DC8>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

In [None]:
hyderabad_venues.head()