## The Battle of Neighborhoods - Hyderabad - Final Report

## Introduction & Business Problem:

A retail company wants to set up supermarket stores in Hyderabad city but is not exactly sure about which Neighborhood(s) to open the store(s) in. The chosen locations should ideally have a considerable population so that there is more store footfall & near to work centers/residential districts for easier access to a large number of citizens.
There are 2 business questions that need to be answered.
1. Which part (area) of the city should the company open the supermarket first.
2. Which Neighborhood(s) would be ideal in that part (as in point 1) setting up such a supermarket in the city.
The company would ideally prefer to open the store/s in Neighborhoods where there is a comparatively lower real estate prices (not absolutely low). But the same time, they want to choose the Neighborhoods with a high population and more number of venues, since it should result in more footfall for the store. When we consider the business problem, we can create a map and information chart where the real estate prices are placed on Hyderabad and each area is clustered according to the venue density.

## Background :

I have selected Hyderabad for my project since I am familiar with the same, being a resident of the city. Hyderabad district is a metropolitan with a population of roughly 5 million and 150 Neighborhoods (GHMC) . The city has a high population and population density. Being a crowded city leads the owners of shops and social sharing places in the city where the population is dense. This clustering will ensure that Neighborhoods with moderate real estate price and more number of venues will be in single clustered together and hence would be used to answer the business problem.

## Data Description

In order to solve the business problem, I have decided to use the following data as listed below, which includes the Foursquare Location data API.
Geographical co-ordinates data of Neighborhoods in Hyderabad city by zip code from GitHub repository.
Source : https://github.com/sanand0/pincode/blob/master/data/IN.csv
Venue data for each Neighborhood in the city using Foursquare API. I included venues within a 1000 meter radius from each neighborhood.
The data helps us to identify similar Neighborhoods using venues and also helps in clustering algorithm.
Geo-Json data for GHMC (Hyderabad Municipality) for Choropleth Maps (to show real estate prices).
Use:
Mapping Neighborhoods on Folium Map. Generating centers for each Neighborhood using geo co-ordinates.
The data helps us to show real estate prices on Choropleth/Folium Maps.
Average House prices (per square feet) for each Neighborhood in Hyderabad city.
Source: https://www.makaan.com/price-trends/property-rates-for-buy-in-hyderabad
Use:
The data helps us to show real estate prices on Choropleth Maps and to identify potential Neighborhoods where stores can be opened.

## Problem Statement

1. Which part (area) of the city should the company open the supermarket first.
2. Which Neighborhood(s) would be ideal in that part (as in point 1) setting up such a supermarket in the city

### Load all necessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


### Downloading and exploring dataset

In [33]:
url = "https://raw.githubusercontent.com/sanand0/pincode/master/data/IN.csv"
hyderabad_data = pd.read_csv(url,delimiter = ',')
hyderabad_data.head()

Unnamed: 0,key,place_name,admin_name1,latitude,longitude,accuracy
0,IN/110001,Connaught Place,New Delhi,28.6333,77.2167,4.0
1,IN/110002,Darya Ganj,New Delhi,28.6333,77.25,4.0
2,IN/110003,Aliganj,New Delhi,28.65,77.2167,
3,IN/110004,Rashtrapati Bhawan,New Delhi,28.65,77.2167,
4,IN/110005,Lower Camp Anand Parbat,New Delhi,28.65,77.2,


In [34]:
hyderabad_data.shape

(11042, 6)

In [35]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - anaconda/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4done

# All requested packages already installed.



In [36]:
from geopy.geocoders import Nominatim

In [37]:
address = "India, HYD"

geolocator = Nominatim(user_agent="Hyderabad_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of hyderbad city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of hyderbad city are 17.23092405, 78.431848261532.
