## Data Description

In order to solve the business problem, I have decided to use the following data as listed below, which includes the Foursquare Location data API.

• List of neighbourhoods in Hyderabad.

• Geographical co-ordinates data of Neighborhoods in Hyderabad city that I have scraped from `Wikipedia` using the     `BeautifulSoup` which is a pyhton package - reference link : 
  **https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Hyderabad,_India**. This data would be used for collecting the Latitude, Longitude values, for Neighbourhood mapping and helps to display points on Folium based maps.

• Venue data for each Neighbourhood in the city using `Foursquare API`. I included venues within a `2000` metre radius from each    neighbourhood centre. The data helps us to identify similar Neighbourhoods using venues. Also helps in clustering algorithm.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

### Scraping data from wikipedia using BeautifulSoup

In [2]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Hyderabad,_India").text

soup = BeautifulSoup(data, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Category:Neighbourhoods in Hyderabad, India - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"6bf0d119-58bc-4b67-9817-c71552aab245","wgCSPNonce":!1,"wgCanonicalNamespace":"Category","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":14,"wgPageName":"Category:Neighbourhoods_in_Hyderabad,_India","wgTitle":"Neighbourhoods in Hyderabad, India","wgCurRevisionId":955880704,"wgRevisionId":955880704,"wgArticleId":3839100,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Neighbourhoods in Telangana","Geography of Hyderabad, India"]

In [3]:
neighborhoodList = []

for row in soup.find_all("div", class_="mw-category")[0].find_all('li'):
    neighborhoodList.append(row.text)

In [13]:
dft = pd.DataFrame({'Neighborhood' : neighborhoodList})
dft.head()

Unnamed: 0,Neighborhood
0,A. S. Rao Nagar
1,A.C. Guards
2,Abhyudaya Nagar
3,Abids
4,Adibatla


In [5]:
dft.shape

(200, 1)

### Getting latitude & longitude using Geocoder

In [7]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Hyderabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

coords = [ get_latlng(neighborhood) for neighborhood in dft["Neighborhood"].tolist() ]


In [14]:
df = pd.DataFrame(coords, columns = ['Latitude', 'Longitude'])

In [15]:
df.head()

Unnamed: 0,Latitude,Longitude
0,17.4112,78.50824
1,17.393001,78.4569
2,17.33765,78.56414
3,17.3898,78.47658
4,17.23579,78.54132


### Combining the two dataframes

In [16]:
dfh = pd.concat([dft,df], axis = 1)

dfh

Unnamed: 0,Neighborhood,Latitude,Longitude
0,A. S. Rao Nagar,17.4112,78.50824
1,A.C. Guards,17.393001,78.4569
2,Abhyudaya Nagar,17.33765,78.56414
3,Abids,17.3898,78.47658
4,Adibatla,17.23579,78.54132
5,Adikmet,17.41061,78.51513
6,Afzal Gunj,17.37751,78.48005
7,Aghapura,17.387385,78.466995
8,"Aliabad, Hyderabad",17.34259,78.47626
9,Alijah Kotla,17.36068,78.47998


In [17]:
address = 'Hyderabad, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hyderabad, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hyderabad, India 17.3616079, 78.4746286.


In [18]:
map_h = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(dfh['Latitude'], dfh['Longitude'], dfh['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_h)  
    
map_h