# Capstone Project - The Battle of the Neighborhoods (Week 2)

##### Applied Data Science Capstone by IBM/Coursera

## Introduction: Business Problem

In this project we will try to find an optimal location for a cafe. Specifically, this report will be targeted to stakeholders interested in opening an cafe in Pune, India.

Since there are lots of cafes in Pune we will try to detect locations that are not already crowded with restaurants. We are also particularly interested in areas with no Cafe in vicinity. We would also prefer locations as close to city center as possible, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data 

Based on definition of our problem, factors that will influence our decission are:

- number of existing cafes in the neighborhood (any type of restaurant)
- number of and distance to cafes in the neighborhood, if any
- distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:

* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding.
* number of cafe and their type and location in every neighborhood will be obtained using Foursquare API and Zomato API.
* coordinate of pune center will be obtained using Google Maps API geocoding of well known Pune location.

## Fetching Data

In [1]:
# The code was removed by Watson Studio for sharing.

#### Generating Method to Get coordinates 

In [2]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Pune, Maharashtra'
pune_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, pune_center))

Coordinate of Pune, Maharashtra: [18.5204303, 73.8567437]


In [3]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Pune center longitude={}, latitude={}'.format(pune_center[1], pune_center[0]))
x, y = lonlat_to_xy(pune_center[1], pune_center[0])
print('Pune center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
PUNE_LATITUDE=la
PUNE_LONGITUDE=lo
print('Pune center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Pune center longitude=73.8567437, latitude=18.5204303
Pune center UTM X=7722847.371412157, Y=3662737.095041096
Pune center longitude=73.8567436999998, latitude=18.520430299995276


#### Get the nearby place coordinates

In [47]:
pune_center_x, pune_center_y = lonlat_to_xy(pune_center[1], pune_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = pune_center_x - 6000
x_step = 800
y_min = pune_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 800 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(pune_center_x, pune_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

206 candidate neighborhood centers generated.


Importing folium library for map

In [5]:
#!pip install folium

import folium

Marking the area we intrested in with the circle

In [51]:
map_pune = folium.Map(location=pune_center, zoom_start=13)
folium.Marker(pune_center, popup='Pune').add_to(map_pune)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=250, color='blue', fill=False).add_to(map_pune)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_pune

The API returning **categories** in some case and **venue.categories** for others so generalize it with filter and storing into variable for further use. 

In [7]:
def get_category_type(row):
    try:
        categories_list = row['categories']
        
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Getting the all venues in the radius of 4 k.m. with help of foursquare API

In [19]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
import requests

pd.set_option('display.max_rows', None)

offset = 0
total_venues = 0
foursquare_venues = pd.DataFrame(columns = ['name', 'categories', 'lat', 'lng'])

while (True):
    url = ('https://api.foursquare.com/v2/venues/explore?client_id={}'
           '&client_secret={}&v={}&ll={},{}&radius={}&limit={}&offset={}').format(FOURSQUARE_CLIENT_ID, 
                                                                        FOURSQUARE_CLIENT_SECRET, 
                                                                        VERSION, 
                                                                        PUNE_LATITUDE, 
                                                                        PUNE_LONGITUDE, 
                                                                        RADIUS,
                                                                        NO_OF_VENUES,
                                                                        offset)
    result = requests.get(url).json()
    venues_fetched = len(result['response']['groups'][0]['items'])
    total_venues = total_venues + venues_fetched
    print("Total {} venues fetched within a total radius of {} Km".format(venues_fetched, RADIUS/666))

    venues = result['response']['groups'][0]['items']
    venues = json_normalize(venues)
    # Filter the columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    venues = venues.loc[:, filtered_columns]
    
    # Filter the category for each row
    venues['venue.categories'] = venues.apply(get_category_type, axis = 1)
    
    # Clean all column names
    venues.columns = [col.split(".")[-1] for col in venues.columns]
    foursquare_venues = pd.concat([foursquare_venues, venues], axis = 0, sort = False)
    
    if (venues_fetched < 100):
        break
    else:
        offset = offset + 100

foursquare_venues = foursquare_venues.reset_index(drop = True)
print("\nTotal {} venues fetched".format(total_venues))

Total 100 venues fetched within a total radius of 6.006006006006006 Km
Total 100 venues fetched within a total radius of 6.006006006006006 Km
Total 8 venues fetched within a total radius of 6.006006006006006 Km

Total 208 venues fetched


Now we will plot the data on the map

In [9]:
pune_map = folium.Map(location = [PUNE_LATITUDE, PUNE_LONGITUDE], zoom_start = 13)
cnt=0
for name, latitude, longitude in zip(foursquare_venues['name'], foursquare_venues['lat'], foursquare_venues['lng']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [latitude, longitude],
        radius = 5,
        popup = label,
        color = 'green',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(pune_map)  
pune_map

With the help of **Zomato API** getting the **Cafes** near the venues which we previously fetch using **foursquare API.**

In [10]:
venues_information = []
print("Fetching data..",end='')
for index, row in foursquare_venues.iterrows():
    print(".", end='')
    venue = []
    url = ('https://developers.zomato.com/api/v2.1/search?q={}' + 
          '&start=0&count=1&lat={}&lon={}&sort=real_distance').format('cafe', row['lat'], row['lng'])
    result = requests.get(url, headers = headers).json()
    try:
        #print(result)
        if (len(result['restaurants']) > 0):
            venue.append(result['restaurants'][0]['restaurant']['name'])
            venue.append(result['restaurants'][0]['restaurant']['location']['latitude'])
            venue.append(result['restaurants'][0]['restaurant']['location']['longitude'])
            venue.append(result['restaurants'][0]['restaurant']['average_cost_for_two'])
            venue.append(result['restaurants'][0]['restaurant']['price_range'])
            venue.append(result['restaurants'][0]['restaurant']['user_rating']['aggregate_rating'])
            if('cafe' in str(result['restaurants'][0]['restaurant']['cuisines']).lower()):
                venues_information.append(venue)
        else:
            venues_information.append(np.zeros(6))
    except:    #To prevent it from making kernal died
        #print(result)
        if(result['code']==440):
            break
print("Fetching Done.")
try:
    zomato_venues = pd.DataFrame(venues_information,columns = ['venue', 'latitude','longitude', 'price_for_two','price_range', 'rating'])
except:
    pass

Fetching data.....................................................................................Fetching Done.


In [11]:
zomato_venues=zomato_venues.drop(['price_for_two','price_range','rating'],axis='columns')
zomato_venues['isCafe']=True

In [20]:
foursquare_venues = foursquare_venues[['name','lat','lng','categories']]
foursquare_venues.columns = ['venue','latitude','longitude','isCafe']
forCafe=['Snack Place','Fast Food Restaurant','Breakfast Spot','Café','Burger Joint','Dessert Shop','Bakerie','Sandwich Place','Donut Shop','Tea Room','Juice Bar','Coffee Shop','Ice Cream Shop']

In [21]:
lst=[]
for i in list(foursquare_venues['isCafe']):
    if i not in list(forCafe):
        lst.append("False")
    else:
        lst.append("True")
foursquare_venues['isCafe']=lst
foursquare_venues.head()

Unnamed: 0,venue,latitude,longitude,isCafe
0,Lal Mahal,18.51872,73.856556,False
1,Sujata Mastani,18.511793,73.852145,True
2,Mad Over Donuts,18.519335,73.84532,True
3,Raja Dinkar Kelkar museum,18.510744,73.854389,False
4,Krishna Juice Bar,18.523553,73.847651,True


In [23]:
newDf=pd.concat([zomato_venues, foursquare_venues], axis=0, join='outer', ignore_index=True, keys=None,levels=None, names=None, verify_integrity=False, copy=True)
newDf.head()

Unnamed: 0,venue,latitude,longitude,isCafe
0,Indo Western Food Station,18.52043,73.85674,True
1,McCafe by McDonald's,18.5183620187,73.8446574658,True
2,Tea Post,18.5207772108,73.8463311642,True
3,Words and Sips Book Cafe,18.521195,73.841722,True
4,Boka Book Cafe,18.5167984999,73.8416151702,True


Ploting the Cafes in map

In [24]:
def str2bool(v):
  return v in ("True", "true", "t", "1","yes")

In [25]:
pune_map = folium.Map(location = [PUNE_LATITUDE, PUNE_LONGITUDE], zoom_start = 13)

for venue, latitude, longitude,isCafe in zip(newDf['venue'], newDf['latitude'], newDf['longitude'],newDf['isCafe']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [latitude, longitude],
        radius = 5,
        popup = label,
        color = 'red' if str2bool(isCafe) else 'blue',
        fill = True,
        fill_color = '#cc3535',
        fill_opacity = 0.7,
        parse_html = False).add_to(pune_map)  

pune_map

In [55]:
pune_latlons = newDf[['latitude','longitude']]

cafe_latlons =newDf[newDf['isCafe']=="True"][['latitude','longitude']]


In [28]:
from folium import plugins
from folium.plugins import HeatMap

pune_map = folium.Map(location=[PUNE_LATITUDE, PUNE_LONGITUDE], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(pune_map) #cartodbpositron cartodbdark_matter
HeatMap(pune_latlons).add_to(pune_map)
folium.Marker([PUNE_LATITUDE, PUNE_LONGITUDE]).add_to(pune_map)
folium.Circle([PUNE_LATITUDE, PUNE_LONGITUDE], radius=1000, fill=False, color='white').add_to(pune_map)
folium.Circle([PUNE_LATITUDE, PUNE_LONGITUDE], radius=2000, fill=False, color='white').add_to(pune_map)
folium.Circle([PUNE_LATITUDE, PUNE_LONGITUDE], radius=3000, fill=False, color='white').add_to(pune_map)
pune_map

In [29]:
from folium import plugins
from folium.plugins import HeatMap

pune_map = folium.Map(location=[PUNE_LATITUDE, PUNE_LONGITUDE], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(pune_map) #cartodbpositron cartodbdark_matter
HeatMap(cafe_latlons).add_to(pune_map)
folium.Marker([PUNE_LATITUDE, PUNE_LONGITUDE]).add_to(pune_map)
folium.Circle([PUNE_LATITUDE, PUNE_LONGITUDE], radius=1000, fill=False, color='white').add_to(pune_map)
folium.Circle([PUNE_LATITUDE, PUNE_LONGITUDE], radius=2000, fill=False, color='white').add_to(pune_map)
folium.Circle([PUNE_LATITUDE, PUNE_LONGITUDE], radius=3000, fill=False, color='white').add_to(pune_map)
pune_map